/images/avatar.png

Yu-Kai "Steven" Wang

ML Engineer, |

How Sampling on a Spherical (Polar) Coordinate Can be Biased

Continuing from last week’s discussion on the volume of a hypersphere, we’re going to take a look at how to uniformly sample points on a 3D sphere (also applies to N-dimensional Hyperspheres), and the common pitfall that comes with it.

Let’s say you are building an algorithm that spawns Pokemons at random locations all around the globe for the game Pokemon Go. The first thing you might do is to uniformly sample locations on Earth to spawn your Pokemons. To do so we’ll assume that the Earth is a perfect unit sphere (radius equals 1).

Volume of a Hypersphere (N-Ball) is Weird...

This week I was watching Netflix’s latest sci-fi adaptation of Three Body Problem (SPOILER ALERT). The story revealed how the San-Ti alien race built the Sophon, a supercomputer the size and mass of a single proton, to sabotage the scientific advancement on Earth. While the TV series didn’t dive too deep into the methodology behind it, the novel (strongly recommend!) described how the San-Ti engineers were able to take advantage of the dimensions of these particles. The Sophon was originally a regular proton (11D by default) “unfolded” into a planet-sized 2D plane, which allowed the alien scientists to etch a humongous supercomputer circuit onto its surface. This planet-sized supercomputer proton is then “folded” back into its original form with 11 dimensions, making it virtually impossible to detect since it has the same size and mass as any other proton. How cool is that?

Normalization Strategies: Batch vs Layer vs Instance vs Group Norm

Normalization has been a standard technique for vision-related tasks for a while, and there are dozens of different strategies out there. It can be overwhelming to try to understand each of them.

Recall that no matter what strategy we pick, the goal of normalization is to “shift” the target samples into a certain distribution. This is usually good for stabilizing training, as normalization standardizes the input.

Let’s look at how the top 4 most used normalization strategies work, and why you might choose one over another. Throughout this post will be looking at an example with an input dimension of N x C x H x W, where N is the batch size, C is channels, and H x W is the size of an image.

How Multi-Channel Convolution Works

As you probably already know, a single-channel convolution works by sliding a 2D filter, usually smaller than the input matrix, across the height and width dimensions. For every sliding “window”, we then compute the weighted sum. The resulting output is a smaller 2D matrix.

simple convolution

simple convolution

Most of the time, however, we are dealing with tensors that have more than one channel (a colored image for example). Things get even more complicated when we want to have a different number of input and output channels.