This week I was watching Netflix’s latest sci-fi adaptation of Three Body Problem (SPOILER ALERT). The story revealed how the San-Ti alien race built the Sophon, a supercomputer the size and mass of a single proton, to sabotage the scientific advancement on Earth. While the TV series didn’t dive too deep into the methodology behind it, the novel (strongly recommend!) described how the San-Ti engineers were able to take advantage of the dimensions of these particles.
Normalization has been a standard technique for vision-related tasks for a while, and there are dozens of different strategies out there. It can be overwhelming to try to understand each of them.
Recall that no matter what strategy we pick, the goal of normalization is to “shift” the target samples into a certain distribution. This is usually good for stabilizing training, as normalization standardizes the input.
Let’s look at how the top 4 most used normalization strategies work, and why you might choose one over another.
As you probably already know, a single-channel convolution works by sliding a 2D filter, usually smaller than the input matrix, across the height and width dimensions. For every sliding “window”, we then compute the weighted sum. The resulting output is a smaller 2D matrix.
simple convolution
Most of the time, however, we are dealing with tensors that have more than one channel (a colored image for example). Things get even more complicated when we want to have a different number of input and output channels.