Machine Learning Projects for Mobile Applications
上QQ阅读APP看书,第一时间看更新

Finding features from an image

Let's take a look at an image of the letter X. When we feed a new image into the system, the CNN doesn't know whether the feature will match or not. Consequently, it will try to match the feature pattern everywhere across the image. This is how we build a filter:

The math logic we apply here is called convolution. To calculate the match of a feature to the portion of the image, multiply each pixel value of the feature by the value of the corresponding pixel in the image. To come to one final value, add together all of the values and divide them by the total number of pixels:

For example, if both pixels are the same color (say a value of 1) then (1) * ( 1) = (1). If not, then (-1) * (-1) = 1. In the end result, every matching pixel will have a resulting value of 1, and every mismatch will have the value of -1.

To complete this convolution process, let's move our feature grid onto the image patch. As shown in the following diagram, the 3 x 3 grid moves the 7 x 7 grid over. This forms the resulting 5 x 5 array. In the resulting grid, the values close to 1 represent strong matches, the values close to 0 represent no match, and the values close to -1 represent photographic negatives of our feature:

In the next step, we have to repeat the convolution process for all other features. This gives the filtered image—one for each of our filters. In CNN, this is known as the convolution layer, and this will be followed by additional layers that are added to it. 

This is where CNN gets into heavy computations. This example shows a simple 7 x 7 image that provides 5 x 5 as a result. However, a typical picture will be at least 128 x 128 pixels in size. The computation increases linearly with the number of features, as well as the number of pixels in each feature.