In the first part we introduced the raw image data formats, which we receive from camera. Now it’s time to get to the good stuff. First we formulate our goals and start with trying to extract an object from video feed. In order to track something we first need to find the “something”.
Think before you act?
When solving a complex problem, it may be sometimes useful to start from the desired result and work your way from the result to the beginning in the problem solving pipeline. This way we can break the problem down to several resolutions – steps in between leading to the final solution if you like. In case of our problem, this reversed pipeline would look something like in the table below.
Table 1. Problem solving pipeline reversed.
|What is the result?||How do we get there?|
|An object, with a specific shape, identified in the image (location, size, shape etc.)||Find the center of mass of the two dimensional object and its outline.|
|Outline of the detected object.||Apply convex hull algorithm to object map (extended binary image, where the background is removed).|
|Object map, where all significant (suspected) objects are separated and the background is removed.||Individualize objects from a binary image.|
|Binary image in which objects have value 1 and background value 0.||Apply algorithm to extract objects with some criteria from the background. E.g. chroma filter.|
|Chroma filter implementation to extract objects from image.||Start coding!|
Here the word “chroma” is a bit misleading, since we also inspect luma (Y) value when filtering the image. The algorithm is as follows:
- Set a desired YUV value as a target. This is basically the color we want to find from the picture i.e. if a ball is blue and the background is orange, we try to set the value as close to the blue color of the ball as possible. We also need to set a threshold value, which defines the allowed difference between the target value and the blue color we accept.
- Iterate the image data (byte array), pixel by pixel or block by block. By block we mean the unit shown in figures 1 and 2. In the case of NV12, there are four Y values, one U and one V. We could calculate the average of the four Y values, but since the values are likely to be almost the same, for the sake of optimization it’s enough to choose just one. Then we simply compare the set target values to the measured ones, as pseudo code:
IF Difference(target_value_Y, measured_Y) < threshold AND
Difference(target_value_U, measured_U) < threshold AND
Difference(target_value_V, measured_V) < threshold
(Mark this pixel/block as selected)
(Mark this pixel/block as not selected)
Figure 4. The original image on left and the image with chroma filter (red) applied on right.
In the case of NV12 we can utilize the Y plane (since it matches the full size of the image) and convert it into a virtual binary image: Y value 0 indicates 0 and Y value 255 indicates 1 (as done on the right-hand-side image in figure 4). We can then use it to map objects as explained in the next chapter.
In our code project we have a native effect, which executes the aforementioned chroma filter for a single frame: ChromaFilterEffect.
The simplest source for mapping objects is a binary image e.g. a bit array where 0 denotes background and 1 an object. What we need to do is to identify objects that are not joined (their pixels don’t touch each other). We can do this by assigning each object a unique number. The end result can be, for instance, an array of integers, where 0 still denotes background, but a value greater than 0 is an ID of a certain object. Figure 5 illustrates this process from the binary image to resulting object map.
The principle of the algorithm for mapping objects is very simple: Check each pixel of the binary image and assign a unique value to those pixels which are adjacent. If we break this down into more detail, we can have something like this, when the image is in form of an array:
- Create an object ID counter (let’s call the counter c), which provides a unique value for the pixels of each object. You can start with value c = 1.
- Go through each pixel starting from 0 to N – 1, where N is the number of pixels:
- If the pixel has value 0 (background):
- Do nothing.
- If the pixel has value 1 (an object):
- If the previous pixel had value 0:
- Set the object ID counter value so that the new value is guaranteed to be unique (let’s call this unique ID U), c = U, since this can be a new object. It is important that the value is truly unique – mere increment by one is not enough.
- If there is an adjacent non-zero pixel above (see figure 6):
- Get the object ID of the adjacent pixel (let’s call this value as A).
- Backtrack (with a separate iteration) and replace all pixels, which have the current value of counter c with value
- You can stop backtracking, if you encounter a line with no pixels having the current value of the object ID counter.
- Set the value of the object ID counter to the value of the adjacent pixel (c = A) and continue the original iteration from the index before backtracking.
- Set the value of the current pixel to match the current value of the object ID counter (value of c).
- If the previous pixel had value 0:
- If the pixel has value 0 (background):
See ImageProcessingUtils::createObjectMap method where the aforementioned algorithm is implemented with C++.
Note that after applying the algorithm described above, you will not end up with an ordered object map (map with ordered IDs: 1, 2, …, N). Instead, you will have a map where each object has a unique ID, but they can be arbitrary, e.g. 2, 5, …, N + 100. Sorting the map is quite trivial: Get a list of current IDs (one iteration), create a map where each existing ID has an ordered counterpart (2 -> 1, 5 -> 2, …, N + 100 -> N) and replace the values in the object map (second iteration).
Identifying object shapes
Now that we have an object map, we can easily see the size of each object. But what about the shape? It depends on the way we want to classify shapes e.g. are we interested in round shapes or squares. In addition, depending on our object extraction methods, the objects may not be perfectly extracted; some of what was part of the object, could have been interpreted as background. For instance, in figure 6, the largest object could have been a ball with a large stripe on it that was lost when our chroma filter was applied. Luckily, there is a simple algorithm for “fixing” the shape, called convex hull.
Convex hull can be considered as an elastic band wrapped around an object. If we apply the algorithm to the first object in the object map we created earlier, the stripe in the center is discarded and a circle shape emerges:
There are number of different convex hull algorithms. The one used in this solution is called monotone chain (see ImageProcessingUtils::createConvexHull method). For more information about monotone chain including code samples can be found from Wikipedia: http://en.wikibooks.org/wiki/Algorithm_Implementation/Geometry/Convex_hull/Monotone_chain
Convex hull gives us a list of points (pixels coordinates), which form the outline of an object. However, we still have to determine the shape of the object somehow. If it’s a ball shape we’re after, we could find the center of the object – e.g. center of mass where one pixel equals one mass unit – and then see how much the outline differs from a circle, where the radius is width or the height of the object divided by 2. The less the difference, the more the shape is like a circle, which of course is the shape of a ball in two dimensional presentation.
Now that we have found our object let’s try to track it. See the next and
the final partof this thrilling trilogy and find out if we succeeded in our goal to detect the changes in object position!