Tracking Objects from Video Feed Part IV: New Hope for Circular Shapes

So, turns out part III wasn’t the last of it – I present to you part IV, which I promise will be at least as good as The Kingdom of the Crystal Skull was as far as sequels after the third one go… In the previous chapter of this story, I mentioned few ideas to improve the reliability and robustness of various bits in the object tracking pipeline. I did, in fact, implement a very simple noise removal functionality, edge detection and chroma delta i.e. the difference between two frames presented as a binary image. I also made the system to be biased towards things, which are round, like balls (or circles if you will since we are dealing with 2D images – but I like the words ballzz better being an immature halfwit).

What’s with the noise!?

There are number of methods to remove noise from an image. Removing noise is essential especially in cases where you want to apply any sort of edge detection. One of the obvious choices is using a Gaussian filter and it really isn’t hard to find ready-made algorithms from the interwebs regardless of the coding language you chose (MUMPS and Brainfuck excluded). Did yours truly then incorporate a Gaussian filter into the project? You bet your ass he did not. Instead, I opted for, although quite similar coding-wise, a nearest neighbor smoother: For every pixel in the image calculate the average of all eight neighbors of said pixel and apply that value to the pixel in question. Super simple! And effective! However, even though this method is slightly quicker than using e.g. 5×5 Gaussian filter, it still takes its toll: I averaged approximately 350 milliseconds per 720p frame on Lumia 930[1].

Software developer enchanted by a lantern
Edges extracted from the original image with noise (center) and with noise removed (right). Hint: Click the image to enlarge.

On the edge

Canny edge detector is a pretty good choice for your edge detection needs. It’s robust, works well and has a funky name. Did I decide to go with a different solution? You betcha! Why? Well, because I’m lazy, and settled for lesser result. My implementation also performs better i.e. less milliseconds spent per frame. What I do is as follows:

  • For every pixel starting from the top left corner handling a horizontal line at a time:
    1. Calculate the difference of the current pixel to the one on the right (if one exists) – we could call this difference “gradient”
    2. Calculate the difference of the current pixel to the one below (if one exists) – let’s go crazy and call this a “gradient” too
    3. If the sum of the gradients exceeds a set threshold value, mark this pixel as part of an edge. If not, it’s not an edge.

“So, what about the pixels on the left and above?” one might ask. “I don’t give a ….” is my answer. So far my solution is sufficient for my needs and that’s all I really care about.

Chroma delta

As described in the previous part, chroma delta takes two frames and calculates their difference and presents it as a binary image. Each pixel has components with three values, whether it’s RGB or YUV, and calculating the overall difference of each of these values for each corresponding pixel in both frames and using a threshold value, we get a binary value for the corresponding pixel in the delta frame. In my case I utilize the Y-plane and set the value 0xff (255) for the pixel, which has changed significantly, or 0x0 (0) for the ones that remain fairly unchanged.

Chroma delta gone bananas
How to make a code monkey mad? Touching his/her banana, that’s how!

So to make it clear: The right-most image displays the change between the two others. See how even the noticeable change in the lighting condition still yields a result. If we know that the banana was in the place depicted in the left image, by looking at the delta image, we can deduce that it’s no longer in that position and by looking at the other shape it’s quite obvious that it moved to the right.

Them balls

When the application is in charge of choosing the object to track, it goes without saying that you have to device some sort of traits for the desired object. My life experience has thought me that round things are more likely to move than rectangular things. I suppose the first guy who realized this invented the wheel. Thus, I’m just as clever. But how do you programmatically evaluate whether a thing is round or not? Especially when despite of your more or less advanced image processing methods do not fully extract the shape of an object?

“Just call me Darth Balls… Bong.” – Jay (from Jay and Silent Bob Strike Back, 2001)

I came to a conclusion that I would have to find a way to calculate how well my neat convex hulls would match a circle. Again, I looked for an answer in the interwebs and discovered that the problem I needed the solution for is called Smallest-circle problem and luckily a sharp, Austrian dude, Emo Welzl, had already proposed a recursive algorithm, which not only solves the problem but does it in mere O(n) time. However – being a lucky bastard – since I already had my convex hulls, I could use a more straightforward solution to create my minimal enclosing circles – here’s how:

  • Find the two vertices, in the convex hull, furthest apart. Their distance is the diameter of your enclosing circle.
  • The center point of the line segment between those two vertices is also the center point of the circle (see the image below).
Minimal Enclosing Circle
Minimal enclosing circle based on the convex hull reconstructs the shape of the ball.

Now that you have the circle, you can compare the traits of the convex hull to see how well they align with the minimal enclosing circle. I tried calculating the difference of the vertices of the convex hull to the circumference of the enclosing circle and it worked out pretty well. I’m sure there are other and even better ways to evaluate the roundness (or “eccentricity” as intelligent people would say). Do also keep in mind that you need to normalize the error based on the object’s size unless you want to be biased towards smaller objects.

Teh codez

Here is the code corresponding to the scribbles in this blog post:

Note that, as of writing this, the pipeline of the Object tracking demo consists of something old, something new, something blue i.e. it’s not really functioning properly.

[1] Lumia 930 is a mobile phone designed and manufactured by a Finnish company, Nokia, which sold its mobile phone division to Microsoft on April 2014 when it decided to focus on Jedi lightsaber-combat-practice-laser-shooting-sphere-things instead.