In the previous part we provided one solution for detecting and identifying a stationary object of certain shape in video feed. In this part we focus on tracking the object and try to analyze a simple path of a moving object. By simple, I mean *really* simple, we try to detect the “from” and “to” positions of the object – where it started and where did it end up.
When milliseconds count
Compared to detecting objects from a static image or frame, detecting object displacement presents us a new, tough requirement: We have to analyze the frames real-time and thus, performance is the key. We cannot simply use all the methods described earlier, since, especially on mobile devices, they simply take too much time to compute. Ideally, depending on the framerate and the estimated speed, relative to our field of view (FoV), of the moving object, our operation for tracking the image should take less than 10 milliseconds per frame. It is quite obvious that the complexity of any algorithms we use is relative to the frame size – the less pixels we have to analyze, the faster the operation.
Instead of using all the methods described earlier (chroma filter, object mapping, convex hull etc.) to track the object, we utilize them to “lock” the target object. In other words, we identify the object we want to track and after that we can use far lighter methods to track its position. We don’t have to process the full frame, but only the area of the object with some margin. This helps us to reduce the resolution and run our operations much quicker.
Since our target object can be expected not to change color (unless we’re tracking a chameleon), we can do the following:
- Once we have detected the object from the image/frames and we know its position and size (number of pixels horizontally and vertically where the object is thickest) we can define a rectangular cropped area with the object in the center and with a margin of e.g. 15 %.
- Apply chroma filter to this cropped area for each frame and keep track of the position, which is defined by the intersecting point of virtual lines placed where we have most pixels horizontally and vertically. Figure 9 illustrates tracking the locked target object.
- If the center point displacement exceeds our predefined delta value, we move to the next phase, where we analyze the object movement.
It moved, but where did it go?
How do we implement the next phase then? It seems that for more accurate analysis of the object movement, we must use more complex methods than we used for detecting the initial displacement of the object. What if we record the frames for later analysis? Since we may not know or forecast when the object is going to move, depending on the frame size, the video we record might be huge! Fortunately, there is a way to store the frames while still keeping the required size fixed: A ring buffer (also known as circular buffer). In short, ring buffer is a fixed size buffer and when you reach the end, your start again from the beginning and replace the frames recorder earlier. See this article about buffering video frames by Juhana Koski to learn more. Because we observe the initial displacement of the object in real-time, we can record few more frames (the estimated time until the object exists our FoV) and then stop. After this we no longer have the real-time requirement and we can take our time analyzing what happened to the object after its initial displacement.
Let’s say that we want to get the last frame of the object until it leaves the FoV. We could use the following algorithm:
- Start iterating from the last recorded frame towards the frame of the initial displacement:
- Treat each frame as we did in the beginning when we found the desired object from the image using chroma filter, object map, convex hull and shape analysis.
- If we find an object satisfying our criteria, we stop expecting it to be the object we were tracking.
- We now have the object position from the beginning of its movement to the last known position in our FoV (see figure 10). This means we can at least calculate the angle and relative velocity of the object.
Challenges and future development
Lighting challenges are typical with image pattern recognition solutions. Changes in lighting conditions affect the perceived color and that makes the selection of parameters (YUV value and threshold) for chroma filtering difficult. Camera hardware and its settings play a significant role here: Longer the exposure time, easier it is to detect the object properly. However, with long exposure time, it’s harder to capture the object movement. The object in motion will have a distorted shape and its color will blend with the background. Thus, it becomes more difficult find the object in the frames when it’s moving. On the other hand, if we use short exposure time, we get less light per frame and the color difference of the object compared to the background might be insufficient.
The current implementation of the solution relies on manual parameter setting for both color and threshold. In the future, we could try to at least partially automate the parameter setting. We would still have to roughly know the shape and size of the object we want to find. We could apply edge detection algorithms to boost the color filter and get more accurate results with stationary objects. Of course, when an object is moving fast, the edges may blur. However, since the current implementation provides us with the frame of the initial object displacement, we can compare that to the later and see the changes in e.g. chroma. The moving object will leave a trace even if it’s blurred with the background or distorted.
And then there was the code…
The related code project is hosted in GitHub: https://github.com/tompaana/object-tracking-demo
See the README.md file delivered with the project to learn more. The project is freely licensed so you can utilize any bits of the code anyway you like. Have fun!
EDIT: Turns out that’s not all folks. See how everything turns out here.