I have been lucky and, as a perk of the job I do, have had the pleasure of following and working with many exciting and innovative start-ups. It gives me twice the joy if the said start-up focuses on making the world a better place – which I am happy to say happens quite often! One of those start-ups is Neuro Event Labs. Their goal is to make the treatment of epilepsy and other neurological diseases easier and more affordable. Needless to say, their work has many different moving parts, but technologies they utilize include the use of depth cameras. In case you’ve never heard about depth cameras: In addition to Microsoft Kinect, Intel has created wide range of RealSense devices, that will unlikely break your bank should you want to get one to play with. Depth cameras basically just measure the distance to what they see; the pixel values define distance, not color. While they solve bunch of complex problems, I got interested in one specific ‘bit’ of a problem and wanted to write my two cents on the matter…
Here’s the general problem statement: How to find the 3-dimensional region of interest (ROI) of cyclic motion (e.g. an object following the same path repetitively) from a set of (depth) frames?
Why cyclic motion? If the object follows a random path and/or even goes outside of the frame, what’s the point, really?
As Simple as Subtraction
In one my ‘Tracking Objects’ blog posts I wrote about calculating delta between two frames. If instead of color images, where a single pixel can have, and normally does, multiple values (e.g. R, G and B), depth frames make calculating the delta easier since one pixel corresponds to a single value. Thus, calculating the difference is but a matrix subtraction!
As seen in the example above in an ideal case (no noise) only the pixels that indicate change (motion) have a nonzero value. Of course, in real life you tend to have some level of noise. Perhaps, the most intuitive way to reduce the effect of noise is to use some threshold value. Let us indicate that value as T. By observation it is not difficult to find a good threshold value, but you can also calculate the approximate T as long as you have frames that you know for certain do not contain any motion. The latter is helpful, when the input device or the environment varies. Needless to say that noise in depth frames goes both ways (closer and farther) so T will indicate an absolute value (|t|).
If the contours in the frame are fairly smooth (gradients are gentle), more sound solution to deal with noise is to use some convolutional method, where the evaluation of a pixel depends on the other pixels surrounding it. This post is not about noise reduction, but if you want to know more, there’s no shortage in information in the internet – start by checking the Wikipedia page on the subject.
At this step we could do more than just settling on the subtracted matrix; we could, for example, replace the nonzero values with the actual distance of the object from the camera. We have the necessary data in frames 1 and 2. Then the combination of all the delta frames would trace out the path of the moving object and the resulting matrix would already give us the 2-dimensional ROI! But let’s not stop here…
Cloudy with a Chance of Points
If our object is strangely oriented and its path of motion extends in all axis, we may want to retain some of that depth information for more precise analysis. Should our delta matrix then be 3-dimensional? Well, why not! One way to represent a 3D matrix in compact form (i.e. with less bytes when the matrix contains a lot of zeros) is to use a point cloud.
When I first heard my colleague, Sachin, who specializes in machine learning (among other things), use the term “point cloud” I had to ask what it meant. And I felt stupid after getting the answer. Yet, I think “point set” would be better (because “set of points” is too long, obviously). At the time of writing this, the first sentence in Wikipedia defines a point cloud as follows: ‘A point cloud is a set of data points in some coordinate system’. Simple as that. The definition does not say whether the points are ordered or not etc. I guess the cloud bit gives one the impression that the points do not have to obey any specific shape as in being bounded by something.
But to the point (pun intended)! In the following figure we have the single delta frame (from figure 1) represented as a point cloud (the blue dots in figure 2 match the red pixels in figure 1):
Figure 2. Point cloud constructed based on the calculated delta frame in figure 1. Created with Plotly.
Now if we would’ve replaced the points with the actual depth values (z axis) of the object prior to populating the point cloud and input the frames to cover the entire motion cycle, we had ended up with the 3D space occupied by the ‘shell’ of the object! Neat, huh? Furthermore, this allows us to construct a 3-dimensional ROI. Sorry for not having another image to point this out, but I hope you can form the mental image.
If you’re lucky or can control the environment so that the object’s path of motion is parallel to one of the axis, you can construct the 3D ROI by simply creating a bounding box around the points in your cloud using xmin, xmax, ymin, ymax, zmin and zmax. However, if the orientation is not under your control, you can still do this after applying a linear transformation (rotation) first. The transformation that provides you with a bounding box with the least volume is the one.
Note that you can have the bounding box updated every time when you add a point to your cloud by updating min/max values (if no transformation is necessary). While this does not reduce the complexity compared to adding all the points and calculating the bounding box after – this is by the way a delusion that I tend to fall into continuously – it divides the load evenly, which may be useful.
So that this wouldn’t be all talk, here is some code too: https://gist.github.com/tompaana/0cee553ca5d0d5c8de19fa9590356879
It’s a simple C++ point cloud implementation of my own. It’s certainly not the fastest nor the prettiest, but I believe it provides a nice description in code form that is easy to understand. I’ve played with it a bit, but if you want to put it into some serious use, I recommend running some tests just to make sure it’s rigid.