Category Archives: Universal apps

(Universal Windows apps)^2

The great majority of apps built for Windows 8.1/Windows Phone 8.1 work on Windows 10 as-is – no changes required what so ever. But what if you want to leverage the new APIs provided by Windows 10 such as the inking API while still supporting the Windows 8.1 version of your app? Or you might be among the few unfortunate ones who have been using some API deprecated on Windows 10; UserInformation class no longer works on Windows 10 but you have to use the User class instead. How to do that without duplicating the code base and having two completely separate app projects to maintain? In this article I’ll describe two approaches to do that.

Shared code and assets in portable project

The first approach is to include all the shared code (in practice that can be almost all of your code) to a separate portable project in your Windows 8.1 solution. First you need to create the project: Right click your solution in the solution explorer, hover on Add and select New Project…

Adding a new project to a solution

Use Class Library as the project type, name it and hit OK.

Creating a class library project

Drag all the code and asset files you want to share between both the Windows 8.1 and Windows 10 app to the newly created Class Library project.

Note that if you have a solution that supports both Windows 8.1 and Windows Phone 8.1, you have to have at least a partial main page (the page you navigate to in the start-up) in the original Windows 8.1 and Windows Phone 8.1 projects. This due to the fact that you can’t add a reference to your Class Library project in the Shared (Windows 8.1/Windows Phone 8.1) project where your App class lives. And without the reference you can’t make your app to navigate to a page defined in your Class Library project in the start-up. Makes sense? Ok, cool, let’s carry on…

Now that we have the code moved to the Class Library project, we must add it as a reference to the other projects so that we can access the classes and assets. Right click the References under the projects in the solution explorer and select Add Reference…

Adding references to a project

On Projects tab you should now find the Class Library project. Check the checkbox and click OK.

Adding a project in the solution to another as a reference

Now fix any minor problems you may have and once your app builds and runs it is time to move on to work on the Windows 10 solution. Create a new Universal Windows 10 application project and add the Class Library project containing the shared code to the Windows 10 solution as an existing project:

Adding an existing project to a solution

Add the Class Library project as a reference to your main Windows 10 project (as explained before), make your main project to use the shared code and you’re all set! Fine – I realize it’s not often this simple and you need to do some tweaking to get all the other dependencies working and so on, but these are the first steps to take.

If you now want to extend the app on Windows 10 by utilizing the cool new APIs, you need to add that specific code to the main project. You can’t, of course, access the code in the main project from the shared code (for many reasons, one being that this would create a circular dependency), but one solution is to define interfaces in the shared code and providing the implementations from the main project. See my example, namely IUserInformationHelper interface in the Class Library, Windows 10 UserInformationHelper implementation and App.xaml.cs where the implementation is provided.

Pros

  • Allows management of the shared code as a single project

Cons

  • Other dependencies (Nuget packages and such) may cause problems e.g. if they aren’t as universal and work only on Windows 8.1 and not on Windows 10
  • You cannot use conditional preprocessing blocks in the shared code (#if) to target a specific platform since the compilation symbols are also shared
Conditional compilation symbols in project preferences (WINDOWS_UWP is for Windows 10 apps)

Shared code and asset files as links

Another way of sharing code between solutions is adding the code and asset files as links. Using links you don’t have to change your existing solution. Simply create a new – in this case – Windows 10 application project and start adding the files from your existing Windows 8.1 solution. Right click your new project in the solution explorer, hover on Add and select Existing Item… Then browse the Windows 8.1 solution folder containing the files you want to add, select the files and click Add As Link:

Adding files as links

The files are now shown in your solution explorer. However, they are not physically in your new project but exist in the Windows 8.1 application project folder. Any changes you make to these files will also appear in both projects.

While adding the files individually can be tedious, the benefit here is that you can take advantage of conditional preprocessing blocks in C# code:

#if WINDOWS_UWP
    // Have your Windows 10 specific code here
#else
    // Have your Windows 8.1 specific code here
#endif

Pros

  • Conditional preprocessing blocks and compilation symbols can be used
  • Dependencies to additional libraries and Nuget packages are easier to maintain
  • Adding platform specific features, e.g. new Windows 10 APIs, is trivial

Cons

  • Adding/removing shared code and asset files needs to be done in both solutions separately

Sample code

An example for using the both approaches featured in this article can be found here in GitHub.

 

Tracking Objects from Video Feed Part III: Detecting Object Displacement

In the previous part we provided one solution for detecting and identifying a stationary object of certain shape in video feed. In this part we focus on tracking the object and try to analyze a simple path of a moving object. By simple, I mean *really* simple, we try to detect the “from” and “to” positions of the object – where it started and where did it end up.

When milliseconds count

Compared to detecting objects from a static image or frame, detecting object displacement presents us a new, tough requirement: We have to analyze the frames real-time and thus, performance is the key. We cannot simply use all the methods described earlier, since, especially on mobile devices, they simply take too much time to compute. Ideally, depending on the framerate and the estimated speed, relative to our field of view (FoV), of the moving object, our operation for tracking the image should take less than 10 milliseconds per frame. It is quite obvious that the complexity of any algorithms we use is relative to the frame size – the less pixels we have to analyze, the faster the operation.

Instead of using all the methods described earlier (chroma filter, object mapping, convex hull etc.) to track the object, we utilize them to “lock” the target object. In other words, we identify the object we want to track and after that we can use far lighter methods to track its position. We don’t have to process the full frame, but only the area of the object with some margin. This helps us to reduce the resolution and run our operations much quicker.

Since our target object can be expected not to change color (unless we’re tracking a chameleon), we can do the following:

  1. Once we have detected the object from the image/frames and we know its position and size (number of pixels horizontally and vertically where the object is thickest) we can define a rectangular cropped area with the object in the center and with a margin of e.g. 15 %.
  2. Apply chroma filter to this cropped area for each frame and keep track of the position, which is defined by the intersecting point of virtual lines placed where we have most pixels horizontally and vertically. Figure 9 illustrates tracking the locked target object.
    • If the center point displacement exceeds our predefined delta value, we move to the next phase, where we analyze the object movement.

VaatturiLockedToObjectScaledFigure 9. Target object locked, and tracking limited to the region marked by the green rectangle.

It moved, but where did it go?

How do we implement the next phase then? It seems that for more accurate analysis of the object movement, we must use more complex methods than we used for detecting the initial displacement of the object. What if we record the frames for later analysis? Since we may not know or forecast when the object is going to move, depending on the frame size, the video we record might be huge! Fortunately, there is a way to store the frames while still keeping the required size fixed: A ring buffer (also known as circular buffer). In short, ring buffer is a fixed size buffer and when you reach the end, your start again from the beginning and replace the frames recorder earlier. See this article about buffering video frames by Juhana Koski to learn more. Because we observe the initial displacement of the object in real-time, we can record few more frames (the estimated time until the object exists our FoV) and then stop. After this we no longer have the real-time requirement and we can take our time analyzing what happened to the object after its initial displacement.

Let’s say that we want to get the last frame of the object until it leaves the FoV. We could use the following algorithm:

  1. Start iterating from the last recorded frame towards the frame of the initial displacement:
    1. Treat each frame as we did in the beginning when we found the desired object from the image using chroma filter, object map, convex hull and shape analysis.
    2. If we find an object satisfying our criteria, we stop expecting it to be the object we were tracking.
  2. We now have the object position from the beginning of its movement to the last known position in our FoV (see figure 10). This means we can at least calculate the angle and relative velocity of the object.

 VaatturiObjectMotionCapturedScaledFigure 10. Object (USB cannon projectile wrapped with pink sticker) motion captured.

Challenges and future development

Lighting challenges are typical with image pattern recognition solutions. Changes in lighting conditions affect the perceived color and that makes the selection of parameters (YUV value and threshold) for chroma filtering difficult. Camera hardware and its settings play a significant role here: Longer the exposure time, easier it is to detect the object properly. However, with long exposure time, it’s harder to capture the object movement. The object in motion will have a distorted shape and its color will blend with the background. Thus, it becomes more difficult find the object in the frames when it’s moving. On the other hand, if we use short exposure time, we get less light per frame and the color difference of the object compared to the background might be insufficient.

The current implementation of the solution relies on manual parameter setting for both color and threshold. In the future, we could try to at least partially automate the parameter setting. We would still have to roughly know the shape and size of the object we want to find. We could apply edge detection algorithms to boost the color filter and get more accurate results with stationary objects. Of course, when an object is moving fast, the edges may blur. However, since the current implementation provides us with the frame of the initial object displacement, we can compare that to the later and see the changes in e.g. chroma. The moving object will leave a trace even if it’s blurred with the background or distorted.

And then there was the code…

The related code project is hosted in GitHub: https://github.com/tompaana/object-tracking-demo

See the README.md file delivered with the project to learn more. The project is freely licensed so you can utilize any bits of the code anyway you like. Have fun!

ThatsAllFolks…or is it?

EDIT: Turns out that’s not all folks. See how everything turns out here.

Tracking Objects from Video Feed Part II: Identifying a Stationary Object

In the first part we introduced the raw image data formats, which we receive from camera. Now it’s time to get to the good stuff. First we formulate our goals and start with trying to extract an object from video feed. In order to track something we first need to find the “something”.

Think before you act?

When solving a complex problem, it may be sometimes useful to start from the desired result and work your way from the result to the beginning in the problem solving pipeline. This way we can break the problem down to several resolutions – steps in between leading to the final solution if you like. In case of our problem, this reversed pipeline would look something like in the table below.

Table 1. Problem solving pipeline reversed.

What is the result? How do we get there?
An object, with a specific shape, identified in the image (location, size, shape etc.) Find the center of mass of the two dimensional object and its outline.
Outline of the detected object. Apply convex hull algorithm to object map (extended binary image, where the background is removed).
Object map, where all significant (suspected) objects are separated and the background is removed. Individualize objects from a binary image.
Binary image in which objects have value 1 and background value 0. Apply algorithm to extract objects with some criteria from the background. E.g. chroma filter.
Chroma filter implementation to extract objects from image. Start coding!

Chroma filter

Here the word “chroma” is a bit misleading, since we also inspect luma (Y) value when filtering the image. The algorithm is as follows:

  1. Set a desired YUV value as a target. This is basically the color we want to find from the picture i.e. if a ball is blue and the background is orange, we try to set the value as close to the blue color of the ball as possible. We also need to set a threshold value, which defines the allowed difference between the target value and the blue color we accept.
  2. Iterate the image data (byte array), pixel by pixel or block by block. By block we mean the unit shown in figures 1 and 2. In the case of NV12, there are four Y values, one U and one V. We could calculate the average of the four Y values, but since the values are likely to be almost the same, for the sake of optimization it’s enough to choose just one. Then we simply compare the set target values to the measured ones, as pseudo code:

IF Difference(target_value_Y, measured_Y) < threshold AND
Difference(target_value_U, measured_U) < threshold AND
Difference(target_value_V, measured_V) < threshold
THEN
(Mark this pixel/block as selected)
ELSE
(Mark this pixel/block as not selected)

 

 VaatturiOriginalImage2Scaled  VaatturiChromaFilterRedScaled

Figure 4. The original image on left and the image with chroma filter (red) applied on right.

In the case of NV12 we can utilize the Y plane (since it matches the full size of the image) and convert it into a virtual binary image: Y value 0 indicates 0 and Y value 255 indicates 1 (as done on the right-hand-side image in figure 4). We can then use it to map objects as explained in the next chapter.

In our code project we have a native effect, which executes the aforementioned chroma filter for a single frame: ChromaFilterEffect.

Mapping objects

The simplest source for mapping objects is a binary image e.g. a bit array where 0 denotes background and 1 an object. What we need to do is to identify objects that are not joined (their pixels don’t touch each other). We can do this by assigning each object a unique number. The end result can be, for instance, an array of integers, where 0 still denotes background, but a value greater than 0 is an ID of a certain object. Figure 5 illustrates this process from the binary image to resulting object map.

BinaryImageAndObjectMapFigure 5. Binary image (left) and object map (right).

The principle of the algorithm for mapping objects is very simple: Check each pixel of the binary image and assign a unique value to those pixels which are adjacent. If we break this down into more detail, we can have something like this, when the image is in form of an array:

  1. Create an object ID counter (let’s call the counter c), which provides a unique value for the pixels of each object. You can start with value c = 1.
  2. Go through each pixel starting from 0 to N – 1, where N is the number of pixels:
    • If the pixel has value 0 (background):
      • Do nothing.
    • If the pixel has value 1 (an object):
      1. If the previous pixel had value 0:
        • Set the object ID counter value so that the new value is guaranteed to be unique (let’s call this unique ID U), c = U, since this can be a new object. It is important that the value is truly unique – mere increment by one is not enough.
      2. If there is an adjacent non-zero pixel above (see figure 6):
        1. Get the object ID of the adjacent pixel (let’s call this value as A).
        2. Backtrack (with a separate iteration) and replace all pixels, which have the current value of counter c with value
          • You can stop backtracking, if you encounter a line with no pixels having the current value of the object ID counter.
        3. Set the value of the object ID counter to the value of the adjacent pixel (c = A) and continue the original iteration from the index before backtracking.
      3. Set the value of the current pixel to match the current value of the object ID counter (value of c).

ObjectMapCreationProcessFigure 6. Object map creation process.

See ImageProcessingUtils::createObjectMap method where the aforementioned algorithm is implemented with C++.

Note that after applying the algorithm described above, you will not end up with an ordered object map (map with ordered IDs: 1, 2, …, N). Instead, you will have a map where each object has a unique ID, but they can be arbitrary, e.g. 2, 5, …, N + 100. Sorting the map is quite trivial: Get a list of current IDs (one iteration), create a map where each existing ID has an ordered counterpart (2 -> 1, 5 -> 2, …, N + 100 -> N) and replace the values in the object map (second iteration).

Identifying object shapes

Now that we have an object map, we can easily see the size of each object. But what about the shape? It depends on the way we want to classify shapes e.g. are we interested in round shapes or squares. In addition, depending on our object extraction methods, the objects may not be perfectly extracted; some of what was part of the object, could have been interpreted as background. For instance, in figure 6, the largest object could have been a ball with a large stripe on it that was lost when our chroma filter was applied. Luckily, there is a simple algorithm for “fixing” the shape, called convex hull.

Convex hull can be considered as an elastic band wrapped around an object. If we apply the algorithm to the first object in the object map we created earlier, the stripe in the center is discarded and a circle shape emerges:

ConvexHullFigure 7. Convex hull algorithm applied to the first object in the object map.

 VaatturiConvexHullScaledFigure 8. Convex hulls (green lines around the bird and red lines around small piece on X-Wing figher side panel) drawn to surround the filtered red areas.

There are number of different convex hull algorithms. The one used in this solution is called monotone chain (see ImageProcessingUtils::createConvexHull method). For more information about monotone chain including code samples can be found from Wikipedia: http://en.wikibooks.org/wiki/Algorithm_Implementation/Geometry/Convex_hull/Monotone_chain

Convex hull gives us a list of points (pixels coordinates), which form the outline of an object. However, we still have to determine the shape of the object somehow. If it’s a ball shape we’re after, we could find the center of the object – e.g. center of mass where one pixel equals one mass unit – and then see how much the outline differs from a circle, where the radius is width or the height of the object divided by 2. The less the difference, the more the shape is like a circle, which of course is the shape of a ball in two dimensional presentation.

to-be-continued-back-to-the-futureNow that we have found our object let’s try to track it. See the next and the final part of this thrilling trilogy and find out if we succeeded in our goal to detect the changes in object position!

Tracking Objects from Video Feed Part I: Image Data Formats

It so happened that a couple of months ago my colleague and I were introduced an interesting challenge: Could we use a phone camera to capture and analyze the path of a moving object. There were (and still are) some wild ideas what we could do in the not-so-distant future, but somewhat excited we decided to investigate what we could do with the current resources (hardware and APIs) available. I personally prefer setting the bar high, but, of course, one needs to start from the ground up – at least with no super powers.

VaatturiLockedToObjectScaledThe might eagle will never know what hit him…

In this three part blog jamboree I’ll unveil where we got (so far) and how we did it.

Getting Started

So we had our challenge formulated: Identify an object from the video feed (the stuff that the camera on the device feeds you) while it’s stationary. Then lock into that object – watch it like a hawk – and if it moves, try to see where it went. To be precise “where it went” means finding its last position in the field of view (FOV).

It was apparent to us from the beginning that we wanted to go native to get the maximum performance. At this point we only had very vague vision of the algorithms (and their complexity), which we would likely use. Anyhow, we expected to be dealing with some heavy stuff. Luckily, another nice fellow at work hinted us to checkout this MediaCapture API sample, and it turned out it was a good starting point. Thereon we focused understanding the image data formats in order to be able process the data.

Image Data Formats

The image data received from camera hardware (the camera of a smartphone, tablet or simply an external webcam) comes in YUV color space. In short, Y (luma) defines the brightness while U and V (chroma) define the color of a pixel. The notion Y’CbCr would be more accurate as Y’UV refers to an analog encoding scheme, but for the sake of simplicity we use U to denote Cb and V to denote Cr in this article.

Depending on the hardware, typically two type of specific YUV formats are used: YUV2, which is commonly used by e.g. webcams, and NV12, commonly used by smartphone cameras. Although they are both based on YUV color space, their structure – size and order of Y, U and V bytes, are different. Thus, the frames need to be either converted to some specific format prior to processing or we have to implement separate methods to process the frames correctly based on the format.

 YUV2

YUV2 format has the following properties:

YUV2 can be seen as a byte array where the first and then every second value is a Y (luma) value while chromas (U and V) fill the blanks in between so that U comes first as shown in figure 1. Each chunk has the size of 4 bytes (32 bits).

YUV2Figure 1. YUV2 format.

For more details about YUV12 format, visit the following pages:

 NV12

NV12 format has the following properties:

  • 4:2:0 chroma subsampling
  • 12 bits per pixel

Let’s consider a VGA image (640×480 pixels) encoded in NV12 format. The size of the Y plane covers the whole resolution a byte per pixel i.e. the Y plane is 640×480 bytes. The U/V plane, which follows, is half of the Y plane: 640×240 bytes (see the figure below).

NV12PlanesFigure 2. NV12 image format.

The figure below depicts four pixels in NV12 format. As you can see each pixel has a dedicated Y value, but only ½ of chroma (¼ of U and ¼ V values to be precise).

FourPixelsNV12Figure 3. A four pixel section in NV12 image data.

The block in figure 2 on the right-hand-side describes the corresponding location of the bytes in NV12 byte array A, where h is the height of the image and w is the width of the image (in bytes).

For more details about NV12 format, visit the following pages:

Coming next…

Check out the next part where we dive straight into the methods of extracting objects from video frames.

BatmanTrappedWill the caped crusader overcome the challenge or will he be trapped by the algorithm villains? Tune in tomorrow (or right now if you like) – same dev-time, same dev-channel!

Staying in control

When working on an app even a bit more complex than couple of views, you quickly find yourself in a need to create either a custom UI components (user controls in Windows development terms) or at very least a composite components. Whenever I find myself in this kind of situation, I try to generalize them and make them as self-contained as possible. In my opinion this approach has two benefits: First, obviously, I can easily use them later in other projects. Second, it makes a better architecture and makes it easier to have more instances of the component in the same project.

I find this to be a standard practice; in most cases when you run into this situation, you find a ready-made solution to your problem in stackoverflow. Usually, it’s in a form of a snippet that you just copy into your project and sometimes you get the complete user control from some project. I recently worked on an app, and found there were no solutions to couple of my problems, so I thought I would fill in the gap and provide them here. Both solutions are quite trivial (except for the first one, if you target a specific platform/framework version), but the thought of saving the precious time of any developer facing the same problem makes me happy.

Sliding panel user control

 

So, to the point. Behold, a sliding panel! With buttons, text, bells and whistles! User can drag it or animate it by tapping an icon or a button. The problem here isn’t the composite nature nor the way it is manipulated (dragging), but the performance when your project is built on a specific framework, namely Windows Phone Silverlight. I try to always work with the latest frameworks, but sometimes it’s just not possible. The performance trick used here is very conventional: Render the whole layout in a bitmap and then animate that. The nice thing about this component is that it works in both Silverlight and Windows Universal apps. It is fully self-contained.

See more detailed description and get the source code from my GitHub project.

Ticker text user control

In image: Three ticker text user controls in StackPanel layout.

This is quite trivial and rarely needed UI component. To be honest, I was quite surprised not to find a version of this anywhere in the interwebs. I suppose one must be somewhere and my search engine skills are just below average. Or it could be that this component is so trivial, that no-one bothers to even look for a ready-made solution. In any case, since I lack all discretion, I dumped my user control here (or actually to GitHub) anyways.

In case the “ticker” does not ring any bells, it’s the component that has a scrolling text on it, much like you see in the lower part on your television sets when watching the news. My control, like the sliding panel, is supported on Windows Silverlight and in Windows Universal apps.

Find the code in GitHub.