Category Archives: C#

Remotely controlled bots

You know. Because it’s good to have a fail-safe around in case of Skynet.

But to the point: Let’s say you have a great backend heavy application and you want to deliver a bot experience to broaden your user base and to provide a new way of interacting in your app context. Well your new bot can surely be made to access the application data in your backend, but how can you make it work the other way around – say, in case of notifications?

Controlling a bot from backendThat’s the bot on the right, by the way.

Hence the title – by remote control I simply refer to a backend controlling the bot remotely (over interwebs) by messages that can be interpreted as commands to execute an action.

Backchannel

That’s what we call it and apparently it’s just one word. What we mean by the word is a type of message (like the ones users send to talk to a bot and the bot uses to reply back), but we just put the meaningful content in a different place of the message object (Activity in C#). Namely, we put the message the bot should react to somehow in IMessageActivity.ChannelData instead of IMessageActivity.Text. Tadaa! End of article.

No, but it really is that simple! In a nutshell you devise a simple custom protocol that your bot knows, for example, when the IMessageActivity.Text contains “notification”, you look at the channel data content to see who and with what message to notify. Then let your implementation in the bot code to do it’s job. Still don’t believe me? Look, here’s a sample (in C#).

Ok, you got me. What I failed to mention is that you have to have some Microsoft Bot Framework specific code in your backend. Perhaps the easiest way to implement this backchannel messaging pipeline between the backend and the bot is using Direct Line. And the easiest way to use the Direct Line is by utilizing the ready-made client components for Node.js and C#. If your backend is not compatible with Node or C# components, implementing your own Direct Line connection is quite straightforward (the first link about Direct Line describes the protocol). They are, after all, only HTTP calls. My sample comes with a super simple console app sending notification commands to the bot. You should be able to use the code almost as-is, if your backend is built with C#.

What about security? I’m not an expert, but there are three points here I want to make:

  1. The Direct Line pipeline is secured by a secret key and TLS
  2. The user cannot inject content to the channel data (think of SQL injection vulnerability) as long as the channel (e.g. Skype) is secure
  3. You can encrypt the channel data content

Note that some descriptions of backchannel say that you should also change the value of the Type property of your Activity; from being “message” to “event”. This is a matter of taste. The benefit of this is that you can be sure that your backchannel message is not treated as a regular message (because the type is not “message”).

Where to, sir?

Where am I supposed to place this backchannel messaging specific code in my bot project? To me, this introduces some controversy; The bot framework utilizes Autofac, an inversion of control (IoC) container for dealing with dependencies, and I am not a fan. In my opinion wide use of IoC leads to incoherent code and architecture with little benefits to offer. And it can make writing tests (which I don’t do unlike true professionals I guess) a pain! But that’s just me – maybe my brain is not sophisticated enough to understand these kinds of exquicite concepts.

Just to show I can do things I don’t like I integrated the backchannel bot code using Autofac in my sample. Take a look at GlobalMessageHandlerModule.cs and Global.asax.cs. I’ve created classes derived from ScorableBase, which are automatically invoked when (and only when) I forward the received Activity object to my root dialog in MessagesController.cs. Then if a backchannel message is detected, the specific scorable class (NotificationsScorable in my sample) consumes and deals with the Activity and it is never given to my dialog. Special thanks to my brilliant colleague, Lilian Kasem, for coming up with this idea!

Call me old-fashioned, but I still find the code a lot easier to understand if I simply put this logic to my MessagesController class (or equivalent) before passing anything to any dialog. That’s just the way I roll…

if (we got a valid backchannel message)
{
    // Do what needs to be done
}
else
{
    // Looks like a message from a user, let the dialog handle it
    await Conversation.SendAsync(activity, () => new RootDialog());
}
else ...

See?

Related resources

 

Chatbots as middlemen

Chatbots typically serve their customers on 1:1 basis. They are not unlike digital assistants (Cortana, Siri, Alexa etc.) except that a chatbot is usually designed to execute a small number of prefined tasks well and focus on a narrow subject like filling a pizza order for example.

Building chatbots is easy, but making them clever is more difficult. Despite all the analytics on user behavior, it is still impossible to anticipate every user reaction. As the technology, Conversation as a Platform (CaaP), evolves, creating more intelligent bots becomes easier and easier, but until Skynet grows self-aware humans still serve a purpose.

Imagine a customer service chat on a website. A bot can probably handle most of the problems a customer could have. For instance, implementing a simple FAQ bot is trivial using Microsoft’s QnA Maker. Add some additional intellect including natural language understanding service and whatnot and you have an efficient customer service bot in your hands that 9 out of 10 customers are perfectly happy with. But for that one customer, you might want to consider a fallback: Let the human – in this case a customer service agent – take over to ensure customer satisfaction.

As long as you have the human labour, implementing this isn’t rocket science. What you need to do is as follows:

  1. Make sure your bot keeps track of all the individuals the bot sees (but remember privacy policies!)
  2. Make sure your bot also keeps track of itself. This might sound weird at first, but I’ll make the reason apparent soon.
  3. Design the handover scenario. It could be based on sentiment analysis or simply a request of help by the user.
  4. Implement the message relaying logic (don’t worry – there are samples available!)

How and why to keep track of people and bots

By keeping track I mean collecting the contact information of a user (and the bot – I’ll explain later) from the bot’s perspective. You can’t send a post card to a person without knowing his or her address. The same applies to the bot framework: You cannot send a message to a user without knowing the IDs of the user and the conversation. What you’ll need at least are:

The aforementioned details may be enough, but this depends on the channel (Skype, MS Teams, Slack etc.) You might as well store all the details as shown in the following tables.

Table 1. Identities in Skype (all values are of type string).
Me Bot
ServiceUrl https://skype.botframework.com https://skype.botframework.com
ChannelId skype skype
ChannelAccount.Id 29:1byUvXHHhinNxwnPCHh4MPhpfiJUbadX_Y3_sTkBspdiSke8sX_Ps6riTYRVez5jT 28:f99fa2c3-8834-418e-b293-039205238055
ChannelAccount.Name Tomi Paananen Intermediator Bot Sample
ConversationAccount.Id 29:1byUvXHHhinNxwnPCHh4MPhpfiJUbadX_Y3_sTkBspdiSke8sX_Ps6riTYRVez5jT 29:1byUvXHHhinNxwnPCHh4MPhpfiJUbadX_Y3_sTkBspdiSke8sX_Ps6riTYRVez5jT
ConversationAccount.Name (N/A in direct conversation) (N/A in direct conversation)

The values above are from a direct conversation in Skype between my bot and I. As you can see the channel account ID (read: my user ID) and the conversation account ID match, but that isn’t necessarily the case in other channels.

Table 2. Identities in Slack.
Me Bot
ServiceUrl https://slack.botframework.com https://slack.botframework.com
ChannelId slack slack
ChannelAccount.Id U1F3JK9A9:T1F248PJ8 B2NSU1D4Z:T1F248PJ8
ChannelAccount.Name tomi intermediatorbot
ConversationAccount.Id B2NSU1D4Z:T1F248PJ7:C3B1ZK5D0 B2NSU1D4Z:T1F248PJ7:C3B1ZK5D0
ConversationAccount.Name bottest bottest

So why do we need the bot’s identity stored too? As you can see, the same bot has a different identity on different channel and conversation. When we send a message to a user, we need to specify who the message is from, and some channel, for example Slack, doesn’t allow you to send messages from bots that aren’t actually there. So in order to relay a message from a user to another on another channel (e.g. Skype to Slack) we need to know and use the bot’s identity in Slack in the from field.

Briefly about the technical implementation: All the activities flow through the MessagesController class in a bot built with C# and that’s the ideal place to keep track of everything. As for bots, the bot is always the receiving party when it gets a new activity, and that’s how you store the bot identities. See Sending and Receiving Activities for more information.

Finally, store the records of the users and the bot somewhere in web e.g. Azure Table storage service. Note: My sample stores the data locally (in memory), which is never, ever a good idea, because bots are essentially web apps and can have multiple instances!

Comparison to Node.js

The essentials for relaying messages are the same whether you are building your bot using C# or Node.js SDKs. However, there are differences between the SDKs and some things are handled differently.

Table 3. Node.js counterparts for establishing user/bot identity.
C# Node.js Node.js example
Activity.ServiceUrl  IChatConnectorAddress.serviceUrl session.message.address.serviceUrl
Activity.ChannelId IAddress.channelId session.message.address.channelId
ChannelAccount.Id IIdentity.id session.message.address.bot.id
session.message.address.user.id
ChannelAccount.Name IIdentity.name session.message.address.bot.name
session.message.address.user.name
ConversationAccount.Id IIdentity.id session.message.address.conversation.id
ConversationAccount.Name IIdentity.name session.message.address.conversation.name

Samples

How to create dynamic FormFlow

You might’ve guessed it from the title – Yes, this is one of my no-nonsense posts. Strictly business and by business I mean code-talk.

This post is about a building block in Microsoft Bot Framework called FormFlow and namely how to add dynamic behavior to the flow when building bots using C#. If you are not familiar with FormFlow, I suggest your study the basics before reading further. Just the basics though, that’s enough.

Building simple FormFlows is… well, simple! This is a method that creates a basic form:

public static IForm<MyClass> BuildForm()
{
    var builder = new FormBuilder<MyClass>();

    return builder
        .Field(nameof(MyClass.Property1))
        .Field(nameof(MyClass.Property2))
        ...
        .Build();
    }
}

Not difficult at all! And you can influence the behavior using, for instance, property attributes like:

[Serializable]
public class MyClass
{
    [Prompt("What would you like the value of this property to be?")]
    public string Property1 { get; set; }

    ....
}

But what if you want to do some of the following:

  • Skip the questions you already know the answer to based on what the user said earlier?
  • Define the options presented for the user dynamically?
  • Change the way the question and options are presented to the user?
  • Validate the user’s response?
  • Customize the behavior of the form in the fly?

Some of the aforementioned things FormFlow tries to do for you automatically. However, usually to achieve a great experience you have to do a bit more work, and luckily, it is possible. See the resources available under Microsoft.Bot.Builder.FormFlow.Advanced namespace. One very useful class under there is called FieldReflector. Whilst you can still add ActiveDelegate and ValidateAsyncDelegate using the overloaded Field method in FormBuilder, FieldReflector allows you to do more:

.Field(new FieldReflector<MyClass>(nameof(MyClass.PropertyX))
    .SetType(typeof(MyClass.PropertyX))
    .SetActive((state) => SetFieldActive(state, nameof(MyClass.PropertyX)))
    .SetDefine(async (state, field) => await SetOptionsForFieldsAsync(state, nameof(MyClass.PropertyX), field))
    .SetAllowsMultiple(true) // Single selection vs. multi-selection
    .SetPrompt(new PromptAttribute("What type of values should this property have? {||}"))
    .SetValidate(async (state, value) => await ValidateResponseAsync(value, state, nameof(MyClass.PropertyX))))

…where SetFieldActive, SetOptionsForFieldAsync and ValidateResponseAsync are methods defined and implemented by the developer. See this class implementing the building of the form from my Dynamic FormFlow Sample. The snippet provides solution to all the questions presented in the bullet point list above. In case you are curious how this enables customizing the behavior in the fly, notice that you can run any arbitrary code in your response validation method (ValidateResponseAsync in the snippet).

In my Dynamic FormFlow Sample I use FormFlow to narrow down a spaceship selection from a static catalog of spaceships. Therefore, it is important that I don’t bother the user with unnecessary options; the user might’ve already told me that he/she is looking for a small ship and thus I shouldn’t later ask to select from options only available for large ships. That’s simply bad UX! Some queries I can skip in case there is only one option available. In my response validation method I do a search against the catalog using the details I’ve gathered from the user so far and  if the response is valid, I store the search results for later. With this approach I can narrow down the selection with as few questions as possible and without presenting the user with options that make no sense.

The caveats

Playing with FormFlow is not all bed of roses. I had to fight few errors, that for first seemed odd, until I got the hang of it. Here are some of the things to keep in mind:

  • After you build your form, it is built. It now exists. And you cannot really control the instance anymore. Why is this important to understand? Because you cannot say for certain when the methods (delegates) in your form are called. It’s now in the hands of the Bot Framework. So make sure your methods (delegates) work in any situation! For starters, have null checks.
  • Do not use the IList interface as a property type in the class where you are collecting user input (Spaceship.cs in the sample). It won’t work and you’ll get a FormCanceledException with “Cannot create an instance of an interface” message. Use List instead, it works.
  • Realize that you don’t have to do everything in the form; after the form is complete, you can continue with the data in a Dialog and ask further questions etc. That’s what I did in my sample, see SpaceshipSelectionDialog.cs.

And a top tip: If you are new to FormFlow, implement and test one complex field at a time.

So it’s a magic bullet?

No, it is not. FormFlow is a handy building block, but will not solve all problems. Duh.

If you feel like your FormFlow code is turning into horrible, uncontrollable mess and you feel like you need to compromise the UX, stop. Stop using FormFlow. You can do the same using Dialogs too and with really complex scenarios it will be – most likely – much easier too. FormFlow is a solution for fairly straightforward forms – it was never meant to be used with overly complex flows. Or at least I think that’s the case.

I should have a post about managing dialog flows out soon, but in the meanwhile, here’s the big secret: IDialogContext.Wait(<method name>) lets you define the next method that will process the next user response.

 

 

(Universal Windows apps)^2

The great majority of apps built for Windows 8.1/Windows Phone 8.1 work on Windows 10 as-is – no changes required what so ever. But what if you want to leverage the new APIs provided by Windows 10 such as the inking API while still supporting the Windows 8.1 version of your app? Or you might be among the few unfortunate ones who have been using some API deprecated on Windows 10; UserInformation class no longer works on Windows 10 but you have to use the User class instead. How to do that without duplicating the code base and having two completely separate app projects to maintain? In this article I’ll describe two approaches to do that.

Shared code and assets in portable project

The first approach is to include all the shared code (in practice that can be almost all of your code) to a separate portable project in your Windows 8.1 solution. First you need to create the project: Right click your solution in the solution explorer, hover on Add and select New Project…

Adding a new project to a solution

Use Class Library as the project type, name it and hit OK.

Creating a class library project

Drag all the code and asset files you want to share between both the Windows 8.1 and Windows 10 app to the newly created Class Library project.

Note that if you have a solution that supports both Windows 8.1 and Windows Phone 8.1, you have to have at least a partial main page (the page you navigate to in the start-up) in the original Windows 8.1 and Windows Phone 8.1 projects. This due to the fact that you can’t add a reference to your Class Library project in the Shared (Windows 8.1/Windows Phone 8.1) project where your App class lives. And without the reference you can’t make your app to navigate to a page defined in your Class Library project in the start-up. Makes sense? Ok, cool, let’s carry on…

Now that we have the code moved to the Class Library project, we must add it as a reference to the other projects so that we can access the classes and assets. Right click the References under the projects in the solution explorer and select Add Reference…

Adding references to a project

On Projects tab you should now find the Class Library project. Check the checkbox and click OK.

Adding a project in the solution to another as a reference

Now fix any minor problems you may have and once your app builds and runs it is time to move on to work on the Windows 10 solution. Create a new Universal Windows 10 application project and add the Class Library project containing the shared code to the Windows 10 solution as an existing project:

Adding an existing project to a solution

Add the Class Library project as a reference to your main Windows 10 project (as explained before), make your main project to use the shared code and you’re all set! Fine – I realize it’s not often this simple and you need to do some tweaking to get all the other dependencies working and so on, but these are the first steps to take.

If you now want to extend the app on Windows 10 by utilizing the cool new APIs, you need to add that specific code to the main project. You can’t, of course, access the code in the main project from the shared code (for many reasons, one being that this would create a circular dependency), but one solution is to define interfaces in the shared code and providing the implementations from the main project. See my example, namely IUserInformationHelper interface in the Class Library, Windows 10 UserInformationHelper implementation and App.xaml.cs where the implementation is provided.

Pros

  • Allows management of the shared code as a single project

Cons

  • Other dependencies (Nuget packages and such) may cause problems e.g. if they aren’t as universal and work only on Windows 8.1 and not on Windows 10
  • You cannot use conditional preprocessing blocks in the shared code (#if) to target a specific platform since the compilation symbols are also shared
Conditional compilation symbols in project preferences (WINDOWS_UWP is for Windows 10 apps)

Shared code and asset files as links

Another way of sharing code between solutions is adding the code and asset files as links. Using links you don’t have to change your existing solution. Simply create a new – in this case – Windows 10 application project and start adding the files from your existing Windows 8.1 solution. Right click your new project in the solution explorer, hover on Add and select Existing Item… Then browse the Windows 8.1 solution folder containing the files you want to add, select the files and click Add As Link:

Adding files as links

The files are now shown in your solution explorer. However, they are not physically in your new project but exist in the Windows 8.1 application project folder. Any changes you make to these files will also appear in both projects.

While adding the files individually can be tedious, the benefit here is that you can take advantage of conditional preprocessing blocks in C# code:

#if WINDOWS_UWP
    // Have your Windows 10 specific code here
#else
    // Have your Windows 8.1 specific code here
#endif

Pros

  • Conditional preprocessing blocks and compilation symbols can be used
  • Dependencies to additional libraries and Nuget packages are easier to maintain
  • Adding platform specific features, e.g. new Windows 10 APIs, is trivial

Cons

  • Adding/removing shared code and asset files needs to be done in both solutions separately

Sample code

An example for using the both approaches featured in this article can be found here in GitHub.

 

Tracking Objects from Video Feed Part III: Detecting Object Displacement

In the previous part we provided one solution for detecting and identifying a stationary object of certain shape in video feed. In this part we focus on tracking the object and try to analyze a simple path of a moving object. By simple, I mean *really* simple, we try to detect the “from” and “to” positions of the object – where it started and where did it end up.

When milliseconds count

Compared to detecting objects from a static image or frame, detecting object displacement presents us a new, tough requirement: We have to analyze the frames real-time and thus, performance is the key. We cannot simply use all the methods described earlier, since, especially on mobile devices, they simply take too much time to compute. Ideally, depending on the framerate and the estimated speed, relative to our field of view (FoV), of the moving object, our operation for tracking the image should take less than 10 milliseconds per frame. It is quite obvious that the complexity of any algorithms we use is relative to the frame size – the less pixels we have to analyze, the faster the operation.

Instead of using all the methods described earlier (chroma filter, object mapping, convex hull etc.) to track the object, we utilize them to “lock” the target object. In other words, we identify the object we want to track and after that we can use far lighter methods to track its position. We don’t have to process the full frame, but only the area of the object with some margin. This helps us to reduce the resolution and run our operations much quicker.

Since our target object can be expected not to change color (unless we’re tracking a chameleon), we can do the following:

  1. Once we have detected the object from the image/frames and we know its position and size (number of pixels horizontally and vertically where the object is thickest) we can define a rectangular cropped area with the object in the center and with a margin of e.g. 15 %.
  2. Apply chroma filter to this cropped area for each frame and keep track of the position, which is defined by the intersecting point of virtual lines placed where we have most pixels horizontally and vertically. Figure 9 illustrates tracking the locked target object.
    • If the center point displacement exceeds our predefined delta value, we move to the next phase, where we analyze the object movement.

VaatturiLockedToObjectScaledFigure 9. Target object locked, and tracking limited to the region marked by the green rectangle.

It moved, but where did it go?

How do we implement the next phase then? It seems that for more accurate analysis of the object movement, we must use more complex methods than we used for detecting the initial displacement of the object. What if we record the frames for later analysis? Since we may not know or forecast when the object is going to move, depending on the frame size, the video we record might be huge! Fortunately, there is a way to store the frames while still keeping the required size fixed: A ring buffer (also known as circular buffer). In short, ring buffer is a fixed size buffer and when you reach the end, your start again from the beginning and replace the frames recorder earlier. See this article about buffering video frames by Juhana Koski to learn more. Because we observe the initial displacement of the object in real-time, we can record few more frames (the estimated time until the object exists our FoV) and then stop. After this we no longer have the real-time requirement and we can take our time analyzing what happened to the object after its initial displacement.

Let’s say that we want to get the last frame of the object until it leaves the FoV. We could use the following algorithm:

  1. Start iterating from the last recorded frame towards the frame of the initial displacement:
    1. Treat each frame as we did in the beginning when we found the desired object from the image using chroma filter, object map, convex hull and shape analysis.
    2. If we find an object satisfying our criteria, we stop expecting it to be the object we were tracking.
  2. We now have the object position from the beginning of its movement to the last known position in our FoV (see figure 10). This means we can at least calculate the angle and relative velocity of the object.

 VaatturiObjectMotionCapturedScaledFigure 10. Object (USB cannon projectile wrapped with pink sticker) motion captured.

Challenges and future development

Lighting challenges are typical with image pattern recognition solutions. Changes in lighting conditions affect the perceived color and that makes the selection of parameters (YUV value and threshold) for chroma filtering difficult. Camera hardware and its settings play a significant role here: Longer the exposure time, easier it is to detect the object properly. However, with long exposure time, it’s harder to capture the object movement. The object in motion will have a distorted shape and its color will blend with the background. Thus, it becomes more difficult find the object in the frames when it’s moving. On the other hand, if we use short exposure time, we get less light per frame and the color difference of the object compared to the background might be insufficient.

The current implementation of the solution relies on manual parameter setting for both color and threshold. In the future, we could try to at least partially automate the parameter setting. We would still have to roughly know the shape and size of the object we want to find. We could apply edge detection algorithms to boost the color filter and get more accurate results with stationary objects. Of course, when an object is moving fast, the edges may blur. However, since the current implementation provides us with the frame of the initial object displacement, we can compare that to the later and see the changes in e.g. chroma. The moving object will leave a trace even if it’s blurred with the background or distorted.

And then there was the code…

The related code project is hosted in GitHub: https://github.com/tompaana/object-tracking-demo

See the README.md file delivered with the project to learn more. The project is freely licensed so you can utilize any bits of the code anyway you like. Have fun!

ThatsAllFolks…or is it?

EDIT: Turns out that’s not all folks. See how everything turns out here.

Tracking Objects from Video Feed Part II: Identifying a Stationary Object

In the first part we introduced the raw image data formats, which we receive from camera. Now it’s time to get to the good stuff. First we formulate our goals and start with trying to extract an object from video feed. In order to track something we first need to find the “something”.

Think before you act?

When solving a complex problem, it may be sometimes useful to start from the desired result and work your way from the result to the beginning in the problem solving pipeline. This way we can break the problem down to several resolutions – steps in between leading to the final solution if you like. In case of our problem, this reversed pipeline would look something like in the table below.

Table 1. Problem solving pipeline reversed.

What is the result? How do we get there?
An object, with a specific shape, identified in the image (location, size, shape etc.) Find the center of mass of the two dimensional object and its outline.
Outline of the detected object. Apply convex hull algorithm to object map (extended binary image, where the background is removed).
Object map, where all significant (suspected) objects are separated and the background is removed. Individualize objects from a binary image.
Binary image in which objects have value 1 and background value 0. Apply algorithm to extract objects with some criteria from the background. E.g. chroma filter.
Chroma filter implementation to extract objects from image. Start coding!

Chroma filter

Here the word “chroma” is a bit misleading, since we also inspect luma (Y) value when filtering the image. The algorithm is as follows:

  1. Set a desired YUV value as a target. This is basically the color we want to find from the picture i.e. if a ball is blue and the background is orange, we try to set the value as close to the blue color of the ball as possible. We also need to set a threshold value, which defines the allowed difference between the target value and the blue color we accept.
  2. Iterate the image data (byte array), pixel by pixel or block by block. By block we mean the unit shown in figures 1 and 2. In the case of NV12, there are four Y values, one U and one V. We could calculate the average of the four Y values, but since the values are likely to be almost the same, for the sake of optimization it’s enough to choose just one. Then we simply compare the set target values to the measured ones, as pseudo code:

IF Difference(target_value_Y, measured_Y) < threshold AND
Difference(target_value_U, measured_U) < threshold AND
Difference(target_value_V, measured_V) < threshold
THEN
(Mark this pixel/block as selected)
ELSE
(Mark this pixel/block as not selected)

 

 VaatturiOriginalImage2Scaled  VaatturiChromaFilterRedScaled

Figure 4. The original image on left and the image with chroma filter (red) applied on right.

In the case of NV12 we can utilize the Y plane (since it matches the full size of the image) and convert it into a virtual binary image: Y value 0 indicates 0 and Y value 255 indicates 1 (as done on the right-hand-side image in figure 4). We can then use it to map objects as explained in the next chapter.

In our code project we have a native effect, which executes the aforementioned chroma filter for a single frame: ChromaFilterEffect.

Mapping objects

The simplest source for mapping objects is a binary image e.g. a bit array where 0 denotes background and 1 an object. What we need to do is to identify objects that are not joined (their pixels don’t touch each other). We can do this by assigning each object a unique number. The end result can be, for instance, an array of integers, where 0 still denotes background, but a value greater than 0 is an ID of a certain object. Figure 5 illustrates this process from the binary image to resulting object map.

BinaryImageAndObjectMapFigure 5. Binary image (left) and object map (right).

The principle of the algorithm for mapping objects is very simple: Check each pixel of the binary image and assign a unique value to those pixels which are adjacent. If we break this down into more detail, we can have something like this, when the image is in form of an array:

  1. Create an object ID counter (let’s call the counter c), which provides a unique value for the pixels of each object. You can start with value c = 1.
  2. Go through each pixel starting from 0 to N – 1, where N is the number of pixels:
    • If the pixel has value 0 (background):
      • Do nothing.
    • If the pixel has value 1 (an object):
      1. If the previous pixel had value 0:
        • Set the object ID counter value so that the new value is guaranteed to be unique (let’s call this unique ID U), c = U, since this can be a new object. It is important that the value is truly unique – mere increment by one is not enough.
      2. If there is an adjacent non-zero pixel above (see figure 6):
        1. Get the object ID of the adjacent pixel (let’s call this value as A).
        2. Backtrack (with a separate iteration) and replace all pixels, which have the current value of counter c with value
          • You can stop backtracking, if you encounter a line with no pixels having the current value of the object ID counter.
        3. Set the value of the object ID counter to the value of the adjacent pixel (c = A) and continue the original iteration from the index before backtracking.
      3. Set the value of the current pixel to match the current value of the object ID counter (value of c).

ObjectMapCreationProcessFigure 6. Object map creation process.

See ImageProcessingUtils::createObjectMap method where the aforementioned algorithm is implemented with C++.

Note that after applying the algorithm described above, you will not end up with an ordered object map (map with ordered IDs: 1, 2, …, N). Instead, you will have a map where each object has a unique ID, but they can be arbitrary, e.g. 2, 5, …, N + 100. Sorting the map is quite trivial: Get a list of current IDs (one iteration), create a map where each existing ID has an ordered counterpart (2 -> 1, 5 -> 2, …, N + 100 -> N) and replace the values in the object map (second iteration).

Identifying object shapes

Now that we have an object map, we can easily see the size of each object. But what about the shape? It depends on the way we want to classify shapes e.g. are we interested in round shapes or squares. In addition, depending on our object extraction methods, the objects may not be perfectly extracted; some of what was part of the object, could have been interpreted as background. For instance, in figure 6, the largest object could have been a ball with a large stripe on it that was lost when our chroma filter was applied. Luckily, there is a simple algorithm for “fixing” the shape, called convex hull.

Convex hull can be considered as an elastic band wrapped around an object. If we apply the algorithm to the first object in the object map we created earlier, the stripe in the center is discarded and a circle shape emerges:

ConvexHullFigure 7. Convex hull algorithm applied to the first object in the object map.

 VaatturiConvexHullScaledFigure 8. Convex hulls (green lines around the bird and red lines around small piece on X-Wing figher side panel) drawn to surround the filtered red areas.

There are number of different convex hull algorithms. The one used in this solution is called monotone chain (see ImageProcessingUtils::createConvexHull method). For more information about monotone chain including code samples can be found from Wikipedia: http://en.wikibooks.org/wiki/Algorithm_Implementation/Geometry/Convex_hull/Monotone_chain

Convex hull gives us a list of points (pixels coordinates), which form the outline of an object. However, we still have to determine the shape of the object somehow. If it’s a ball shape we’re after, we could find the center of the object – e.g. center of mass where one pixel equals one mass unit – and then see how much the outline differs from a circle, where the radius is width or the height of the object divided by 2. The less the difference, the more the shape is like a circle, which of course is the shape of a ball in two dimensional presentation.

to-be-continued-back-to-the-futureNow that we have found our object let’s try to track it. See the next and the final part of this thrilling trilogy and find out if we succeeded in our goal to detect the changes in object position!

Tracking Objects from Video Feed Part I: Image Data Formats

It so happened that a couple of months ago my colleague and I were introduced an interesting challenge: Could we use a phone camera to capture and analyze the path of a moving object. There were (and still are) some wild ideas what we could do in the not-so-distant future, but somewhat excited we decided to investigate what we could do with the current resources (hardware and APIs) available. I personally prefer setting the bar high, but, of course, one needs to start from the ground up – at least with no super powers.

VaatturiLockedToObjectScaledThe might eagle will never know what hit him…

In this three part blog jamboree I’ll unveil where we got (so far) and how we did it.

Getting Started

So we had our challenge formulated: Identify an object from the video feed (the stuff that the camera on the device feeds you) while it’s stationary. Then lock into that object – watch it like a hawk – and if it moves, try to see where it went. To be precise “where it went” means finding its last position in the field of view (FOV).

It was apparent to us from the beginning that we wanted to go native to get the maximum performance. At this point we only had very vague vision of the algorithms (and their complexity), which we would likely use. Anyhow, we expected to be dealing with some heavy stuff. Luckily, another nice fellow at work hinted us to checkout this MediaCapture API sample, and it turned out it was a good starting point. Thereon we focused understanding the image data formats in order to be able process the data.

Image Data Formats

The image data received from camera hardware (the camera of a smartphone, tablet or simply an external webcam) comes in YUV color space. In short, Y (luma) defines the brightness while U and V (chroma) define the color of a pixel. The notion Y’CbCr would be more accurate as Y’UV refers to an analog encoding scheme, but for the sake of simplicity we use U to denote Cb and V to denote Cr in this article.

Depending on the hardware, typically two type of specific YUV formats are used: YUV2, which is commonly used by e.g. webcams, and NV12, commonly used by smartphone cameras. Although they are both based on YUV color space, their structure – size and order of Y, U and V bytes, are different. Thus, the frames need to be either converted to some specific format prior to processing or we have to implement separate methods to process the frames correctly based on the format.

 YUV2

YUV2 format has the following properties:

YUV2 can be seen as a byte array where the first and then every second value is a Y (luma) value while chromas (U and V) fill the blanks in between so that U comes first as shown in figure 1. Each chunk has the size of 4 bytes (32 bits).

YUV2Figure 1. YUV2 format.

For more details about YUV12 format, visit the following pages:

 NV12

NV12 format has the following properties:

  • 4:2:0 chroma subsampling
  • 12 bits per pixel

Let’s consider a VGA image (640×480 pixels) encoded in NV12 format. The size of the Y plane covers the whole resolution a byte per pixel i.e. the Y plane is 640×480 bytes. The U/V plane, which follows, is half of the Y plane: 640×240 bytes (see the figure below).

NV12PlanesFigure 2. NV12 image format.

The figure below depicts four pixels in NV12 format. As you can see each pixel has a dedicated Y value, but only ½ of chroma (¼ of U and ¼ V values to be precise).

FourPixelsNV12Figure 3. A four pixel section in NV12 image data.

The block in figure 2 on the right-hand-side describes the corresponding location of the bytes in NV12 byte array A, where h is the height of the image and w is the width of the image (in bytes).

For more details about NV12 format, visit the following pages:

Coming next…

Check out the next part where we dive straight into the methods of extracting objects from video frames.

BatmanTrappedWill the caped crusader overcome the challenge or will he be trapped by the algorithm villains? Tune in tomorrow (or right now if you like) – same dev-time, same dev-channel!

Working with Conspicuous Devices (My One Obligatory IoT Article)

IoT! The new, hip word meaning “Internet of Things” – unlike many other trends that come and go, this one is here to stay. Oh, and grow! But enough of the hyped marketing talk; I’m not good at that anyways. What I want to offer to you, developers, in this short blog post, is a small part of Windows 10 offering for IoT related app development, specifically for Bluetooth LE (BLE) beacons.

What is a BLE beacon you ask? It’s a small piece of hardware, typically run by a small battery of which lifetime varies from months to many years. It does but one thing: Transmits a signal with a small payload over and over again (hence the name beacon). Beacons can be attached in many places, both stationary and mobile. Who knows – you could have one in your pants right now!

So, a beacon alone does not do anything useful, but think of what devices receiving the signals can do! Like the whole field of IoT, it’s hard to foresee all the use cases random tech enthusiasts devise with beacons and similar devices. I already worked with one of the visionaries in the field, a company called Sensorberg, and it’s hard not to get excited by their enthusiasm alone.

But, let’s cut to the chase (I promised this would be a short article)!

Windows 10 and its new converged Bluetooth stack

Windows 8.1 did not have enablers for developers to work with beacons, but this unfortunate shortcoming is spectacularly fixed in the spanking new Windows 10. Not only that, but the whole Bluetooth stack is now converged i.e. it’s the same on all devices running Windows 10. However, you should note that some of the features have hardware dependencies, which you have to take into consideration when developing universal apps. The good news is that the same code works everywhere; you just have to catch the possible exceptions in the case of missing hardware support.

The new namespaces for working with beacons are Windows.Devices.Bluetooth.Advertisement and Windows.Devices.Bluetooth.Background. The aforementioned is the one I’ll be focusing in this article. The latter provides the means to work with beacons using a background task.

Implementing a tricorder

What does it take to make your Windows 10 device to scan for beacons. Not much. You simply construct a BLE advertisement watcher instance, give it some filters, start it and wait for beacons to come in range. Then, simply catch the event and do something with the data you received.

DataTricorder“Look Geordi! I received a coupon code!”

In code setting the watcher up and starting it is done like this (based on the snippet taken from the official Microsoft sample):

BluetoothLEAdvertisementWatcher watcher =
    new BluetoothLEAdvertisementWatcher();

var manufacturerData = new BluetoothLEManufacturerData();

// Then, set the company ID for the manufacturer data.
// Here we picked an unused value: 0xFFFE
manufacturerData.CompanyId = 0xFFFE;

// Finally set the data payload within the manufacturer-specific section
// Here, use a 16-bit UUID: 0x1234 -> {0x34, 0x12} (little-endian)
var writer = new DataWriter();
writer.WriteUInt16(0x1234);

// Make sure that the buffer length can fit within an advertisement payload.
// Otherwise you will get an exception.
manufacturerData.Data = writer.DetachBuffer();

// Add the manufacturer data to the advertisement filter on the watcher:
watcher.AdvertisementFilter.Advertisement.ManufacturerData.Add(manufacturerData);

watcher.Start();

To catch the received beacon data you must hook to BluetoothLEAdvertisementWatcher.Received event, where you get the data encapsulated in BluetoothLEAdvertisementReceivedEventArgs. You will find all the data transmitted there as raw byte array and some of the data is provided as properties for convenience. You can check out the format of the beacon data here.

If at this point you are too eager to jump right into code, I don’t mind. You can check out the official Microsoft sample code here or take a look at my awesome sample here.

Your device can be a beacon too

Windows 10 also allows you to make your device function as a beacon (and remember that Windows 10 runs on all kinds of devices including even the smallest ones). This is handy for a number of reasons, probably many use cases exist that I can’t even imagine yet, but of course, the obvious one is testing your app – unlike a standard, physical beacon, the beacon ID, used to identify a certain beacon, can be changed dynamically and you can easily start or stop broadcasting with a push of a button.

For turning your device into beacon there’s a class called BluetoothLEAdvertisementPublisher. The use of it is just as simple as that of the watcher; construct the instance, give it a payload and hit start! Here’s an example (based on the snippet taken from the official Microsoft sample):

// Create and initialize a new publisher instance.
BluetoothLEAdvertisementPublisher publisher =
    new BluetoothLEAdvertisementPublisher();

// We need to add some payload to the advertisement. A publisher without any payload
// or with invalid ones cannot be started. We only need to configure the payload once
// for any publisher.

// Add a manufacturer-specific section:
// First, let create a manufacturer data section
var manufacturerData = new BluetoothLEManufacturerData();

// Then, set the company ID for the manufacturer data. Here we picked an unused value: 0xFFFE
manufacturerData.CompanyId = 0xFFFE;

// Finally set the data payload within the manufacturer-specific section
// Here, use a 16-bit UUID: 0x1234 -> {0x34, 0x12} (little-endian)
var writer = new DataWriter();
UInt16 uuidData = 0x1234;
writer.WriteUInt16(uuidData);

// Make sure that the buffer length can fit within an advertisement payload. Otherwise you will get an exception.
manufacturerData.Data = writer.DetachBuffer();

// Add the manufacturer data to the advertisement publisher:
publisher.Advertisement.ManufacturerData.Add(manufacturerData);

publisher.Start();

Note that the advertising feature is a limited hardware resource, which can be used by multiple apps. So, unless your app is the only one using it on your device, be prepared for having to wait for the resource to be available. Luckily, you can hook to BluetoothLEAdvertisementPublisher.StatusChanged event. One of the statuses is “Waiting”.

 

Please, have my code

Screenshot of BLE Beacon Sample

You can check out my BLE beacon sample, which does both scanning (using the watcher) and advertising (using the publisher). It allows you to enter the desired beacon IDs. I tried to keep the code as simple as I could by adding two utility classes: Beacon and BeaconFactory. The sample is hosted in GitHub here.

Wait, there’s more…

You might have noticed that I did not cover beacon scanning scenario where the application is in the background. That’s because my dear colleague, Juhana Koski, has already covered that in his article, and it also comes with a code sample.

Do also check out this great session on BLE advertisement APIs from the 2015 //build/ conference.

Finally, for a proper developer, which I’m sure you are, code speaks more than… umm… 1000 words which are not code. Thus, check these samples to get a quick dive-in to the world of BLE beacons on Windows 10 universal platform:

Staying in control

When working on an app even a bit more complex than couple of views, you quickly find yourself in a need to create either a custom UI components (user controls in Windows development terms) or at very least a composite components. Whenever I find myself in this kind of situation, I try to generalize them and make them as self-contained as possible. In my opinion this approach has two benefits: First, obviously, I can easily use them later in other projects. Second, it makes a better architecture and makes it easier to have more instances of the component in the same project.

I find this to be a standard practice; in most cases when you run into this situation, you find a ready-made solution to your problem in stackoverflow. Usually, it’s in a form of a snippet that you just copy into your project and sometimes you get the complete user control from some project. I recently worked on an app, and found there were no solutions to couple of my problems, so I thought I would fill in the gap and provide them here. Both solutions are quite trivial (except for the first one, if you target a specific platform/framework version), but the thought of saving the precious time of any developer facing the same problem makes me happy.

Sliding panel user control

 

So, to the point. Behold, a sliding panel! With buttons, text, bells and whistles! User can drag it or animate it by tapping an icon or a button. The problem here isn’t the composite nature nor the way it is manipulated (dragging), but the performance when your project is built on a specific framework, namely Windows Phone Silverlight. I try to always work with the latest frameworks, but sometimes it’s just not possible. The performance trick used here is very conventional: Render the whole layout in a bitmap and then animate that. The nice thing about this component is that it works in both Silverlight and Windows Universal apps. It is fully self-contained.

See more detailed description and get the source code from my GitHub project.

Ticker text user control

In image: Three ticker text user controls in StackPanel layout.

This is quite trivial and rarely needed UI component. To be honest, I was quite surprised not to find a version of this anywhere in the interwebs. I suppose one must be somewhere and my search engine skills are just below average. Or it could be that this component is so trivial, that no-one bothers to even look for a ready-made solution. In any case, since I lack all discretion, I dumped my user control here (or actually to GitHub) anyways.

In case the “ticker” does not ring any bells, it’s the component that has a scrolling text on it, much like you see in the lower part on your television sets when watching the news. My control, like the sliding panel, is supported on Windows Silverlight and in Windows Universal apps.

Find the code in GitHub.