Remotely controlled bots

You know. Because it’s good to have a fail-safe around in case of Skynet.

But to the point: Let’s say you have a great backend heavy application and you want to deliver a bot experience to broaden your user base and to provide a new way of interacting in your app context. Well your new bot can surely be made to access the application data in your backend, but how can you make it work the other way around – say, in case of notifications?

Controlling a bot from backendThat’s the bot on the right, by the way.

Hence the title – by remote control I simply refer to a backend controlling the bot remotely (over interwebs) by messages that can be interpreted as commands to execute an action.

Backchannel

That’s what we call it and apparently it’s just one word. What we mean by the word is a type of message (like the ones users send to talk to a bot and the bot uses to reply back), but we just put the meaningful content in a different place of the message object (Activity in C#). Namely, we put the message the bot should react to somehow in IMessageActivity.ChannelData instead of IMessageActivity.Text. Tadaa! End of article.

No, but it really is that simple! In a nutshell you devise a simple custom protocol that your bot knows, for example, when the IMessageActivity.Text contains “notification”, you look at the channel data content to see who and with what message to notify. Then let your implementation in the bot code to do it’s job. Still don’t believe me? Look, here’s a sample (in C#).

Ok, you got me. What I failed to mention is that you have to have some Microsoft Bot Framework specific code in your backend. Perhaps the easiest way to implement this backchannel messaging pipeline between the backend and the bot is using Direct Line. And the easiest way to use the Direct Line is by utilizing the ready-made client components for Node.js and C#. If your backend is not compatible with Node or C# components, implementing your own Direct Line connection is quite straightforward (the first link about Direct Line describes the protocol). They are, after all, only HTTP calls. My sample comes with a super simple console app sending notification commands to the bot. You should be able to use the code almost as-is, if your backend is built with C#.

What about security? I’m not an expert, but there are three points here I want to make:

  1. The Direct Line pipeline is secured by a secret key and TLS
  2. The user cannot inject content to the channel data (think of SQL injection vulnerability) as long as the channel (e.g. Skype) is secure
  3. You can encrypt the channel data content

Note that some descriptions of backchannel say that you should also change the value of the Type property of your Activity; from being “message” to “event”. This is a matter of taste. The benefit of this is that you can be sure that your backchannel message is not treated as a regular message (because the type is not “message”).

Where to, sir?

Where am I supposed to place this backchannel messaging specific code in my bot project? To me, this introduces some controversy; The bot framework utilizes Autofac, an inversion of control (IoC) container for dealing with dependencies, and I am not a fan. In my opinion wide use of IoC leads to incoherent code and architecture with little benefits to offer. And it can make writing tests (which I don’t do unlike true professionals I guess) a pain! But that’s just me – maybe my brain is not sophisticated enough to understand these kinds of exquicite concepts.

Just to show I can do things I don’t like I integrated the backchannel bot code using Autofac in my sample. Take a look at GlobalMessageHandlerModule.cs and Global.asax.cs. I’ve created classes derived from ScorableBase, which are automatically invoked when (and only when) I forward the received Activity object to my root dialog in MessagesController.cs. Then if a backchannel message is detected, the specific scorable class (NotificationsScorable in my sample) consumes and deals with the Activity and it is never given to my dialog. Special thanks to my brilliant colleague, Lilian Kasem, for coming up with this idea!

Call me old-fashioned, but I still find the code a lot easier to understand if I simply put this logic to my MessagesController class (or equivalent) before passing anything to any dialog. That’s just the way I roll…

if (we got a valid backchannel message)
{
    // Do what needs to be done
}
else
{
    // Looks like a message from a user, let the dialog handle it
    await Conversation.SendAsync(activity, () => new RootDialog());
}
else ...

See?

Related resources

 

Chatbots as middlemen

Chatbots typically serve their customers on 1:1 basis. They are not unlike digital assistants (Cortana, Siri, Alexa etc.) except that a chatbot is usually designed to execute a small number of prefined tasks well and focus on a narrow subject like filling a pizza order for example.

Building chatbots is easy, but making them clever is more difficult. Despite all the analytics on user behavior, it is still impossible to anticipate every user reaction. As the technology, Conversation as a Platform (CaaP), evolves, creating more intelligent bots becomes easier and easier, but until Skynet grows self-aware humans still serve a purpose.

Imagine a customer service chat on a website. A bot can probably handle most of the problems a customer could have. For instance, implementing a simple FAQ bot is trivial using Microsoft’s QnA Maker. Add some additional intellect including natural language understanding service and whatnot and you have an efficient customer service bot in your hands that 9 out of 10 customers are perfectly happy with. But for that one customer, you might want to consider a fallback: Let the human – in this case a customer service agent – take over to ensure customer satisfaction.

As long as you have the human labour, implementing this isn’t rocket science. What you need to do is as follows:

  1. Make sure your bot keeps track of all the individuals the bot sees (but remember privacy policies!)
  2. Make sure your bot also keeps track of itself. This might sound weird at first, but I’ll make the reason apparent soon.
  3. Design the handover scenario. It could be based on sentiment analysis or simply a request of help by the user.
  4. Implement the message relaying logic (don’t worry – there are samples available!)

How and why to keep track of people and bots

By keeping track I mean collecting the contact information of a user (and the bot – I’ll explain later) from the bot’s perspective. You can’t send a post card to a person without knowing his or her address. The same applies to the bot framework: You cannot send a message to a user without knowing the IDs of the user and the conversation. What you’ll need at least are:

The aforementioned details may be enough, but this depends on the channel (Skype, MS Teams, Slack etc.) You might as well store all the details as shown in the following tables.

Table 1. Identities in Skype (all values are of type string).
Me Bot
ServiceUrl https://skype.botframework.com https://skype.botframework.com
ChannelId skype skype
ChannelAccount.Id 29:1byUvXHHhinNxwnPCHh4MPhpfiJUbadX_Y3_sTkBspdiSke8sX_Ps6riTYRVez5jT 28:f99fa2c3-8834-418e-b293-039205238055
ChannelAccount.Name Tomi Paananen Intermediator Bot Sample
ConversationAccount.Id 29:1byUvXHHhinNxwnPCHh4MPhpfiJUbadX_Y3_sTkBspdiSke8sX_Ps6riTYRVez5jT 29:1byUvXHHhinNxwnPCHh4MPhpfiJUbadX_Y3_sTkBspdiSke8sX_Ps6riTYRVez5jT
ConversationAccount.Name (N/A in direct conversation) (N/A in direct conversation)

The values above are from a direct conversation in Skype between my bot and I. As you can see the channel account ID (read: my user ID) and the conversation account ID match, but that isn’t necessarily the case in other channels.

Table 2. Identities in Slack.
Me Bot
ServiceUrl https://slack.botframework.com https://slack.botframework.com
ChannelId slack slack
ChannelAccount.Id U1F3JK9A9:T1F248PJ8 B2NSU1D4Z:T1F248PJ8
ChannelAccount.Name tomi intermediatorbot
ConversationAccount.Id B2NSU1D4Z:T1F248PJ7:C3B1ZK5D0 B2NSU1D4Z:T1F248PJ7:C3B1ZK5D0
ConversationAccount.Name bottest bottest

So why do we need the bot’s identity stored too? As you can see, the same bot has a different identity on different channel and conversation. When we send a message to a user, we need to specify who the message is from, and some channel, for example Slack, doesn’t allow you to send messages from bots that aren’t actually there. So in order to relay a message from a user to another on another channel (e.g. Skype to Slack) we need to know and use the bot’s identity in Slack in the from field.

Briefly about the technical implementation: All the activities flow through the MessagesController class in a bot built with C# and that’s the ideal place to keep track of everything. As for bots, the bot is always the receiving party when it gets a new activity, and that’s how you store the bot identities. See Sending and Receiving Activities for more information.

Finally, store the records of the users and the bot somewhere in web e.g. Azure Table storage service. Note: My sample stores the data locally (in memory), which is never, ever a good idea, because bots are essentially web apps and can have multiple instances!

Comparison to Node.js

The essentials for relaying messages are the same whether you are building your bot using C# or Node.js SDKs. However, there are differences between the SDKs and some things are handled differently.

Table 3. Node.js counterparts for establishing user/bot identity.
C# Node.js Node.js example
Activity.ServiceUrl  IChatConnectorAddress.serviceUrl session.message.address.serviceUrl
Activity.ChannelId IAddress.channelId session.message.address.channelId
ChannelAccount.Id IIdentity.id session.message.address.bot.id
session.message.address.user.id
ChannelAccount.Name IIdentity.name session.message.address.bot.name
session.message.address.user.name
ConversationAccount.Id IIdentity.id session.message.address.conversation.id
ConversationAccount.Name IIdentity.name session.message.address.conversation.name

Samples

How to create dynamic FormFlow

You might’ve guessed it from the title – Yes, this is one of my no-nonsense posts. Strictly business and by business I mean code-talk.

This post is about a building block in Microsoft Bot Framework called FormFlow and namely how to add dynamic behavior to the flow when building bots using C#. If you are not familiar with FormFlow, I suggest your study the basics before reading further. Just the basics though, that’s enough.

Building simple FormFlows is… well, simple! This is a method that creates a basic form:

public static IForm<MyClass> BuildForm()
{
    var builder = new FormBuilder<MyClass>();

    return builder
        .Field(nameof(MyClass.Property1))
        .Field(nameof(MyClass.Property2))
        ...
        .Build();
    }
}

Not difficult at all! And you can influence the behavior using, for instance, property attributes like:

[Serializable]
public class MyClass
{
    [Prompt("What would you like the value of this property to be?")]
    public string Property1 { get; set; }

    ....
}

But what if you want to do some of the following:

  • Skip the questions you already know the answer to based on what the user said earlier?
  • Define the options presented for the user dynamically?
  • Change the way the question and options are presented to the user?
  • Validate the user’s response?
  • Customize the behavior of the form in the fly?

Some of the aforementioned things FormFlow tries to do for you automatically. However, usually to achieve a great experience you have to do a bit more work, and luckily, it is possible. See the resources available under Microsoft.Bot.Builder.FormFlow.Advanced namespace. One very useful class under there is called FieldReflector. Whilst you can still add ActiveDelegate and ValidateAsyncDelegate using the overloaded Field method in FormBuilder, FieldReflector allows you to do more:

.Field(new FieldReflector<MyClass>(nameof(MyClass.PropertyX))
    .SetType(typeof(MyClass.PropertyX))
    .SetActive((state) => SetFieldActive(state, nameof(MyClass.PropertyX)))
    .SetDefine(async (state, field) => await SetOptionsForFieldsAsync(state, nameof(MyClass.PropertyX), field))
    .SetAllowsMultiple(true) // Single selection vs. multi-selection
    .SetPrompt(new PromptAttribute("What type of values should this property have? {||}"))
    .SetValidate(async (state, value) => await ValidateResponseAsync(value, state, nameof(MyClass.PropertyX))))

…where SetFieldActive, SetOptionsForFieldAsync and ValidateResponseAsync are methods defined and implemented by the developer. See this class implementing the building of the form from my Dynamic FormFlow Sample. The snippet provides solution to all the questions presented in the bullet point list above. In case you are curious how this enables customizing the behavior in the fly, notice that you can run any arbitrary code in your response validation method (ValidateResponseAsync in the snippet).

In my Dynamic FormFlow Sample I use FormFlow to narrow down a spaceship selection from a static catalog of spaceships. Therefore, it is important that I don’t bother the user with unnecessary options; the user might’ve already told me that he/she is looking for a small ship and thus I shouldn’t later ask to select from options only available for large ships. That’s simply bad UX! Some queries I can skip in case there is only one option available. In my response validation method I do a search against the catalog using the details I’ve gathered from the user so far and  if the response is valid, I store the search results for later. With this approach I can narrow down the selection with as few questions as possible and without presenting the user with options that make no sense.

The caveats

Playing with FormFlow is not all bed of roses. I had to fight few errors, that for first seemed odd, until I got the hang of it. Here are some of the things to keep in mind:

  • After you build your form, it is built. It now exists. And you cannot really control the instance anymore. Why is this important to understand? Because you cannot say for certain when the methods (delegates) in your form are called. It’s now in the hands of the Bot Framework. So make sure your methods (delegates) work in any situation! For starters, have null checks.
  • Do not use the IList interface as a property type in the class where you are collecting user input (Spaceship.cs in the sample). It won’t work and you’ll get a FormCanceledException with “Cannot create an instance of an interface” message. Use List instead, it works.
  • Realize that you don’t have to do everything in the form; after the form is complete, you can continue with the data in a Dialog and ask further questions etc. That’s what I did in my sample, see SpaceshipSelectionDialog.cs.

And a top tip: If you are new to FormFlow, implement and test one complex field at a time.

So it’s a magic bullet?

No, it is not. FormFlow is a handy building block, but will not solve all problems. Duh.

If you feel like your FormFlow code is turning into horrible, uncontrollable mess and you feel like you need to compromise the UX, stop. Stop using FormFlow. You can do the same using Dialogs too and with really complex scenarios it will be – most likely – much easier too. FormFlow is a solution for fairly straightforward forms – it was never meant to be used with overly complex flows. Or at least I think that’s the case.

I should have a post about managing dialog flows out soon, but in the meanwhile, here’s the big secret: IDialogContext.Wait(<method name>) lets you define the next method that will process the next user response.

 

 

Privates exposed

We’re all tempted, right… But don’t do it. Just don’t.

Access modifiers, as we all know, are fundamental part of object oriented languages. When used correctly, they help to provide clear interfaces for classes by data encapsulation and allow carefree development of software using various APIs. When you see a private (or protected) method, you think there’s a good reason why the one who implemented the class decided to do so. If for some reason you do need to go further than the public API allows you to, you usually find a valid workaround – and even then you question if there is a better way to achieve what you are trying to accomplish.

However, in some extremely rare cases you might find yourself in a situation where there is no workaround and without access to some protected/private method you are facing a wall. Or the possible workaround costs you hours or even weeks more work when with the access you could be done in just few minutes. What to do? Well, it’s up to you, but if you really want to take the easy (but risky) path, you can, since there really are no such things as protected or private. An access modifier is more than a recommendation though and you should think not twice but N times before dismissing one.

You have been warned

Accessing private bits in C++ is a bit tricky. The method I’d recommend is to get the address of, i.e. pointer to the function in question. You might need to calculate the offset from e.g. the class pointer, but this can be done by ye olde trial-and-error method. You may also consider trying #ifdef hacks, but those could drive you crazy with all the other errors they might cause.

In other languages, namely those that support the magicks of reflection (Java and C# for instance), things can be far more simple. For example, in (Android) Java you access and invoke a private method as follows:

Method method = SomeClass.class.getDeclaredMethod("methodName");
method.setAccessible(true);
SomeClass someClass = new SomeClass();
method.invoke(someClass);

The constructors and members are accessible in a similar fashion. See Class and Method classes for more information.

Note that  even though your code accessing and invoking private methods works now, you cannot rely it to work in the future. If – and often times when – the code you’re referencing and the private method signature changes, your code will throw a NoSuchMethodException. Therefore, it’s a no-brainer to surround the code with try-catch. But what then? What do you do when an exception is thrown and you’ve caught it. Albeit this is from programming 101, I’m gonna say it: Handle the exception gracefully; Your application has to perform even when your hack of access violation trickery doesn’t! Same goes regardless of what your weapon (language) of choice is.

I warned you

 

Case Android Bluetooth socket

I was working on a cross-platform peer web project called Thali. Furthermore, I was in charge of the native Android layer of the project (see Thali Android Connector Library). We had had issues (in addition to number of other problems) with failing Bluetooth sockets, namely in the connection process.

We noticed that many reported better results using a workaround that they used to create a socket with a specified port. One uses BluetoothDevice class to construct BluetoothSocket instances. However, using the publicly available methods (read: the methods intended to be used) to create sockets one cannot define the port – instead the port is decided for you. If you really want to define the port yourself, there is a way: Use reflection to invoke the method with which you can define the port. And it’s not even protected/private, just cannot be called directly:

// bluetoothDevice is an instance of BluetoothDevice class
Method method = bluetoothDevice.getClass().getMethod("createRfcommSocket", new Class[] { int.class });
BluetoothSocket bluetoothSocket = (BluetoothSocket) method.invoke(bluetoothDevice, 1); // 1 is the port number

This solution didn’t work for us since Thali project uses insecure RFCOMM sockets vs. the secure ones and the method for constructing insecure sockets with a specified port number is neither public nor available. Thus, to accomplish the same effect as the aforementioned code snippet, one has to access the private constructor of the BluetoothSocket class. So I created a helper method which allows you to construct both secure and insecure BluetoothSocket instances with the desired channel/port (see BluetoothUtils class in Thali Android Connectivity Library project):

public static BluetoothSocket createBluetoothSocketToServiceRecord(
        BluetoothDevice bluetoothDevice, UUID serviceRecordUuid, int channelOrPort, boolean secure) {
    Constructor[] bluetoothSocketConstructors = BluetoothSocket.class.getDeclaredConstructors();
    Constructor bluetoothSocketConstructor = null;

    for (Constructor constructor : bluetoothSocketConstructors) {
        Class<?>[] parameterTypes = constructor.getParameterTypes();
        boolean takesBluetoothDevice = false;
        boolean takesParcelUuid = false;

        for (Class<?> parameterType : parameterTypes) {
            if (parameterType.equals(BluetoothDevice.class)) {
                takesBluetoothDevice = true;
            } else if (parameterType.equals(ParcelUuid.class)) {
                takesParcelUuid = true;
            }
        }

        if (takesBluetoothDevice && takesParcelUuid) {
            // We found the right constructor
            bluetoothSocketConstructor = constructor;
            break;
        }
    }

    // This is the constructor we should now have:
    // BluetoothSocket(int type, int fd, boolean auth, boolean encrypt, BluetoothDevice device,
    //      int port, ParcelUuid uuid) throws IOException

    // Create the parameters for the constructor
    Object[] parameters = new Object[] {
            Integer.valueOf(1), // BluetoothSocket.TYPE_RFCOMM
            Integer.valueOf(-1),
            Boolean.valueOf(secure),
            Boolean.valueOf(secure),
            bluetoothDevice,
            Integer.valueOf(channelOrPort),
            new ParcelUuid(serviceRecordUuid)
    };

    bluetoothSocketConstructor.setAccessible(true);
    BluetoothSocket bluetoothSocket = null;

    try {
        bluetoothSocket = (BluetoothSocket)bluetoothSocketConstructor.newInstance(parameters);
        Log.d(TAG, "createBluetoothSocketToServiceRecord: Socket created with channel/port " + channelOrPort);
    } catch (Exception e) {
        Log.e(TAG, "createBluetoothSocketToServiceRecord: Failed to create a new Bluetooth socket instance: " + e.getMessage(), e);
    }

    return bluetoothSocket;
}

What good did it do?

None. Jacksh*t! It did no good at all as far as I can tell.

“Paskaaks se mitään teki.”

The hack didn’t solve our problems. Turns out the problem was elsewhere and fault of my own (I’ll let you in on a secret, if you haven’t realized it by now: I’m not a guru. I’m not a master programmer. I’m your average software developer and, if anything, I’m lazy enough to find quick, clean solutions to problems that usually work.) That said, the hack might have provided useful on earlier versions of Android, but the possible platform issue was most likely fixed on Lollipop and newer. With the hack the Bluetooth socket worked as well as without the trickery and when it was bound to fail it did so regardless.

So as final words I give you…

Reasons why NOT to access protected/private stuff

  1. 99.9 times out of 100, there’s really no need – work around it!
  2. Given that whoever wrote the code is not a complete tool, made it inaccessible for a reason.
  3. Your hack won’t most probably be sustainable. It will break. Just see. Unless, of course, no one will eveeeeer touch the code you’re referencing.
  4. As per the aforementioned – you have to keep maintaining your code constantly to make sure it stays up-to-date with the code you are referencing.
  5. You’re just asking for trouble.
  6. Go to 1.
Run away
The recommended action

(Universal Windows apps)^2

The great majority of apps built for Windows 8.1/Windows Phone 8.1 work on Windows 10 as-is – no changes required what so ever. But what if you want to leverage the new APIs provided by Windows 10 such as the inking API while still supporting the Windows 8.1 version of your app? Or you might be among the few unfortunate ones who have been using some API deprecated on Windows 10; UserInformation class no longer works on Windows 10 but you have to use the User class instead. How to do that without duplicating the code base and having two completely separate app projects to maintain? In this article I’ll describe two approaches to do that.

Shared code and assets in portable project

The first approach is to include all the shared code (in practice that can be almost all of your code) to a separate portable project in your Windows 8.1 solution. First you need to create the project: Right click your solution in the solution explorer, hover on Add and select New Project…

Adding a new project to a solution

Use Class Library as the project type, name it and hit OK.

Creating a class library project

Drag all the code and asset files you want to share between both the Windows 8.1 and Windows 10 app to the newly created Class Library project.

Note that if you have a solution that supports both Windows 8.1 and Windows Phone 8.1, you have to have at least a partial main page (the page you navigate to in the start-up) in the original Windows 8.1 and Windows Phone 8.1 projects. This due to the fact that you can’t add a reference to your Class Library project in the Shared (Windows 8.1/Windows Phone 8.1) project where your App class lives. And without the reference you can’t make your app to navigate to a page defined in your Class Library project in the start-up. Makes sense? Ok, cool, let’s carry on…

Now that we have the code moved to the Class Library project, we must add it as a reference to the other projects so that we can access the classes and assets. Right click the References under the projects in the solution explorer and select Add Reference…

Adding references to a project

On Projects tab you should now find the Class Library project. Check the checkbox and click OK.

Adding a project in the solution to another as a reference

Now fix any minor problems you may have and once your app builds and runs it is time to move on to work on the Windows 10 solution. Create a new Universal Windows 10 application project and add the Class Library project containing the shared code to the Windows 10 solution as an existing project:

Adding an existing project to a solution

Add the Class Library project as a reference to your main Windows 10 project (as explained before), make your main project to use the shared code and you’re all set! Fine – I realize it’s not often this simple and you need to do some tweaking to get all the other dependencies working and so on, but these are the first steps to take.

If you now want to extend the app on Windows 10 by utilizing the cool new APIs, you need to add that specific code to the main project. You can’t, of course, access the code in the main project from the shared code (for many reasons, one being that this would create a circular dependency), but one solution is to define interfaces in the shared code and providing the implementations from the main project. See my example, namely IUserInformationHelper interface in the Class Library, Windows 10 UserInformationHelper implementation and App.xaml.cs where the implementation is provided.

Pros

  • Allows management of the shared code as a single project

Cons

  • Other dependencies (Nuget packages and such) may cause problems e.g. if they aren’t as universal and work only on Windows 8.1 and not on Windows 10
  • You cannot use conditional preprocessing blocks in the shared code (#if) to target a specific platform since the compilation symbols are also shared
Conditional compilation symbols in project preferences (WINDOWS_UWP is for Windows 10 apps)

Shared code and asset files as links

Another way of sharing code between solutions is adding the code and asset files as links. Using links you don’t have to change your existing solution. Simply create a new – in this case – Windows 10 application project and start adding the files from your existing Windows 8.1 solution. Right click your new project in the solution explorer, hover on Add and select Existing Item… Then browse the Windows 8.1 solution folder containing the files you want to add, select the files and click Add As Link:

Adding files as links

The files are now shown in your solution explorer. However, they are not physically in your new project but exist in the Windows 8.1 application project folder. Any changes you make to these files will also appear in both projects.

While adding the files individually can be tedious, the benefit here is that you can take advantage of conditional preprocessing blocks in C# code:

#if WINDOWS_UWP
    // Have your Windows 10 specific code here
#else
    // Have your Windows 8.1 specific code here
#endif

Pros

  • Conditional preprocessing blocks and compilation symbols can be used
  • Dependencies to additional libraries and Nuget packages are easier to maintain
  • Adding platform specific features, e.g. new Windows 10 APIs, is trivial

Cons

  • Adding/removing shared code and asset files needs to be done in both solutions separately

Sample code

An example for using the both approaches featured in this article can be found here in GitHub.

 

Tracking Objects from Video Feed Part IV: New Hope for Circular Shapes

So, turns out part III wasn’t the last of it – I present to you part IV, which I promise will be at least as good as The Kingdom of the Crystal Skull was as far as sequels after the third one go… In the previous chapter of this story, I mentioned few ideas to improve the reliability and robustness of various bits in the object tracking pipeline. I did, in fact, implement a very simple noise removal functionality, edge detection and chroma delta i.e. the difference between two frames presented as a binary image. I also made the system to be biased towards things, which are round, like balls (or circles if you will since we are dealing with 2D images – but I like the words ballzz better being an immature halfwit).

What’s with the noise!?

There are number of methods to remove noise from an image. Removing noise is essential especially in cases where you want to apply any sort of edge detection. One of the obvious choices is using a Gaussian filter and it really isn’t hard to find ready-made algorithms from the interwebs regardless of the coding language you chose (MUMPS and Brainfuck excluded). Did yours truly then incorporate a Gaussian filter into the project? You bet your ass he did not. Instead, I opted for, although quite similar coding-wise, a nearest neighbor smoother: For every pixel in the image calculate the average of all eight neighbors of said pixel and apply that value to the pixel in question. Super simple! And effective! However, even though this method is slightly quicker than using e.g. 5×5 Gaussian filter, it still takes its toll: I averaged approximately 350 milliseconds per 720p frame on Lumia 930[1].

Software developer enchanted by a lantern
Edges extracted from the original image with noise (center) and with noise removed (right). Hint: Click the image to enlarge.

On the edge

Canny edge detector is a pretty good choice for your edge detection needs. It’s robust, works well and has a funky name. Did I decide to go with a different solution? You betcha! Why? Well, because I’m lazy, and settled for lesser result. My implementation also performs better i.e. less milliseconds spent per frame. What I do is as follows:

  • For every pixel starting from the top left corner handling a horizontal line at a time:
    1. Calculate the difference of the current pixel to the one on the right (if one exists) – we could call this difference “gradient”
    2. Calculate the difference of the current pixel to the one below (if one exists) – let’s go crazy and call this a “gradient” too
    3. If the sum of the gradients exceeds a set threshold value, mark this pixel as part of an edge. If not, it’s not an edge.

“So, what about the pixels on the left and above?” one might ask. “I don’t give a ….” is my answer. So far my solution is sufficient for my needs and that’s all I really care about.

Chroma delta

As described in the previous part, chroma delta takes two frames and calculates their difference and presents it as a binary image. Each pixel has components with three values, whether it’s RGB or YUV, and calculating the overall difference of each of these values for each corresponding pixel in both frames and using a threshold value, we get a binary value for the corresponding pixel in the delta frame. In my case I utilize the Y-plane and set the value 0xff (255) for the pixel, which has changed significantly, or 0x0 (0) for the ones that remain fairly unchanged.

Chroma delta gone bananas
How to make a code monkey mad? Touching his/her banana, that’s how!

So to make it clear: The right-most image displays the change between the two others. See how even the noticeable change in the lighting condition still yields a result. If we know that the banana was in the place depicted in the left image, by looking at the delta image, we can deduce that it’s no longer in that position and by looking at the other shape it’s quite obvious that it moved to the right.

Them balls

When the application is in charge of choosing the object to track, it goes without saying that you have to device some sort of traits for the desired object. My life experience has thought me that round things are more likely to move than rectangular things. I suppose the first guy who realized this invented the wheel. Thus, I’m just as clever. But how do you programmatically evaluate whether a thing is round or not? Especially when despite of your more or less advanced image processing methods do not fully extract the shape of an object?

“Just call me Darth Balls… Bong.” – Jay (from Jay and Silent Bob Strike Back, 2001)

I came to a conclusion that I would have to find a way to calculate how well my neat convex hulls would match a circle. Again, I looked for an answer in the interwebs and discovered that the problem I needed the solution for is called Smallest-circle problem and luckily a sharp, Austrian dude, Emo Welzl, had already proposed a recursive algorithm, which not only solves the problem but does it in mere O(n) time. However – being a lucky bastard – since I already had my convex hulls, I could use a more straightforward solution to create my minimal enclosing circles – here’s how:

  • Find the two vertices, in the convex hull, furthest apart. Their distance is the diameter of your enclosing circle.
  • The center point of the line segment between those two vertices is also the center point of the circle (see the image below).
Minimal Enclosing Circle
Minimal enclosing circle based on the convex hull reconstructs the shape of the ball.

Now that you have the circle, you can compare the traits of the convex hull to see how well they align with the minimal enclosing circle. I tried calculating the difference of the vertices of the convex hull to the circumference of the enclosing circle and it worked out pretty well. I’m sure there are other and even better ways to evaluate the roundness (or “eccentricity” as intelligent people would say). Do also keep in mind that you need to normalize the error based on the object’s size unless you want to be biased towards smaller objects.

Teh codez

Here is the code corresponding to the scribbles in this blog post:

Note that, as of writing this, the pipeline of the Object tracking demo consists of something old, something new, something blue i.e. it’s not really functioning properly.


[1] Lumia 930 is a mobile phone designed and manufactured by a Finnish company, Nokia, which sold its mobile phone division to Microsoft on April 2014 when it decided to focus on Jedi lightsaber-combat-practice-laser-shooting-sphere-things instead.

Tracking Objects from Video Feed Part III: Detecting Object Displacement

In the previous part we provided one solution for detecting and identifying a stationary object of certain shape in video feed. In this part we focus on tracking the object and try to analyze a simple path of a moving object. By simple, I mean *really* simple, we try to detect the “from” and “to” positions of the object – where it started and where did it end up.

When milliseconds count

Compared to detecting objects from a static image or frame, detecting object displacement presents us a new, tough requirement: We have to analyze the frames real-time and thus, performance is the key. We cannot simply use all the methods described earlier, since, especially on mobile devices, they simply take too much time to compute. Ideally, depending on the framerate and the estimated speed, relative to our field of view (FoV), of the moving object, our operation for tracking the image should take less than 10 milliseconds per frame. It is quite obvious that the complexity of any algorithms we use is relative to the frame size – the less pixels we have to analyze, the faster the operation.

Instead of using all the methods described earlier (chroma filter, object mapping, convex hull etc.) to track the object, we utilize them to “lock” the target object. In other words, we identify the object we want to track and after that we can use far lighter methods to track its position. We don’t have to process the full frame, but only the area of the object with some margin. This helps us to reduce the resolution and run our operations much quicker.

Since our target object can be expected not to change color (unless we’re tracking a chameleon), we can do the following:

  1. Once we have detected the object from the image/frames and we know its position and size (number of pixels horizontally and vertically where the object is thickest) we can define a rectangular cropped area with the object in the center and with a margin of e.g. 15 %.
  2. Apply chroma filter to this cropped area for each frame and keep track of the position, which is defined by the intersecting point of virtual lines placed where we have most pixels horizontally and vertically. Figure 9 illustrates tracking the locked target object.
    • If the center point displacement exceeds our predefined delta value, we move to the next phase, where we analyze the object movement.

VaatturiLockedToObjectScaledFigure 9. Target object locked, and tracking limited to the region marked by the green rectangle.

It moved, but where did it go?

How do we implement the next phase then? It seems that for more accurate analysis of the object movement, we must use more complex methods than we used for detecting the initial displacement of the object. What if we record the frames for later analysis? Since we may not know or forecast when the object is going to move, depending on the frame size, the video we record might be huge! Fortunately, there is a way to store the frames while still keeping the required size fixed: A ring buffer (also known as circular buffer). In short, ring buffer is a fixed size buffer and when you reach the end, your start again from the beginning and replace the frames recorder earlier. See this article about buffering video frames by Juhana Koski to learn more. Because we observe the initial displacement of the object in real-time, we can record few more frames (the estimated time until the object exists our FoV) and then stop. After this we no longer have the real-time requirement and we can take our time analyzing what happened to the object after its initial displacement.

Let’s say that we want to get the last frame of the object until it leaves the FoV. We could use the following algorithm:

  1. Start iterating from the last recorded frame towards the frame of the initial displacement:
    1. Treat each frame as we did in the beginning when we found the desired object from the image using chroma filter, object map, convex hull and shape analysis.
    2. If we find an object satisfying our criteria, we stop expecting it to be the object we were tracking.
  2. We now have the object position from the beginning of its movement to the last known position in our FoV (see figure 10). This means we can at least calculate the angle and relative velocity of the object.

 VaatturiObjectMotionCapturedScaledFigure 10. Object (USB cannon projectile wrapped with pink sticker) motion captured.

Challenges and future development

Lighting challenges are typical with image pattern recognition solutions. Changes in lighting conditions affect the perceived color and that makes the selection of parameters (YUV value and threshold) for chroma filtering difficult. Camera hardware and its settings play a significant role here: Longer the exposure time, easier it is to detect the object properly. However, with long exposure time, it’s harder to capture the object movement. The object in motion will have a distorted shape and its color will blend with the background. Thus, it becomes more difficult find the object in the frames when it’s moving. On the other hand, if we use short exposure time, we get less light per frame and the color difference of the object compared to the background might be insufficient.

The current implementation of the solution relies on manual parameter setting for both color and threshold. In the future, we could try to at least partially automate the parameter setting. We would still have to roughly know the shape and size of the object we want to find. We could apply edge detection algorithms to boost the color filter and get more accurate results with stationary objects. Of course, when an object is moving fast, the edges may blur. However, since the current implementation provides us with the frame of the initial object displacement, we can compare that to the later and see the changes in e.g. chroma. The moving object will leave a trace even if it’s blurred with the background or distorted.

And then there was the code…

The related code project is hosted in GitHub: https://github.com/tompaana/object-tracking-demo

See the README.md file delivered with the project to learn more. The project is freely licensed so you can utilize any bits of the code anyway you like. Have fun!

ThatsAllFolks…or is it?

EDIT: Turns out that’s not all folks. See how everything turns out here.

Tracking Objects from Video Feed Part II: Identifying a Stationary Object

In the first part we introduced the raw image data formats, which we receive from camera. Now it’s time to get to the good stuff. First we formulate our goals and start with trying to extract an object from video feed. In order to track something we first need to find the “something”.

Think before you act?

When solving a complex problem, it may be sometimes useful to start from the desired result and work your way from the result to the beginning in the problem solving pipeline. This way we can break the problem down to several resolutions – steps in between leading to the final solution if you like. In case of our problem, this reversed pipeline would look something like in the table below.

Table 1. Problem solving pipeline reversed.

What is the result? How do we get there?
An object, with a specific shape, identified in the image (location, size, shape etc.) Find the center of mass of the two dimensional object and its outline.
Outline of the detected object. Apply convex hull algorithm to object map (extended binary image, where the background is removed).
Object map, where all significant (suspected) objects are separated and the background is removed. Individualize objects from a binary image.
Binary image in which objects have value 1 and background value 0. Apply algorithm to extract objects with some criteria from the background. E.g. chroma filter.
Chroma filter implementation to extract objects from image. Start coding!

Chroma filter

Here the word “chroma” is a bit misleading, since we also inspect luma (Y) value when filtering the image. The algorithm is as follows:

  1. Set a desired YUV value as a target. This is basically the color we want to find from the picture i.e. if a ball is blue and the background is orange, we try to set the value as close to the blue color of the ball as possible. We also need to set a threshold value, which defines the allowed difference between the target value and the blue color we accept.
  2. Iterate the image data (byte array), pixel by pixel or block by block. By block we mean the unit shown in figures 1 and 2. In the case of NV12, there are four Y values, one U and one V. We could calculate the average of the four Y values, but since the values are likely to be almost the same, for the sake of optimization it’s enough to choose just one. Then we simply compare the set target values to the measured ones, as pseudo code:

IF Difference(target_value_Y, measured_Y) < threshold AND
Difference(target_value_U, measured_U) < threshold AND
Difference(target_value_V, measured_V) < threshold
THEN
(Mark this pixel/block as selected)
ELSE
(Mark this pixel/block as not selected)

 

 VaatturiOriginalImage2Scaled  VaatturiChromaFilterRedScaled

Figure 4. The original image on left and the image with chroma filter (red) applied on right.

In the case of NV12 we can utilize the Y plane (since it matches the full size of the image) and convert it into a virtual binary image: Y value 0 indicates 0 and Y value 255 indicates 1 (as done on the right-hand-side image in figure 4). We can then use it to map objects as explained in the next chapter.

In our code project we have a native effect, which executes the aforementioned chroma filter for a single frame: ChromaFilterEffect.

Mapping objects

The simplest source for mapping objects is a binary image e.g. a bit array where 0 denotes background and 1 an object. What we need to do is to identify objects that are not joined (their pixels don’t touch each other). We can do this by assigning each object a unique number. The end result can be, for instance, an array of integers, where 0 still denotes background, but a value greater than 0 is an ID of a certain object. Figure 5 illustrates this process from the binary image to resulting object map.

BinaryImageAndObjectMapFigure 5. Binary image (left) and object map (right).

The principle of the algorithm for mapping objects is very simple: Check each pixel of the binary image and assign a unique value to those pixels which are adjacent. If we break this down into more detail, we can have something like this, when the image is in form of an array:

  1. Create an object ID counter (let’s call the counter c), which provides a unique value for the pixels of each object. You can start with value c = 1.
  2. Go through each pixel starting from 0 to N – 1, where N is the number of pixels:
    • If the pixel has value 0 (background):
      • Do nothing.
    • If the pixel has value 1 (an object):
      1. If the previous pixel had value 0:
        • Set the object ID counter value so that the new value is guaranteed to be unique (let’s call this unique ID U), c = U, since this can be a new object. It is important that the value is truly unique – mere increment by one is not enough.
      2. If there is an adjacent non-zero pixel above (see figure 6):
        1. Get the object ID of the adjacent pixel (let’s call this value as A).
        2. Backtrack (with a separate iteration) and replace all pixels, which have the current value of counter c with value
          • You can stop backtracking, if you encounter a line with no pixels having the current value of the object ID counter.
        3. Set the value of the object ID counter to the value of the adjacent pixel (c = A) and continue the original iteration from the index before backtracking.
      3. Set the value of the current pixel to match the current value of the object ID counter (value of c).

ObjectMapCreationProcessFigure 6. Object map creation process.

See ImageProcessingUtils::createObjectMap method where the aforementioned algorithm is implemented with C++.

Note that after applying the algorithm described above, you will not end up with an ordered object map (map with ordered IDs: 1, 2, …, N). Instead, you will have a map where each object has a unique ID, but they can be arbitrary, e.g. 2, 5, …, N + 100. Sorting the map is quite trivial: Get a list of current IDs (one iteration), create a map where each existing ID has an ordered counterpart (2 -> 1, 5 -> 2, …, N + 100 -> N) and replace the values in the object map (second iteration).

Identifying object shapes

Now that we have an object map, we can easily see the size of each object. But what about the shape? It depends on the way we want to classify shapes e.g. are we interested in round shapes or squares. In addition, depending on our object extraction methods, the objects may not be perfectly extracted; some of what was part of the object, could have been interpreted as background. For instance, in figure 6, the largest object could have been a ball with a large stripe on it that was lost when our chroma filter was applied. Luckily, there is a simple algorithm for “fixing” the shape, called convex hull.

Convex hull can be considered as an elastic band wrapped around an object. If we apply the algorithm to the first object in the object map we created earlier, the stripe in the center is discarded and a circle shape emerges:

ConvexHullFigure 7. Convex hull algorithm applied to the first object in the object map.

 VaatturiConvexHullScaledFigure 8. Convex hulls (green lines around the bird and red lines around small piece on X-Wing figher side panel) drawn to surround the filtered red areas.

There are number of different convex hull algorithms. The one used in this solution is called monotone chain (see ImageProcessingUtils::createConvexHull method). For more information about monotone chain including code samples can be found from Wikipedia: http://en.wikibooks.org/wiki/Algorithm_Implementation/Geometry/Convex_hull/Monotone_chain

Convex hull gives us a list of points (pixels coordinates), which form the outline of an object. However, we still have to determine the shape of the object somehow. If it’s a ball shape we’re after, we could find the center of the object – e.g. center of mass where one pixel equals one mass unit – and then see how much the outline differs from a circle, where the radius is width or the height of the object divided by 2. The less the difference, the more the shape is like a circle, which of course is the shape of a ball in two dimensional presentation.

to-be-continued-back-to-the-futureNow that we have found our object let’s try to track it. See the next and the final part of this thrilling trilogy and find out if we succeeded in our goal to detect the changes in object position!