First steps with Kinect v2 for Windows

20 July 2014

In this post we take a look at how to get started developing WinRT apps with the Kinect v2 sensor for Windows.

I pre-ordered the Kinect version 2 sensor for Windows a few weeks ago. Microsoft stated that they would start shipped the new sensor, along with a public beta of the v2 Kinect SDK, on the 15th July. I was delighted when I got mine delivered actually on the 15th, just in time for the Microsoft Virtual Academy Jump Start course on the v2 sensor (which was excellent)!

After downloading and installing the public beta of the v2 SDK (which also installs the necessary drivers and services) I hooked up the sensor to a USB 3.0 port and Windows instantly recognized the device and completed installation of the various drivers. That was reassuring!

I then ran the Kinect for Windows Developer Kit app. This allows you to install the source for lots of different samples. It also allows you to directly run WPF apps to test out the basic functionality of the device (the color and infrared cameras, the depth sensor, skeleton tracking, audio, etc.). This enabled me to verify that my new Kinect was working perfectly!

Note that in this post I'm not going to go into the details of all the wonderful things you can do with a Kinect device, and I'm not going to focus on all the improvements in the v2 compared to the first version. See the Microsoft Kinect for Windows site for relevant details.

Requirements for Developing Kinect for Windows App

The Kinect v2 device features:

  • A 1080p color camera running at 30Hz (30 FPS)
  • A 512x484 infrared camera (30Hz), plus and infrared light emitters
  • Depth sensing (512x484 at 30Hz)
  • Skeleton tracking of 25 joints per person, for up to six concurrent persons
  • Face tracking
  • Four high-quality microphones

Because of the volume of data coming off the device, you need to meet the following hardware requirements:

  • A 64-bit (x64) processor, dual-core 3.1 GHz (2 logical cores per physical) or faster processor (I'm using a quad-core i7 4770k running at 3.5GHz)
  • A USB 3.0 port dedicated to the Kinect for Windows v2 sensor (Intel and Renesas controllers, I'm running on an Intel 9-series chipset motherboard)
  • 2 GB of RAM (I'm using 16GB)
  • A Graphics card that supports DirectX 11 (I'm using an NVIDIA GeForce GTX 750Ti 2048MB)
  • Windows or Windows Embedded 8 or 8.1 (I'm running Windows 8.1 Update 1)

I've successfully used the Kinect v2 on my i7-based desktop PC and my i5-based (256GB) Surface Pro 2. The Kinect team say they all use Surface Pro 1 or 2 devices when demoing the Kinect. And yes, you definately need a dedicated, on-board USB 3 port (i.e don't try connecting it to a USB hub running other devices). From comments I've seen on the Kinect developer forums, you also need to make sure your motherboard is running the Intel or Renesas chipset (which is most modern PC's).

Simple WinRT Kinect app: Capturing Color Frames

The complete source for this simple demo can be downloaded from here: KinectBasicDemo.zip

I used the Color Basic - XAML demo app from the v2 Toolkit as the basis for my first simple app. Essentially all the app does is to allow the user to connect to the Kinect device, and then start/stop receiving color frames. I also added the ability to track how many frames have been received and to calculate the FPS (Frames Per Second) value.

After creating a new Windows Store app using the Blank App template, the first thing you need to do is add a reference to the main Kinnect assembly:

C:\Program Files\Microsoft SDKs\Kinect\v2.0-PublicPreview1407\Assemblies\Microsoft.Kinect.dll

This is most easily done using the Extensions section of the Reference Manager dialog:

Adding a reference to the Kinect assembly also adds a reference to the Microsoft Visual C++ 2013 Runtime Package for Windows.

If you try and build the project you'll get a compilation error:

The processor architecture of the project being built "Any CPU" is not supported by the referenced SDK 
"Microsoft.VCLibs, Version=12.0". Please consider changing the targeted processor architecture of your 
project (in Visual Studio this can be done through the Configuration Manager) to one of the architectures 
supported by the SDK: "x86, x64, ARM".

The error's pretty self-explanatory, so change the targetted platform to be one of the supported types. I selected "x86":

The last bit of configuration that needs to be done is to edit the app's manifest and select the Webcam and Microphone capabilities (because there's no way to independently turn on/off the Kinect's cameras and microphones):

We're now ready to start writing some code. The first thing I did was to define a simple UI in MainPage.xaml:

The main thing to note is that I'm using an Image control for the incoming color frames received from the Kinect. The Image control's Source property is data bound to a Bitmap property :

<Page
    x:Class="KinectBasicDemo.MainPage"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:local="using:KinectBasicDemo"
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    mc:Ignorable="d" 
    DataContext="{Binding RelativeSource={RelativeSource Mode=Self}}">

    <Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">
        <StackPanel Margin="20">
            <StackPanel Orientation="Horizontal">
                <TextBlock 
                    Text="Kinect Device: " 
                    Margin="10,10,0,10" 
                    FontSize="24" 
                    Foreground="BlueViolet"/>
                
                <TextBlock 
                    Text="{Binding KinectStatus, Mode=OneWay, FallbackValue=Off}" 
                    Margin="5,10,10,10" 
                    FontSize="24" 
                    Foreground="BlueViolet"/>
                
                <TextBlock 
                    Text="Frame Count: " 
                    Margin="10,10,0,10" 
                    FontSize="24" 
                    Foreground="Aqua"/>
                
                <TextBlock 
                    Text="{Binding TotalFrameCount, Mode=OneWay, FallbackValue=0}" 
                    Margin="5,10,10,10" 
                    FontSize="24" 
                    Foreground="Aqua"/>
                
                <TextBlock 
                    Text="FPS: " 
                    Margin="10,10,0,10" 
                    FontSize="24" 
                    Foreground="Crimson"/>
                
                <TextBlock 
                    Text="{Binding Fps, Mode=OneWay, FallbackValue=0}" 
                    Margin="5,10,10,10" 
                    FontSize="24" 
                    Foreground="Crimson"/>
            </StackPanel>
            
            <Button Content="Open Kinect Device" Margin="10" Click="OpenKinect"/>
            <Button Content="Start Reading Color Frames" Margin="10" Click="StartReading"/>
            <Button Content="Stop Reading Color Frames" Margin="10" Click="StopReading"/>
            
            <Image Stretch="UniformToFill" Source="{Binding Bitmap, Mode=OneWay}"/>
        </StackPanel>
    </Grid>
</Page>

The code-behind file (I'm not using an MVVM approach on such a simple demo app) defines a number of private variables and a couple of important properties:

namespace KinectBasicDemo
{
    public sealed partial class MainPage : Page, INotifyPropertyChanged
    {
        public event PropertyChangedEventHandler PropertyChanged;

        // Reference to the Kinect device itself
        private KinectSensor _kinectSensor;  

        // Gets color frames from the Kinect 1080p camera
        private ColorFrameReader _colorFrameReader;  

        // A temp buffer to hold image data from the incoming color frame as it arrives
        private byte[] _tempPixelBuffer;  

        // A writable (updatable) bitmap that we can update many times (30 FPS). 
        // Using a standard BitmapSource would result in a large amount of bitmaps 
        // needing to be garbage collected, and possible performance issues        
        private WriteableBitmap _bitmap;  

        // Data binding source for the device status TextBlock
        public string KinectStatus 
        {
            get
            {
                if(_kinectSensor == null) return "Off";
                return _kinectSensor.IsAvailable ? "Available" : "Not available";
            }
        }

        // This is the Source for the Image control
        public WriteableBitmap Bitmap
        {
            get { return _bitmap; }
            set { _bitmap = value; OnPropertyChanged();}
        }

When the app starts we call InitKinect() to initialize the Kinect. The main steps we take are:

  1. Get a reference to the Kinect sensor by calling KinectSensor.GetDefault()
  2. Subscribe to the IsAvailableChanged event so we can display status info on the sensor
  3. Create a FrameDescription that contains data on the expected format of incoming color frames
  4. Create a byte array buffer to hold the color frame image data
  5. Create a WriteableBitmap. This property is data bound to the Image control's Source property and will be updated at a rate of 30 times a second as frames are received
public MainPage()
{
    InitializeComponent();
    :
    :
    // Set up the Kinect device and various temp buffers
    InitKinect();  

    // Hook into the Unloaded event so we can dispose of Kinect objects
    Unloaded += MainPageUnloaded;  
}

private async void InitKinect()
{
    try
    {
        _error = false;

        // Only one Kinect device can be connected currently - get a reference to it
        _kinectSensor = KinectSensor.GetDefault();

        // Subscribe to the Kinect's status change event
        _kinectSensor.IsAvailableChanged += (s, args) => OnPropertyChanged("KinectStatus");

        // Get info on what format frames will take as they arrive
        var colorFrameDescription = _kinectSensor.ColorFrameSource.
            CreateFrameDescription(ColorImageFormat.Rgba);

        // Create a temp buffer to hold the frame's image data as a byte array
        _tempPixelBuffer = new byte[
            colorFrameDescription.Width *
            colorFrameDescription.Height *
            colorFrameDescription.BytesPerPixel];

        // Create a writable (updatable) bitmap. This'll get updated at ~30 FPS and 
        // is data bound to the Image control's Source property
        Bitmap = new WriteableBitmap(colorFrameDescription.Width, colorFrameDescription.Height);
    }
    catch(Exception ex)
    {
        _error = true;
        _errorMessage = "Error initializing the Kinect device: " + ex.Message;
    }

    if(_error) await HandleError(_errorMessage);
}

The "Open Kinect Device" button simply calls the Open() method on the Kinect object - this will turn on the device:

private void OpenKinect(object sender, RoutedEventArgs e)
{
    // Start the Kinect (this will turn on the cameras, LEDs, etc.)
    if(_kinectSensor != null) _kinectSensor.Open();
}

The user can start/stop reading color frames using the start/stop buttons. Below is the code for the button handlers.

We get a reference to the Kinect's ColorFrameReader object by calling OpenReader(). Note that we subscribe to the OnFrameArrived event in StartReading() and unsubscribe in StopReading. You also need to be careful to Dispose() of the frame reader. This is a recurring theme with Kinect development! Because so much data is being generated by the Kinect's various senors, you need to be careful to allow objects to be cleaned-up by the garbage collector, either through explicit use of Dispose(), or by the using { ... } pattern.

private void StartReading(object sender, RoutedEventArgs e)
{
    if(_kinectSensor == null) return;

    // Get the sensor's color frame reader. Note that's there's no equivalent "CloseReader" method
    _colorFrameReader = _kinectSensor.ColorFrameSource.OpenReader();  

    // Hook into the event that fires when a new frame is available
    _colorFrameReader.FrameArrived += OnFrameArrived;
}

private void StopReading(object sender, RoutedEventArgs e)
{
    if(_kinectSensor == null || _colorFrameReader == null) return;

    // Unsubscribe from the frame available event
    _colorFrameReader.FrameArrived -= OnFrameArrived;

    // Dispose of the reader (there's no "CloseReader" method)
    _colorFrameReader.Dispose();

    // Allow the GC to reclaim the object's memory
    _colorFrameReader = null;
}

The OnFrameArrived handler looks like this:

private async void OnFrameArrived(ColorFrameReader sender, ColorFrameArrivedEventArgs args)
{
    try
    {
        _error = false;

        var colorFrame = args.FrameReference.AcquireFrame(); // Get the actual color frame data
        if(colorFrame == null) return;

        // Important: Always dispose of frames when finished with them. If you don't, you'll not get any
        // more frame arrived events!
        using(colorFrame)
        {
            // Does the incoming frame match the excepted width/height format?
            if(colorFrame.FrameDescription.Width != _tempBitmap.PixelWidth ||
               colorFrame.FrameDescription.Height != _tempBitmap.PixelHeight) 
                return; // The incoming frame was not in the expected format

            // Copy the image data into the temp pixel buffer (we have to use an intermdiate 
            // byte array buffer as we can't  access the bitmap's buffer directly as an array of bytes)
            if(colorFrame.RawColorImageFormat == ColorImageFormat.Bgra) 
                colorFrame.CopyRawFrameDataToArray(_tempPixelBuffer);
            else 
                colorFrame.CopyConvertedFrameDataToArray(_tempPixelBuffer, ColorImageFormat.Bgra);
        }

        // Write the contents of the temp pixel buffer into the re-writable bitmap's buffer
        Bitmap.PixelBuffer.AsStream().Write(_tempPixelBuffer, 0, _tempPixelBuffer.Length);
        Bitmap.Invalidate(); // Invalidating the bitmap forces it to be redrawn
    }
    catch(Exception ex)
    {
        _error = true;
        _errorMessage = "Error processing color frame: " + ex.Message;
    }

    if(_error) await HandleError(_errorMessage);
}

The frame is accessed via args.FrameReference.AcquireFrame() and, after checking the incoming frame is in the expected format, we copy the image data into the temporary pixel data byte array. The WritableBitmap is then updated with the new frame image and a redraw forced by invalidating the bitmap.

The Page.Unloaded handler cleans-up the resources we've used like this:

private void MainPageUnloaded(object sender, RoutedEventArgs e)
{
    // Dispose of the frame reader and allow GC
    if(_colorFrameReader != null)
    {
        _colorFrameReader.Dispose();
        _colorFrameReader = null;
    }

    // Dispose of the Kinect device and allow GC
    if(_kinectSensor == null) return;

    _kinectSensor.Close();
    _kinectSensor = null;
}

The complete source for this simple demo can be downloaded from here: KinectBasicDemo.zip

In the next post I'll look at how to do some basic body face tracking.