Face and Body Tracking with Kinect v2 for Windows

05 August 2014 (Updated 08 August 2014)

Building on the previous post (see First steps with Kinect v2 for Windows), in this post we take a look at how to do basic body and face tracking with the Kinect v2 sensor for Windows.

Key Kinect Objects

The following diagram shows the key objects you need to use when working with the Kinect v2 for Windows. As you can see, there's a single Kinect Sensor object, which then gives access to a range of data type sources. These sources then provide access to readers that enable you to subscribe to various FrameArrived events. The pattern is consistent, regardless of the type of data you want to receive from the Kinect device:

Kinect Face and Body Tracking Demo WinRT app

The complete source for the two demos discussed in this post can be downloaded from here: KinectMinimalFaceTrackingDemo.zip and here: KinectFaceTrackingDemo.zip

The basic steps required for tracking up to six people and their facial expressions using the Kinect for Windows v2 device are:

  1. Get a reference to the default Kinect device
  2. Get a reference to a BodyFrameReader by calling kinect.BodyFrameSource.OpenReader()
  3. Use the BodyFrameReader to subscribe to the FrameArrived event
  4. Start the Kinect device by calling kinect.Open()
  5. In the body FrameArrived handler, acquire the frame data (bodyFrame)
  6. Populate or update the list of bodies tracked by calling bodyFrame.GetAndRefreshBodyData(). You'll need to pass a reference to a list or array of Body objects
  7. Enumerate the list of bodies. Information about the state of the tracked body (e.g HandRightState) is available
  8. If you want to track facial features for the body, create a FaceFrameSource reference and then call faceFrameSource.OpenReader()
  9. Using the FaceFrameReader, subscribe to the FrameArrived event
  10. In the face FrameArrived handler, acquire the frame data (faceFrame)
  11. Data on facial features (e.g. Happy, Engaged, etc.) can be found in the faceFrame.FaceFrameResult.FacePropertie collection

The following WinRT 8.1 app demonstrates the essential concepts:

<Page
    x:Class="KinectMinimalFaceTrackingDemo.MainPage"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    DataContext="{Binding RelativeSource={RelativeSource Self}}"
    mc:Ignorable="d">

    <Page.Resources>
        <Style x:Name="RedTB" TargetType="TextBlock">
            <Setter Property="FontSize" Value="40"/>
            <Setter Property="Foreground" Value="Crimson"/>
            <Setter Property="HorizontalAlignment" Value="Right"/>
            <Setter Property="VerticalAlignment" Value="Center"/>
            <Setter Property="Margin" Value="5"/>
            <Setter Property="Grid.Column" Value="0"/>
        </Style>

        <Style x:Name="VioletTB" TargetType="TextBlock">
            <Setter Property="FontSize" Value="40"/>
            <Setter Property="Foreground" Value="BlueViolet"/>
            <Setter Property="HorizontalAlignment" Value="Left"/>
            <Setter Property="VerticalAlignment" Value="Center"/>
            <Setter Property="Margin" Value="5"/>
            <Setter Property="Grid.Column" Value="1"/>
        </Style>
    </Page.Resources>
    
    <Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">
        <Grid.RowDefinitions>
            <RowDefinition Height="90"/>
            <RowDefinition Height="90"/>
            <RowDefinition Height="90"/>
            <RowDefinition Height="90"/>
            <RowDefinition Height="90"/>
            <RowDefinition Height="90"/>
            <RowDefinition Height="90"/>
            <RowDefinition Height="90"/>
        </Grid.RowDefinitions>
        <Grid.ColumnDefinitions>
            <ColumnDefinition Width="350"/>
            <ColumnDefinition Width="750"/>
        </Grid.ColumnDefinitions>

        <TextBlock Style="{StaticResource RedTB}" Grid.Row="1" Text="Kinect Status"/>
        <TextBlock Style="{StaticResource RedTB}" Grid.Row="2" Text="Tracking Body"/>
        <TextBlock Style="{StaticResource RedTB}" Grid.Row="3" Text="Tracking Face"/>
        <TextBlock Style="{StaticResource RedTB}" Grid.Row="4" Text="Body Frame Count"/>
        <TextBlock Style="{StaticResource RedTB}" Grid.Row="5" Text="Face Frame Count"/>
        <TextBlock Style="{StaticResource RedTB}" Grid.Row="6" Text="Right Hand"/>
        <TextBlock Style="{StaticResource RedTB}" Grid.Row="7" Text="Face Engaged"/>

        <TextBlock Style="{StaticResource VioletTB}" Grid.Row="1" Text="{Binding KinectStatus}"/>
        <TextBlock Style="{StaticResource VioletTB}" Grid.Row="2" Text="{Binding TrackingBody}"/>
        <TextBlock Style="{StaticResource VioletTB}" Grid.Row="3" Text="{Binding TrackingFace}"/>
        <TextBlock Style="{StaticResource VioletTB}" Grid.Row="4" Text="{Binding BodyFrameCount}"/>
        <TextBlock Style="{StaticResource VioletTB}" Grid.Row="5" Text="{Binding FaceFrameCount}"/>
        <TextBlock Style="{StaticResource VioletTB}" Grid.Row="6" Text="{Binding RightHandState}"/>
        <TextBlock Style="{StaticResource VioletTB}" Grid.Row="7" Text="{Binding FaceEngaged}"/>

    </Grid>
</Page>
using System;
using System.ComponentModel;
using System.Diagnostics;
using System.Linq;
using System.Runtime.CompilerServices;
using Windows.UI.Xaml;
using Windows.UI.Xaml.Controls;
using WindowsPreview.Kinect;
using KinectMinimalFaceTrackingDemo.Annotations;
using Microsoft.Kinect.Face;

// Note that this project MUST be compiled as x64 otherwise the Kinect Face API causes 
// memory access violations.
//
// You should also add the following post-build event to the project properties:
//
// xcopy "C:\Program Files (x86)\Microsoft SDKs\Windows\v8.0\ExtensionSDKs\Microsoft.
// Kinect.Face\2.0\Redist\CommonConfiguration\x64\NuiDatabase" "NuiDatabase" /e /y /i /r
//
// Although the app works "OK" without, face gestures are much more reliable with the 
// inclusion of the NuiDatabase (which seems like it includes definitions for face features)

namespace KinectMinimalFaceTrackingDemo
{
    public sealed partial class MainPage : Page, INotifyPropertyChanged
    {
        public event PropertyChangedEventHandler PropertyChanged;

        /// <summary>Status of the Kinect device</summary>
        public string KinectStatus 
        {
            get
            {
                if(_kinectSensor == null) return "Off";
                return _kinectSensor.IsAvailable ? "Available" : "Not available";
            }
        }

        /// <summary>Flags if we're actively tracking a body or not</summary>
        public bool TrackingBody
        {
            get { return _trackingBody; }
            set { _trackingBody = value; OnPropertyChanged();}
        }
        
        /// <summary>Flags if we're actively tracking a face</summary>
        public bool TrackingFace
        {
            get { return _trackingFace; }
            set { _trackingFace = value; OnPropertyChanged();}
        }

        /// <summary>Keeps count of the number of body frames received</summary>
        public int BodyFrameCount
        {
            get { return _bodyFrameCount; }
            set { _bodyFrameCount = value; OnPropertyChanged();}
        }

        /// <summary>Keeps count of the number of face frames received</summary>
        public int FaceFrameCount
        {
            get { return _faceFrameCount; }
            set { _faceFrameCount = value; OnPropertyChanged();}
        }

        /// <summary>Flags is the currently tracked face is "engaged"</summary>
        public string FaceEngaged
        {
            get { return _faceEngaged; }
            set { _faceEngaged = value; OnPropertyChanged();}
        }

        /// <summary>Returns the current state of the tracked body's right hand</summary>
        public string RightHandState
        {
            get { return _rightHandState; }
            set { _rightHandState = value; OnPropertyChanged();}
        }

        private KinectSensor _kinectSensor;
        private BodyFrameReader _bodyFrameReader;
        private FaceFrameSource _faceFrameSource;
        private FaceFrameReader _faceFrameReader;
        private Body[] _bodies;
        private bool _trackingBody;
        private bool _trackingFace;
        private string _faceEngaged;
        private string _rightHandState;
        private int _bodyFrameCount = 0;
        private int _faceFrameCount = 0;
        private ulong _currentBodyTrackingId = ulong.MinValue;

        private const FaceFrameFeatures _faceFrameFeatures =
            FaceFrameFeatures.BoundingBoxInColorSpace |
            FaceFrameFeatures.PointsInColorSpace | 
            FaceFrameFeatures.MouthOpen |
            FaceFrameFeatures.LookingAway |
            FaceFrameFeatures.Happy |
            FaceFrameFeatures.FaceEngagement | 
            FaceFrameFeatures.Glasses |
            FaceFrameFeatures.LeftEyeClosed | 
            FaceFrameFeatures.MouthMoved | 
            FaceFrameFeatures.RightEyeClosed;

        public MainPage()
        {
            this.InitializeComponent();

            // Subscribe to the unloaded event so we can clean up resources
            Unloaded += OnUnloaded;

            // Setup the Kinect device
            InitKinect();
        }

        /// <summary>Setup the Kinect device</summary>
        private void InitKinect()
        {
            try
            {
                // Only one Kinect device can be connected currently - get a reference to it
                _kinectSensor = KinectSensor.GetDefault();

                // Subscribe to the Kinect's status change event
                _kinectSensor.IsAvailableChanged += (s, args) => OnPropertyChanged("KinectStatus");
                _kinectSensor.Open();  // Start the Kinect device

                // Subscribe to the body frame arrived event. In order to get face
                // frame data we need to subscribe to the body event
                _bodyFrameReader = _kinectSensor.BodyFrameSource.OpenReader();
                _bodyFrameReader.FrameArrived += OnBodyFrameArrived;
            }
            catch(Exception ex)
            {
                Debug.WriteLine("Error in InitKinect(). {0}", ex.Message);
            }
        }

        /// <summary>Handle the FrameArrive event to collect body data</summary>
        /// <param name="sender">Object raising the event</param>
        /// <param name="args">Info related to the event</param>
        private void OnBodyFrameArrived(BodyFrameReader sender, BodyFrameArrivedEventArgs args)
        {
            try
            {
                if(args.FrameReference == null) return;

                // Get the actual body data
                var bodyFrame = args.FrameReference.AcquireFrame();
                if(bodyFrame == null) return;

                BodyFrameCount++;  // Bump our body frame counter

                // Make SURE to QUICKLY and correctly dispose the frame - if you don't
                // the Kinect SDK will stop raising the event!
                using(bodyFrame)
                {
                    // Lazy instantiation. If we've not previously allocated an array of Body
                    // object allocate memory for the number of bodies are we tracking (0..6)
                    if(_bodies == null) _bodies = new Body[bodyFrame.BodyCount];

                    // Update the array of Body objects
                    bodyFrame.GetAndRefreshBodyData(_bodies);
                }

                // Are we tracking at least one body?
                foreach(var body in _bodies.Where(body => body.IsTracked && body.TrackingId > ulong.MinValue)) 
                {
                    TrackingBody = true;  // We just track one body (first in the list) in this demo

                    // What's the tracked body's right hand doing?
                    RightHandState =  body.HandRightState.ToString();

                    // Are we going to start tracking a new face? If so, _currentBodyTrackingId
                    // will have been set to ulong.MinValue in the OnFaceTrackingIdLost handler
                    if(_currentBodyTrackingId != ulong.MinValue) return;  // Already tracking face  

                    _currentBodyTrackingId = body.TrackingId;  // Store the id of the actively tracked body

                    // We're tracking a new body/face. Create a face frame source and reader
                    // The TrackingId is used to link the body and face
                    _faceFrameSource = new FaceFrameSource(_kinectSensor, body.TrackingId, _faceFrameFeatures);
                    _faceFrameReader = _faceFrameSource.OpenReader();

                    // Subscribe to the face FrameArrived event
                    _faceFrameReader.FrameArrived += OnFaceFrameArrived;

                    // Subscribe to the TrackingIdLost event (raised when the face we're tracking is lost)
                    _faceFrameSource.TrackingIdLost += OnFaceTrackingIdLost;

                    return;
                }

                // We're not tracking a body
                TrackingBody = false;
            }
            catch(Exception ex)
            {
                Debug.WriteLine("Error in OnBodyFrameArrived(): {0}", ex.Message);               
            }
        }

        /// <summary>Event raised when the face we're tracking is lost</summary>
        /// <param name="sender">Object that raised the event</param>
        /// <param name="args">Data related to the event</param>
        private void OnFaceTrackingIdLost(FaceFrameSource sender, TrackingIdLostEventArgs args)
        {
            try
            {
                _currentBodyTrackingId = ulong.MinValue;  // Flag that we lost the current face

                // Unsubscribe to the face FrameArrived event (we'll re-subscribe in the 
                // OnBodyFrameArrived event handler as required)
                _faceFrameReader.FrameArrived -= OnFaceFrameArrived;
                _faceFrameSource.TrackingIdLost -= OnFaceTrackingIdLost;

                // Clean-up (important for GC)
                _faceFrameReader.Dispose(); 
                _faceFrameReader = null;
                _faceFrameSource = null;
                TrackingFace = false;
            }
            catch(Exception ex)
            {
                Debug.WriteLine("Error in OnFaceTrackingIdLost(): {0}", ex.Message);                                               
            }
        }

        /// <summary>Handle the face FrameArrived event</summary>
        /// <param name="sender">Object that raised the event</param>
        /// <param name="args">Data related to the event</param>
        private void OnFaceFrameArrived(FaceFrameReader sender, FaceFrameArrivedEventArgs args)
        {
            try
            {
                if(args.FrameReference == null) return;

                // Get the actual frame data
                var faceFrame = args.FrameReference.AcquireFrame();
                if(faceFrame == null) return;

                // Bump our face frame counter
                FaceFrameCount++;
                TrackingFace = true;

                // Make sure to quickly and correctly dispose the frame
                using(faceFrame)
                {
                    if(faceFrame.FaceFrameResult.TrackingId != _currentBodyTrackingId) 
                        OnFaceTrackingIdLost(null, null);

                    // Use the frame's FaceProperties collection to get info on the tracked face...
                    FaceEngaged = faceFrame.FaceFrameResult.FaceProperties[FaceProperty.Engaged].ToString();
                }
            }
            catch(Exception ex)
            {
                Debug.WriteLine("Error in OnFaceFrameArrived(): {0}", ex.Message);                               
            }
        }

        /// <summary>Handle the page Unloaded event to dispose of all objects</summary>
        private void OnUnloaded(object sender, RoutedEventArgs e)
        {
            if(_bodyFrameReader != null)
            {
                _bodyFrameReader.Dispose();
                _bodyFrameReader = null;
            }

            if(_faceFrameReader != null)
            {
                _faceFrameReader.Dispose();
                _faceFrameReader = null;
            }

            _faceFrameSource = null;

            if (_bodies != null) 
                foreach(var body in _bodies.Where(body => body != null)) 
                    body.Dispose();

            if(_kinectSensor == null) return;
            _kinectSensor.Close();
            _kinectSensor = null;
        }

        [NotifyPropertyChangedInvocator]
        private void OnPropertyChanged([CallerMemberName] string propertyName = null)
        {
            var handler = PropertyChanged;
            if(handler != null) handler(this, new PropertyChangedEventArgs(propertyName));
        }
    }
}

A few points to note. Firstly, as with all Kinect development, be sure to quickly dispose of frame objects. If you don't, you'll find the device stops raising the necessary FrameArrived events!

Second, I spent quite a bit of time trying to find out why the face FrameArrived events stopped being fired after about four or five seconds of working perfectly. I was convinced the issue was related to not correctly disposing of some object or other. However, after being unable to track down the source of the problem I decided to install the latest daily build of the Kinect SDK. This magically solved the issue! So if you're seeing similar problems using the publically released preview version of the SDK, you may want to consider applying to the Kinect team for access to the daily builds. You can do this via the public Kinect forums (direct link).

The previous basic demo app can be expanded to provide some more interesting output. The KinectFaceTrackingDemo app shows images for the state of the left and right hands. It also shows a limited range of (crudely drawn!) cartoon faces corresponding to the state of the tracked face: