All you need to know about Kinect2 development.
With these slides you can understand the system requirements needed to start developing with Kinect2 sensor, the available sources accessible with the sensor and the gestures you can perform in order to interact with touchless interfaces.
4. Kinect 2 Sensor
Depth resolution: 512×424 pixels
RGB resolution: 1920×1080 pixels (16:9)
Frame rate: 30 FPS
Mic frequecy: 48 kHz
Range: from 0.5 to 4.5 m
4
USB hub
Power supply
3D DEPTH SENSOR
RGB CAMERA
MULTI-ARRAY MIC
Sensor
5. Kinect 1 VS Kinect 2 5
Feature Kinect for Windows 1 Kinect for Windows 2
Color Camera 640 x 480 @ 30 fps 1920 x 1080 @ 30 fps
Depth Camera 320 x 240 512 x 424
Max Depth Distance ~4.0 M ~4.5 M
Min Depth Distance 80cm (40 cm in near mode) 50 cm
Horizontal Field of View 57 degrees 70 degrees
Vertical Field of View 43 degrees 60 degrees
Tilt Motor yes no
Skeleton Joints Defined 20 joints 25 joints
Full Skeletons Tracked 2 6
USB Standard 2.0 3.0
Supported OS Win 7, Win 8 Win 8-8.1 (WSA)
Price (sensor + adapter) ~ €160 ~ €200
7. System Requirements
• Operating System
• Windows 8/8.1 (x64)
• Windows 8/8.1 Embedded Standard (x64)
• Hardware
• 64 bit processor (x64) i7 3.1Ghz (or higher)
• 4 GB memory (or more)
• Built-in USB 3.0 host controller
• DirectX11 capable graphics adapter:
ATI Radeon (HD 5400 series, HD 6570, HD
7800),
NVidia Quadro (600, K1000M), NVidia
GeForce (GT 640, GTX 660),
Intel HD 4400
• Kinect v2 sensor (with power supply and USB
hub)
• Software
• .NET Framework 4.5
• Visual Studio 2012 or higher
• Microsoft Speech Platform Software
Development Kit (Version 11)
• Kinect for Windows SDK
http://www.microsoft.com/en-
us/download/details.aspx?id=44561
• Applications
• Windows Presentation Foundation (WPF)
• Windows Store App
• Programming languages
• C++, C#, VB.NET, …
7
https://dev.windows.com/en-us/kinect
10. Architecture (2)
• The sensor is a resource many applications can access it simultaneously
• The sensor gives a set of sources (functionalities)
• From every source it is possible to start readers
• Every reader gives events to acquire references to the device’s frames.
• From every frame it is possible to get data about the specific source (e.g. color
image, body data, etc…)
10
Sensor Sources Reader
Frame
Ref
Frame
11. Sensor
• Sensor usage
• Get an instance of KinectSensor
• Open the sensor
• Use the sensor
• Close the sensor
• In case of device unplug
• The KinectSensor instance remain valid
• No more frames are sent/received
• The sensor IsAvailable property become false
11
Sensor Sources Reader
Frame
Ref
Frame
12. Sources
• The sensor exhibit a source for every functionality
• Color source
• Depth source
• Infrared source
• Body Index source
• Body source (skeleton, hand tracking, lean…)
• Audio source
12
Sensor Sources Reader
Frame
Ref
Frame
13. Readers
• Give access to frames
• Events
• Polling
• Multiple readers can be created for each source
• Reader can be paused
13
Sensor Sources Reader
Frame
Ref
Frame
14. Frame References
• Access current frame through AcquireFrame() method
• Frame contains metadata (i.e., for the color: format, width, height)
• MUST be managed quickly and then released (if a frame is not released other frames
shouldn’t arrive)
14
Sensor Sources Reader
Frame
Ref
Frame
15. Frame
• Access frame data
• Access raw buffer directly
• Take a local copy
15
Sensor Sources Reader
Frame
Ref
Frame
16. MultiSourceFrameReader
• Allows to get a matched set of frames from multiple sources on a single event
• Delivers frames at the lowest FPS of the selected sources
16
MultiSourceFrameReader MultiReader =
Sensor.OpenMultiSourceFrameReader(FrameSourceTypes.Color |
FrameSourceTypes.BodyIndex |
FrameSourceTypes.Body);
var frame = args.FrameReference.AcquireFrame();
if (frame != null) {
using (colorFrame = frame.ColorFrameReference.AcquireFrame())
using (bodyFrame = frame.BodyFrameReference.AcquireFrame())
using (bodyIndexFrame = frame.BodyIndexFrameReference.AcquireFrame())
{
//
}
}
19. Kinect Data Sources – Color
• 1920 x 1080 array of color pixels
• 30 or 15 fps, based on lighting conditions
• Elaborated Image Format:
• RGBA, BGRA, YUY2, …
• Raw Format: YUY2
• Frame data can be:
• Used in raw format
• Converted to other formats (with a
computational cost)
• The Buffer is a byte array.
• The number of bytes per pixel depends on
raw format (now is 4 bytes per pixel).
19
20. Kinect Data Sources – Infrared
• 512 x 424 pixel @ 30 fps
• Same physical sensor of the depth source
• Two sources:
• Infrared: single infrared frame
• LongExposureInfrared: overlapping of 3
frames
(better ratio signal/noise but images with
blurry effect)
• Every pixel is composed by 2 byte (16-bit)
and represent the IR intensity value
• Ambient light removed: the SDK get only
the reflection of the infrared light,
projected by the device
20
21. Kinect Data Sources – Depth
• 512 x 424 pixel @ 30 fps
• Range: 0.5 – 4.5 meters (Extended Depth to
8m)
• Every pixel is composed by 2 byte (16-bit)
and contain the distance in millimeters from
the sensor’s focal plane
• Player index not present
21
23. Kinect Data Sources – Body Index
• 512 x 424 @ 30 fps
• Every pixel is composed by 1 byte
• Pixel Data
• 0 to 5: Index of the corresponding body,
as tracked by the body source
• > 5: No tracked body at that pixel
23
24. Kinect Data Sources – Body
• Range is 0.5-4.5 meters
• 30fps
• Frame data is a collection of Body objects
• Each body has
• 25 joints (each joint has position in 3D space
and orientation)
• Hand tracking (open, close, “lazo”)
• Face tracking and expressions
• Bones’ orientation
• Up to 6 simultaneous bodies
• Hand State on 2 bodies
24
25. Body information
• The Body class contains useful properties:
• ClippedEdges: edges of the Field of View that clip the body
• HandState [Left/Right]: { Unknown, NotTracked, Closed, Open, Lasso }
• HandConfidence [Left/Right]: { High, Low }
• IsRestricted
• IsTracked
• TrackingId: 64-bit unique id
• Joints: position in the space of each joint
• JointOrientations: orientation in the space of the articulation
• Lean: inclination vector of the body
• LeanTrackingState: { Inferred, NotTracked, Tracked }
• Up to 6 bodies simultaneously
• Up to 2 players’ hands simultaneously
25
28. Kinect Data Sources – Audio
• Frame data is an Audio Beam
• Readers and event as previous sources
• Acquire frames through AcquireBeamFrames() method
28
29. Coordinate System
• ColorSpace (Coordinate System of the Color Image)
• … Color
• DepthSpace (Coordinate System of the Depth Data)
• … Depth, Infrared, BodyIndex
• CameraSpace (Coordinate System with the origin located to the sensor)
• … Body (Joint)
29
30. Coordinate Mapper
• Three coordinate systems
• Coordinate mapper provides conversions between each system
• Convert single or multiple points
30
Name Applies to Dimensions Units Range Origin
ColorSpacePoint Color 2 pixels 1920x1080 Top left corner
DepthSpacePoint Depth,
Infrared,
Body index
2 pixels 512x424 Top left corner
CameraSpacePoint Body 3 meters – Infrared/depth
camera
34. Kinect Region & User Controls
• The KinectRegion user control define a part of the user interface (XAML) where the
user can interact with an hand pointer
• The region must be connected to the sensor instance
• Available gestures (“out-of-the-box”) usable into a KinectRegion:
• Click
• Grab
• Pan
• Zoom
• KinectUserViewer gives a visual feedback related to the tracked state of the users
• Re-use default user controls
34
40. Gesture Recognition 40
• Gesture is a coding problem
• Quick to do simple gestures/poses
(hand over head)
• ML can also be useful to find good
signals for Heuristic approach
• Gesture is a data problem
• Signals which may not be easily
human understandable (progress in a
baseball swing)
• Large investment for production
• Danger of over-fitting, causes you to
be too specific – eliminating
recognition of generic cases
Heuristic Machine Learning (ML) with G.B.
41. Visual Gesture Builder (1)
• New tool integrated with v2 SDK
• Organize data using projects and solutions
• Give meaning to data by tagging gestures
• Build gestures using machine learning technology
• Adaptive Boosting (AdaBoost) Trigger
• Determines if player is performing gesture
• Random Forest Regression (RFR) Progress
• Determines the progress of the gesture performed by player
• Analyze / test the results of gesture detection
• Live preview of results
41
43. Resources
• General Info & Blog https://dev.windows.com/en-us/Kinect
• Purchase Sensor http://goo.gl/ZsMtBx
• Developer Forums https://goo.gl/bpptyq
• Twitter Account @KinectWindows
• A Facebook Group http://on.fb.me/1LSflbX
• A LinkedIn Group http://linkd.in/1J9gFcY
• A Twitter Account @KinectDevelop
• A Google Plus Page http://bit.ly/1SHtduT
43
Editor's Notes
Improved body tracking
The enhanced fidelity of the depth camera, combined with improvements in the software, have led to a number of body tracking developments. The latest sensor tracks as many as six complete skeletons (compared to two with the original sensor), and 25 joints per person (compared to 20 with the original sensor). The tracked positions are more anatomically correct and stable and the range of tracking is broader.
In interactive scenarios, your avatars will be more stable—with more accurate body position evaluation and crisper interactions—and you have the potential for bystanders to participate.
Ability to interact with your application
Includes control to visualize user
Re-use default XAML controls
Ability to interact with your application
Includes control to visualize user
Re-use default XAML controls