User Tools

Site Tools


gestures:motion:gesture_index

3D Motion Gesture Index

Motion gestures are the collective name for gestures that use 3D object, pose and motion to create a gesture interaction.The 3D motion gestures outlined by the GML in this section have been designed and tested on the following time-of-flight (tof) and structured light devices:

  • Hand and Finger Tracking Devices: Leap Motion, RealSense, Duo3D, PMD Pico.
  • Eye Tracking Devices: Tobii, Eyetribe.
  • Head tracking Devices: RealSense, RGB Cameras.
  • Body Tracking Devices: Kinect 1&2, SoftKinetic, Asus Xtion, Occipital Structure.
  • 3D Object Tracking Devices: RealSense, RGB Cameras.

Hand Gestures

There are thousands of high fidelity hand motion gestures that can be described using GML. This graphic outlines a few Manual and Bi-manual hand pose and motion gestures that can be confidently tracked and distinguished using today's commodity input devices.

As the accuracy of hand tracking increases the number of available high fidelity gestures also increases. The next generation of input devices will enable hand and finger micro-gestures that will further multiply the number of distinct gesture options.

Hand Poses

In GML, hand poses can be implicitly or explicitly defined. Implicit definitions simply reference an expected hand pose by name. Explicit hand poses define the state of the hand, palm, thumb and individual fingers. Hand poses are analyzed and matched using a skeletal model of the hand. Simple hand models use 6 points of reference: the finger tips and the palm point. In more complex hand models all 22 bones of the hand can be used to defined a pose.

Implicit Hand Poses

Pose Type

Icon

Description

Examples

pinch


x

index_thumb_pinch_open
index_thumb_pinch_closed
index_middle_pinch_open
hand_grab

point


x

index_point
index_middle_point
index_middle_ring_point
index_middle_ring_pinky_point

trigger


x

index_trigger
index_middle_trigger
index_middle_ring_pinky_trigger

fist


x

fist_push
fist_hold

splay


x

splay_push
splay_wave
splay_wave_in
splay_wave_out
splay_tap

thumb


x

thumb_up
thumb_down
thumb_left
thumb_right

hook


x

index_hook
index_middle_hook
hand_hook

flat


x

flat_push
flat_wave_in
flat_wave_out
flat_swipe
flat_flick

Hand Motion Gestures

Hand motion gestures use a defined hand pose or property to define a cluster configuration. The motion of this configuration, or the life cycle on the timeline along with how the action is mapped, is then used to define a gesture object.

Temporal 3D Motion Gestures:

Temporal Type

Icon

Description

Examples

Begin


x

pinch_begin
index_thumb_pinch_begin
index_point_begin

Hold


x

pinch_hold
splay_hold
fist_hold
thumb_up_hold

Tap


x

1_finger_tap
index_finger_tap
fist_tap

Double Tap


x

index_double_tap
middle_finger_double_tap
flat_tap
Spatial 3D Motion Gestures:

Motion Type

Icon

Description

Examples

Drag


x

point_drag
pinch_drag
trigger_drag
fist_drag

Rotate


x

point_rotate
pinch_rotate
trigger_rotate

Scale


x

2_point_scale
2_pinch_scale
2_trigger_scale

Swipe


x

point_swipe
splay_swipe

Scroll


x

point_scroll
pinch_scroll

Flick


x

splay_flick
flat_hand_flick

Tilt


x

splay_tilt
flat_hand_tilt

Bi-manual Hand Motion Gestures

Bi-manual motion gestures require two hands to define a complete gesture context. The advantage of using both a left and right hand to explicitly define a gesture is the additional fidelity and input bandwidth this provides. Not only can you perform balanced manipulations but you can analyze each hand independently so one can qualify the pose of the other.

Bi-manual Manipulations

Bi-manual controls can use duplicated poses between hands to create an interaction point pair. This “typed” pair can be treated as a unique cluster which can be analyzed for relative translation, rotation and separation to provide reliable direct manipulation tools. In the example below different poses are used to create an interaction point cluster; each pose can be mapped to a specific 2D/3D manipulation if desired. For example: a pinch pair may be used to scale an object and a hook pair may lock and rotate an object on a specific axis.

  • 2_finger_rotate_bimanual
  • 2_finger_scale_bimanual
  • 2_trigger_rotate_bimanual
  • 2_trigger_scale_bimanual
  • 2_pinch_rotate_bimanual
  • 2_pinch_scale_bimanual
  • 2_pinch_tilt_bimanual

Bi-manual Discrete and Continuous Controls

As with keyboard and mouse input, bi-manual gesture input can be divided so that the left hand can control discrete input and the right hand can control continuous input. For example: a left hand can be in an index_thumb_pinch_hold pose and the right hand can perform a free-form index_point_drag. The left hand in this case qualifies the right hand drag gesture so that a simple red line can be drawn on the screen. The left hand can then change poses to a middle_thumb_hold pose and then the index_point_drag can draw a green line on the screen. In this way the left hand acts as the tool selector (like a keyboard shortcut) and the right hand is treated as a cursor (like a mouse or stylus).

  • index_thumb_pinch_left + index_point_right
  • middle_thumb_pinch_left + index_point_right
  • index_middle_point_left + index_point_right
  • index_trigger_left + index_point_right


Explicit Hand Motion Gesture Descriptions

Explicit hand motion gestures use an explicit pose definition as the primary matching criteria for the gesture. With implicit descriptions the hand pose could be simply referenced using a clear pose name. However, when explicitly describing a hand pose, the extension of each digit must be identified as well as any other distinguishing characteristics of the hand itself, such as whether it is a left or right hand, or what direction it may be facing.


Head Motion Gestures

Depth-map based 3D motion tracking methods can allow for head and face identification and tracking. These can be characterized in a number of different ways. In the majority of cases a center point between the eyes is used along with a direction vector normal to the facial plane. With these two data points a simple and relatively robust set of head gestures can be confidently defined.

  • head_lean_left
  • head_lean_right
  • head_lean_forward
  • head_lean_back
  • head_up
  • head_down
  • head_nod_left_right
  • head_nod_up_down
  • head_look_forward
  • head_look_left
  • head_look_right
  • head_look_down
  • head_look_up

These head-motion based gestures can be great for adding subtle context cues to game controls and metrics or even used to directly modify the way digital content is presented on a display.


Eye Motion Gestures

Consumer eye tracking devices and methods are continuing to become more accurate. Currently there are low cost commodity tracking systems for desktop use that are fast enough and reliable enough to be used for game or application control (for instance, those made by Tobii and Eye Tribe). These desktop eye trackers can discriminate between blinks and winks, determine what part of the screen a user is looking at, and, in some cases, can even provide a proxy measurement for head motion.

  • eye_left_wink
  • eye_right_wink
  • eye_double_wink
  • eye_look_left
  • eye_look_right
  • eye_look_up
  • eye_look_down
  • eye_look_target
  • eye_look_screen

Tracking simple eye features can provide critical context about the user's state of attention. For example the “eye_look_screen” gesture creates an event that returns the identification of the part of the screen the user is currently gazing at. This real-time contextual information can be combined with other concurrent global inputs to target the desired window or screen and provide focus for a voice command.

This context-targeting principle using eye tracking can be further extended with “eye_look_target” which returns an event if the X,Y coordinate of the gaze location matches a known display object. Objects within a scene can be directly targeted by the user's gaze. This can then prompt the application to directly respond to the user's attention. For example: a character in a game can turn and look at the user in response to the user's stare, or an enemy can attack from the shadows when the user is not looking, creating a rich, “human-centric” interaction.


Body Motion Gestures

Body motion gestures have become commonly known due to their use with the Microsoft Kinect controller. Longer range depth-map sensors that allow for “room scale” 3D motion tracking (such as the Kinect, Softkinectic, and Structure devices) allow a complete “body” to be identified and tracked in 3D space. A body skeleton is generated that allows the position, motion and pose of the various parts of the body to be tracked in detail. However, these long range depth sensors have limited depth resolution and are generally not well suited to distinguishing and tracking small features such as fingertips.

Implicit 3D Body Motion Gestures:

Body

Hands

Arms

Legs

body_lean_left
body_lean_right
body_lean_forward
body_lean_back
body_sit
body_stand
body_jump
body_squat
body_walk

hand_wave_left_right
hand_wave_up_down
hand_open_push
hand_closed_push
hand_clap
hand_object_golf_swing
hand_object_tennis_swing

arm_raised_forward (left/right)
arm_raised_side (left/right)
arm_point (left/right)
arm_wave (left/right)
arm_push (left/right)
arm_throw (left/right)
arm_punch (left/right)
arm_folded (left/right)
arm_on_hip (left/right)

leg_kick_low (left/right)
leg_kick_high(left/right)
leg_raised (left/right)
leg_squat
leg_lunge

LED Constellation Tracking

Constellation tracking is the name given to 3D point cluster-tracking that uses known, fixed geometric patterns or feature points to uniquely locate and orient objects in 3D space. A common method in motion capture applications is the use of mounted reflective ball arrays. Other active options include structured RGB LED arrays or IR LED emitters such as in the Oculus DK2 head-mounted display. Both active and passive IR tracking methods are useful for 3D object tracking because they can be done in parallel to RGB camera scene capture without optical interference.

There are a number of advantages to using active constellation tracking for augmented reality applications. For example RGB LED arrays can be tracked from a greater distance than traditional AR printed fiducials; they can be tracked in variable lighting conditions and the visible color patterns which can be leveraged present engaging real-world visual feedback to users. The low power requirements also allow LED constellation technology to be packed into portable objects which can lead to rich, tangible interactions.


3D Object Gestures & Behaviors

Desktop range depth-mapping devices such as Intel's RealSense camera have the capability to create detailed 3D scene maps and isolate medium-sized objects from the scene background. When a known object surface mesh (3D object skeleton) is defined, the 3D object features can be recognized and tracked. The 3D position and orientation of the object can also be tracked and then analyzed as a structured input stream.

  • object_drag
  • object_rotate
  • object_tilt
  • object_hold
  • object_grab
  • object_release
  • object_tip_tap

Objects can be used to create manipulation gestures such as 3D drag, rotate, tilt events that can in turn be mapped to virtual 3D object manipulation or traditional 2D surface based UI controls.


Hand Object Gestures

-passive object gestures, computer vision classification and tracking methods
-active object gestures, IMU fusion

Pen/Stylus Gestures

  • pen_stroke
  • pen_tap
  • pen_double_tap
  • pen_hold

Simple 3D Object Gestures

  • ball_grab
  • cube_grab
  • phone_grab
  • object_grab
  • object_tilt

Bi-manual Object Gestures

  • ball_tip_tap
  • cube_tip_tap
  • pen_tip_tap
  • phone_tip_tap

Sensor Gesture Index

gestures/motion/gesture_index.txt · Last modified: 2016/02/26 18:28 by paul