10-03-2011, 10:14 AM
[attachment=9912]
A Hand Gesture Recognition System Based on Local Linear Embedding
Introduction
Interaction with computers are not comfortable experience
Computers should communicate with people with body language.
Hand gesture recognition becomes important
Interactive human-machine interface and virtual environment
Two common technologies for hand gesture recognition
glove-based method
Using special glove-based device to extract hand posture
Annoying
vision-based method
3D hand/arm modeling
Appearance modeling
3D hand/arm modeling
Highly computational complexity
Using many approximation process
Appearance modeling
Low computational complexity
Real-time processing
Overview of algorithm proposed in the paper
Vision-based method to be used for the problem of CSL real-time recognition
Input: 2D video sequences
two major steps
Hand gesture region detection
Hand gesture recognition
CSL and Pre-processing
Sign Language
Rely on the hearing society
Two main elements:
Low and simple level signed alphabet, mimics the letters of the native spoken language
Higher level signed language, using actions to mimic the meaning or description of the sign
CSL and Pre-processing
CSL is the abbreviation for Chinese Sign Language
30 letters in CSL alphabet ßà Objects in recognition
Pre-processing of Hand Gesture Recognition
Detection of Hand Gesture Regions
Aim to fix on the valid frames and locate the hand region from the rest of the image.
Low time consuming à fast processing rate à real time speed
Pre-processing of Hand Gesture Recognition
Detect skin region from the rest of the image by using color.
Each color has three components
hue, saturation, and value
chroma consists of hue and saturation is separated from value
Under different condition, chroma is invariant.
Color is represented in RGB space, also in YUV and YIQ space.
In YUV space
saturation à displacement
hue -> amplitude
In YIQ space
The color saturation cue I is combined with Θto reinforce the segmentation effect
Skins are between red and yellow
Transform color pixel point P from RGB to YUV and YIQ space
Skin region is:
105 º <= Θ<= 150 º
30 <= I <= 100
Hands and faces
On-line video stream containing hand gestures can be considered as a signal S(x, y, t)
(x,y) denotes the image coordinate
t denotes time
Convert image from RGB to HIS to extract intensity signal I(x,y,t)
Based on the representation by YUV and YIQ, skin pixels can be detected and form a binary image sequence M’(x,y,t) – region mask
Another binary image sequence M’’(x,y,t) which reflects the motion information is produced between every consecutive pair of intensity images – motion mask
M(x,y,t) delineating the moving skin region by using logical AND between the corresponding region mask and motion mask sequence
Normalization
Transformed the detection results into gray-scale images with 36*36 pixels.
Locally Linear Embedding
Sparse data vs. High dimensional space
30 different gestures, 120 samples/gesture
36*36 pixels
3600 training samples vs. d = 1296
Difficult to describe the data distribution
Reduce the dimensionality of hand gesture images
Locally Linear Embedding maps the high-dimensional data to a single global coordinate system to preserve the neighbouring relations.
Given n input vectors {x1, x2, …, xn},
è LLE algorithm
è {y1, y2, …, yn} (m<<d)
Find the k nearest neighbours of each point xi
Measure reconstruction error from the approximation of each point by the neighbour points and compute the reconstruction weights which minimize the error
Compute the low-embedding by minimizing an embedding cost function with the reconstruction weights
Experiments
4125 images including all 30 hand gestures
60% for training , 40% for testing
For each image:
320*240 image, 24b color depth
Taken from camera with different distance and orientation
Sampled at 25 frames/s
Experiment Results
Conclusion
Robust against similar postures in different light conditions and backgrounds
Fast detection process, allows the real time video application with low cost sensors, such as PC and USB camera