Inferring Body Pose without Tracking Body Parts
#1

[attachment=12521]
ABSTRACT
A novel approach for estimating articulated body posture and motion from monocular video sequences is pro-posed. Human pose is defined as the instantaneous two dimensional configuration (i.e., the projection onto the image plane) of a single articulated body in terms of the position of a predetermined set of joints. First, statistical segmentation of the human bodies from the background is performed and low-level visual features are found given the segmented body shape. The goal is to be able to map these, generally low level, visual features to body configurations. The system estimates different mappings, each one with a specific cluster in the visual feature space. Given a set of body motion sequences for training, unsupervised clustering is obtained via the Expectation Maximization algorithm. For each of the clusters, a function is estimated to build the mapping between low-level features to 2D pose. Given new visual features, a mapping from each cluster is performed to yield a set of possible poses. From this set, the system selects the most likely pose given the learned probability distribution and the visual feature similarity between hypothesis and input. Performance of the pro-posed approach is characterized using real and artificially generated body postures, showing promising results.
1. INTRODUCTION
In recent years, there has been a great deal of interest in methods for tracking and analysis of human body motion by computer Effective solutions would lead to breakthroughs in areas such as video coding, visual surveillance, human motion recognition, ergonomics, video indexing and retrieval, and human-computer interfaces, among others.
If the basic structure of the tracked body (its configuration) is reconstructed, motion analysis would be greatly simplified. In our everyday life, humans can easily estimate body part location and structure from relatively low-resolution images of the projected 3D world (e.g., watching a video). Unfortunately, this problem is inherently difficult for a computer. The difficulty stems from the number of degrees of freedom in the human body, the complex under-lying probability distribution, ambiguities in the projection of human motion onto the image plane, self-occlusion, in-sufficient temporal or spatial resolution, etc.
In this paper, we develop an approach for estimating human body pose given a single image or a monocular image sequence containing unconcluded bodies. Given a set of body motion sequences for training, a set of clusters is built in which each has statistically similar configurations according to a given measure and model. Then, for each of the clusters, a function that maps visual features to body pose is acquired via machine learning. Given new visual features, a mapping from each cluster is performed providing a set of possible poses. From this set, we extract the most likely pose given the learned probability distribution and the visual feature similarity between hypothesis and input.
2. GENERAL PROBLEM DEFINITION
The problem of obtaining body pose (either 2D or 3D) from visual features can be thought of as an instance of the more general problem of estimating the function that maps elements of a given (cue) space to another (target) space from data. In our case this function seems to be highly complex, and the mapping is many to many (e.g., same visual features can represent different body pose configurations and same body configurations can generate different visual features due to clothing, view-point, etc.).
Let us define  t to be the set of sample data points from the target space and    C , with the same cardinality as , to be the set of sample data points from the cue space. Assume that for each element i  we know its counterpart vi  (i.e., the data is labeled), or that there is a way to generate vi, for example vi =  (i ) . Note that if  is many-to-one, its inverse does not exist.
In our case,  represents the set of example human body poses, and  is the corresponding set of visual features taken from image projections under certain viewing conditions. We do not intend to solve the general problem of function approximation; instead, we address the specific problem of recovering pose parameters of an articulated body (the human body) from monocular visual features.
As stated above, our goal is to map visual features to likely body pose configurations. For training, motion capture can provide 3D marker positions and orientation of the human. Following a similar notation to that used above, the set of marker positions is denoted 3d  t .
Visual features generated by the three-dimensional object can be obtained by pointing a video camera at the given object, and analyzing the captured images. It is clear that these visual features depend on the camera parameters (e.g., camera orientation, location, focal length, etc).
Alternatively, a computer graphics model of the 3D object (in our case, a human body model) can be used to render a set of images. These images simulate the visual appearance of the object in question, given pose and camera parameters. Optionally R can take a parameter  indicating the camera point of view (or object orientation). Images are an intermediate representation from which we can extract visual features using a function we denote by V : I C . Following the definitions above, we have:
The set 3d  c is formed by the visual features extracted from the images of  3d, using . Our goal is to estimate the function denoted , as defined above.
An alternative problem is to recover 2D marker positions, instead of 3D positions, from image features. By 2D marker positions, we mean the projection of the 3D
markers onto the image plane. The 2D projections of the markers can be obtained from 3d to generate a data set 2d;   S of all frames viewed from camera orientation , and a distance to the object. In the same way as in the 3D case, we can render 2D marker positions to form an image, this rendering function will be denoted R : S I, which is a 2D approximation of R. Note that having the set 3d from which 2d; was generated, we can obtain a more accurate rendering by using R on 3d at the appropriate orientation . When this is possible, we will use R instead of ^ R. To generate visual features from images, we can proceed as before, using V to generate the set 2d; , which contains the visual features corresponding to the rendering of the set 2d;
For notational convenience, we define  = U 2d; , 2d
can be defined similarly. We also have:
2d( ) = V ( ^ R( )); 2d : SC with 2d. The problem is then to approximate 2d (the 2D version of )from data. In other words, given visual features, we want to find the likely 2D marker projections that generated them.
3. APPROACH OVERVIEW
Given below is every single step of the proposed approach. The steps are as follows:
1. A set of motion 3D capture sequences is obtained, 3d _ < t . A set of visual features _3d is computed from images that the 3D body generated (using a computer
graphics rendering function or simply captured by a video camera). By projecting the elements of 3d onto the image plane over a given number of views, we obtain as set of 2D marker positions 2d.
2. The set 2d is partioned into several exclusive subsets via unsupervised clustering. This yields a set of m clusters. Each cluster corresponds to a group of similar pose parameters.
3. Given 2d and _2d, for each cluster i, we approximate a mapping function Pi. By clustering our target space, the mapping can be approximated with simple functions,
each responsible for a subset of the domain. We would hope that linear functions could do the mapping, but decided to estimate nonlinear functions; a multi-layer perceptron
is trained for each cluster.
4. Novel data is presented in the form of human silhouettes. For each frame, visual features are extracted using V : Then, using Pi, a set of m projected marker positions per frame are estimated.
5. The series of possible m solutions provided for each frame is rendered by calling the rendering function R : < t ! I, where I is set of images at a given resolution to achieve images and their visual features are extracted. The best match with respect to the presented data can then be found via the maximum likelihood criterion. As an optional step, consistency in time can be enforced by observing some frames ahead. So that we finally achieved
4. MODELING THE CONFIGURATION SPACE
Motion capture data 3d will be used to train our model. Motion capture data provides 3D position information about the location of a set of markers. In the case, the set of markers roughly corresponds to a subset of major human body joints. This set of marker is fixed and deter-mined beforehand.
3D marker positions are projected into 2D marker po-sitions2d, using a perspective camera located at a fixed height and distance from the center of the body. This projection is repeated by rotating the camera around the main axis of the human, at fixed increments of _. In our experiments d_ = _=16. Note that we can make the set 2d as dense as we want by sampling at more camera orientations. To account for a wider variety of viewing conditions, we could sample the whole viewing sphere. Differences in the camera-object distance could be avoided in principle by choosing scale invariant image features. Given marker positions for a human body in a particular frame, we can render its visual appearance using computer graphics techniques. In our case, we specify the structure of the connections between markers, and use cylinders to connect them. Fig. 1 shows two elements of the set 2d, and the corresponding rendered binary images from which visual features _2d are extracted. For this implementation we chose Hu moments [11] as our visual features, mainly due to their ease of computation and their invariance to translation, scaling and rotation on the image plane.
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: mirroring body language attraction, bus body details pdf, who is freida pintos body, diagram for body temperature measurement, motorvator throttle body spacer, aesthetic consideration in car body design, parts of a flyover,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  Automatic Sun Tracking System (ASTS) electronics seminars 9 12,028 21-07-2014, 09:18 PM
Last Post: seminar report asees
  VEHICLE POSITION TRACKING USING GPS AND GSM RECIEVER WITH LICENCE Electrical Fan 5 9,717 22-06-2014, 12:34 AM
Last Post: Guest
  PIC BASED INTELLIGENT TRACKING SYSTEM USING SOLAR POWER project report helper 3 4,097 27-03-2014, 05:35 AM
Last Post: Guest
  MICROCONTROLLER BASED SOLAR TRACKING SYSTEM seminar class 10 7,565 31-12-2012, 01:00 PM
Last Post: Guest
  Real Time Web based Vehicle Tracking using GPS smart paper boy 3 2,333 26-11-2012, 12:55 PM
Last Post: seminar details
  PHS BASED ONLINE VEHICLE TRACKING SYSTEM full report project topics 5 7,180 25-10-2012, 09:57 PM
Last Post: Guest
  Bug Tracking System seminar projects crazy 1 2,690 20-10-2012, 12:37 PM
Last Post: seminar details
  LIVE HUMAN DETECTION AND TRACKING USING GPS AND SEND SMS THROUGH GSM TO A MOBILE project report tiger 14 15,517 07-03-2012, 09:51 AM
Last Post: seminar paper
  AUTOMATIC STREET LIGHT CONTROL WITH SUN TRACKING seminar class 1 3,584 15-02-2012, 10:26 AM
Last Post: seminar paper
  SOLAR POWER GENERATION WITH AUTO TRACKING SYSTEM project report helper 2 4,109 03-02-2012, 09:47 AM
Last Post: seminar addict

Forum Jump: