ADVANCES in computing power and numerical algorithms full report
#1

[attachment=1413]
Abstract”
In this paper, a real-time system to create a talking head from a video
sequence without any user intervention is presented. In the proposed
system, a probabilistic approach, to decide whether or not extracted
facial features are appropriate for creating a three-dimensional (3-D)
face model, is presented. Automatically extracted two-dimensional
facial features from a video sequence are fed into the proposed
probabilistic framework before a corresponding 3-D face model is built
to avoid generating an unnaturalor nonrealistic 3-D face model. To
extract face shape, we also present a face shape extractor based on an
ellipse model controlled by three anchor points, which is accurate and
computationally cheap. To create a 3-D face model, a least-square
approach is presented to find a coefficient vector that is necessary to
adapt a generic 3-D model into the extracted facial features.
Experimental results show that the proposed system can efficiently
build a 3-D face model from a video sequence without any user
intervention for various Internet applications including virtual
conference and a virtual story teller that do not require much head
movements or high-quality facial animation.
Index Terms”MPEG-4 facial object, probabilistic approach, speech-driven
talking heads, talking heads, virtual face.
I. INTRODUCTION
ADVANCES in computing power and numerical algorithms in graphics and
image-processing make it possible to build a realistic three-
dimensional (3-D) face from a video
sequence by using a regular PC camera. However, in most reported
systems, user intervention is generally required to provide feature
points at the initialization stage . In the initialization stage,
feature points in two orthogonal frames or in multiple frames have to
be provided carefully to generate a photo-realistic 3-D face model.
These techniques can build high quality face models but they are
computationally expensive and time consuming. For various multimedia
applications such as video conferencing,e-commerce, and virtual
anchors, integrating talking heads are highly required to enrich their
human-computer interface.To provide talking head solutions for these
multimedia applications,which do not require high quality animation,
fast and easy ways to build a 3-3-D face model have been investigated
to generate many different face models in a short time period.However,
user intervention is still required to provide several corresponding
points in two frames from a video sequence, or feature points in a
single frontal image .In this paper, we present a real-time system that
extracts facial features automatically and builds a 3-D face model
without any user intervention from a video sequence. Approaches for
creating a 3-D face model can be classified into two groups. Methods in
the first group use a generic 3-D model, usually generated by a 3-D
scanner, and deform the 3-D model by calculating coordinates of all
vertices in the 3-D model. Lee et al. considered deformation of
vertices in a 3-D model as an interpolation of the displacements of the
given control points. They used Dirichlet Free-From Deformation
technique to calculate new 3-D coordinates of a deformed 3-D model.
Pighin et al. also considered model deformation as an interpolation
problem and used radial basis functions to find new 3-D coordinates for
vertices in a generic 3-D model. However, methods in the second group
use multiple 3-D face models to find 3-D coordinates of all vertices in
a new 3-D model based on given feature points. They combine multiple 3
-D models to generate a new 3-D model by calculating parameters to
combine them. Blanz et al. [8] used a laser scanner (Cyberware) to
generate a 3-D model database. They considered a new face model as a
linear combination of the shapes of 3-D faces in the database. Liu et
al. [4] simplified the idea of linear combination of 3-D models by
designing key 3-D faces that can be used to build a new 3-D model by
combining the key 3-D faces linearly, eliminating the need for a large
3-D face database. The merit of these approaches used in the second
group is that linearly created face objects can eliminate a wrong face
that is not natural, which is a very important aspect to create a 3-D
face model without user intervention. Emerging Internet applications
equipped with a talking head system such as merchandise narrator ,
virtual anchors , and e-commerce do not require high quality facial
animation, e.g., the one used in Shrek or Toy Story, etc. Furthermore,
movement of a 3-D face model in those applications, i.e., rotation
along the x and y directions, can be restricted within 5“10 degrees. In
other words, although the movement of a talking head is limited, users
still do not feel uncomfortable in these applications. Recent
approaches of creating a 3-D face model from a single image are
applicable to those Internet applications , . Valle et al. used
manually extracted feature points and an interpolation technique based
on a radial basis function to obtain coordinates of polygon mesh of a
3-D model. Kuo et al. used the anthropometric and a priori information
to estimate the depth of a 3-D face model. Lin et al. used a two-
dimensional (2-D) mesh model to animate a talking head by mesh warping.
They manually adjust control points of mesh to fit eyes, nose, and
mouth into an input image. All these approaches, based on a single
image to obtain a 3-D face model, are computationally cheap and fast,
which are suitable to generate multiple face models in a short time.
Although depth information of a created 3-D model from these approaches
is not as accurate as other labor-intensive approaches, such as ,
textured 3-D face models should be good enough for various Internet
applications that do not require high quality facial animation. In this
paper, we present a real-time system that extracts facial features
automatically and builds a 3-D face model without any user
intervention. The main contribution of this paper can be summarized as
follows. Firstly, we propose a face shape extractor, which is easy and
accurate for various face shapes. We believe face shape is one of the
most important facial features in creating a 3-D face model. Our face-
shape extractor uses a model of an ellipse controlled by three anchor
points, extracting various face shapes successfully. Secondly, we
present a probabilisticnetwork to maximally use facial feature evidence
in deciding if extracted facial features are suitable for creating a 3
-D face model. To create a 3-D model from a video sequence without any
user intervention, we need to keep on extracting facial features and
checking if the extracted features are good enough to build a 3-D model
in a systematical way.We propose facial feature net, a face shape net,
and a topology net to verify correctness of extracted facial features,
which also enable the algorithm to extract facial features more
accurately. Thirdly, atleast-square approach to create a 3-D face model
based on extracted facial features is presented. Our approach for 3-D
model adaptation is similar to Liuâ„¢s approach in a sense that a 3-D
model is described as a linear combination of a neutral face and some
deformation vectors. The differences are that we use a least-square
approach to find coefficients for the deformation vectors and we build
a 3-D face model from a video sequence with no user input. Lastly, a
talking head system is presented by combining an audio-to-visual
conversion technique based on constrained optimization [25] and the
proposed automatic scheme of 3-D model creation.The organization of
this paper is as follows. In Section II, the proposed face shape
extractor based on an ellipse model controlled by three anchor points
is presented. The detailed explanation of the probabilistic network is
described in Section III. In Section IV, the proposed least-square
approach to create a 3-D face model is described. In Section V,
experimental results as well as implementation of the proposed real-
time talking head system are described. Finally, conclusions and future
work are given in Section VI.
II. FACE SHAPE EXTRACTOR
Face shape is one of the most important features in creating a 3-D face
model. In this section, we propose a novel idea to extract face shape,
which is easy and accurate for various face shapes. Based on the study
of anthropometry of the head and face , a face can be classified into
one of the several types of shapes, i.e., long-narrow, square,
triangular, and trapezoid, etc., as shown in Fig. 1. Our face-shape
extractor uses a model of an ellipse controlled by three anchor points
as shown in Fig. 1(d) and (e). The idea behind the proposed model is
that we can control shape of a face by fixing the bottom anchor point
P2 and moving the left and right anchor points , P1 and P3 up and
down as shown in Fig. 1(d) and (e).

Fig. 1. Modeling of different faces using an ellipse: (a) square; (b)
triangular; © trapezoid; (d) long-narrow; and (e) same size in width
and height.
For different faces as shown in Fig. 1, an ellipse that contains three
anchor points, left, right, and down, can describe various face shapes
correctly and smoothly, although each ellipse requires different
parameters.For instance, to describe a long-narrow face the left and
right anchor points need to be moved up as shown in Fig. 1(d).Based on
this observation, shape extraction is considered as a problem to find
parameters for an ellipse that produces maximum boundary energy under a
constraint that the ellipse must contain the three anchor points P1 ,
P2 and P3 The detailed steps of the face shape extraction can be
described as follows.
1) Find three anchor points P1 , P2 and P3 at 180, 270, and 0 degree
directions.
2) Draw an ellipse using the detected three anchor points. If we use x
2 /a 2 +y 2/b 2
as an ellipse, a is the distance between x position of P1 and P2 and
b is distance between y position of P1 and P3(If face shape is
symmetric)
3) Add intensity of pixels that are lower than the left and right
anchor points on the ellipse and record the sum.
4) Move the left and right anchor points up and down to find parameters
of an ellipse that produces maximum boundary energy for the face shape
from an edge image [see Fig. 2(e)] using (1).After positions of facial
components such as mouth and eyes are known as shown in Fig. 2(a) using
various methods , the proposed face shape extractor is ready to start.
We assume that a human face has a homogeneous color distribution,



Fig. 2. Detecting three anchor points. (a) Extract
facial features first. (b) Calculate intensity average for inside of a
face: 1) draw lines from the corner of left eye
and from nose center and find a intersection point C and 2) find
average intensity of pixels within a rectangular window (size = 20_20)
centered at the point C.
(Three anchor points P1,P2 and P3 . (d) An ellipse shaped search
window and search direction. (e) An edge image.
which means statistics, e.g., means and variances, can be used as
criteria to decide if a region is inside or outside of the face (if
statistics for the inside of a face is known). As it starts, the search
procedure for three anchor points, it calculates statistics first. It
calculates an intensity average for the inside of a face by using a
window as shown in Fig. 2(b). By locating a point that has a quite
different intensity average from the previously calculated average of
the inside of a face, three anchor points can be found. In our
implementation, threshold Tfs=0.5* (average intensity of the inside of
a face) isselected experimentally to locate the anchor points. Because
the search procedure for three anchor points highly depends on color
distributions, it is sensitive to color distributions of background
objects. To overcome this weak point, the threshold Tfs is adjusted
adaptively in our procedure (please refer to Section V for details). To
find an optimalface shape, (1) is used to find parameters a and b of an
ellipse.
where E(x,y) is the intensity of an edge image [Fig. 2(e)] and
denotes a subset of pixels on an ellipse, whose pixels are located
lower than the left and right anchor points.
III. PROBABILITY NETWORKS
Probabilistic approaches have been successfully used to locate human
faces from a scene and to track deformations of local features .
Cipolla et al. proposed a probabilistic framework to combine different
facial features and face groups, achieving a high confidence rate for
face detection from a complicated scene. Huang et al. used a
probabilistic network for local feature tracking by modeling locations
and velocities of selected features points. In our automated system, a
probabilistic framework is adopted to maximally use facial feature
evidence for deciding correctness of extracted facial features before a
3-D face model is built. Fig. 3 shows the selected FDPs for the
proposed probabilistic framework. The network hierarchy used in our
approach is shown in Fig. 4, which consists of a facial feature net, a
face shape net, and a topology net. The facial feature net has a mouth
net and eye net as its subnets. The detail of each subnet is shown in
Fig. 5. In the networks, each node represents a random variable and
each arrow denotes conditional dependency between two nodes. In a study
of face anthropometry , data are collected by measuring distances and
angles among selected key points from a human face, e.g., corners of
eyes, mouth and ears, to describe the variability of a human face.
Based on the study, we are characterizing a frontal face by measuring
distances and covariance between key points chosen from the study. All
nodes in the proposed probability networks are classified into four
groups:
Mouth=[D(8.1,2.2),D(8.4,8.3),D(2.3,8.2)], Eyes=

[D(3.12,3.7),D(3.12,3.8),D(3.8,3.11),D(3.13,3.9)],Topology=[D
(2.1,9.15),D(2.1,3.8),D(9.15,2.2)],and Face Shape=[D(2.2,2.1),D
(10.7,10.8)],where D(P1,P2) is a distance between FDPs P1 and P2
defined in MPEG-4 standard . In our network, the distance between two
feature points is defined as a random variable for each node. For
instance, we model D(3.5, 3.6), the distance between centers of the
left and right eyes, and D(2.1, 9.15), the distance of selected two
points FDP 2.1 and FDP 9.15, shown in Fig. 5(b), as a 2-D Gaussian
distribution, estimating means, standard deviations, and correlation
coefficients. Fig. 5© shows graphical illustrations of the
relationship between two nodes in the proposed. probability networks.
For example, the distance between FDP
3.5 and FDP 3.6, and the length between FDP 8.4 and FDP 8.3
(width of mouth), are modeled as a 2-D Gaussian distribution where
denote the distance
between two selected FDPs, the means, and standard deviation of D1
respectively. denotes the correlation coefficients between two nodes D1
and D2 . To model 2-D Gaussian distributions of D(3.5, 3.6) and
distances of selected paired points, a database from is used in our
simulations. The reason we model probability distributions based on
FDP3.5 and FDP3.6 is that the left and right eye centers are the
features that can be detected most reliably and accurately from a video
sequence according to our implementation. The chain rule and
conditional independence relationship are applied to calculate the
joint probability of each network. For
instance, the probability of the face shape net is defined as a joint
probability of all three nodes, D(3.5, 3.6), D(2.2, 2.1), and D(10.7,
10.8), as follows:

In the same manner, probabilities of other
networks can be
defined as follows:

in our implementation , P(Face Shape Net) is used to verify face shape
extracted from our face shape extractor, and P(Mouth Net) is used to
check extracted mouth features. P(Topology Net) is used for deciding if
facial components, i.e., eyes, nose, and mouth, are located correctly
along the vertical axis. P(Facial Features, Face Shape, Topology) of
(8) is used as a decision criterion for the correctness of extracted
facial features for building a 3-D face model.
IV. A LEAST-SQUARE APPROACH TO ADAPT A
3-D FACE MODEL
Our system is devoted to creating a 3-D face model without any user
intervention from a video sequence, which means we need an algorithm
that is robust and stable to build a photo-realistic and natural 3-D
face model. Recent approach proposed by Liu et al. shows that
combining multiple 3-D models linearly is a promising way to generate a
photo-realistic 3-D model. In this approach, a new face model is
described as a linear combination of key 3-D face models, e.g., big
mouth, small eyes, etc. The strong point of this approach is that the
multiple face models constrain the shape of a new 3-D face, preventing
algorithms from producing an unrealistic 3-D face model. Our approach
is similar with Liuâ„¢s approach in the sense that a 3-D model is
described as a linear combination of a neutral face and some
deformation vectors. The main differences are that: 1) we use atleast-
square approach to find the coefficient vector for creating a
new 3-D face model rather than an iterative approach and
2) we build a 3-D face model from a video sequence with no user input.
A. The 3-D Model Our 3-D model is a modified version of the 3-D face
model developed by Parke and Waters [28]. We have developed a 3-D model
editor to build a complete head and shoulder model including ears and
teeth. Fig. 6(a) shows the modified 3-D model used in our system. It
has 1294 polygons and it is good enough for realistic facial animation.
Based on this 3-D model and the 3-D model editor, 16 face models have
been designed for the proposed system (more face models can be added to
make a better 3-D model), because eight position vectors and eight
shape vectors (please see Section IV-B) are a minimal requirement to
describe a 3-D face in a sense that shapes and locations of mouth,
nose, eyes are the most important features to describe a human frontal
face. These face models are combined linearly based on automatically
extracted facial features such as shape of face, location of eyes, nose
and mouth, and size of these features, etc. If we denote the face
geometry by a vector F=(v1,¦,vn)T , where vi=(Xi,Yi,Zi) T are the
vertices, and a deformation vector that
contains the amount of variation for size and location of vertices on a
3-D model, the face geometry can be described as where F0 is a neutral
face vector and is a coefficient vector, i.e c=(c1,c2,¦.,cm) that
decides the amount of variation needed to be applied to vertices on the
neutral face model
B. The 3-D Model Adaptation
Finding an optimal 3-D model that is best matched with the input video
sequences can be considered as a problem to find a coefficient vector
that minimizes mean-square errors between projected 3-D feature points
onto 2-D and feature points from input face. We assume that all feature
points are equally important because locations as well as shapes of
facial components

such as mouth, eyes, and nose are all critical to model a 3-D face from
a frontal face. In our system, all coefficients are decided at once by
solving the following least-square formulation:

where n denotes the number of extracted features and is the number of
the deformation vector . Vj is an extracted feature from an input
image, which has (x,y) location, F0j
is the corresponding vertex on a neutral 3-D model projected onto 2-D,
and Dij means the corresponding vertex on a deformation vector Di
projected onto 2-D using current camera parameters. Fig. 6(a) shows the
neutral 3-D face model and Fig. 6(b)“(f) show examples of 3-D face
models used to calculate deformation vectors in our implementation. For
instance, by subtracting a wide 3-D face model, as shown in Fig. 6(b),
from a neutral 3-D face model, shown in Fig. 6(a), a deformation vector
for wide face is obtained. For the deformation vectors , eight shape
vectors (wide face, thin face, big (and small) mouth, nose and eyes)
and eight position vectors (minimum (and maximum) horizontal and
vertical translation for eyes and minimum (and maximum) vertical
translation for mouth and nose) are designed in our implementation. To
solve the least-square problem the singular value decomposition (SVD)
is used in our implementation.
V. IMPLEMENTATION AND EXPERIMENTAL RESULTS
A. Automatic Creation of a 3-D Face Model
In this section, the detailed implementation of the proposed real-time
talking head system is presented. To create a photo-realistic 3-D model
from a video sequence without any user intervention, the proposed
algorithms have to be integrated carefully. We assume that user should
be in a neutral face as defined in , looking at the camera, and
rotating in the x and y directions. The proposed algorithms catch the
best facial orientation,i.e., simply a frontal face, by extracting and
verifying facial features.By analyzing video sequences, two
requirements for the real-time system have been established, because
input is not a single image but a video sequence. First, locating face
should not be called every frame. Once face is located, face location
in the following frames is likely to be the same or very close to it.
Second, facial features obtained in previous frames should be
exploited to provide a better result in current frame. Fig. 7 shows the
detailed block diagram of the proposed realtime system. The proposed
system starts with finding a face location from a video sequence by
using a method based on a normalized RG color space and frame
difference. After detecting face location, a valley detection filter,
which was proposed in , is used to find rough positions of facial
components.After applying a valley detection filter, rough location of
facial components, i.e., eyes, nose, and mouth, is located by examining
its intensity distribution projected in vertical and horizontal
directions. Then, exact location for nose is obtained by recursive
thresholding because the nose holes always have the lowest intensity
around the nose. A threshold value is increased recursively until we
reach the number of pixels that corresponds to nose holes. To find the
exact location of mouth and eyes, several approaches can be used. We
use a
pseudo moving difference method to find exact location of facial
components, which is simple and computationally cheap. Based on the
extracted feature location, a search area for extracting face shape can
be found (readers are referred to fordetails.).Within this search area,
we use the face shape extractor to extract face shape. After feature
extraction is done, the extracted features are fed into the proposed
probabilistic networks to verify the correctness and suitability before
a corresponding 3-D face model is built. The proposed probabilistic
network acts as a quality control agent in creating a 3-D face model in
the proposed system. Based on the output of the probability networks,
Tfs is adjusted adaptively to extract face shape more accurately. If
only face shape is bad, which means extracted features are correct
except face shape, the algorithm adjusts thresholds, Tfs and extracts
face shape again without moving into the next frame [see Fig. 11© and
(d)]. If extracted face shape is bad again, the algorithm moves to the
next frame and starts from detecting rough location, without detecting
face location. If all features are bad, the algorithm moves to the next
frame, locates face, and extracts all features again.
B. Speech-Driven Talking Head System
After a virtual face is built an audio-to-visual conversion technique
based on constrained optimization is combined with the virtual face to
make a complete talking head system. There are several research results
available for audio-to-visual conversion . In our system, we have
selected the constrained optimization technique that is robust in noisy
environments . Our talking head system aims at generating FDPs and FAPs
for MPEG-4 talking head applications with no user input. FDPs are
obtained automatically from a video sequence, captured by a camera
connected to a PC, based on the proposed automatic scheme of facial
feature extraction and a 3-D model adaptation. FAPs are generated from
an audio-to-visual conversion based on the constrained optimization
technique. Fig. 8 shows the block diagram of the encoder for the
proposed talking head system. The FDPs and FAPs, created without any
user intervention, are

coded as an MPEG-4 bit stream and sent to a decoder via Internet.
Because the coded bit stream contains FDPs and FAPs, no animation
artifacts are expected in the decoder. For transmitting speech via
Internet, G.723.1, a dual rate speech coder for multimedia
communications, is used. G.723.1, the most widely used standard codec
for Internet telephony, is selected because of its capability of lowbit
rate codingworking at 5.3 and 6.3 kb/s (please see [29] for a detailed
explanation about G.723.1). In initialization stage 3-D coordinates and
texture information for an adapted 3-D model is sent to the decoder via
TCP protocol. Then, coded speech and animation parameters are sent to
the decoder
via UDP protocol in our implementation. Fig. 9(a) and (b) show screen
shots of encoder and decoder implemented in our talking head system.
The performance of the proposed talking head system has been evaluated
subjectively and the results are
shown in Section V-C.
CHOI AND HWANG: AUTOMATIC CREATION OF A TALKING HEAD FROM A VIDEO
SEQUENCE



C. Experimental Results
The proposed automatic system, creating a 3-D face model from a video
sequence without any user intervention, produces facial features
including face shape about 9 fps (frames per second) on Pentium III
600-MHz PC. Twenty feature points as shown in Fig. 3, and 16
deformation vectors were used in our implementation [n=20 and m=16 for
(10)]. Users are required to provide a frontal view with a rotation
angle less than 5 degrees. Twenty video sequences were recorded, making
approximately 2000 frames in total. The proposed face shape extractor
was tested for the captured video sequences that have different types
of faces. Fig. 10 shows some examples of extracted face shapes for
different face shapes and orientation. The proposed face shape
extractor achieved a detection rate of 64% for 1180 selected frames
from the testing video sequences. Most errors come from the similar
color distribution between face and background and failure to detect
facial components such as eyes and mouth. Fifty frontal face images of
the PICS database from the University of Stirling (http://
pics.psych.stir.ac.uk/) were used to build the proposed probabilistic
network and the Expectation Maximization (EM) algorithm was used to
model 2-D Gaussian distributions. The proposed probabilistic network
was tested as a quality control agent in our real-time talking head
system. Fig. 11(a) and (b) show examples of rejected facial features
from the probabilistic network, preventing the creation of unrealistic
faces.T fs the threshold value for face shape extraction, was adjusted
automatically from 0.5 (average intensity of the inside of a face) to
1.0 (average intensity of the inside of a face) to improve accuracy
based on the results of the probabilistic network. If only P(Face Shape
Net) is low,Tfs was increased to find a more clear boundary of the face
[please see P2 in Fig. 2©.]. Fig. 11© and (d) shows examples of
feature extraction improved via adjusting threshold values. According
to the simulation results the proposed probabilistic networks were
successfully combined with our automatic system to create a 3-D face
model. Fig. 12 shows examples of successfully created 3-D face models.
By using the probabilistic network approach the chance of creating
unrealistic faces due to wrong facial features was reduced
significantly. The performance of the proposed talking head system was
evaluated subjectively. Twelve people participated in the subjective
assessments. The 5-point scale was used for the subjective evaluations.
Table I shows results from the subjective test and gives an idea of how
good the proposed talking head system is, even though it is created
without any user intervention. People were asked how realistic an
adapted 3-D model is and how natural its talking head is to see the
performance of the proposed system. They were also
asked to measure audio quality, audio-visual synchronization, and
overall performances. Overall results from the subjective evaluations
show that the proposed automatic scheme produces

TABLE I
SUBJECTIVE EVALUATIONS OF THE PROPOSED TALKING HEAD SYSTEM

a 3-D model that is quite realistic and good enough for various
Internet applications that do not require high-quality facialanimation.
VI. CONCLUSIONS AND FUTURE WORK
We have presented an implementation of an automatic system to create a
talking head from a video sequence without any user intervention. In
the proposed system, we have presented: 1) anovel scheme to extract
face shape based on an ellipse model
controlled by three anchor points;
2) a probabilistic network to verify if extracted features are good
enough to build a 3-D face model;
3) a least-square approach to adapt a generic 3-D model into extracted
features from input video; and 4) a talking head system that generates
FAPs and FDPs without any user intervention for MPEG-4 facial animation
systems. Based on an ellipse model controlled by three anchor points,
an accurate and computationally cheap method for face shape extraction
was developed. A least-square approach was used to calculate a required
coefficient vector to adapt a generic model to fit an input face.
Probability networks were successfully combined with our automatic
system to maximally use facial feature evidence in deciding if
extracted facial features are suitable for creating a 3-D
face model. Creating a 3-D face model with no user intervention is a
very difficult task. In this paper, an automatic scheme to build a 3-D
face model from a video sequence is presented. Although we assume that
user should be in a neutral face and looking at the
input camera, we believe this is a basic requirement to build a 3-D
face model in an automatic fashion. The created 3-D model is allowed to
rotate less than 10 degrees along x and y directions because z
coordinates of vertices on the 3-D model are not calculated from input
features. The proposed speech-driven talking head system, generating
FDPs and FAPs for MPEG-4 talkingm head applications, is suitable for
various Internet applications including virtual conference and a
virtual story teller that do not require much head movements or high
quality facial animation. For future research, more accurate mouth and
eye extractionscheme can be considered to improve quality of a created
3-D model and to handle nonneutral faces and faces with mustache. The
current approach based on a simple parametric curve has limitations on
the shapes of mouth and eyes. In addition, to build a complete 3-D face
model, extracting hair from the head and modeling its style should be
considered in future research.
ACKNOWLEDGMENT
The authors wish to thank the anonymous reviewers for their valuable
comments.
REFERENCES [1] W.-S. Lee, M. Escher, G. Sannier, and N. Magnenat-
Thalmann, MPEG-4 compatible faces from orthogonal photos, in Proc.
Int. Conf. Computer Animation, 1999, pp. 186“194.
[2] P. Fua and C. Miccio, Animated heads from ordinary images: a
leastsquares approach, Comput. Vis. Image Understand., vol. 75, no. 3,
pp.247“259, 1999.
[3] F. Pighin, R. Szeliski, and D. H. Salesin, Resynthesizing facial
animation through 3-D model-based tracking, in Proc. 7th IEEE Int.
Conf. Computer Vision, vol. 1, 1999, pp. 143“150.
[4] Z. Liu, Z. Zhang, C. Jacobs, and M. Cohen, Rapid Modeling of
Animated Faces From Video,, Tech.l Rep. MSR-TR-2000-11.
[5] A. C. A. del Valle and J. Ostermann, 3-D talking head
customization by adapting a generic model to one uncalibrated picture,
in Proc. IEEE Int. Symp. Circuits and Systems, 2001, pp. 325“328.
[6] C. J. Kuo, R.-S. Huang, and T.-G. Lin, 3-D facial model estimation
from single front-view facial image, IEEE Trans. Circuits Syst. Video
Technol., vol. 12, no. 3, pp. 183“192, Mar. 2002.
[7] L. Moccozet and N. Magnenat Thalmann, Dirichlet free-from
deformations symmetry, in Proc. 11th IAPR Int. Conf. Pattern
Recognition, 1992, pp. 117“120.
and their application to hand simulation, in Proc. Computer Animation
97, 1997, pp. 93“102.
[8] V. Blanz and T. Vetter, A morphable model for the synthesis of 3-D
faces, in Computer Graphics, Annu. Conf. Series, SIGGRAPH 1999, pp.
187“194.
[9] E. Cosatto and H. P. Graf, Photo-realistic talking-heads from
image smples, IEEE Trans. Multimedia, vol. 2, no. 3, pp. 152“163, Jun.
2000.
[10] I.-C. Lin, C.-S. Hung, T.-J. Yang, and M. Ouhyoung, A speech
driven talking head system based on a single face image, in Proc. 7th
Pacific Conf. Computer Graphics and Applications, 1999, pp. 43“49.
[11] http://ananova [Online]
[12] R.-S.Wang andY.Wang, Facial feature extraction and tracking in
video sequences, in Proc. IEEE Int. Workshop on Multimedia Signal
Processing, 1997, pp. 233“238.
[13] D. Reisfeld and Y.Yeshurun, Robust detection of facial features
by generalized
symmetry, in Proc. 11th IAPR Int. Conf. Pattern Recognition, 1992, pp.
117“120.
CHOI AND HWANG: AUTOMATIC CREATION OF A TALKING HEAD FROM A VIDEO
SEQUENCE
[14] M. Zobel, A. Gebhard, D. Paulus, J. Denzler, and H. Niemann,
Robust
facial feature localization by coupled features, in Proc. 4th IEEE
Int.
Conf. Automatic Face and Gesture Recognition, 2000, pp. 2“7.
[15] Y. Tian, T. Kanade, and J. Cohn, Robust lip tracking by combining
shape, color and motion, in Proc. 4th Asian Conf. Computer Vision,
2000.
[16] J. Luettin, N. A. Tracker, and S. W. Beet, Active shape models
for visual
speech feature extraction, University of Sheffield, Sheffield, U.K.,
Electronic Systems Group Rep. 95/44, 1995.
[17] C. Kim and J.-N. Hwang, An integrated scheme for object-based
video
abstraction, in Proc. ACM Int. Multimedia Conf., 2000.
[18] L. G. Farkas, Anthropometry of the Head and Face. New York: Raven,
1994.
[19] K. C. Yow and R. Cipolla, A probabilistic framework for
perceptual
grouping of features for human face detection, in Proc. IEEE Int.
Conf.
Automatic Face and Gesture Recognition ™96, 1996, pp. 16“21.
[20] H. Tao, R. Lopez, and T. Huang, Tracking facial features using
probabilistic
network, Auto. Face Gesture Recognit., pp. 166“170, 1998.
[21] ISO/IEC FDIS 14 496-1 Systems, ISO/IEC JTC1/SC29/WG11 N2501,
Nov. 1998.
[22] ISO/IEC FDIS 14 496-2 Visual, ISO/IEC JTC1/SC29/WG11 N2502,
Nov. 1998.
[23] Psychological Image Collection at Stirling (PICS). [Online]
Available:
http://pics.psych.stir.ac.uk/
[24] J. Luettin, N. A. Tracker, and S. W. Beet, Active shape models
for visual
speech feature extraction, University of Sheffield, Sheffield, U.K.,
Electronic Systems Group Rep. 95/44, 1995.
[25] K. H. Choi and J.-N.Hwang, Creating 3-D speech-driven talking
heads:
a probabilistic approach, in Proc. IEEE Int. Conf. Image Processing,
2002, pp. 984“987.
[26] F. Lavagetto, Converting speech into lip movement: A multimedia
telephone
for hard of hearing people, IEEE Trans. Rehabil. Eng., vol. 3, no.
1, pp. 90“102, Jan. 1995.
[27] R. R. Rao, T. Chen, and R. M. Mersereau, Audio-to-visual
conversion
for multimedia communication, IEEE Trans. Ind. Electron., vol. 45, no.
1, pp. 15“22, Feb. 1998.
[28] F. I. Parke and K.Waters, Computer Facial Animation. Wellesley,
MA:
A. K. Peters, 1996.
[29] Dual Rate Speech Coder for Multimedia Communications Transmitting
at 5.3 and 6.3 kbits/s, ITU-T Recommendation G.723.1, Mar. 1996.
[30] K. H. Choi and J.-N. Hwang, A real-time system for automatic
creation
of 3-D face models from a video sequence, in Proc. IEEE Int. Conf.
Acoustics, Speech, and Signal Processing, 2002, pp. 2121“2124.
Kyoung-Ho Choi (Mâ„¢03) received the B.S. and M.S.
degrees in electrical and electronics engineering from
Inha University, Korea, in 1989 and 1991, respectively,
and the Ph.D. degree in electrical engineering
from the University of Washington, Seattle, in 2002.
In January 1991, he joined the Electronics and
Telecommunications Research Institute (ETRI),
where he was a Leader of the Telematics Content
Research Team. He was also a Visiting Scholar at
Cornell University, Ithaca, NY, in 1995. In March
2005, he joined the Department of Information
and Electronic Engineering, Mokpo National University, Chonnam, Korea.
His research interests include telematics, multimedia signal processing
and
systems, mobile computing, MPE4/7/21, multimedia-GIS, and audio-to-
visual
conversion and audiovisual interaction..
Dr. Choi was selected as an Outstanding Researcher at ETRI in 1992.
Jenq-Neng Hwang (Fâ„¢03) received the B.S. and
M.S. degrees, both in electrical engineering, from the
National Taiwan University, Taipei, Taiwan, R.O.C.,
in 1981 and 1983, respectively, and the Ph.D.
degree from the University of Southern California in
December 1988.
He spent 1983“1985 in obligatory military services.
He was then a Research Assistant in the Signal
and Image Processing Institute, Department of
Electrical Engineering, University of Southern California.
He was also a visiting student at Princeton
University, Princeton, NJ, from 1987 to 1989. In the summer of 1989, he
joined the Department of Electrical Engineering, University of
Washington,
Seattle, where he is currently a Professor. He has published more than
180
journal, conference paper, and book chapters in the areas of
image/video signal
processing, computational neural networks, multimedia system
integration,
and networking. He is the co-author of the Handbook of Neural Networks
for
Signal Processing (Boca Raton, FL: CRC Press, 2001).
Dr. Hwang served as the Secretary of the Neural Systems and
Applications
Committee of the IEEE Circuits and Systems Society from 1989 to 1991,
and
was a member of Design and Implementation of the SP Systems Technical
Committee
of the IEEE SP Society. He is also a Founding Member of the Multimedia
SP Technical Committee of IEEE SP Society. He served as the Chairman of
the
Neural Networks SP Technical Committee of the IEEE SP Society from 1996
to 1998, and the Societyâ„¢s representative to the IEEE Neural Network
Council
from 1997 to 2000. He served as Associate Editor for the IEEE
TRANSACTIONS
ON SIGNAL PROCESSING and IEEE TRANSACTIONS ON NEURAL NETWORKS, and
is currently an Associate Editor for the IEEE TRANSACTIONS ON CIRCUITS
AND
SYSTEMS FOR VIDEO TECHNOLOGY. He is also on the editorial board of the
Journal of VLSI Signal Processing Systems for Signal, Image, and Video
Technology.
He was a Guest Editor for the IEEE TRANSACTIONS ON MULTIMEDIA,
Special Issue on Multimedia over IP in March/June 2001, the Conference
Program
Chair for the 1994 IEEE Workshop on Neural Networks for Signal
Processing
held in Ermioni, Greece, in September 1994, the General Co-Chair of
the International Symposium on Artificial Neural Networks held in
Hsinchu,
Taiwan, R.O.C., in December 1995, the Chair of the Tutorial Committee
for the
IEEE International Conference on Neural Networks (ICNNâ„¢96) held in
Washington,
DC, in June 1996, and the Program Co-Chair of the International
Conference
on Acoustics, Speech, and Signal Processing (ICASSP) held in Seattle,
WA, in 1998. He received the 1995 IEEE Signal Processing (SP) Societyâ„¢s
Annual
Best Paper Award (with S.-R. Lay and A. Lippman) in the area of Neural
Networks for Signal Processing.
Reply
#2
my seminar topic is " deformation and computer graphics" ..plz if u have den plz send me report and ppt on my id...very urgent..
my id is : m.khushi14all[at]gmail.com
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: advances in capilliary fluid modelling, numerical computation, advances in control system, numerical relay project, advances in power system protection devices, advances in capillary fluid modelling, travelodge seattle,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  wearable biosensors full report computer science technology 4 13,476 07-10-2017, 02:13 AM
Last Post: DanielRes
  software defined radio full report computer science technology 15 14,120 19-10-2015, 02:51 PM
Last Post: seminar report asees
  synthetic aperture radar system full report computer science technology 11 13,631 25-03-2015, 11:07 AM
Last Post: seminar report asees
Big Grin touch screen technology report and ppt devu 73 114,858 28-07-2014, 09:36 PM
Last Post: EstellShipmen10
  satrack full report computer science technology 8 17,239 21-07-2013, 08:32 AM
Last Post: Guest
  Power Point Tracking for Photovoltaic Systems full report computer science technology 1 4,489 19-01-2013, 12:51 PM
Last Post: seminar details
  robotics and its applications full report computer science technology 5 14,369 21-12-2012, 11:58 AM
Last Post: seminar details
  embedded configurable operating system full report project reporter 1 5,029 11-12-2012, 01:32 PM
Last Post: seminar details
  adaptive missle guidance full report computer science technology 1 4,565 10-12-2012, 03:28 PM
Last Post: seminar details
  Wireless Battery Charger Chip for Smart-Card Applications full report project topics 6 7,011 09-11-2012, 11:53 AM
Last Post: seminar details

Forum Jump: