ASK HERE

summer project pal · 29-01-2011, 09:55 AM

Bottom-Up/Top-Down Image Parsing with Attribute Grammer
A Seminar Report
by
Remya M.R.
M105112
Department of Computer Science & Engineering
College of Engineering Trivandrum
2010-11

Abstract
The paper, presents an attribute graph grammar for image parsing on scenes with man-made
objects, such as buildings, hallways, kitchens, and living rooms. For this choose one class of
primitives 3D planar rectangles projected on images, and six graph grammar production rules.
Each production rule not only expands a node into its components but also includes a number of
equations that constrain the attributes of a parent node and those of its children. Thus proposed
graph grammar is context sensitive. The grammar rules are used recursively to produce a large
number of objects and patterns in images and thus is a type of generative model.The inference
algorithm integrates bottom-up rectangle detection which activates top-down prediction using
the grammar rules. The output of the inference is a hierarchical parsing graph with objects,
surfaces, rectangles, and their spatial relations. In the inference, the acceptance of a grammar
rule means a recognition of an object, and actions are taken to pass the attributes between
a node and its parent through the constraint equations associated with this production rule.
When an attribute is passed from a child node to a parent node, it is called bottom-up, and
the opposite is called top-down.

[attachment=8526]

1 Introduction
In real world images, especially man-made scenes, such as buildings, oces, and
living spaces, a large number of complex patterns (objects) are composed of a small set of
primitives using few operational relations. This is very similar to language where a huge set of
sentences can be generated with relatively small vocabulary and grammar rules in a hierarchy
from words, phrases, clauses, and to sentences. The objective of image parsing is to decompose
an image into its constituent components (objects, parts, surfaces, primitives) in a hierarchical
structure represented by a parsing graph.Fig.1 shows a parsing graph for a kitchen scene using
the rectangle primitives. Here only shows a subset of the rectangles for clarity. Its vertical
edges show the decomposition of scene and objects, and its horizontal links specify the spatial
relations between objects and components. Thus a parsing graph is more general than a parsing
tree in language.
In the literature the study of syntactic pattern recognition was pioneered by Fu et
al in the 1970-80s. Unfortunately its applicability was very much limited by two diculties.(i)
The primitive patterns(terminators in their visual languages)could not be computed reliably
from real images. Thus their parsing algorithms achieved little success in real images.(ii) The
string grammar(nite state automation)and stochastic context free grammars(SCFG)in early
work are not strong enough to express complex visual patterns.
5
Figure 1: The hierarchic parse graph constructed by an iterative top-down/bottom-up algo-
rithm.
6
2 Overview of the Top-Down/Bottom-Up Inference Al-
gorithm
The paper mainly focuses on the top-down and bottom-up inference. The algorithm
proceeds in three phases. Phase I is bottom-up computation.Phase I computes a number of
edge segments from the input image, and estimate a number of vanish points(usually three).
Thus the line segments converging to the same vanishing point are put in a line set. Then
generate a number of rectangle hypotheses. By drawing two pairs of line segments from two
out of the three line sets, and then evaluate it by the goodness of t to a rectangle in the edge
map. Thus an excessive number of rectangles are proposed as bottom-up particles which may
overlap or con
ict with each other. The bottom-up particles are tested and pursued one by one
in a procedure similar to the matching pursuit algorithm.
Phase 2 initializes the terminal nodes of the parse graph in a greedy way. In each
step, the algorithm picks the most promising bottom-up rectangle hypothesis with the heaviest
weight among all of the candidates and accepts it if it reduces the description length. Then,
the weights of all of the candidates that overlap or con
ict with this accepted rectangle are
reduced as in the matching pursuit algorithm.
Phase 3 integrates top-down/bottom-up inference. Each rectangle in the cur-
rent parse graph matches (often partially) to a production rule with attributes passed to the
nonterminal node. These nonterminal nodes are, in turn, matched to other production rules,
which then generate top-down proposals for predictions (see the downward arrows in Fig.1).
In phase 3,each of the ve grammar rules (omitting the scene rule) maintains a data structure
that stores all of its weighted candidates. Each step of the algorithm picks the most promising
proposal (with the heaviest weight) among all ve candidate sets. Thus, a new nonterminal
node is added to the parse graph. This corresponds to recognizing a new subconguration and
activates the following actions: 1)creating potentially new "`top-down" proposals and inserting
them into the lists.2) reweighting some proposals in the candidate sets.3) passing attributes
between a node and its parent through the constraint equations associated with this production
rule.
7
3 Related Work on Image Parsing
In the literature, the study of syntactic pattern recognition was pioneered by Fu and
You,Hanson and Riseman and Ohta et al. and other image understanding systems with top-
down/bottom-up inference in the 1970-1980s. In recent years, attribute grammars and context
sensitive graph grammars have been developed in visual diagrams parsing. In the vision lit-
erature, grammars are mostly studied in binary shape recognition, such as the grammars for
medial axis and shock graphs. Most recently, there has been a resurgence of compositional
computing for segmentation and object recognition.
Detecting rectangular structures in images has been well studied in the vision litera-
ture, especially for detecting building roofs in aerial images. One class of methods detects
edge segments and line primitives and then groups them into rectangles. The other types of
methods use Hough Transforms on edge maps to detect rectangles globally. A Markov chain
Monte Carlo method was developed in rectangular scene construction in which also uses com-
positional structures. Putting detecting rectangle structures in the broader topic of modeling
structural variability, this work is also closely related to a variety of representations, including
shape grammars with algebraic constraints. This work is also related to some previous work on
object recognition and image parsing by data-driven Markov chain Monte Carlo(DDMCMC).
8
4 Attribute Graph Grammer for Scene Representation
4.1 Attribute Graph Grammer
An attribute graph grammar is augmented from the SCFG by including attributes
and constraints on the nodes.
Denition 1.An attribute graph grammar is specied by a 5-tuple:
G = (VN; VT ; S;R; P)
where VNand VT are the sets of nonterminal and terminal nodes, respectively, and S is the
initial node for the scene. R is a set of production rules for spatial relationships. P is the
probability for the grammar.
A nonterminal node is denoted by capital letters, A;A1;A2 2 VN: and a terminal node
is denoted by lowercase letters,a; b; c; a1; a2 2 VT . Both nonterminal and terminal nodes have
a vector of attributes denoted by X (A) and x (a), respectively. R = (r1; r2; :::; rk) is a set of
production rules expanding a nonterminal node into a number of nodes in VN [ VT . Each rule
is associated with a number of constraint equations. For example, the following is a rule that
expands one node A into two nodesA1;A2 2 VN :
r : A ! (A1;A2)
The associated equations are constraints on the attributes with constraints
gi(X(A)) = fi(X(A1);X(A2)); i = 1; 2; :::; n®:
gi ()and fi () are usually projection functions that take some elements from the attribute vec-
tors. For instance, letX(A) = (X1;X2;X3) and let X(A) = (X11;X12): Then, an equation
could simply be an equivalence constraint (or assignment) for passing the information between
nodes A and A1 in either direction, X1 = X11: In the parsing process, the attributes of a child
node X11 pass it to X1 in rule r. This is called "bottom-up message passing". Then,X1 may be
passed to another child node's attribute X2, with X21 = X1: This is called "top-down message
passing."
A production rule may instantiate a nonterminal node to a terminal node:
r : A ! a
with constraints
gi(X(A)) = fi(x(a)); i = 1; 2; :::; n®:
Denition 2.A parse graph G is a tree-structured representation expanded from a root node
S by a sequence of production rules (r1; r2; :::; rk) and augmented with spatial relations and
constraints.
Denition 3.A conguration C is a set of terminal nodes (rectangles in this paper):
C = (ai; x(ai)) : ai 2 VT ; i = 1; 2; :::;K:
Denition 4.A language of grammar G is the set of all valid congurations that can be derived
by the production rules starting from a root node S. It is denoted by
(G) =
n
(C;X ©) : S

1;:::;
k ! C;
i 2 R; i = 1; 2; ; :::; k
o
:
4.2 A Class of Primitives:Rectangles
The simple grammar uses only one class of primitives, i.e.,the projection of planar
rectangles in 3-space into the image plane. As illustrated in Fig. 2, it has two pairs of parallel
line segments in 3D, which intersect at two vanishing points v1; v2 in the image plane. There-
fore,the set of terminal nodes is denoted by
VT = (a; x(a)) : x(a) 2
a:
There are many equivalent ways of dening the attributes x(a) for a rectangle.The
9
Figure 2: A planar rectangle (shaded) is described by eight variables: the two vanishing points
v1 = (x1; y1) and v2 = (x2; y2) and the four directions 1; 2; 3 and 4 at the two vanishing
points.
method chooses the variables to simplify the constraint equations and thus denote x(a) by
eight variables: two vanishing points v1 = (x1; y1) and v2 = (x2; y2), two orientations 1 and 2
for the two boundaries converging at v1,and two orientations 3 and 4 for the two boundaries
converging at v2:
x(a) = (x1; y1; x2; y2; 1; 2; 3; 4):
4.3 Six Production Rules
As a generic grammar for image interpretation, the proposed representation has the
root node S for the scene and one nonterminal node A for objects and surfaces:
VN = (S;X(S)); (A;X(A)) : X(S) = n;X(A)
A
The scene node S generates n independent objects.The object node A can be instantiated(assigned)to
a rectangle (rule r5) or can be used recursively by the other four production rules: r2-the line
production rule, r3the mesh production rule, r4the nesting production rule, and r6the cube
production rule.The six production rules are summarized in Fig.3.
For a line object of n = n(A) rectangles,
X0(A) = fv1; v2; 1; 2; 3; 4; 1; :::; n

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	Image Processing & Compression Techniques (Download Full Seminar Report)	Computer Science Clay	42	22,990	07-10-2014, 07:57 PM Last Post: seminar report asees
	Hardware for image processing - Basics Eye – Human vision sensor ppt	computer topic	0	7,767	25-03-2014, 11:12 PM Last Post: computer topic
	sketch image match to digital image	arma	1	1,520	30-06-2013, 12:24 PM Last Post: Guest
	Image Segmentation Using Information Bottleneck Method	seminar class	4	4,023	19-01-2013, 12:45 PM Last Post: seminar details
	Digital Image Watermarking	project report helper	3	5,675	19-12-2012, 11:48 AM Last Post: seminar details
	digital image processing	project topics	1	2,289	19-11-2012, 01:46 PM Last Post: seminar details
	IMAGE COMPRESSION USING WEDGELETS	seminar class	4	3,536	08-11-2012, 12:44 PM Last Post: seminar details
	Fuzzy Random Impulse Noise Removal From Color Image Sequences	computer girl	1	1,702	24-10-2012, 01:45 PM Last Post: seminar details
	Image Edge Detection based on FPGA	seminar class	1	3,978	18-10-2012, 11:43 AM Last Post: seminar details
	Seminar on Attribute Editor	computer girl	0	855	11-06-2012, 02:16 PM Last Post: computer girl

Important Note..!

ASK HERE