ASK HERE

summer project pal · 29-01-2011, 06:57 PM

Learning Rules and Relations from Web Text
A Seminar Report
by
Jayarajan J N
M105108
Department of Computer Science & Engineering
College of Engineering Trivandrum
Kerala - 695016
2010-11

Abstract
As a step towards semantic based searching over the web, we must be able to understand the
relationships among dierent entities that are seen in typical web page content. We must be
able to group dierent entities as groups or classes. Once the entities or objects are classied
we can then try to nd the relationships among them. Once we know the relationships, we
can derive inference rules from those relations. These inference rules can be used to process
semantic based queries in future.
This seminar will concentrate on the part where the inference rules are derived, based on
the paper titled \Learning First Order Horn Clauses from Web Text". This paper investigates
the problem of learning inference rules from Web text in an unsupervised, domain-independent
manner. SHERLOCK - the machine learning system, described here is a rst-order learner that
acquires over 30,000 Horn clauses from Web text. SHERLOCK embodies several innovations,
including a novel rule scoring function based on Statistical Relevance (Salmon et al., 1971)
which is eective on ambiguous, noisy and incomplete Web extractions. Their experiments
show that inference over the learned rules discovers three times as many facts (at precision 0.8)
as the TEXTRUNNER system which merely extracts facts explicitly stated in Web text.

[attachment=8544]

1 Introduction
Todays Web search engines locate pages that match keyword queries. Even sophisticated Web-
based Q/A systems merely locate pages that contain an explicit answer to a question. These
systems are helpless if the answer has to be inferred from multiple sentences, possibly on
dierent pages. To solve this problem, Schoenmackers et al.(2008) introduced the HOLMES
system, which infers answers from tuples extracted from text.
HOLMES's distinction is that it is domain independent and that its inference time is linear in
the size of its input corpus, which enables it to scale to the Web. However, HOLMES's Achilles
heel is that it requires hand-coded, rst-order, Horn clauses as input. Thus, while HOLMES's
inference run time is highly scalable, it requires substantial labor and expertise to hand-craft
the appropriate set of Horn clauses for each new domain.
Is it possible to learn eective rst-order Horn clauses automatically from Web text in a
domain-independent and scalable manner? The paper refers to the set of ground facts derived
from Web text as open-domain theories. Learning Horn clauses has been studied extensively in
the Inductive Logic Programming (ILP) literature (Quinlan, 1990; Muggleton, 1995). However,
learning Horn clauses from open domain theories is particularly challenging for several reasons.
First, the theories denote instances of an unbounded and unknown set of relations. Second,
the ground facts in the theories are noisy, and incomplete. Negative examples are mostly
absent, and certainly the system cannot make the closed-world assumption typically made by
ILP systems. Finally, the names used to denote both entities and relations are rife with both
synonyms and polysymes making their referents ambiguous and resulting in a particularly noisy
and ambiguous set of ground facts.
This paper presents a new ILP method, which is optimized to operate on open-domain
theories derived from massive and diverse corpora such as theWeb, and experimentally conrms
both its eectiveness and superiority over traditional ILP algorithms in this context. Figure 1
shows some example rules that were learned by SHERLOCK.
Figure 1: Example rules learned by SHERLOCK fromWeb extractions. Note that the italicized
rules are unsound.
The work on SHERLOCK makes the following contributions:
1. The paper describe the design and implementation of SHERLOCK system, which utilizes
a novel, unsupervised ILP method to learn rst-order Horn clauses from open-domain
Web text.
5
2. It derives an innovative scoring function that is particularly well-suited to unsupervised
learning from noisy text. For Web text, the scoring function yields more accurate rules
than several functions from the ILP literature.
3. It demonstrate the utility of SHERLOCK's automatically learned inference rules. Infer-
ence using SHERLOCK's learned rules identies three times as many high quality facts
(e.g., precision 0:8) as were originally extracted from the Web text corpus.
The remainder of the paper is organized as follows. It start by describing previous work.
Section 3 introduces SHERLOCK rule learning system, with Section 3.4 describing how it
estimates rule quality. After that the conclusion comes.
6
2 Previous Work
SHERLOCK is one of the rst systems to learn rst-order Horn clauses from open-domain Web
extractions. The learning method in SHERLOCK belongs to the Inductive logic programming
(ILP) subeld of machine learning (Lavrac and Dzeroski, 2001).
2.1 FOIL and PROGOL
Classical ILP systems (e.g., FOIL (Quinlan, 1990) and Progol (Muggleton, 1995)) make strong
assumptions that are inappropriate for open domains.
Progol is an implementation of Inductive Logic Programming used in computer science that
combines \Inverse Entailment" with \general-to-specic search" through a renement graph.
\Inverse Entailment" is used with mode declarations to derive the most-specic clause within
the mode language which entails a given example. This clause is used to guide a renement-
graph search.
First, ILP systems assume high-quality, hand-labeled training examples for each relation
of interest. Second, ILP systems assume that constants uniquely denote individuals; however,
in Web text strings such as \dad" or \John Smith" are highly ambiguous. Third, ILP system
typically assume complete, largely noise-free data whereas tuples extracted from Web text are
both noisy and radically incomplete. Finally, ILP systems typically utilize negative examples,
which are not available when learning from open-domain facts. One system that does not require
negative examples is LIME (Mc- Creath and Sharma, 1997) Most prior ILP and Markov logic
structure learning systems (e.g., (Kok and Domingos, 2005)) are not designed to handle the
noise and incompleteness of open-domain, extracted facts.
2.2 NELL
The research goal of NELL(Never-Ending Language Learner)(Carlson et al., 2010) is to build a
never-ending machine learning system that acquires the ability to extract structured information
from unstructured web pages. If successful, this will result in a knowledge base (i.e., a relational
database) of structured information that mirrors the content of the Web. NELL performs
coupled semi-supervised learning to extract a large knowledge base of instances, relations, and
inference rules, bootstrapping from a few seed examples of each class and relation of interest and
a few constraints among them. In contrast, SHERLOCK focuses mainly on learning inference
rules, but does so without any manually specied seeds or constraints.
Craven et al.(1998) also used ILP to help information extraction on the Web, but required
training examples and focused on a single domain.
2.3 DIRT and RESOLVER
Two other notable systems that learn inference rules from text are DIRT (Lin and Pantel,
2001) and RESOLVER (Yates and Etzioni, 2007). However, both DIRT and RESOLVER learn
only a limited set of rules capturing synonyms, paraphrases, and simple entailments, not more
expressive multi-part Horn clauses. For example, these systems may learn the rule X acquired
Y ) X bought Y , which captures dierent ways of describing a purchase. Applications of
these rules often depend on context (e.g., if a person acquires a skill, that does not mean they
bought the skill). To add the necessary context, ISP (Pantel et al., 2007) learned selectional
preferences (Resnik, 1997) for DIRTs rules. The selectional preferences act as type restrictions
7
Figure 2: The rules learned by NELL which are automatically posted over Twitter
on the arguments, and attempt to lter out incorrect inferences. While these approaches are
useful, they are strictly more limited than the rules learned by SHERLOCK.
The Recognizing Textual Entailment (RTE) task (Dagan et al., 2005) is to determine whether
one sentence entails another. Approaches to RTE include those of Tatu and Moldovan (2007),
which generates inference rules from WordNet lexical chains and a set of axiom templates,
and Pennacchiotti and Zanzotto (2007), which learns inference rules based on similarity across
entailment pairs. In contrast with this work, RTE systems reason over full sentences, but
benet by being given the sentences and training data. SHERLOCK operates over simpler
Web extractions, but is not given guidance about which facts may interact.
8
3 System Description
SHERLOCK takes a large set of open domain facts as input, and returns a set of weighted
Horn-clause inference rules. Other systems (e.g., HOLMES) use the rules to answer questions,
infer additional facts, etc.
SHERLOCKs basic architecture is depicted in Figure 2. To learn inference rules, SHER-
LOCK performs the following steps:
1. Identify classes and instances
2. Discover relations between classes
3. Learn inference rules and evaluate the condence.
Figure 3: Architecture of SHERLOCK. SHERLOCK learns inference rules oine and provides
them to the HOLMES inference engine, which uses the rules to answer queries.
SHERLOCK reads extracted facts from TEXTRUNNER of the form (R; a; b) and utilizes
HOLMES as it's inference engine.
3.1 Identifying Classes and Instances
SHERLOCK rst searches for a set of well-dened classes and class instances. Instances of the
same class tend to behave similarly, so identifying a good set of instances will make it easier to
discover the general properties of the entire class.
Hearst is used to identify interesting classes. It includes a set of textual patterns which
indicate hyponymy (e.g., `Class such as Instance'). The pairs extracted using these patterns
is cleaned using dropping modiers.
SHERLOCK focus on the less ambiguous classes by eliminating any class not appearing as
a descendant of physical entity, social group, physical condition, or event in WordNet. Beyond
this ltering It makes no use of a type hierarchy and treat classes independently.
9
3.2 Discovering Relations between Classes
For every pair of classes (C1;C2), It nd a set of typed, candidate relations from the 100 most
frequent relations in the corpus where the rst argument is an instance of C1 and the second
argument is an instance of C2. For extraction terms with multiple senses (e.g., New York), It
splits their weight based on how frequently they appear with each class in the Hearst patterns.
Relations that are rare and meaningless and whose point-wise mutual information falls below
a threshold are discarded.
PMI(R(C1;C2)) = p(R;C1;C2)
p(R;;)p(;C1;)p(;;C2)
where p(R; ; ) is the probability a random extraction has relation R; p(;C1; ) is the prob-
ability a random extraction has an instance of C1 as its rst argument, p(; ;C2) is similar for
the second argument, and p(R;C1;C2) is the probability that a random extraction has relation
R and instances of C1 and C2 as its rst and second arguments, respectively. A low PMI
indicates the relation occurred by random chance, which is typically due to ambiguous terms
or extraction errors.
3.3 Learning Inference Rules
SHERLOCK attempts to learn inference rules for each typed relation in turn. SHERLOCK
receives a target relation, R, a set of observed examples of the relation, E+, a maximum
clause length k, a minimum support, s, and an acceptance threshold, t, as input. SHERLOCK
generates all rst-order, denite clauses up to length k, where R appears as the head of the
clause. It retains each clause that:
1. Contains no unbound variables
2. Infers at least s examples from E+
3. Scores at least t according to the score function
3.4 Evaluating Rules
Statistical relevance tries to infer the simplest set of factors which explain an observation. It can
be viewed as searching for the simplest propositional Horn-clause which increases the likelihood
of a goal proposition g. The two key ideas in determining statistical relevance are discovering
factors which substantially increase the likelihood of g (even if the probabilities are small in an
absolute sense), and dismissing irrelevant factors.
Statistical relevance appears useful in the opendomain context, since all the necessary prob-
abilities can be estimated from only positive examples. Furthermore, approximating relative
probabilities in the presence of missing data is much more reliable than determining absolute
probabilities.
10
For the Horn-clause Head(v1; :::; vn) :- Body(v1; :::; vm) (where Body(v1; :::; vm) is a conjunc-
tion of function-free, non-negated, rst-order relations, and vi 2 V is the set of typed variables
used in the rule), It says the body helps explain the head if:
1. Observing an instance of the body substantially increases the probability of observing the
head.
2. The body contains no irrelevant (conditionally independent) terms.
Denition 1 A rst-order Horn-clause Head(:: Smile

:- Body(:: Smile

is statistically relevant if
p(Head(:: Smile

jBody(::

) p(Head(:: Smile

) and if there is no clause body B0(:: Smile

Body(::

such
that p(Head(:: Smile

jBody(::

) p(Head(:: Smile

jB0(::

)
11
4 Functional Features of SHERLOCK
There are many distingushable improvements brought up in SHERLOCK compared to similar
systems. SHERLOCK diers in the way it infers the facts, weight calculation, and the way it
handles Open domain nature and the absence of the negetive examples.
4.1 Inference
As input HOLMES requires a target atom H(:: Smile

, an evidence set E and weighted rule set R
as input. It performs a form of knowledge based model construction (Wellman et al., 1992),
rst nding facts using logical inference, then estimating the condence of each using a Markov
Logic Network (Richardson and Domingos, 2006).
Since the rules and inferences are based on a set of noisy and incomplete extractions, the algo-
rithms used for both weight learning and inference should capture the following characteristics
of the concerned problem:
1. Any arbitrary unknown fact is highly unlikely to be true.
2. The more frequently a fact is extracted from the Web, the more likely it is to be true.
However, facts in E should have a condence bounded by a threshold pmax < 1. E
contains systematic extraction errors, so they want uncertainty even in the most frequently
extracted facts.
3. An inference that combines uncertain facts should be less likely than each fact it uses.
4.2 Weight Learning
The weight is a count of true groundings for each rule. Since the web extractions are noisy It
will be an over estimate. Therefore It computes the number of true groundings for a rule i as
follows:
ni(E) =
X
j
maxk
Y
B(:: Smile

2Bodyijk
p(B(:: Smile

)
1. ni(E) is the number of true groundings for the rule i, where E is the evidence.
2. p(B(:: Smile

) is an approximated value which ranges from 0 to 0.75.
3. Bodyijk is the body of the kth grounding rule of the jth head of the ith rule.
Longer rules have higher weight as they are more likely to be inferred wrongly. Otherwise,
each rule will receive a high weight as it has no negative examples.
4.3 Probabilistic Inference
After learning the weights, It adds the following two rules to the rule set:
1. H(:: Smile

with negative weight wprior
2. H(:: Smile

:- ExtractedF rom(H(:: Smile

; sentencei) with weight 1
HOLMES attempts to infer the truth value of each ground atom H(:: Smile

in turn by treating all
other extractions E in the corpus as evidence. Inference also requires computing ni(E) which
is done according to Equation 1 as in weight learning.
12
5 Conclusion
Here we have discussed about the dierent ILPs for knowledge building and about the project
\Open Information Extraction" which includes RESOLVER, TEXTRUNNER, HOLMES and
SHERLOCK. We have seen that SHERLOCK works in the open domain and learns rst-order
rules without supervision based on the papers published in connection with the above said
project.
SHERLOCK uses an optimistic approach towards the learned rules and brings the fact, that
the learned rule could be wrong, into account and its modied weight learning and probabilistic
inference together makes it possible to run in unsupervised mode.
This system may help us to build accurate knowledge bases which can be used to build high
performance AI based systems and expert systems.
13
References
[1] Daniel S. Weld Jesse Davis Stefan Schoenmackers, Oren Etzioni. \Learning rst-order horn
clauses fromweb text". In EMNLP2010 International Conference, 2010.
[2] Oren Etzioni Thomas Lin, Mausam. \Identifying functional relations in web text". In
EMNLP2010 International Conference, 2010.
[3] O. Etzioni A. Yates. \Unsupervised methods for determining object and relation synonyms
on the web". International Journal on Articial Intelligence Research, 2009.

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	web spoofing full report	computer science technology	9	11,059	26-03-2014, 06:29 AM Last Post: Guest
	Web Services Architecture	computer topic	0	7,587	25-03-2014, 10:20 PM Last Post: computer topic
	Opera (web browser)	computer science crazy	3	4,380	08-07-2013, 12:45 PM Last Post: computer topic
	Emotional Annotation of Text	project topics	4	3,227	07-02-2013, 10:24 AM Last Post: seminar details
	Relation-Based Search Engine in Semantic Web	project topics	1	2,170	21-12-2012, 11:00 AM Last Post: seminar details
	A survey of usage of Data Mining and Data Warehousing in Academic Institution and Lib	seminar class	1	2,142	29-11-2012, 12:56 PM Last Post: seminar details
	Recent Researches on Web Page Ranking	computer science crazy	1	1,812	30-10-2012, 02:04 PM Last Post: seminar details
	e learning	yaminee malika	3	2,930	03-10-2012, 03:51 PM Last Post: seminar details
	Intelligent Electronic Devices (IEDs) and Supervisory Control and Data Acquisition	computer girl	0	1,155	09-06-2012, 06:01 PM Last Post: computer girl
	Ontology Description using OWL to Support Semantic Web Applications	computer girl	0	1,032	09-06-2012, 02:25 PM Last Post: computer girl

Important Note..!

ASK HERE