ASK HERE

seminar surveyer · 27-01-2011, 02:38 PM

[attachment=8481]

BY
Sudheer reddy . B

Agenda

Definition
History
Overview
Performance Measures
What IR Do-How ?
Traditional View of IR

History :

The idea of using computers to search for relevant pieces of information was popularized by the article “As We May Think” by Vannevar Bush in 1945.
The first automated information retrieval systems were introduced in the 1950s and 1960s.
In 1992, the US Department of Defense along with the NIST cosponsored the Text Retrieval Conference(TREC) program-Web Search Engines.

Overview :

An information retrieval process begins when a user enters a Query into the system.
Process may then be iterated if the user wishes to refine the query.

What IR Systems Try to Do ?

Predict, on the basis of some information about the user, and information about the knowledge resource, what information objects are likely to be the most appropriate for the user to interact with, at any particular time.

How IR Systems Try to Do This

Represent the user’s information problem (the query)
Represent (surrogate) and organize (classify) the contents of the knowledge resource
Compare query to surrogates (predict relevance)
Present results to the user for interaction/judgment

Performance measures :

Traditional goal of IR is to retrieve all and only the relevant IOs in response to a query.
All is measured by recall: the proportion of relevant IOs in the collection which are retrieved
Only is measured by precision: the proportion of retrieved IOs which are relevant

seminar class · 11-03-2011, 02:21 PM

[attachment=9990]
Information Retrieval Systems
n Information retrieval (IR) systems use a simpler data model than database systems
l Information organized as a collection of documents
l Documents are unstructured, no schema
n Information retrieval locates relevant documents, on the basis of user input such as keywords or example documents
l e.g., find documents containing the words “database systems”
n Can be used even on textual descriptions provided with non-textual data such as images
n Web search engines are the most familiar example of IR systems
n Differences from database systems
l IR systems don’t deal with transactional updates (including concurrency control and recovery)
l Database systems deal with structured data, with schemas that define the data organization
l IR systems deal with some querying issues not generally addressed by database systems
n Approximate searching by keywords
n Ranking of retrieved answers by estimated degree of relevance
Keyword Search
n In full text retrieval, all the words in each document are considered to be keywords.
l We use the word term to refer to the words in a document
n Information-retrieval systems typically allow query expressions formed using keywords and the logical connectives and, or, and not
l Ands are implicit, even if not explicitly specified
n Ranking of documents on the basis of estimated relevance to a query is critical
l Relevance ranking is based on factors such as
 Term frequency
– Frequency of occurrence of query keyword in document
 Inverse document frequency
– How many documents the query keyword occurs in
» Fewer è give more importance to keyword
 Hyperlinks to documents
– More links to a document è document is more important
Relevance Ranking Using Terms
n TF-IDF (Term frequency/Inverse Document frequency) ranking:
l Let n(d) = number of terms in the document d
l n(d, t) = number of occurrences of term t in the document d.
l Relevance of a document d to a term t
 The log factor is to avoid excessive weight to frequent terms
Relevance of document to query Q
n Most systems add to the above model
l Words that occur in title, author list, section headings, etc. are given greater importance
l Words whose first occurrence is late in the document are given lower importance
l Very common words such as “a”, “an”, “the”, “it” etc are eliminated
 Called stop words
l Proximity: if keywords in query occur close together in the document, the document has higher importance than if they occur far apart
n Documents are returned in decreasing order of relevance score
l Usually only top few documents are returned, not all
Similarity Based Retrieval
n Similarity based retrieval - retrieve documents similar to a given document
l Similarity may be defined on the basis of common words
 E.g. find k terms in A with highest TF (d, t ) / n (t ) and use these terms to find relevance of other documents.
n Relevance feedback: Similarity can be used to refine answer set to keyword query
l User selects a few relevant documents from those retrieved by keyword query, and system finds other documents similar to these
n Vector space model: define an n-dimensional space, where n is the number of words in the document set.
l Vector for document d goes from origin to a point whose i th coordinate is TF (d,t ) / n (t )
l The cosine of the angle between the vectors of two documents is used as a measure of their similarity.

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	What Networking of Information Can Do for Cloud Computing	project topics	1	8,280	29-03-2013, 01:03 AM Last Post: Guest
	Image Segmentation Using Information Bottleneck Method	seminar class	4	4,116	19-01-2013, 12:45 PM Last Post: seminar details
	Management Information System	computer science crazy	1	2,165	31-12-2012, 04:14 PM Last Post: seminar details
	TWO WAY STUDENT INFORMATION SYSTEM USING CELLULAR TECHNOLOGY	smart paper boy	3	3,583	24-12-2012, 11:24 AM Last Post: seminar details
	Cybereconomy : Information Technology and Economy	computer science crazy	1	2,822	23-11-2012, 01:00 PM Last Post: seminar details
	Cybereconomy : Information Technology and Economy	Electrical Fan	2	3,002	23-11-2012, 01:00 PM Last Post: seminar details
	Embedded Systems and Information Appliances	seminar projects crazy	1	2,186	22-10-2012, 01:21 PM Last Post: seminar details
	SMS BASED MARK INFORMATION SYSTEM	seminar class	5	5,277	07-03-2012, 12:15 PM Last Post: seminar paper
	Raga Identification of Carnatic music for Music Information Retrieval full report	seminar topics	1	2,815	13-02-2012, 02:59 PM Last Post: seminar paper
	INFORMATION RETRIEVAL AND KNOWLEDGE MANAGEMENT SYSTEM	project report helper	1	1,813	13-02-2012, 02:59 PM Last Post: seminar paper

Important Note..!

ASK HERE