Signature-Based Indexing Method for Efficient Content-Based Retrieval
#1

A Signature-Based Indexing Method for Efficient Content-Based Retrieval of Relative Temporal Patterns
Abstract:
A number of algorithms have been proposed for the discovery of dataâ„¢s from the large database. However, since the number of generated patterns can be large, selecting which patterns to analyze can be nontrivial. There is thus a need for algorithms and tools that can assist in the selection of discovered patterns so that subsequent analysis can be performed in an efficient and, ideally, interactive manner. In this project, we propose a signature-based indexing method to optimize the storage and retrieval of a relative dataâ„¢s from the large database.
Existing Systems:
Inverted Files indexing method concentrate partial match retrieval, which are basically subset queries.
An inverted list that stores a list of references to all occurrences of this value in the database
Proposed System:
Focuses on supporting content-based queries of dataâ„¢s from the database.
Efficiently can be retrieved by signature file indexing method
System Requirements:
Hardware:
PROCESSOR : PENTIUM IV 2.6 GHz
RAM : 512 MB DD RAM
MONITOR : 15 COLOR
HARD DISK : 20 GB
Software:
FRONT END : J2EE (JSP)
OPERATING SYSTEM : Windowâ„¢s Xp
BACK END : Sql Server 2000
Reply
#2
[attachment=2767]

A Signature-Based Indexing Method for Efficient Content-Based Retrieval of Relative dataâ„¢s
INTRODUCTION
MANY rule discovery algorithms in data mining generate a large number of patterns/rules, sometimes even exceeding the size of the underlying database, with only a small fraction being of interest to the user It is generally understood that interpreting the discovered patterns/rules to gain insight into the domain is an important phase in the knowledge discovery process. However, when there are a large number of generated rules, identifying and analyzing those that are interesting becomes difficult. For example, providing the user with a list of association rules ranked by their confidence and support might not be a good way of organizing the set of rules as this method would overwhelm the user and not all rules with high confidence and support are necessarily interesting for a variety of reasons.
Therefore, to be useful, a data mining system must manage the generated rules by offering flexible tools for rule selection. In the case of association rule mining, several approaches for the post processing of discovered association rules have been discussed. One approach is to group similar rules which work well for a moderate number of rules. However, for a larger number of rules it produces too many clusters. A more flexible approach is to allow the identification of rules that are of special importance to the user through templates or data mining queries. This approach can complement the rule grouping approach and has been used to specify interesting and uninteresting classes of rules (for both association and episodic rules).The importance of data mining queries has been highlighted by the introduction of the inductive database concept, which allows the user to both query the data and query patterns, rules, and models extracted from these data.
Review of Set and Sequence Retrieval
The retrieval of data objects on set-valued attributes (for short set retrieval) is an important research topic with wide areas of applications. A significant amount of todayâ„¢s stored data consists of records with set-valued attributes (i.e. attributes that are sets of items). Set-valued attributes are extensively used in object oriented databases to represent an objectâ„¢s multivalued attribute (Ishikawa, Kitagawa & Ohbo 1993), in multimedia databases representing objects inside an image (Rabitti & Zezula 1990), and in data mining applications representing basket market data (Morzy & Zakrzewicz 1998). Although advanced database systems, such as nested relational or object-oriented database systems, provide the means to store set-valued attributes in the databases, they do not provide language primitives or indexes to process and query such attributes. Furthermore, some of the existing index structures proposed in the database literature, for example, (B+ trees (Comer 1979) and R trees (Guttman 1984), etc.), are not designed to fully support set value manipulation in general. Therefore, new types of index structure have been proposed in the literature to support queries on set-valued attributes, namely, signature files and inverted files.
One of application domains that would benefit from the possibility of per-
forming efficient querying on sets is data mining. Several data mining techniques
rely on excessive set processing, especially in the case of mining association rules using the Apriori family of algorithms. Shifting these computations from the
data mining algorithms to the database engines could result in considerable time
savings. Efficient set retrieval is also useful during pattern post-processing for
the selection of discovered association rules according to user-defined criteria, or
for the querying of the database against association rules to identify transactions
that satisfy certain criteria. Recently the set retrieval using signature files has
also been used as a basis for sequential patterns retrieval.
Data Flow Diagram
Modules and Description
¢ Data collection from the user end.
¢ Constructing raw database into Signature Files
¢ Query process
¢ Answering the query from database
Module 1:
In this module we are getting the data from the user end which is maintained as a raw database for our project.
Module 2:
1. Converting the raw database into equivalent id-based file.
2. Id based database is converted into equivalent signature file.
Module 3:
In this module the query is converted into equivalent signature form i.e. into bit form In this module the (content-based queries), Let D be a temporal pattern database and q be a query pattern. The four forms of content-based queries that this research supports include the following:
1. Sub pattern queries. Find those patterns in D that contain q.
2. Super pattern queries. Find those patterns in D that are a sub-
pattern of q.
3. Equality queries. Find those patterns in D equal to q.
4. K-nearest subpattern queries. Find the k most similar patterns in D to q.
Superpattern queries are useful when searching for the characteristic parts of a large pattern, while k-nearest sub pattern queries limit the number of patterns generated by sub pattern or super pattern queries. The temporal pattern is converted into equivalent sets.
Input: A database D of temporal patterns
Output: SignatureFile
1: for each p 2 D do
2: EðpÞ ¼ Equivalent SetðpÞ
3: sigp ¼ SignatureðEðpÞÞ
4: Insert hsigp; pidpi into SignatureFile
5: end for
6:return SignatureFile
Pseudocode of evaluateSubPattern using BSSF
Input: Temporal pattern database D, a query pattern q
Output: AnswerSet
1: EðqÞ ¼ Equivalent SetðqÞ
2: sigq ¼ SignatureðEðqÞÞ
3: Retrieve the bit slices corresponding to the bit position
set to 1 in sigq
4: Perform a bitwise intersect operation on the retrieved
bit slices
5: for each entry where 1 is set in the resulting intersect
bit slice do
6: Add the corresponding pidp into the PID list
7: end for
8: for each pidp in the PID list do
9: Retrieve p from D
10: if p w q then
11: Add p into AnswerSet
12: end if
13: end for
14: return AnswerSet
Pseudocode of evaluateSuperPattern using BSSF
Input: Temporal pattern database D, a query pattern q
Output: AnswerSet
1: EðqÞ ¼ Equivalent SetðqÞ
2: sigq ¼ SignatureðEðqÞÞ
3: Retrieve the bit slices corresponding to the bit position
set to 0 in sigq
4: Perform a bitwise union operation on the retrieved bit
slices
5: for each entry where 0 is set in the resulting union bit
slice do
6: add the corresponding pidp into the PID list
7: end for
8: for each pidp in the PID list do
9: Retrieve p from D
10: if p v q then
11: Add p into AnswerSet
12: end if
13: end for
14:return AnswerSet
Module 4:
Here we address the problem of Super pattern and sub pattern queries.
In this filtering step, the false-drop process is done.
UML Diagram
Use case diagram
Activity diagram:
Collaboration Diagram
Sequence Diagram
Literature Review
Inverted files have been used extensively in text retrieval (Moffat & Zobel 1996, Zobel, Moffat & Ramamohanarao 1998). Their main application is for supporting partial match retrieval, which is basically subset queries. This section describes the use of inverted files for supporting set retrieval. Given a collection of target sets, D, an inverted file consists of a directory containing all distinct values in D and, for each value in the directory, an inverted list that stores a list of references to all occurrences of this value in the database, that is a list of references to target sets containing the value. Consider the market basket data containing four transactions and five items in Figure 6.1. Using this database an inverted file index can be created and is shown in Figure 1.
An inverted list of an item is presented as , where n is the number of transactions in which an item appears, followed by transaction identifiers (tids). As shown in the figure, the inverted list of the item book is because it appears in two transactions, namely, transactions 1 and 4. If D contains a large number of items, the search values in the directory are usually stored in the B-tree.
Helmer and Moerkotte (1999) have adapted inverted files for set retrieval and modified the inverted list by also storing the cardinality of the target set with
each target set reference, so that set queries can be answered more efficiently by using the cardinalities as a quick pre-test. As an example, the inverted list of the item book becomes
After an inverted file has been built, subset queries can be processed as follows:
for each item in the query set the appropriate list is fetched, and then all those
lists are intersected. The result of intersection contains a list of references to
sets that contains the query set. The equality queries are processed the same
way as with subset queries, but can be improved by eliminating all references
to target sets whose cardinality is not equal to the query set cardinality. When
evaluating superset queries, all lists associated with the values in the query set
are retrieved. Then the number of occurrences of each reference appearing in the
retrieved lists is counted. A reference whose number of occurrences is not equal
to the cardinality of its set is eliminated. The existence of such a reference means
that the reference appears in the lists associated with the values that are not in
the query set, so that its set cannot be a subset of the query set
Helmer and Moerkotte (1999) compare the performance of three signaturebased
indexes against that of the inverted file index in processing equality, subset
and superset queries. They conclude that the inverted file index structure dominated
other index structures for subset and superset queries in terms of query
processing time. Kouris et al. (2004) have used the inverted file index to improve
the performance of an Apriori-based algorithm in discovering association
rules. The index is accessed during support counting so that instead of reading
the original database, the mining algorithm scans the inverted file index stored in
memory. Tuzhilin and Liu (2002) use inverted file indexing scheme for querying
multiple sets of discovered association rules. A comparison between inverted files
and signature files is also studied by Zobel et al. (1998) and Carterette and Can
(2005).
Set Retrieval using Signature Files
The purpose of using signature files in set retrieval is to filter out the non qualifying data objects. The basic idea is to represent the set-valued attribute of data objects into bit patterns, called signatures, and store them in a separate file which acts as a filter to eliminate the non-qualifying data objects when processing set queries. A signature failing to match the query signature guarantees that the corresponding object can be ignored. Therefore, unnecessary object access is prevented. Since direct set comparisons are very expensive, using signatures as filters can speed up query processing in set retrieval.
Methods for Generating Set Signatures
In set retrieval with signature files, a target signature is generated for each target
set and stored in the signature file. A number of signature generation methods
have been proposed by Faloutsos and Chistodoulakis (1987) in the context of text
retrieval. These methods are Word Signatures (WS), Superimposed Coding (SC),
Bit-Block Compresion (BC), and Run Length Encoding (RL). The description of
each method for generating target signatures is given below.
1. Word signature (WS). In the WS method each element of the target set
is hashed into a bit pattern of a certain length. These patterns, called word
signatures, are then concatenated to form the target signature.
2. Superimposed Coding (SC). In the SC method, each element in a target
set is hashed to a binary bit pattern called an element signature. All element
signatures have F bit length, and exactly m bits are set to ˜1™, where m < F.
F is called the length of a signature, while m is called the weight of an
element signature. Then, a target signature is obtained by bit-wise OR-ing
(superimposed coding) element signatures of all the elements in the target
set.
3. Bit-Block Compression (BC). The signature extraction process for BC
is similar to SC. The difference is that the original size (length) of the
signature, designated as B, is large, and for each element of a target set
only one bit is set to ˜1™ (i.e., m = 1). As a result, the bit vector B of the
set signature is sparse. Therefore, before storing the signature, B is divided
into groups of consecutive bits of size b and compressed.
CHAPTER 6. REVIEW OF SET AND SEQUENCE RETRIEVAL 125
4. Run Length Encoding (RL). The RL method is similar to both SC and
BC. It differs from BC only in the compression method. RL records the
distances between the positions of bits with value ˜1™.
Of these four methods, the most commonly used method in set retrieval is the
superimposed coding (Helmer & Moerkotte 1999, Ishikawa et al. 1993, Tousidou,
Bozanis & Manolopoulos 2002, Morzy & Zakrzewicz 1998). Therefore, for the rest
of this chapter, unless stated otherwise, it will be assumed that the superimposed
coding is used to generate set signatures. Figure 6.3 illustrates the generation of
set signature using the superimposed coding when the value of F = 8 and m = 2.
Fig.2
Reply
#3
i want to clear information on

A Signature Based Indexing Method for Efficient Content Based Retrieval
Of Relative Temporal Patterns.
plz send to mail[/font]

Reply
#4
i need the full report on this project
Reply
#5

hi
please go through the following threads..

http://studentbank.in/report-a-signature...5#pid12455
http://studentbank.in/report-a-signature...relative-t
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: system architecture for signature based indexing method for efficient content based retrieval, ppt on signature based indexing method for efficient content based retrieval of relative temporal patterns, indexing and retrieval of video database, a signature based indexing method for efficient content based retrieval of relative temporal patterns with pdf, equality maryland, signature based indexing method project requirements, indexing method information retrieval,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  An Efficient Algorithm for Mining Frequent Patterns full report project topics 3 4,806 01-10-2016, 10:02 AM
Last Post: Guest
  Content-based image retrieval (CBIR) System project topics 15 13,855 13-05-2016, 02:30 PM
Last Post: dhanabhagya
  Hybrid Intrusion Detection with Weighted Signature Generation over Anomalous Internet electronics seminars 6 3,327 26-04-2013, 01:58 PM
Last Post: Guest
  A Novel Framework for Semantic Annotation and Personalized Retrieval of Sports Video electronics seminars 4 3,059 23-03-2013, 07:40 PM
Last Post: [email protected]
  Efficient Graph-Based Image Segmentation seminar class 2 3,369 02-02-2013, 01:58 PM
Last Post: seminar details
  Efficient Broadcasting in Mobile Ad Hoc Networks project report tiger 11 10,077 20-12-2012, 12:30 PM
Last Post: seminar details
  Information Retrieval Knowledge Management System mechanical engineering crazy 4 3,470 28-11-2012, 12:53 PM
Last Post: seminar details
  Energy-Efficient Routing in Mobile Ad Hoc Networks: Mobility-Assisted Case (Java) project topics 1 1,766 12-11-2012, 12:42 PM
Last Post: seminar details
  Low Power Design of Precomputation-Based Content-Addressable Memory seminar class 2 1,790 09-11-2012, 12:51 PM
Last Post: seminar details
  Signature Verification nit_cal 2 2,562 24-10-2012, 01:27 PM
Last Post: seminar details

Forum Jump: