10-03-2011, 11:59 AM
[attachment=9929]
INTELLIGENT WEB MINING
Improvising efficiency of web search engines
PRESENT SCENARIO
• INDEX BASED SEARCHING
• Page Ranking Algorithm.
• PROBLEMS
• DIFFICULT FOR INEXPERIENCED USERS.(You don’t get what you want!)
• POLYSEMY PROBLEM DUE TO INDEX SEARCH.
• POSSIBLE FLAW IN PAGE RANKING ALGORITHM.
REAL TIME EXAMPLE
• SEARCH FOR “BUSH” IN WWW.GOOGLE.COM
• 266,000,000 RESULTS!!!
• FIRST TEN PAGES ONLY HAS PRESIDENT BUSH.
• IS THERE ONLY PRESIDENT BUSH IN THIS WORLD?
HOW SEARCH ENGINES WORK?
• BY ANALYSING WEB PAGE STRUCTURE, USING DOM TREE STRUCTURE.
OUR PROPOSALS
TO OVER COME THE PRESENT PROBLEMS :
• CONTEXT BASED SEARCH
• DUAL ROLE TREE STRUCTURE
• TAGGING SIMILAR WORDS TOGETHER
• CONTEXT BASED SEARCH
• IDENTIFIES CONTEXTS IN WEB PAGES THROUGH AUTOMATED KEYWORD IDENTIFICATION.
• CONTEXT WORDS BECOME NODES OF CONTEXT BASED TREE.
• NODES ARE ORDERED BASED ON SIMILARITY WITH KEYED IN WORD.
• SEARCH ENGINE SEARCHES CONTEXT TREE.
• DISPLAY.
• DUAL ROLE BASED TREE
DOM TREE STRUCTURE
CONTEXT BASED
TREE STRUCTURE
BUT,HOW TO CREATE CONTEXTS?
ANT ANALOGY
ANT IDENTIFIES SMELL OF FOOD.
HERE SMELL IS ATTRIBUTE.
SIMILARLY IDENTIFY ATTRIBUTES OF DATA.
SEARCH FOR THEM.
ANT LOOKS IN LIKELY PLACES.
SIMILARLY, SEARCH FOR
LIKELY CLUSTERS
USING:
CORRELATION
ANALYSIS.
• PROTOTYPE
Search for : BUSH
• PRESIDENT,SHRUBS,TRIBES(BUSHMEN) could be possible nodes of context tree.
• PRIORITY WOULD BE GIVEN FOR EVERY NODE.
• CHANCES LIKELY THAT USER IS NOT DISAPPOINTED.
BENEFITS
• EFFICIENT
• UN-NECESSARY INFORMATION WILL BE ABSENT.
• IMPROVISES PAGE RANKING ALGORITHM.
INTELLIGENT WEB MINING
Improvising efficiency of web search engines
PRESENT SCENARIO
• INDEX BASED SEARCHING
• Page Ranking Algorithm.
• PROBLEMS
• DIFFICULT FOR INEXPERIENCED USERS.(You don’t get what you want!)
• POLYSEMY PROBLEM DUE TO INDEX SEARCH.
• POSSIBLE FLAW IN PAGE RANKING ALGORITHM.
REAL TIME EXAMPLE
• SEARCH FOR “BUSH” IN WWW.GOOGLE.COM
• 266,000,000 RESULTS!!!
• FIRST TEN PAGES ONLY HAS PRESIDENT BUSH.
• IS THERE ONLY PRESIDENT BUSH IN THIS WORLD?
HOW SEARCH ENGINES WORK?
• BY ANALYSING WEB PAGE STRUCTURE, USING DOM TREE STRUCTURE.
OUR PROPOSALS
TO OVER COME THE PRESENT PROBLEMS :
• CONTEXT BASED SEARCH
• DUAL ROLE TREE STRUCTURE
• TAGGING SIMILAR WORDS TOGETHER
• CONTEXT BASED SEARCH
• IDENTIFIES CONTEXTS IN WEB PAGES THROUGH AUTOMATED KEYWORD IDENTIFICATION.
• CONTEXT WORDS BECOME NODES OF CONTEXT BASED TREE.
• NODES ARE ORDERED BASED ON SIMILARITY WITH KEYED IN WORD.
• SEARCH ENGINE SEARCHES CONTEXT TREE.
• DISPLAY.
• DUAL ROLE BASED TREE
DOM TREE STRUCTURE
CONTEXT BASED
TREE STRUCTURE
BUT,HOW TO CREATE CONTEXTS?
ANT ANALOGY
ANT IDENTIFIES SMELL OF FOOD.
HERE SMELL IS ATTRIBUTE.
SIMILARLY IDENTIFY ATTRIBUTES OF DATA.
SEARCH FOR THEM.
ANT LOOKS IN LIKELY PLACES.
SIMILARLY, SEARCH FOR
LIKELY CLUSTERS
USING:
CORRELATION
ANALYSIS.
• PROTOTYPE
Search for : BUSH
• PRESIDENT,SHRUBS,TRIBES(BUSHMEN) could be possible nodes of context tree.
• PRIORITY WOULD BE GIVEN FOR EVERY NODE.
• CHANCES LIKELY THAT USER IS NOT DISAPPOINTED.
BENEFITS
• EFFICIENT
• UN-NECESSARY INFORMATION WILL BE ABSENT.
• IMPROVISES PAGE RANKING ALGORITHM.