WEB MINING
#4
PREPARED BY:
TRIPTI

[attachment=9427]
INTRODUCTION
Explosive growth in information available on www
Web broswers provide easy access to data & text
Finding the desired information is not an easy task
Profusion of resources prompted the need for web Mining
SOFT WEB MINING - a good candidate for developing automated tools in order to find and extract and to evaluate user’s desired info from unlabeled,heterogeneous data.
WEB MINING
Discovery & analysis of useful info from www
Data can be collected at
 Server side
 Client side
 Proxy servers
 Can be obtained from organizational database
Characteristics of web Data
 Unlabeled
 Distributed
 Heterogenous
 Semistructued
 Time varying
WEB MINING COMPONENTS & METHODOLOGIES
INFORMATION RETRIEVAL

Deals with automatic retrieval of all relevant documents
All non-relevant documents are fetched as few as possible
IR process mainly includes
 Document representation
By Furnkranz[36] Bag of words & hyperlink information
By Soderland[40] Sentence, phrases & named entity
 Indexing (collection of terms with pointers to place where documents can be found)
POPULAR INDEX
 ALTA-VISTA
 WEB CRAWLER(can scan millions of documents and store an index of words in the document)
 Searching for Document
Search Engines are used(programs written to query and retrieve info)
INFORMATION SELECTION/EXTRACTION & PREPROCESSING
Task of identifying specific fragments of a single document that constitute its core semantic content
METHODS USED ARE
 Involves writing wrappers(Hand coding)which map the documents to some data models
 Operates by interpreting the various sites as knowledge sources & extract information from them.To do so,system processes the site document to extract relevant text fragments
 To extract info from hypertext.each page is approached with a set of questions and the problem therefore reduces to identifying the text fragments which answer those specific questions

INFORMATION SELECTION/EXTRACTION & PREPROCESSING
LSI(LATENT SEMANTIC INDEXING)
:-Preprocessing technique for IE.
When a user requests a web page it includes:
 variety of files
 Images
 Sound
 Video
 Html pages
Server contains relevant & irrelevant entities,which needs to be removed using this preprocessing technique.
LIS transform the original document to a lower dimensional space by analysing the correlational structure of terms
Similar documents that do not share the same terms are not placed in same category
GENERALIZATION
Uses Pattern Recognition and Machine Learning techniques
Machine learning system learn about user’s interest than web itself
Major OBSTACLE when learning about web is Labelling problem
Data mining technique require inputs labelled as(+ve) or(-ve)
FOR EXAMPLE
given large set of web pages labeled as (+ve) or (-ve) examples of homepage,then
We can design a classifier that predicts whether unknown page is homepage or not..But unfortunately web pages are not labelled.
Clustering technique do not require labelled inputs and outputs
Association Rule Mining(INTEGRAL PART OF THIS PHASE)
X=>Y
X,Y ->Sets of Items
Expresses whenever a Transaction(T) contains X then T probably contains Y also
ANALYSIS
Data Driven Problem
Presumes that there is a sufficient data available to extract & analyse useful information
Important for validation & interpretation of mined patterns
Uses Online Analytical processing(OLAP)techniques
Webminer proposes a SQL like quering mechanism for quering the discovered knoweledge
WEB MINING CATEGORIES
WCM(Web Content Mining)

Deals with the discovery of useful information from the web contents/data/documents/services.
web contents contains
Text
audio
Video
symbolic
metadata
hyperlinked data.
Web Text Data(3 TYPES)
1) unstructured data( free text)
2) semistructured data(HTML)
3) fully structured data( tables or databases).
(WSM)Web Structure Mining
Mining the structure of hyperlinks within the web itself
Structure represents the graph of the links in a site or between sites
Reveals more information than just the information contained in documents.
Rather than collecting all the index,it focues only on the links that are relevant and avoid irrelevant regions
LInks pointing to a document indicate the popularity of the document,
while links coming out of a document indicate the variety of documents
WUM(WEB USAGE MINING
Mines secondary data generated by the user’s interaction with web
Also known as web log mining
Works on user profiles, user access patterns, and mining navigation paths
Plays a key role in personalizing space, which is the need of the hour.
Uses Techniqes like:
 Association Rules
 Clustering
 Sequential Patterns
 Rough Sets
 Fuzzy Logic
LIMITATIONS OF EXISTING WEB MINING METHODS
INFORMATION RETRIEVAL

 Subjectivity, Imprecision, and Uncertainty
 Deduction
 Page Ranking
 Dynamism, Scale, and Heterogeneity
 INFORMATION EXTRACTION
 Based on the “wrapper” technique
 Limitation : Each wrapper is an IE system customized for a particular site and is not universally applicable.
 Ad hoc formatting conventions, used in one site, are rarely relevant elsewhere.
GENERALIZATION
 Clustering
 Outliers
 Association Rule Mining
ANALYSIS
 Knowledge Discovery out of the information Is a challenge to the analysts
 The output of knowledge mining algorithms are not suitable for direct human interpretation.
 The patterns discovered are mainly in mathematical form
Soft Computing & Its Relevance
SOFT COMPUTING & ITS RELEVANCE
SOFT COMPUTING- a collection of methodologies which provide information processing capabilities for handling real life ambiguous situations
FUZZY LOGIC AND WEB MINING
FUZZY SETS – Their elements possess degrees of membership
Classically membership of an element in a set was bivalent
- Element belongs to set(1)
- Element does not belong to set(0)
Designated by a pair (A, m)
A->Set and m : A->[0,1]
Values strictly between 0&1 represent fuzzy members
Here the degree of truth of a statement can range between 0 & 1.
Degree is not restricted to 2 truth values truth->1 and false->0
Deals with reasoning that is approximate rather then precise.
FUZZY LOGIC AND WEB MINING
YAGER described an IR language which enables a user to specify interrelationships between desired attributes of documents sought using linguistic quantifiers e.g. “at least ”, “most” ,”about half”
Q->linguistic expression for quantifier “most”
Represented by fuzzy subset over I=[0,1]
For any proportion r belong to I,
Q ®-> degree to which r satisfies the concept indicated by
quantifier Q.
Model proposed by Koczky & Gedeon
-Helps in retrieval of documents where it cannot be guaranteed that the user queries include actual words that occur in the documents to be retrieved
FUZZY LOGIC AND WEB MINING
Model proposed by Bordonga and Pasi for semi-structured (e.g.HTML) document retrieval
-> representation of document d D
D ->set of archive of documents
t T where T is the set of index terms
Membership function of is
->significance of term t in section s of document d
COMMERCIALLY AVAILABLE SYSTEMS
NZsearch
- Search engine based on Fuzzy Logic
- It considers entire phrase rather than individual words for the purpose of matching
DNS Search
- Uses FL to find the closest DNS entry to your typed URL.
- E.g. You type gogle.com
- System will give suggestions on possible close URLs
Finder
- Uses Multidimensional optimization to display best or “Most suited” matches to the query.
- Existing search engines provide exact match to the query.
- Finder goes beyond “yes” or “no” criterion used by SQL or Btrieve.
- Uses SCORING MODEL
- E.g. If one is looking for a blue car but the car in database was red, it will not ignore the entry all together but will give it a lower score.
FL – Prospective areas of application
Provides human like deductive capability to the search engine.
Can be used in terms of matching by compromising slightly on precision.
For “Page ranking”, the degree of closeness of hits in a document can be used e.g. variables like ”close”, “far”, “nearby” can be used.
NEURAL NETWORK AND WEB MINING
Parallel interconnected network of simple processing elements which is intended to interact with the objects of the real world in the same way as biological systems do.
Designated by
- Network Topology
- Weights
- Node characteristics
- Status updating rules
Characteristics
- Generalization capability
- Adaptivity to new data/information
- Speed due to massively parallel architecture
- Robustness to missing, confusing, ill-defined/noisy data.
- Capability for modeling non-linear decision boundaries.
NEURAL NETWORK AND WEB MINING
WISCONSIN ADAPTIVE WEB ASSISTANT (WAWA-IE+IR) SYSTEM
Suggested by Shavlik
Uses 2 network models
 SCORE LINK
Uses unsupervised learning.
 SCORE PAGE
Uses supervised learning in form of advice from users. Here the system uses Knowledge based Neural Nets (KBNNs) as its knowledge base to encode the initial knowledge of users which is then refined.
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: zigbee in mining, web mining pdf file, mining diamonds**ing energy seminar topic abstract, abstract on web mining, web mining tesis, abstract of web mining, astronomy softwares,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Messages In This Thread
WEB MINING - by seminar projects crazy - 31-01-2009, 12:52 AM
RE: WEB MINING - by sriman - 12-01-2010, 11:20 AM
RE: WEB MINING - by justlikeheaven - 26-01-2010, 09:56 AM
RE: WEB MINING - by seminar class - 03-03-2011, 11:31 AM
RE: WEB MINING - by seminar class - 04-03-2011, 03:34 PM
RE: WEB MINING - by seminar class - 05-03-2011, 04:08 PM
RE: WEB MINING - by seminar class - 10-03-2011, 11:59 AM
RE: WEB MINING - by seminar class - 15-03-2011, 03:04 PM
RE: WEB MINING - by jacktorson - 15-03-2011, 04:20 PM
RE: WEB MINING - by seminar class - 19-04-2011, 11:34 AM
RE: WEB MINING - by seminar class - 12-05-2011, 09:25 AM
RE: WEB MINING - by bhawnaAggarwal - 09-10-2011, 08:01 PM
RE: WEB MINING - by seminar addict - 10-10-2011, 09:56 AM
RE: WEB MINING - by seminar addict - 02-02-2012, 01:23 PM
RE: WEB MINING - by diamondkaju - 02-02-2012, 10:37 PM
RE: WEB MINING - by seminar addict - 03-02-2012, 10:24 AM
RE: WEB MINING - by seminar details - 16-02-2013, 11:43 AM
RE: WEB MINING - by Guest - 15-12-2018, 10:38 PM

Possibly Related Threads...
Thread Author Replies Views Last Post
  It is Imperative to Strengthen Safety in Mining Crusher Industry wanerjob 1 1,157 25-10-2014, 11:22 PM
Last Post: jaseela123d
  Coal Mining Machine from Shanghai Zenith wanerjob 0 928 27-09-2014, 08:59 PM
Last Post: wanerjob
  Web Server for High Performance Biological Sequence Alignment Based on FPGA seminar-database 0 1,384 23-05-2011, 08:15 AM
Last Post: seminar-database
  The Web Sensor Gateway Architecture for ZIGBEE seminar class 1 1,921 03-05-2011, 12:55 PM
Last Post: seminar class
  web traffic???????????? mena jon 0 994 30-04-2011, 01:46 PM
Last Post: mena jon
  Gesture Controlled Web navigation using GestureCam full report seminar class 0 1,172 27-04-2011, 09:23 AM
Last Post: seminar class
  Developing Mobile Web Applications With ASP.NET Mobile Controls seminar class 0 1,697 28-02-2011, 09:42 AM
Last Post: seminar class
Information WEB BROWSING seminar projects crazy 0 1,342 31-01-2009, 11:20 AM
Last Post: seminar projects crazy

Forum Jump: