E-MINE: A novel web mining approach
#2

[attachment=11879]
E-MINING-A NOVEL WEB MINING APPROACH
DEFINITION

 It is a technique that mines relevant data regions from a web page.
THE PROPOSED TECHNIQUE
 E-Mine – An effective method to mine the data region from a web page automatically
 It enables the system to identify gaps that separate records, which helps to segment data records correctly.
 The visual information also contains information about the hierarchical structure of the tags.
 By observing a webpage, it can be analysed that
the relevant data region occupies the major central part of the Webpage.
SYSTEM OF THE e-Mine TECHNIQUE
HOW ALGORITHM WORKS?

 Determining the height and width of all bounding rectangles.
 Identification of the largest rectangle.
 Identification of the container within the largest rectangle.
 Identification of data region containing data records with in the container.
STEP 1
DETERMINING HEIGHT AND WIDTH OF ALL BOUNDING RECTANGLES

 Determine the dimensions of all the bounding rectangles in the web page.
 If not specified, the MSHTML parsing and rendering engine of Microsoft Internet Explorer 6.0 can be used.
STEP 2
IDENTIFICATION OF THE LARGEST RECTANGLE

 Based on the height and width of bounding rectangles obtained in the previous step, we determine the area of the bounding rectangles
 Among these rectangles determine the largest rectangle.
PROCEDURE FOR IDENTIFICATION OF LARGEST RECTANGLE
 Procedure getMaxRect
 Input: <body> of the HTML source
 for each child of <body> tag
 Begin
 Find the coordinates of the bounding rectangles for the child
 If the area of the bounding rectangle > area of maximum Rectangle
 then Maximum Rectangle = child
 Endif
 end
STEP 3
 Identification of the container with in the largest rectangle
 Once the largest rectangle is obtained, we determine the bounding rectangle having the largest area in the set.
 The reason for determining the largest rectangle within this set is that only the largest rectangle will contain data records.
 Procedure getContainer
 Input: The Largest Rectangle out of all Bounding Rectangles.
 List_of_Children=list of all the children tags associated with Maximum Rectangle.
 for each tag in List_of_Children
 begin
 if area of bounding rectangle of a tag > half the area of Maximum Rectangle
 then container = tag
 Endif
 End.
IRRELEVANT PORTION TO BE FILTERED
STEP 4

 Identification of data region containing data records with in the container
 Filter is used to remove the irrelevant data from a container
PROCEDURE FOR FILTER
 Input: The container obtained from the previous step.
 totalHeight=0
 for each child tag within container
 totalHeight+=height of the bounding rectangle of child
 averageHeight = totalHeight/no of children of container
 averageHeight = totalHeight/no of children of container
 for each child within container
 if height of child’s bounding rectangle < averageHeight
 then Discard child from container
 endif
 end for
 end for
ADVANTAGES
 Overcomes the disadvantages of the existing automated approaches. Eg: MDR Algorithm.
 It enables the system to identify gaps that separate records, which helps to segment data records correctly.
 The visual information also contains information about the hierarchical structure of the tags.
DISADVANTAGES
 It may extract large amount of unwanted data
 The extracted relevant data region from a web page may not be of users interest
CONCLUSION
 This is a new approach to extract structured data from web pages
 eMine is a pure visual structure oriented method that can correctly identify the data regions.
 eMine overcomes the drawbacks of existing methods and performs significantly better than existing methods.
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Tagged Pages: emine a novel web mining approach ppt, e mine a novel web mining approach, emine a novel web maining approach, e mine a novel web mining approach definition, e mine novel web mining approach, emine a novel web mining approach, www e mine a novel web mining approach,
Popular Searches: web mining seminar topics, emine a novel web maining approach, container com, e mine a web mining approach, web mining book, web mining, ieee papers on e mine,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Messages In This Thread
RE: E-MINE: A novel web mining approach - by seminar class - 07-04-2011, 04:37 PM
RE: E-MINE: A novel web mining approach - by Guest - 20-02-2012, 05:55 PM
RE: E-MINE: A novel web mining approach - by Guest - 13-03-2012, 04:57 PM

Possibly Related Threads...
Thread Author Replies Views Last Post
  A New Data Mining Based Network Intrusion Detection Model prem0597 2 4,318 04-05-2018, 09:42 PM
Last Post: Guest
  phonet a voice based web technology tejasree 4 2,529 02-08-2016, 09:36 AM
Last Post: seminar report asees
  made for each other novel by vibhavari verma 2 1,101 21-07-2016, 03:26 PM
Last Post: dhanabhagya
  ppt on effective pattern discovery for text mining 1 582 02-07-2016, 02:21 PM
Last Post: visalakshik
  ppt on effective pattern discovery for text mining 1 538 18-06-2016, 11:35 AM
Last Post: dhanabhagya
  web technologies by a a puntambekar free pdf 1 676 11-06-2016, 03:39 PM
Last Post: dhanabhagya
  web technologies book by aa puntambekar pdf free download 2 885 10-06-2016, 04:47 PM
Last Post: Guest
  download web technologies textbook of technical publications 2 765 10-05-2016, 10:48 AM
Last Post: dhanabhagya
  web technology technical publications puntambekar free download 1 986 07-05-2016, 11:37 AM
Last Post: dhanabhagya
  web based claim processing system project documentation pdf 1 678 29-04-2016, 10:48 AM
Last Post: dhanabhagya

Forum Jump: