E-MINE: A novel web mining approach
#1

Sir/Madam,
I am Sampath studying B.E.(ISE) at VCET,Mangalore. I want seminar report and ppt on the topic "E-MINE: A novel web mining approach".
Please send those to my mail id "sampathputtur[at]gmail.com".




Thank you.
Reply
#2
[attachment=11879]
E-MINING-A NOVEL WEB MINING APPROACH
DEFINITION

 It is a technique that mines relevant data regions from a web page.
THE PROPOSED TECHNIQUE
 E-Mine – An effective method to mine the data region from a web page automatically
 It enables the system to identify gaps that separate records, which helps to segment data records correctly.
 The visual information also contains information about the hierarchical structure of the tags.
 By observing a webpage, it can be analysed that
the relevant data region occupies the major central part of the Webpage.
SYSTEM OF THE e-Mine TECHNIQUE
HOW ALGORITHM WORKS?

 Determining the height and width of all bounding rectangles.
 Identification of the largest rectangle.
 Identification of the container within the largest rectangle.
 Identification of data region containing data records with in the container.
STEP 1
DETERMINING HEIGHT AND WIDTH OF ALL BOUNDING RECTANGLES

 Determine the dimensions of all the bounding rectangles in the web page.
 If not specified, the MSHTML parsing and rendering engine of Microsoft Internet Explorer 6.0 can be used.
STEP 2
IDENTIFICATION OF THE LARGEST RECTANGLE

 Based on the height and width of bounding rectangles obtained in the previous step, we determine the area of the bounding rectangles
 Among these rectangles determine the largest rectangle.
PROCEDURE FOR IDENTIFICATION OF LARGEST RECTANGLE
 Procedure getMaxRect
 Input: <body> of the HTML source
 for each child of <body> tag
 Begin
 Find the coordinates of the bounding rectangles for the child
 If the area of the bounding rectangle > area of maximum Rectangle
 then Maximum Rectangle = child
 Endif
 end
STEP 3
 Identification of the container with in the largest rectangle
 Once the largest rectangle is obtained, we determine the bounding rectangle having the largest area in the set.
 The reason for determining the largest rectangle within this set is that only the largest rectangle will contain data records.
 Procedure getContainer
 Input: The Largest Rectangle out of all Bounding Rectangles.
 List_of_Children=list of all the children tags associated with Maximum Rectangle.
 for each tag in List_of_Children
 begin
 if area of bounding rectangle of a tag > half the area of Maximum Rectangle
 then container = tag
 Endif
 End.
IRRELEVANT PORTION TO BE FILTERED
STEP 4

 Identification of data region containing data records with in the container
 Filter is used to remove the irrelevant data from a container
PROCEDURE FOR FILTER
 Input: The container obtained from the previous step.
 totalHeight=0
 for each child tag within container
 totalHeight+=height of the bounding rectangle of child
 averageHeight = totalHeight/no of children of container
 averageHeight = totalHeight/no of children of container
 for each child within container
 if height of child’s bounding rectangle < averageHeight
 then Discard child from container
 endif
 end for
 end for
ADVANTAGES
 Overcomes the disadvantages of the existing automated approaches. Eg: MDR Algorithm.
 It enables the system to identify gaps that separate records, which helps to segment data records correctly.
 The visual information also contains information about the hierarchical structure of the tags.
DISADVANTAGES
 It may extract large amount of unwanted data
 The extracted relevant data region from a web page may not be of users interest
CONCLUSION
 This is a new approach to extract structured data from web pages
 eMine is a pure visual structure oriented method that can correctly identify the data regions.
 eMine overcomes the drawbacks of existing methods and performs significantly better than existing methods.
Reply
#3
I think you mentioned wrong disadvantages because it only extracts center portion which contains only useful data but not unwanted data.
Reply
#4
to get information about the topic WEB MINING full report ,ppt and related topic refer the page link bellow

http://studentbank.in/report-web-mining-...ars-report

http://studentbank.in/report-web-mining

http://studentbank.in/report-web-mining?page=2

http://studentbank.in/report-web-mining?page=4

http://studentbank.in/report-e-mine-a-no...g-approach

http://studentbank.in/report-open-web-ap...web-mining

http://studentbank.in/report-webmining-f...nalization

http://studentbank.in/report-web-mining--24452

http://studentbank.in/report-signed-appr...t-outliers

http://studentbank.in/report-web-mining?pid=65307

http://studentbank.in/report-frequent-pa...b-log-data
Reply
#5
send me this ppt
Reply
#6
to get information about the topic WEB MINING full report ,ppt and related topic refer the page link bellow

http://studentbank.in/report-web-mining-...ars-report

http://studentbank.in/report-web-mining

http://studentbank.in/report-web-mining?page=2

http://studentbank.in/report-web-mining?page=4

http://studentbank.in/report-e-mine-a-no...g-approach

http://studentbank.in/report-open-web-ap...web-mining

http://studentbank.in/report-webmining-f...nalization

http://studentbank.in/report-web-mining--24452

http://studentbank.in/report-signed-appr...t-outliers

http://studentbank.in/report-web-mining?pid=65307

http://studentbank.in/report-frequent-pa...b-log-data
Reply
#7
[/size][/font][font=Times New Roman][size=medium]
Reply
#8
seminars report on e-mining for computer science
Reply
#9
E-MINE: A NOVEL WEB MINING APPROACH


.doc   E-MINE.doc (Size: 272 KB / Downloads: 4)

ABSTRACT

In recent years government agencies and industrial enterprises are using the web as the medium of publication. Hence, a large collection of documents, images, text files and other forms of data in structured, semi structured and unstructured forms are available on the web. It has become increasingly difficult to identify relevant pieces of information since the pages are often cluttered with irrelevant content like advertisements, copyright notices, etc surrounding the main content. Thus, we propose a technique that mines the relevant data regions from a web page. This technique is based on three important observations about data regions on the web.

. Introduction

Extracting the regularly structured data records from web pages is an important problem. So far, several attempts have been made to deal with the problem. The main disadvantage with the existing automatic approaches is their assumption that the relevant information of a data record is contained in a contiguous segment of HTML code, which is not always true. Thus, we propose a more effective method to mine the data region in a web page. The algorithm, eMine, finds the data regions formed by all types of tags using visual cues.

Related Work

Related work, mainly in the area of mining data records in a web page is MDR (Mining Data Records). MDR is a well known approach which basically exploits the regularities in the HTML tag structure directly. MDR algorithm makes use of the HTML tag tree of the web page to extract data records from the page. However, an incorrect tag tree may be constructed due to the misuse of HTML tags, which in turn makes it impossible to extract data records correctly.

The Proposed Technique

We propose a novel and an effective method, eMine, to mine the data region from a web page automatically. The basic criteria which eMine uses are the locations on the screen at which tags are rendered i.e. visual Information.

How the Algorithm works?

The algorithm takes the HTML source of the web page as input. In step 2 we scan the HTML document for tags and identify the height and width of all the bounding rectangles. Thus, you have the area of each bounding rectangle. The step 3 finds the largest rectangle out of all the bounding rectangles. Step 4 identifies the container which holds most of the relevant data region (and some irrelevant regions also). Step 5 identifies the actual relevant data region by filtering the irrelevant regions.
The following sections provide more details about the individual modules associated with the algorithm.

Determining the Height and width of all bounding rectangles

In the first step of the proposed technique, we determine the dimensions of all the bounding rectangles in the web page. Every <table> tag in a web page will be associated with a specific height and width attribute. We extract them. If not specified, the MSHTML parsing and rendering engine of Microsoft Internet Explorer 6.0 can be used. This parsing and rendering engine of the web browser gives us the coordinates of a bounding rectangle. We scan the HTML file for tags. For each tag encountered, we determine the coordinates of the bounding rectangle of the corresponding tag and plot it.

Conclusion

In this paper, we have proposed a new approach to extract structured data from web pages. Although the problem has been studied by several researchers, existing techniques make many strong assumptions. eMine is a pure visual structure oriented method that can correctly identify the data regions. Most of the current algorithms fail to correctly determine the data region, when the data region consists of only one data record. Also, most of the approaches fail in the case where a series of data records is separated by an advertisement, followed again by a single data record. eMine works correctly for the above case. Further, the comparisons are made on numbers, unlike other methods where strings or trees are compared. Thus eMine overcomes the drawbacks of existing methods and performs significantly better than existing methods.



Reply
#10
hei frds my name is bharath chandra. studyng btech ece final year..
pls sen me info about technical seminor topic APPLE - a novel approach for direct energy weapon control..mail id is bharath007chandra[at]gmail.com
Reply
#11
to get information about the topic "A novel web mining approach" full report ppt and related topic refer the page link bellow

http://studentbank.in/report-e-mine-a-no...e=threaded

http://studentbank.in/report-e-mine-a-no...g-approach
Reply
#12
to get information about the topic WEB MINING full report ,ppt and related topic refer the page link bellow

http://studentbank.in/report-web-mining-...ars-report

http://studentbank.in/report-web-mining

http://studentbank.in/report-web-mining?page=2

http://studentbank.in/report-web-mining?page=4

http://studentbank.in/report-e-mine-a-no...g-approach

http://studentbank.in/report-open-web-ap...web-mining

http://studentbank.in/report-webmining-f...nalization

http://studentbank.in/report-web-mining--24452

http://studentbank.in/report-signed-appr...t-outliers

http://studentbank.in/report-web-mining?pid=65307

http://studentbank.in/report-frequent-pa...b-log-data
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Tagged Pages: e mine a novel web mining approach definition, emine a novel web mining approach pdf, emine a novel web maining approach, e mine a web mining approach, e mine novel web mining approach, emine a novel web mining approach ppt, emine a novel web mining approach,
Popular Searches: abstract on web mining, goliath remote controlled mine, e mine novel web mining approach, the school is mine avpsy, visual structure oriented approach in web mining, seminar report on e mine, emine a novel web mining approach ppt,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  A New Data Mining Based Network Intrusion Detection Model prem0597 2 4,262 04-05-2018, 09:42 PM
Last Post: Guest
  phonet a voice based web technology tejasree 4 2,459 02-08-2016, 09:36 AM
Last Post: seminar report asees
  made for each other novel by vibhavari verma 2 1,061 21-07-2016, 03:26 PM
Last Post: dhanabhagya
  ppt on effective pattern discovery for text mining 1 544 02-07-2016, 02:21 PM
Last Post: visalakshik
  ppt on effective pattern discovery for text mining 1 491 18-06-2016, 11:35 AM
Last Post: dhanabhagya
  web technologies by a a puntambekar free pdf 1 632 11-06-2016, 03:39 PM
Last Post: dhanabhagya
  web technologies book by aa puntambekar pdf free download 2 830 10-06-2016, 04:47 PM
Last Post: Guest
  download web technologies textbook of technical publications 2 730 10-05-2016, 10:48 AM
Last Post: dhanabhagya
  web technology technical publications puntambekar free download 1 955 07-05-2016, 11:37 AM
Last Post: dhanabhagya
  web based claim processing system project documentation pdf 1 626 29-04-2016, 10:48 AM
Last Post: dhanabhagya

Forum Jump: