Locating and Recognizing Text inWWWImages
#1

[attachment=5061]
Locating and Recognizing Text inWWWImages

DANIEL LOPRESTI dpl[at]research.bell-labs.com
Bell Laboratories, Lucent Technologies, Inc., 600 Mountain Avenue, Murray Hill, NJ 07974, USA
JIANGYING ZHOU jiangying[at]summus.com
Summus Ltd., Suite 2200, 2000 Center Point Drive, Columbia, SC 29210, USA
Received April 1, 1999; Revised December 30, 1999; Accepted December 30, 1999

Abstract .
The explosive growth of the World Wide Web has resulted in a distributed database consisting of hundreds of millions of documents. While existing search engines index a page based on the text that is readily extracted from its HTML encoding, an increasing amount of the information on the Web is embedded in images. This situation presents a new and exciting challenge for the fields of document analysis and information retrieval, as WWW image text is typically rendered in color and at very low spatial resolutions. In this paper, we survey the results of several years of our work in the area. For the problem of locating text in Web images, we describe a procedure based on clustering in color space followed by a connected-components analysis that seems promising. For character recognition, we discuss techniques using polynomial surface fitting and “fuzzy” n-tuple classifiers. Also presented are the results of several experiments that demonstrate where our methods perform well and where more work needs to be done. We conclude with a discussion of topics for further research. Keywords: document analysis, information retrieval, optical character recognition, WWW image text 1. Introduction Traditionally, the field of document analysis has focused on the translation of information contained in paper documents to an electronic form. The myriad of problems that arise in the process have been studied for decades. Some are widely accepted to be difficult, while others have been addressed satisfactorily, at least in certain special cases. Many of the problems still considered open have nonetheless received a good deal of attention in the literature, and are well-understood, if not yet solved. A recent development of note, however, is the explosive growth of the World Wide Web (WWW) and the rapid proliferation of electronic documents it has fostered. Since 1993, the number ofWWWservers has been increasing at an exponential rate, currently doubling every six months; it now totals over 7,000,000 (Zakon). The popular Alta Vista search engine indexes 250,000,000 Web pages, and processes tens of millions of HTTP requests each day (Alta Vista, Search EngineWatch). While it is now generally acknowledged, even by members of the Web community, that electronic documents will never totally supplant paper ones (WordWideWeb Consortium), there are compelling reasons to consider whether the analysis techniques originally developed for paper documents might have applications in the online world.
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: recognizing text from image, fault locating in undergrounded and high resistance grounded systems seminar topic pdf, locating appletalk networks with nmap, locating system, recognizing text organization, ship locating system loran, alta sierra,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  Emotional Annotation of Text project topics 4 3,219 07-02-2013, 10:24 AM
Last Post: seminar details
  A survey of usage of Data Mining and Data Warehousing in Academic Institution and Lib seminar class 1 2,124 29-11-2012, 12:56 PM
Last Post: seminar details
  Intelligent Electronic Devices (IEDs) and Supervisory Control and Data Acquisition computer girl 0 1,148 09-06-2012, 06:01 PM
Last Post: computer girl
  Text Classification from Labeled and Unlabeled Documents using EM computer girl 0 799 09-06-2012, 11:28 AM
Last Post: computer girl
  The 8051 Microcontroller and Embedded Systems Using Assembly and C computer girl 0 1,044 04-06-2012, 05:41 PM
Last Post: computer girl
  Lean and Zoom: Proximity-Aware User Interface and Content Magnification seminar class 0 931 05-05-2011, 02:39 PM
Last Post: seminar class
  Efficient and Secure Content Processing and Distribution by Cooperative Intermediarie project topics 5 4,723 03-05-2011, 10:33 AM
Last Post: seminar class
  Network Monitoring and Measurement and its application in security field seminar surveyer 1 1,414 28-03-2011, 10:36 AM
Last Post: seminar class
  Virus Attack on Computers And Mobiles And Palmtops full report computer science technology 2 3,460 18-03-2011, 12:08 PM
Last Post: seminar class
  TopCells: Supporting Keyword Search in Text Cube seminar class 0 916 16-02-2011, 10:33 AM
Last Post: seminar class

Forum Jump: