i am Nikanth kumbhare final year Information Technology b.tech. student .my project topics is web crawler i want just reference iee ppt plz help i am kindely request !!!!!!!!
Posts: 14,118
Threads: 61
Joined: Oct 2014
A web crawler, sometimes called a spider, is an Internet bot that systematically browses the World Wide Web, usually for the purpose of web spidering. Web search engines and some other sites use web tracking or spidering software to update your web content or web content indexes from other sites. Web crawlers can copy all the pages they visit for further processing by a search engine that indexes downloaded pages so that users can search much more efficiently.
Crawlers consume resources on the systems they visit and often visit sites without approval. Problems with programming, loading and "courtesy" come into play when accessing large collections of pages. There are mechanisms for public sites that do not wish to be tracked to let the tracking agent know. For example, including a robots.txt file may require bots to index only parts of a website, or nothing at all.
As the number of pages on the Internet is extremely large, even the largest crawlers fail to make a complete index. For that reason search engines were bad at giving relevant search results in the early years of the World Wide Web, by the year 2000. This is greatly improved by modern search engines; Today the very good results are given instantly. A web crawler can also be called a web spider, an ant, an automatic crawler, or (in the context of FOAF software) a scutter web.