ASK HERE

seminar class · 04-05-2011, 03:06 PM

Abstract
Phishing websites, fraudulent sites that impersonate atrusted third party to gain access to private data, continueto cost Internet users over a billion dollars each year. Inthis paper, we describe the design and performance char-acteristics of a scalable machine learning classifier we de-veloped to detect phishing websites. We use this classifierto maintain Google’s phishing blacklist automatically. Ourclassifier analyzes millions of pages a day, examining theURL and the contents of a page to determine whether ornot a page is phishing. Unlike previous work in this field,we train the classifier on a noisy dataset consisting of mil-lions of samples from previously collected live classificationdata. Despite the noise in the training data, our classifierlearns a robust model for identifying phishing pages whichcorrectly classifies more than 90% of phishing pages sev-eral weeks after training concludes.
1 Introduction
Phishing is a social engineering crime generally definedas impersonating a trusted third party to gain access to privatedata. For example, an adversary might send the victiman email directing him to a fraudulent website that lookslike a page belonging to a bank. The adversary can useany information the victim enters into the phishing page todrain the victim’s bank account or steal the victim’s identity.Despite increasing public awareness, phishing continues tobe a major threat to Internet users. Gartner estimates thatphishers stole $1.7 billion in 2008, and the Anti-PhishingWorking Group identified roughly twenty thousand uniquenew phishing sites each month between July and Decemberof 2008 [3], [17]. To help combat phishing, Googlepublishes a blacklist of phishing URLs and phishing URLpatterns [7], [29]. The anti-phishing features in Firefox 3,Google Chrome, and Apple Safari use this blacklist. Weprovide access to the list to other clients through our publicAPI [18].In order for an anti-phishing blacklist to be effective, itmust be comprehensive, error-free, and timely. A blacklistthat is not comprehensive fails to protect a portion of itsusers. One that is not error-free subjects users to unnecessarywarnings and ultimately trains its users to ignore thewarnings. A blacklist that is not timely may fail to warn itsusers about a phishing page in time to protect them. Consideringthat phishing pages only remain active for an averageof approximately three days, with the majority of pageslasting less than a day, a delay of only a few hours can significantlydegrade the quality of a blacklist [2], [30].Currently, human reviewers maintain some blacklists,like the one published by PhishTank [25]. With Phish-Tank, the user communitymanually verifies potential phishingpages submitted by community members to keep theirblacklist mostly error-free. Unfortunately, this review processtakes a considerable amount of time, ranging from amedian of over ten hours in March, 2009 to a median ofover fifty hours in June, 2009, according to PhishTank’sstatistics. Omitting verification to improve the timelinessof the data is not a good option for PhishTank. Without verification,the list would have many false positives comingfrom either innocent confusion or malicious abuse.An automatic classifier could handle this verificationtask. Previously published efforts have shown that a classificationsystem could examine the same signals a humanreviewer uses to evaluate whether a page is phishing [13],[16], [20], [21], [35]. Such a system could add verifiedphishing pages to the blacklist automatically, substantiallyreducing the verification time and improving the throughput.With higher throughput, the system could even examinelarge numbers of questionable, automatically collectedURLs to look for otherwise missed phishing pages.This paper describes such an automatic phishing classifierthat we built and currently use to evaluate phishingpages and maintain our blacklist. Since its activation inNovember, 2008, this system evaluates millions of potentialphishing pages every day. To evaluate each page, theclassifier considers features regarding the page’s URL, content,and hosting information. We retrain this classifier dailyusing approximately ten million samples from classificationdata collected over the last three months

Download full report
http://isocisoc/conferences/ndss/10/pdf/08.pdf

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	WEB SERVICE SELECTION BASED ON RANKING OF QOS USING ASSOCIATIVE CLASSIFICATION		1	929	15-02-2017, 04:13 PM Last Post: jaseela123d
	Revisiting Defenses against Large-Scale Online Password Guessing Attacks	Projects9	5	3,946	18-03-2013, 12:25 PM Last Post: computer topic
	Captcha security for phishing	smart paper boy	2	2,482	18-10-2012, 01:15 PM Last Post: seminar details
	Analysis of Shortest Path Routing for Large Multi-Hop Wireless Networks	project report tiger	13	6,719	07-03-2012, 12:17 PM Last Post: seminar paper
	HBA DISTRIBUTED METADATA MANAGEMENT FOR LARGE SCALE CLUSTER BASED STORAGE SYSTEM --PA	electronics seminars	1	1,829	20-02-2012, 01:24 PM Last Post: seminar paper
	SEO ROBOTIC (IEEE) - Automatic Search engine optimization and Submission System.	project topics	5	4,412	02-02-2012, 10:33 AM Last Post: seminar addict
	Interactivity-Constrained Server Provisioning in Large-Scale Distributed Virtual Envi	Projects9	0	831	23-01-2012, 05:21 PM Last Post: Projects9
	In Cloud, Can Scientific Communities Benefit from the Economies of Scale?	Projects9	0	704	23-01-2012, 05:19 PM Last Post: Projects9
	SHIP: A Scalable Hierarchical Power Control Architecture for Large-Scale Data Centers	Projects9	0	782	23-01-2012, 05:08 PM Last Post: Projects9
	Learning a Propagable Graph for Semisupervised Learning: Classification and Regressio	Projects9	0	998	23-01-2012, 03:42 PM Last Post: Projects9

Important Note..!

ASK HERE