Large-Scale Automatic Classification of Phishing Pages
#1

Abstract
Phishing websites, fraudulent sites that impersonate atrusted third party to gain access to private data, continueto cost Internet users over a billion dollars each year. Inthis paper, we describe the design and performance char-acteristics of a scalable machine learning classifier we de-veloped to detect phishing websites. We use this classifierto maintain Google’s phishing blacklist automatically. Ourclassifier analyzes millions of pages a day, examining theURL and the contents of a page to determine whether ornot a page is phishing. Unlike previous work in this field,we train the classifier on a noisy dataset consisting of mil-lions of samples from previously collected live classificationdata. Despite the noise in the training data, our classifierlearns a robust model for identifying phishing pages whichcorrectly classifies more than 90% of phishing pages sev-eral weeks after training concludes.
1 Introduction
Phishing is a social engineering crime generally definedas impersonating a trusted third party to gain access to privatedata. For example, an adversary might send the victiman email directing him to a fraudulent website that lookslike a page belonging to a bank. The adversary can useany information the victim enters into the phishing page todrain the victim’s bank account or steal the victim’s identity.Despite increasing public awareness, phishing continues tobe a major threat to Internet users. Gartner estimates thatphishers stole $1.7 billion in 2008, and the Anti-PhishingWorking Group identified roughly twenty thousand uniquenew phishing sites each month between July and Decemberof 2008 [3], [17]. To help combat phishing, Googlepublishes a blacklist of phishing URLs and phishing URLpatterns [7], [29]. The anti-phishing features in Firefox 3,Google Chrome, and Apple Safari use this blacklist. Weprovide access to the list to other clients through our publicAPI [18].In order for an anti-phishing blacklist to be effective, itmust be comprehensive, error-free, and timely. A blacklistthat is not comprehensive fails to protect a portion of itsusers. One that is not error-free subjects users to unnecessarywarnings and ultimately trains its users to ignore thewarnings. A blacklist that is not timely may fail to warn itsusers about a phishing page in time to protect them. Consideringthat phishing pages only remain active for an averageof approximately three days, with the majority of pageslasting less than a day, a delay of only a few hours can significantlydegrade the quality of a blacklist [2], [30].Currently, human reviewers maintain some blacklists,like the one published by PhishTank [25]. With Phish-Tank, the user communitymanually verifies potential phishingpages submitted by community members to keep theirblacklist mostly error-free. Unfortunately, this review processtakes a considerable amount of time, ranging from amedian of over ten hours in March, 2009 to a median ofover fifty hours in June, 2009, according to PhishTank’sstatistics. Omitting verification to improve the timelinessof the data is not a good option for PhishTank. Without verification,the list would have many false positives comingfrom either innocent confusion or malicious abuse.An automatic classifier could handle this verificationtask. Previously published efforts have shown that a classificationsystem could examine the same signals a humanreviewer uses to evaluate whether a page is phishing [13],[16], [20], [21], [35]. Such a system could add verifiedphishing pages to the blacklist automatically, substantiallyreducing the verification time and improving the throughput.With higher throughput, the system could even examinelarge numbers of questionable, automatically collectedURLs to look for otherwise missed phishing pages.This paper describes such an automatic phishing classifierthat we built and currently use to evaluate phishingpages and maintain our blacklist. Since its activation inNovember, 2008, this system evaluates millions of potentialphishing pages every day. To evaluate each page, theclassifier considers features regarding the page’s URL, content,and hosting information. We retrain this classifier dailyusing approximately ten million samples from classificationdata collected over the last three months

Download full report
http://isocisoc/conferences/ndss/10/pdf/08.pdf
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: school coloring pages, disadvantages of phishing, what are the advantages of phishing, seminar reportson phishing pdf, advantages of phishing, advantages for phishing, money pages jacksonville,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  WEB SERVICE SELECTION BASED ON RANKING OF QOS USING ASSOCIATIVE CLASSIFICATION 1 929 15-02-2017, 04:13 PM
Last Post: jaseela123d
  Revisiting Defenses against Large-Scale Online Password Guessing Attacks Projects9 5 3,946 18-03-2013, 12:25 PM
Last Post: computer topic
  Captcha security for phishing smart paper boy 2 2,482 18-10-2012, 01:15 PM
Last Post: seminar details
  Analysis of Shortest Path Routing for Large Multi-Hop Wireless Networks project report tiger 13 6,719 07-03-2012, 12:17 PM
Last Post: seminar paper
  HBA DISTRIBUTED METADATA MANAGEMENT FOR LARGE SCALE CLUSTER BASED STORAGE SYSTEM --PA electronics seminars 1 1,829 20-02-2012, 01:24 PM
Last Post: seminar paper
  SEO ROBOTIC (IEEE) - Automatic Search engine optimization and Submission System. project topics 5 4,412 02-02-2012, 10:33 AM
Last Post: seminar addict
  Interactivity-Constrained Server Provisioning in Large-Scale Distributed Virtual Envi Projects9 0 831 23-01-2012, 05:21 PM
Last Post: Projects9
  In Cloud, Can Scientific Communities Benefit from the Economies of Scale? Projects9 0 704 23-01-2012, 05:19 PM
Last Post: Projects9
  SHIP: A Scalable Hierarchical Power Control Architecture for Large-Scale Data Centers Projects9 0 782 23-01-2012, 05:08 PM
Last Post: Projects9
  Learning a Propagable Graph for Semisupervised Learning: Classification and Regressio Projects9 0 998 23-01-2012, 03:42 PM
Last Post: Projects9

Forum Jump: