data mining full report
#8
Presented by:
Chris Nelson

[attachment=10871]
Data Mining
 New buzzword, old idea.
 Inferring new information from already collected data.
 Traditionally job of Data Analysts
 Computers have changed this.
Far more efficient to comb through data using a machine than eyeballing statistical data.
Data Mining – Two Main Components
 Wikipedia definition: “Data mining is the entire process of applying computer-based methodology, including new techniques for knowledge discovery, from data.”
 Knowledge Discovery
Concrete information gleaned from known data. Data you may not have known, but which is supported by recorded facts.
(ie: Diapers and beer example from previous presentation)
 Knowledge Prediction
Uses known data to forecast future trends, events, etc. (ie: Stock market predictions)
 Wikipedia note: "some data mining systems such as neural networks are inherently geared towards prediction and pattern recognition, rather than knowledge discovery.“ These include applications in AI and Symbol analysis
Data Mining vs. Data Analysis
 In terms of software and the marketing thereof
Data Mining != Data Analysis
 Data Mining implies software uses some intelligence over simple grouping and partitioning of data to infer new information.
 Data Analysis is more in line with standard statistical software (ie: web stats). These usually present information about subsets and relations within the recorded data set (ie: browser/search engine usage, average visit time, etc. )
Data Mining Subtypes
 Data Dredging
The process of scanning a data set for relations and then coming up with a hypothesis for existence of those relations.
 MetaData
Data that describes other data. Can describe an individual element, or a collection of elements.
Wikipedia example: “In a library, where the data is the content of the titles stocked, metadata about a title would typically include a description of the content, the author, the publication date and the physical location”
 Applications for Data Dredging in business include Market and Risk Analysis, as well as trading strategies.
Applications for Science include disaster prediction.
Propositional vs. Relational Data
 Old data mining methods relied on Propositional Data, or data that was related to a single, central element, that could be represented in a vector format. (ie: the purchasing history of a single user. Amazon uses such vectors in its related item suggestions [a multidimensional dot product])
 Current, advanced data mining methods rely on Relational Data, or data that can be stored and modeled easily through use of relational databases. An example of this would be data used to represent interpersonal relations.
 Relational Data is more interesting than Propositional data to miners in the sense that an entity, and all the entities to which it is related, factor into the data inference process.
Key Component of Data Mining
 Whether Knowledge Discovery or Knowledge Prediction, data mining takes information that was once quite difficult to detect and presents it in an easily understandable format (ie: graphical or statistical)
 Data mining Techniques involve sophisticated algorithms, including Decision Tree Classifications, Association detection, and Clustering.
 Since Data mining is not on test, I will keep things superficial.
Uses of Data Mining
 AI/Machine Learning
Combinatorial/Game Data Mining
Good for analyzing winning strategies to games, and thus developing intelligent AI opponents. (ie: Chess)
 Business Strategies
Market Basket Analysis
Identify customer demographics, preferences, and purchasing patterns.
 Risk Analysis
Product Defect Analysis
Analyze product defect rates for given plants and predict possible complications (read: lawsuits) down the line.
 User Behavior Validation
Fraud Detection
In the realm of cell phones
Comparing phone activity to calling records. Can help detect calls made on cloned phones.
Similarly, with credit cards, comparing purchases with historical purchases. Can detect activity with stolen cards.
 Health and Science
Protein Folding
Predicting protein interactions and functionality within biological cells. Applications of this research include determining causes and possible cures for Alzheimers, Parkinson's, and some cancers (caused by protein "misfolds")
Extra-Terrestrial Intelligence
Scanning Satellite receptions for possible transmissions from other planets.
 For more information see Stanford’s Folding@home and SETI@home projects. Both involve participation in a widely distributed computer application.
 Sources of Data for Mining
 Databases (most obvious)
Text Documents
 Computer Simulations
 Social Networks
Privacy Concerns
 Mining of public and government databases is done, though people have, and continue to raise concerns.
 Wiki quote:
"data mining gives information that would not be available otherwise. It must be properly interpreted to be useful. When the data collected involves individual people, there are many questions concerning privacy, legality, and ethics."
Prevalence of Data Mining
 Your data is already being mined, whether you like it or not.
 Many web services require that you allow access to your information [for data mining] in order to use the service.
 Google mines email data in Gmail accounts to present account owners with ads.
 Facebook requires users to allow access to info from non-Facebook pages. Facebook privacy policy:
"We may use information about you that we collect from other sources, including but not limited to newspapers and Internet sources such as blogs, instant messaging services and other users of Facebook, to supplement your profile.
 This allows access to your blog RSS feed (rather innocuous), as well as information obtained through partner sites (worthy of concern).
Data Mining Controversies
 Latest one: Facebook's Beacon Advertising program (Just popped on Slashdot within the last week)
 What Beacon does:
“when you engage in consumer activity at a [Facebook] partner website, such as Amazon, eBay, or the New York Times, not only will Facebook record that activity, but your Facebook connections will also be informed of your purchases or actions.” [taken from http://trickytrickywhiteboy.blogspot2007...eacon.html]
Controversies continued
 Implications: "Thus where Facebook used to be collecting data only within the confines of its own website, it will now extend that ability to harvest data across other websites that it partners with. Some of the companies that have signed on to participate on the advertising side include Coca-Cola, Sony, Verizon, Comcast, Ebay — and the CBC. The initial list of 44 partner websites participating on the data collection side include the New York Times, Blockbuster, Amazon, eBay, LiveJournal, and Epicurious.”
[Remember the privacy policy on the previous slide]
 Verdict is still out. This may violate an old (100+ years) New York law prohibiting advertising using endorsements without the endorsee’s consent.
 Facebook currently offers users no way to opt out of Beacon (once it has been activated ?). Users can close the accounts, but account data is never deleted.
Bottom Line
 Data obtained through Data Mining is incredibly valuable
 Companies are understandably reluctant to give up data they have obtained.
 Expect to see prevalence of Data Mining and (possibly subversive) methods increase in years to come.
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: ipg arvada, lex progrms, scoring rubric, cgi baseball, powerschool 251, spamassassin, what is a fullform of lpk 2518,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Messages In This Thread
data mining full report - by project report tiger - 24-02-2010, 11:26 PM
RE: data mining full report - by project topics - 08-04-2010, 10:51 PM
RE: data mining full report - by projectsofme - 18-10-2010, 11:50 AM
RE: data mining full report - by seminar surveyer - 07-01-2011, 12:37 PM
RE: data mining full report - by seminar class - 15-03-2011, 04:06 PM
live betting - by AressWesmop - 22-03-2011, 11:00 AM
RE: data mining full report - by seminar class - 23-03-2011, 04:42 PM
fat burning furnace reviews - by AntalaDualm - 06-05-2011, 08:45 PM
RE: data mining full report - by ashu44 - 28-12-2011, 06:24 PM
RE: data mining full report - by seminar addict - 29-12-2011, 09:35 AM
RE: data mining full report - by jonssmith2 - 19-01-2012, 04:59 PM
RE: data mining full report - by seminar addict - 30-01-2012, 02:34 PM
RE: data mining full report - by jonssmith - 09-02-2012, 04:00 PM
RE: data mining full report - by seminar paper - 13-02-2012, 02:00 PM
RE: data mining full report - by seminar paper - 14-02-2012, 10:43 AM
RE: data mining full report - by RsZ - 06-03-2012, 04:44 PM
RE: data mining full report - by seminar paper - 08-03-2012, 04:35 PM
RE: data mining full report - by seminar paper - 10-03-2012, 03:21 PM
tSFHixRktgTkebbr - by ToPWA - 07-10-2014, 09:10 PM

Possibly Related Threads...
Thread Author Replies Views Last Post
  SAMBA SERVER ADMINISTRATION full report project report tiger 3 4,759 17-01-2018, 05:40 PM
Last Post: AustinnuAke
  air ticket reservation system full report project report tiger 16 46,887 08-01-2018, 02:33 PM
Last Post: RaymondGom
  A Link-Based Cluster Ensemble Approach for Categorical Data Clustering 1 1,084 16-02-2017, 10:51 AM
Last Post: jaseela123d
  Exploiting the Functional and Taxonomic Structure of Genomic Data by Probabilistic To 1 768 14-02-2017, 04:15 PM
Last Post: jaseela123d
  An Efficient Algorithm for Mining Frequent Patterns full report project topics 3 4,764 01-10-2016, 10:02 AM
Last Post: Guest
  online examination full report project report tiger 14 42,891 03-09-2016, 11:20 AM
Last Post: jaseela123d
  Employee Cubicle Management System full report computer science technology 4 5,121 07-04-2016, 11:37 AM
Last Post: dhanabhagya
  e-Post Office System full report computer science technology 27 25,985 30-03-2016, 02:56 PM
Last Post: dhanabhagya
  Remote Server Monitoring System For Corporate Data Centers smart paper boy 3 2,851 28-03-2016, 02:51 PM
Last Post: dhanabhagya
  Secured Data Hiding and Extractions Using BPCS project report helper 4 3,670 04-02-2016, 12:52 PM
Last Post: seminar report asees

Forum Jump: