06-04-2011, 11:15 AM
[attachment=11768]
Why Mine Data? Commercial Viewpoint
l Lots of data is being collected
and warehoused
– Web data, e-commerce
– purchases at department/
grocery stores
– Bank/Credit Card
transactions
l Computers have become cheaper and more powerful
l Competitive Pressure is Strong
– Provide better, customized services for an edge (e.g. in Customer Relationship Management)
Why Mine Data? Scientific Viewpoint
l Data collected and stored at
enormous speeds (GB/hour)
– remote sensors on a satellite
– telescopes scanning the skies
– microarrays generating gene
expression data
– scientific simulations
generating terabytes of data
l Traditional techniques infeasible for raw data
l Data mining may help scientists
– in classifying and segmenting data
– in Hypothesis Formation
Mining Large Data Sets – Motivation
l There is often information “hidden” in the data that is
not readily evident
l Human analysts may take weeks to discover useful information
l Much of the data is never analyzed at all
What is Data Mining?
l Many Definitions
– Non-trivial extraction of implicit, previously unknown and potentially useful information from data
– Exploration & analysis, by automatic or
semi-automatic means, of
large quantities of data
in order to discover
meaningful patterns
What is (not) Data Mining?
l What is not Data Mining?
– Look up phone number in phone directory
– Query a Web search engine for information about “Amazon”
l What is Data Mining?
– Certain names are more prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly… in Boston area)
– Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,)