05-06-2012, 02:53 PM
Machine Learning with WEKA
weka.ppt (Size: 2.27 MB / Downloads: 5)
WEKA: the software
Machine learning/data mining software written in Java (distributed under the GNU Public License)
Used for research, education, and applications
Complements “Data Mining” by Witten & Frank
Main features:
Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods
Graphical user interfaces (incl. data visualization)
Environment for comparing learning algorithms
WEKA: versions
There are several versions of WEKA:
WEKA 3.0: “book version” compatible with description in data mining book
WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only)
WEKA 3.3: “development version” with lots of improvements
This talk is based on the latest snapshot of WEKA 3.3 (soon to be WEKA 3.4)
Explorer: pre-processing the data
Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary
Data can also be read from a URL or from an SQL database (using JDBC)
Pre-processing tools in WEKA are called “filters”
WEKA contains filters for:
Discretization, normalization, resampling, attribute selection, transforming and combining attributes, …