Agenda WEKA - Explorer dr inŝ. Jacek Grekow WEKA: Pakiet programów do maszynowego uczenia Moduł Explorer Klasyfikacja WEKA rodzaj nie latającego ptaka występującego w Nowej Zelandii Copyright: Martin Kramer (mkramer@wxs.nl) WEKA właściwości produktu Software do uczenia maszynowego/eksploracji danych napisany w Java (GNU Public License) UŜywany w nauce, edukacji i aplikacjach Główne właściwości: Wielofunkcyjny zbiór narzędzi do wstępnej obróbki danych, klasyfikacji, regresji, grupowania i ewaluacji Graficzny interfejs uŝytkownika (włącznie z wizualizacją danych) Środowisko do porównywania algorytmów uczących WEKA właściwości produktu 49 data preprocessing tools 76 classification/regression algorithms 8 clustering algorithms 15 attribute/subset evaluators + 10 search algorithms for feature selection 3 algorithms for finding association rules 3 graphical user interfaces The Explorer (exploratory data analysis) The Experimenter (experimental environment) The KnowledgeFlow (new process model inspired interface) 1
Skąd pobrać pakiet WEKA http://www.cs.waikato.ac.nz/ ml/weka/ Stable GUI version Book version Developer version Click here to download a self-extracting executable that includes Java VM 6.0 (weka-3-7-1jre.exe; 37.6 MB) Weka GUI Chooser Weka OutOfMemory Exception java -Xmx512m You can use -Xmx2g to set it to 2GB. >java -Xmx1000m classpath "%CLASSPATH%;d:\Program Files\Weka-3-7\weka.jar" weka.gui.main 2
WEKA only deals with flat files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present WEKA only deals with flat files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present Explorer: wstępna obróbka danych Dane mogą być zaimportowane z kilku formatów : ARFF, CSV, C4.5, binary Dane mogą być czytane z adresu URL lub z SQL baz danych uŝywając JDBC Narzędzia do wstępnej obróbki danych nazywane są w WEKA filterami WEKA posiada wiele róŝnych filtrów: Discretization, normalization, resampling, attribute selection, attribute combination, Explorer - Section Tabs 1. Preprocess. Choose and modify the data being acted on. 2. Classify. Train and test learning schemes that classify or perform regression. 3. Cluster. Learn clusters for the data. 4. Associate. Learn association rules for the data. 5. Select attributes. Select the most relevant attributes in the data. 6. Visualize. View an interactive 2D plot of the data. Preprocess Status Box The status box appears at the very bottom of the window. It displays messages that keep you informed about what s going on. For example, if the Explorer is busy loading a file, the status box will say that. TIP right-clicking the mouse anywhere inside the status box brings up a little menu. The menu gives two options: Memory information. Display in the log box the amount of memoryavailable to WEKA. Run garbage collector. 3
Log Button Clicking on this button brings up a separate window containing a scrollable text field. Each line of text is stamped with the time it was entered into the log. As you perform actions in WEKA, the log keeps a record of what has happened. Loading Data Open file... Brings up a dialog box allowing you to browse for the data file on the local file system. Open URL... Asks for a Uniform Resource Locator address for where the data is stored. Open DB... Reads data from a database. (Note that to make this work you might have to edit the file in weka/experiment/databaseutils.props.) Generate... Enables you to generate artificial data from a variety of DataGenerators. 4
5
6
Explorer: Budowa modeli do klasyfikacji Klasyfikatory w WEKA to modele które przewidują nominal i numeryczne wielkości Zaimplementowane algorytmy: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes nets, Meta -klasyfikatory: Bagging, boosting, stacking, error-correcting output codes, data cleansing, 7
8
Test Options Use training set. The classifier is evaluated on how well it predicts the class of the instances it was trained on. Supplied test set. The classifier is evaluated on how well it predicts the class of a set of instances loaded from a file. Clicking the Set... buttonbrings up a dialog allowing you to choose the file to test on. Cross-validation. The classifier is evaluated by crossvalidation, using the number of folds that are entered in the Folds text field. Percentage split. The classifier is evaluated on how well it predicts a certain percentage of the data which is held out for testing. The amount of data held out depends on the value entered in the % field. 9
The Classifier Output Text Run information. A list of information giving the learning scheme options, relation name, instances, attributes and test mode that were involved in the process. Classifier model (full training set). A textual representation of the classification model that was produced on the full training data. Summary. A list of statistics summarizing how accurately the classifier was able to predict the true class of the instances under the chosen test mode. The Classifier Output Text Detailed Accuracy By Class. A more detailed per-class break down of the classifier s prediction accuracy. Confusion Matrix. Shows how many instances have been assigned to each class. Elements show the number of test examples whose actual class is the row and whose predicted class is the column. Source code (optional). This section lists the Java source code if one chose Output source code in the More options dialog. 10
The Result List View in separate window. Opens a new independent window for viewing the results. Save result buffer. Brings up a dialog allowing you to save a text file containing the textual output. Load model. Loads a pre-trained model object from a binary file. Save model. Saves a model object to a binary file. Objects are saved in Java serialized object form. The Result List Re-evaluate model on current test set. Takes the model that has been built and tests its performance on the data set that has been specified with the Set.. button under the Supplied test set option. Visualize classifier errors. Brings up a visualization window that plots the results of classification. Correctly classified instances are represented by crosses, whereas incorrectly classified ones show up as squares. Visualize tree or Visualize graph. Brings up a graphical representationof the structure of the classifier model, 11
12
13
QuickTime and a TIFF (LZW) decompressor are needed to see this picture. QuickTime and a TIFF (LZW) decompressor are needed to see this picture. QuickTime and a TIFF (LZW) decompressor are needed to see this picture. 14
QuickTime and a TIFF (LZW) decompressor are needed to see this picture. 15
Podsumowanie pakietu WEKA Wielofunkcyjny zbiór narzędzi do wstępnej obróbki danych, klasyfikacji, regresji, grupowania i ewaluacji Graficzny interfejs uŝytkownika (włącznie z wizualizacją danych) Środowisko do porównywania algorytmów uczących Zestaw klas javy gotowych do wykorzystywania w aplikacjach uŝytkownika Dziękuję za uwagę 16