Baza danych dla potrzeb zgłębiania DMX ID Outlook Temperature Humidity Windy PLAY 1 sunny hot high false N 2 sunny hot high true N 3 overcast hot high false T 4rain mild high false T 5rain cool normal false T 6rain cool normal true N 7 overcast cool normal true T 8 sunny mild high false N 9 sunny cool normal false T 10 rain mild normal false T 11 sunny mild normal true T 12 overcast mild high true T 13 overcast hot normal false T 14 rain mild high true N
SQL Server 2005 zapytania DMX
Tworzenie Struktury zgłębiania DROP MINING STRUCTURE [mstenis]; GO CREATE MINING STRUCTURE [mstenis] ( [ID] LONG KEY, [OUTLOOK] TEXT DISCRETE, [TEMPERATURE] TEXT DISCRETE, [HUMIDITY] TEXT DISCRETE, [WINDY] TEXT DISCRETE, [PLAY] TEXT DISCRETE );
Tworzenie modelu zgłębiania drzewa decyzyjne ALTER MINING STRUCTURE [mstenis] ADD MINING MODEL [TenisDT] ( [ID], [OUTLOOK], [TEMPERATURE], [HUMIDITY], [WINDY], [PLAY] PREDICT )USING Microsoft_Decision_Trees WITH DRILLTHROUGH;
Tworzenie modelu zgłębiania zmiana parametrów na różne od domyślnych ALTER MINING STRUCTURE [mstenis] ADD MINING MODEL [TenisDT1] ( [ID], [OUTLOOK], [TEMPERATURE], [HUMIDITY], [WINDY], [PLAY] PREDICT )USING Microsoft_Decision_Trees (MINIMUM_SUPPORT =1, SCORE_METHOD=1) WITH DRILLTHROUGH ;
Tworzenie modelu zgłębiania zmiana parametrów na różne od domyślnych ALTER MINING STRUCTURE [mstenis] ADD MINING MODEL [TenisDT1] ( [ID], [OUTLOOK], [TEMPERATURE], [HUMIDITY], [WINDY], [PLAY] PREDICT )USING Microsoft_Decision_Trees MAXIMUM_INPUT_ATTRIBUTES MAXIMUM_OUTPUT_ATTRIBUTES SCORE_METHOD SPLIT_METHOD MINIMUM_SUPPORT COMPLEXITY_PENALTY FORCED_REGRESSOR (MINIMUM_SUPPORT =1, SCORE_METHOD=1) WITH DRILLTHROUGH ; Maksymalna liczba parametrów wejściowych. Domyślnie 255. Maksymalna liczba parametrów wyjściowych. Domyślnie 255. Metoda budowania drzewa Entropia (1), Bayes z predykcją K2 (2), or Bayes z predykcją Dirichleta (3). Domyślnie 3. Metoda dzielenia drzewa. Binarna (1), Całkowita (2), Obie (3). Domyślnie 3. Minimalna liczba przypadków pozwalająca na dalszy podział węzła. Domyślnie 10 Funkcja kary kontrolująca komplikacje drzewa małe wartości powodują wzrost Wymusza aby użyć kolumny jako regresowa.
DELETE * FROM TenisDT1; GO INSERT INTO MINING STRUCTURE [mstenis] ( [ID], [OUTLOOK], [TEMPERATURE], [HUMIDITY], [WINDY], [PLAY] ) OPENQUERY ([Zglebianie], 'SELECT [ID], [OUTLOOK], [TEMPERATURE], [HUMIDITY], [WINDY], [PLAY] FROM dbo.tennis' ); Zasilanie modelu danymi To jest nazwa tabeli, a nie źródła danych (DataSource)
DELETE * FROM TenisDT1; GO INSERT INTO MINING STRUCTURE [mstenis] ( [ID], [OUTLOOK], [TEMPERATURE], [HUMIDITY], [WINDY], [PLAY] ) OPENQUERY ([Zglebianie], 'SELECT [ID], [OUTLOOK], [TEMPERATURE], [HUMIDITY], [WINDY], [PLAY] FROM dbo.tennis WHERE ID < 10' ); Zasilanie modelu danymi Warunek ograniczający zbiór treningowy
Zbiór treningowy - CASES SELECT * FROM [TenisDT1].CASES;
Zawartość wymiaru dimmension_content Select * From [TenisDT1].Dimension_Content;
Zawartość wymiaru content Select * From [TenisDT1]. Content;
Wyświetlenie zawartości węzła Select * from [TenisDT1].Cases WHERE IsInNode('00000000400');
Wyświetlenie zawartości węzłów potomnych SELECT * FROM [TenisDT1].CONTENT WHERE ISDESCENDANT('00000000402'); SELECT * FROM [TenisDT1].CONTENT WHERE ISDESCENDANT('000000004');
SELECT [PLAY] AS [Czy zagra], PredictProbability ([PLAY], 1) AS [Prawdopodobienstwo], PredictNodeId([PLAY]) AS [Węzeł], PredictHistogram([PLAY]) AS [Statystyka], TopCount (PredictHistogram([PLAY]), $AdjustedProbability,3) AS [najwyższe po], PredictSupport([PLAY], 1) AS [Liczba], PredictStdev([PLAY]) AS [StDev], PredictVariance([PLAY]) AS [Warjancja] FROM [TenisDT1] NATURAL PREDICTION JOIN ( SELECT 'normal' AS [HUMIDITY], 'Sunny' AS [OUTLOOK] ) AS test Testowanie modelu
Testowanie modelu SELECT FLATTENED [PLAY] AS [Czy zagra], PredictProbability ([PLAY], 1) AS [Prawdopodobienstwo], PredictNodeId([PLAY]) AS [Węzeł], PredictHistogram([PLAY]) AS [Statystyka], TopCount (PredictHistogram([PLAY]), $AdjustedProbability,3) AS [najwyższe po], PredictSupport([PLAY], 1) AS [Liczba], PredictStdev([PLAY]) AS [StDev], PredictVariance([PLAY]) AS [Warjancja] FROM [TenisDT1] NATURAL PREDICTION JOIN ( SELECT 'normal' AS [HUMIDITY], 'Sunny' AS [OUTLOOK] ) AS test
SELECT TOP 4 test.humidity, test.windy, test.temperature, TenisDT1.PLAY, PredictProbability(TenisDT1.PLAY) AS PO From TenisDT1 PREDICTION JOIN OPENQUERY(zglebianie, 'SELECT HUMIDITY, WINDY, TEMPERATURE, OUTLOOK FROM dbo.tennis WHERE [ID]>10 ') AS test ON TenisDT1.HUMIDITY=test.HUMIDITY AND TenisDT1.WINDY=test.WINDY AND TenisDT1.TEMPERATURE=test.TEMPERATURE AND TenisDT1.OUTLOOK=test.OUTLOOK ORDER BY PredictProbability(PLAY) DESC Testowanie algorytmu częścią danych z tabeli
Testowanie algorytmu SELECT [PLAY] AS [Czy zagra], PredictProbability ([PLAY], 1) AS [PO], PredictNodeId([PLAY]) AS [Węzeł], PredictHistogram([PLAY]) AS [Statystyka], TopCount (PredictHistogram([PLAY]), $AdjustedProbability,3) AS [najwyższe po], PredictSupport([PLAY], 1) AS [Liczba], PredictStdev([PLAY]) AS [StDev], PredictVariance([PLAY]) AS [Warjancja] FROM [TenisDT1] PREDICTION JOIN OPENQUERY(zglebianie, 'SELECT HUMIDITY, WINDY, TEMPERATURE, OUTLOOK FROM dbo.tennis WHERE [ID]>10 ') AS test ON TenisDT1.HUMIDITY=test.HUMIDITY AND TenisDT1.WINDY=test.WINDY AND TenisDT1.TEMPERATURE=test.TEMPERATURE AND TenisDT1.OUTLOOK=test.OUTLOOK częścią danych z tabeli
Testowanie algorytmu SELECT [PLAY] AS [Czy zagra], PredictProbability ([PLAY], 1) AS [PO], PredictNodeId([PLAY]) AS [Węzeł], PredictHistogram([PLAY]) AS [Statystyka], TopCount (PredictHistogram([PLAY]), $AdjustedProbability,3) AS [najwyższe po], PredictSupport([PLAY], 1) AS [Liczba], PredictStdev([PLAY]) AS [StDev], PredictVariance([PLAY]) AS [Warjancja] FROM [TenisDT1] PREDICTION JOIN OPENQUERY(zglebianie, 'SELECT HUMIDITY, WINDY, TEMPERATURE, OUTLOOK FROM dbo.tennis WHERE [ID]>10 ') AS test ON TenisDT1.HUMIDITY=test.HUMIDITY AND TenisDT1.WINDY=test.WINDY AND TenisDT1.TEMPERATURE=test.TEMPERATURE AND TenisDT1.OUTLOOK=test.OUTLOOK częścią danych z tabeli