Bioinformatyka V. Analiza Danych w Języku R

Bioinformatyka V Analiza Danych w Języku R

ANALIZA DANYCH Metody statystyczne analizy danych eksploracja danych testowanie hipotez analiza Bayesowska Metody uczenia maszynowego Uczenie nadzorowane Uczenie nienadzorowane

R Język programowania obiektowy funkcyjny wektorowy Środowisko do analizy danych istnieją implementacje ~wszystkich metod statystycznych pakiety wbudowane pakiety dodatkowe większość metod uczenia maszynowego wiele metod wizualizacji danych

R oprogramowanie open source duże i prężne środowisko użytkowników i developerów de facto standard w statystyce coraz częściej standard w uczeniu maszynowym mnóstwo narzędzi bioinformatycznych (Bioconductor) dostępny na Windows, Linux, Mac środowisko terminalowe i praca wsadowa środowisko zintegrowane łatwa automatyzacja rutynowych zadań

R open source community de facto standard field in sta`s`cs emerging standard in machine learning many bioinforma`cs tools implemented e.g. BioconductoR available for all major plaaorms: Windows, Linux, Mac command line interface integrated environment easy automa`on of rou`ne tasks

R >?iris

KALKULATOR > 2+2 [1] 4 > 98+79 [1] 177

KALKULATOR > 2+2*3 [1] 8 > (2+2)*3 [1] 12

KALKULATOR > sqrt(2+2*3) [1] 2.828427 > (2+2*3)^0.333 [1] 1.998614 > (2+2*3)^(1/3) [1] 2 >

ZMIENNE > 2+2->FOUR > FOUR [1] 4 > FOUR^2 [1] 16 > FOUR_bis <- 2+2 > FOUR_t = 2+2

WEKTORY > myvec<-c(2,1,3,4,5) > myvec [1] 2 1 3 4 5 > myvec*3 [1] 6 3 9 12 15 > myvec*3->newvec > newvec [1] 6 3 9 12 15 >

LISTY > student1<-c(imie= Adam",Nazwisko="Abacki",Przedmiot="Bioinformatyka", OcenaKolokwium="5",OcenaEgzamin="5") > student2<-c(imie= Bogdan",Nazwisko="Babacki",Przedmiot="Bioinformatyka", OcenaKolokwium="4",OcenaEgzamin="4") > team1<-list(student1,student2) > team1 [[1]] Imie Nazwisko Przedmiot OcenaKolokwium OcenaEgzamin "Adam" "Abacki" "Bioinformatyka" "5" "5" [[2]] Imie Nazwisko Przedmiot OcenaKolokwium OcenaEgzamin "Bogdan" "Babacki" "Bioinformatyka" "4" "4" >

TABLICE > Team2<-c(student1,student2) > Team2 Imie Nazwisko Przedmiot OcenaKolokwium OcenaEgzamin Imie Nazwisko Przedmiot OcenaKolokwium OcenaEgzamin "Adam" "Abacki" "Bioinformatyka" "5" "5" "Bogdan" "Babacki" "Bioinformatyka" "4" "4" > dim(team2)<-c(5,2) > Team2 [,1] [,2] [1,] "Adam" "Bogdan" [2,] "Abacki" "Babacki" [3,] "Bioinformatyka" "Bioinformatyka" [4,] "5" "4" [5,] "5" "4" > t(team2) [,1] [,2] [,3] [,4] [,5] [1,] "Adam" "Abacki" "Bioinformatyka" "5" "5" [2,] "Bogdan" "Babacki" "Bioinformatyka" "4" "4"

TABLICE > t(team2)->team2 > colnames(team2)<c("imie","nazwisko","przedmiot","ocenakolokwium","ocenaegzamin") > rownames(team2)<-c("student1","student2") > Team2 Imie Nazwisko Przedmiot OcenaKolokwium OcenaEgzamin student1 "Adam" "Abacki" "Bioinformatyka" "5" "5" student2 "Bogdan" "Babacki" "Bioinformatyka" "4" "4"

TABLICE > Team2<-t(Team2) > Team2 student1 student2 Imie "Adam" "Bogdan" Nazwisko "Abacki" "Babacki" Przedmiot "Bioinformatyka" "Bioinformatyka" OcenaKolokwium "5" "4" OcenaEgzamin "5" "4" > data.frame(team2)->team2

RAMKI > data.frame(team2)->team2 > Team2$student1 Imie Nazwisko Przedmiot OcenaKolokwium OcenaEgzamin Adam Abacki Bioinformatyka 5 5 Levels: 5 Abacki Adam Bioinformatyka

RAMKI > Team2$student3<-c("Cyprian","Cebacki","Bioinformatyka",3,3) > Team2 student1 student2 student3 Imie Adam Bogdan Cyprian Nazwisko Abacki Babacki Cebacki Przedmiot Bioinformatyka Bioinformatyka Bioinformatyka OcenaKolokwium 5 4 3 OcenaEgzamin 5 4 3

RAMKI > Team2$student4<-c("Damian","Debacki","Bioinformatyka",NA,NA) > Team2 student1 student2 student3 student4 Imie Adam Bogdan Cyprian Damian Nazwisko Abacki Babacki Cebacki Debacki Przedmiot Bioinformatyka Bioinformatyka Bioinformatyka Bioinformatyka OcenaKolokwium 5 4 3 <NA> OcenaEgzamin 5 4 3 <NA>

SEKWENCJA > 1:10 [1] 1 2 3 4 5 6 7 8 9 10 > seq(from=0,to=11,by=1) [1] 0 1 2 3 4 5 6 7 8 9 10 11 > seq(from=0,to=111,by=11) [1] 0 11 22 33 44 55 66 77 88 99 110 > seq(from=0.02,to=1.03,by=0.15) [1] 0.02 0.17 0.32 0.47 0.62 0.77 0.92 >> seq(from=0,to=0,length.out=20) [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

RAMKI > Team2 student1 student2 student3 student4 Imie Adam Bogdan Cyprian Damian Nazwisko Abacki Babacki Cebacki Debacki Przedmiot Bioinformatyka Bioinformatyka Bioinformatyka Bioinformatyka OcenaKolokwium 5 4 3 <NA> OcenaEgzamin 5 4 3 <NA> > Team2[1:2,] student1 student2 student3 student4 Imie Adam Bogdan Cyprian Damian Nazwisko Abacki Babacki Cebacki Debacki

RAMKI > Team2[2:3,] student1 student2 student3 student4 Nazwisko Abacki Babacki Cebacki Debacki Przedmiot Bioinformatyka Bioinformatyka Bioinformatyka Bioinformatyka > Team2[c(2,4),] student1 student2 student3 student4 Nazwisko Abacki Babacki Cebacki Debacki OcenaKolokwium 5 4 3 <NA> > Team2[c("Nazwisko","OcenaEgzamin"),] student1 student2 student3 student4 Nazwisko Abacki Babacki Cebacki Debacki OcenaEgzamin 5 4 3 <NA> > 4 <NA>

RAMKI > Team2[,c(2,4)] student2 student4 Imie Bogdan Damian Nazwisko Babacki Debacki Przedmiot Bioinformatyka Bioinformatyka OcenaKolokwium 4 <NA> OcenaEgzamin 4 <NA> > Team2[c("student3","student1")] student3 student1 Imie Cyprian Adam Nazwisko Cebacki Abacki Przedmiot Bioinformatyka Bioinformatyka OcenaKolokwium 3 5 OcenaEgzamin 3 5

SAMPLE > sample(1:150,10) [1] 30 59 6 150 49 93 75 65 81 48 > dim(iris) [1] 150 5 > sample(1:150,10)->mysample > MySample [1] 82 99 65 22 112 8 38 2 114 7

SAMPLE > MySample [1] 82 99 65 22 112 8 38 2 114 7 > iris[mysample,] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 82 5.5 2.4 3.7 1.0 versicolor 99 5.1 2.5 3.0 1.1 versicolor 65 5.6 2.9 3.6 1.3 versicolor 22 5.1 3.7 1.5 0.4 setosa 112 6.4 2.7 5.3 1.9 virginica 8 5.0 3.4 1.5 0.2 setosa 38 4.9 3.6 1.4 0.1 setosa 2 4.9 3.0 1.4 0.2 setosa 114 5.7 2.5 5.0 2.0 virginica 7 4.6 3.4 1.4 0.3 setosa

RAMKI > MySample [1] 82 99 65 22 112 8 38 2 114 7 > order(mysample) [1] 8 10 6 4 7 3 1 2 5 9 > MySample[order(MySample)] [1] 2 7 8 22 38 65 82 99 112 114

RAMKI > MySample [1] 82 99 65 22 112 8 38 2 114 7 > order(mysample) [1] 8 10 6 4 7 3 1 2 5 9 > MySample[order(MySample)] [1] 2 7 8 22 38 65 82 99 112 114 > iris[mysample[order(mysample)],] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 2 4.9 3.0 1.4 0.2 setosa 7 4.6 3.4 1.4 0.3 setosa 8 5.0 3.4 1.5 0.2 setosa 22 5.1 3.7 1.5 0.4 setosa 38 4.9 3.6 1.4 0.1 setosa 65 5.6 2.9 3.6 1.3 versicolor 82 5.5 2.4 3.7 1.0 versicolor 99 5.1 2.5 3.0 1.1 versicolor 112 6.4 2.7 5.3 1.9 virginica 114 5.7 2.5 5.0 2.0 virginica >

RAMKI > IrisSample [1] 143 13 121 145 98 72 36 51 35 48 76 129 7 147 70 83 103 29 27 91 > order(irissample) [1] 13 2 19 18 9 7 10 8 15 6 11 16 20 5 17 3 12 1 4 14 > IrisSample[order(IrisSample)] [1] 7 13 27 29 35 36 48 51 70 72 76 83 91 98 103 121 129 143 145 147 > iris[irissample[order(irissample)],]->iris20 > iris20 Sepal.Length Sepal.Width Petal.Length Petal.Width Species 7 4.6 3.4 1.4 0.3 setosa 13 4.8 3.0 1.4 0.1 setosa 27 5.0 3.4 1.6 0.4 setosa 29 5.2 3.4 1.4 0.2 setosa 35 4.9 3.1 1.5 0.2 setosa 36 5.0 3.2 1.2 0.2 setosa 48 4.6 3.2 1.4 0.2 setosa 51 7.0 3.2 4.7 1.4 versicolor 70 5.6 2.5 3.9 1.1 versicolor 72 6.1 2.8 4.0 1.3 versicolor 76 6.6 3.0 4.4 1.4 versicolor 83 5.8 2.7 3.9 1.2 versicolor 91 5.5 2.6 4.4 1.2 versicolor 98 6.2 2.9 4.3 1.3 versicolor 103 7.1 3.0 5.9 2.1 virginica 121 6.9 3.2 5.7 2.3 virginica 129 6.4 2.8 5.6 2.1 virginica 143 5.8 2.7 5.1 1.9 virginica 145 6.7 3.3 5.7 2.5 virginica 147 6.3 2.5 5.0 1.9 virginica

RAMKI > iris[mysample[order(mysample)],]->iris10 > iris10$sepal.dim<-iris10$sepal.length+iris10$sepal.width > iris10 Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Dim 2 4.9 3.0 1.4 0.2 setosa 7.9 7 4.6 3.4 1.4 0.3 setosa 8.0 8 5.0 3.4 1.5 0.2 setosa 8.4 22 5.1 3.7 1.5 0.4 setosa 8.8 38 4.9 3.6 1.4 0.1 setosa 8.5 65 5.6 2.9 3.6 1.3 versicolor 8.5 82 5.5 2.4 3.7 1.0 versicolor 7.9 99 5.1 2.5 3.0 1.1 versicolor 7.6 112 6.4 2.7 5.3 1.9 virginica 9.1 114 5.7 2.5 5.0 2.0 virginica 8.2 >

RAMKI > iris10$petal.dim<-iris10$petal.length+iris10$petal.width > iris10$petal.radius<sqrt(iris10$petal.length*iris10$petal.length+iris10[,4]*iris10[,4]) > iris10$sepal.radius<-sqrt(iris10[,1]*iris10[,1]+iris10[,2]*iris10[,2]) > iris10 Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Dim Petal.Dim Petal.Radius Sepal.Radius 2 4.9 3.0 1.4 0.2 setosa 7.9 1.6 1.414214 5.745433 7 4.6 3.4 1.4 0.3 setosa 8.0 1.7 1.431782 5.720140 8 5.0 3.4 1.5 0.2 setosa 8.4 1.7 1.513275 6.046487 22 5.1 3.7 1.5 0.4 setosa 8.8 1.9 1.552417 6.300794 38 4.9 3.6 1.4 0.1 setosa 8.5 1.5 1.403567 6.080296 65 5.6 2.9 3.6 1.3 versicolor 8.5 4.9 3.827532 6.306346 82 5.5 2.4 3.7 1.0 versicolor 7.9 4.7 3.832754 6.000833 99 5.1 2.5 3.0 1.1 versicolor 7.6 4.1 3.195309 5.679789 112 6.4 2.7 5.3 1.9 virginica 9.1 7.2 5.630275 6.946222 114 5.7 2.5 5.0 2.0 virginica 8.2 7.0 5.385165 6.224147 >

RAMKI > Iris$Sepal.Dim<-Iris$Sepal.Length+Iris$Sepal.Length > Iris$Petal.Dim<-Iris$Petal.Length+Iris$Petal.Width > Iris$Sepa.Radius<-sqrt(Iris[,1]*Iris[,1]+Iris[,2]*Iris[,2]) > Iris$Petal.Radius<-sqrt(Iris[,3]*Iris[,3]+Iris[,4]*Iris[,4]) > Iris[1:10,] Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Dim Petal.Dim Sepa.Radius Petal.Radius 1 5.1 3.5 1.4 0.2 setosa 10.2 1.6 6.185467 1.414214 2 4.9 3.0 1.4 0.2 setosa 9.8 1.6 5.745433 1.414214 3 4.7 3.2 1.3 0.2 setosa 9.4 1.5 5.685948 1.315295 4 4.6 3.1 1.5 0.2 setosa 9.2 1.7 5.547071 1.513275 5 5.0 3.6 1.4 0.2 setosa 10.0 1.6 6.161169 1.414214 6 5.4 3.9 1.7 0.4 setosa 10.8 2.1 6.661081 1.746425 7 4.6 3.4 1.4 0.3 setosa 9.2 1.7 5.720140 1.431782 8 5.0 3.4 1.5 0.2 setosa 10.0 1.7 6.046487 1.513275 9 4.4 2.9 1.4 0.2 setosa 8.8 1.6 5.269725 1.414214 10 4.9 3.1 1.5 0.1 setosa 9.8 1.6 5.798276 1.503330

RAMKI > Iris[1:10,] Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Dim Petal.Dim Sepa.Radius Petal.Radius 1 5.1 3.5 1.4 0.2 setosa 10.2 1.6 6.185467 1.414214 2 4.9 3.0 1.4 0.2 setosa 9.8 1.6 5.745433 1.414214 3 4.7 3.2 1.3 0.2 setosa 9.4 1.5 5.685948 1.315295 4 4.6 3.1 1.5 0.2 setosa 9.2 1.7 5.547071 1.513275 5 5.0 3.6 1.4 0.2 setosa 10.0 1.6 6.161169 1.414214 6 5.4 3.9 1.7 0.4 setosa 10.8 2.1 6.661081 1.746425 7 4.6 3.4 1.4 0.3 setosa 9.2 1.7 5.720140 1.431782 8 5.0 3.4 1.5 0.2 setosa 10.0 1.7 6.046487 1.513275 9 4.4 2.9 1.4 0.2 setosa 8.8 1.6 5.269725 1.414214 10 4.9 3.1 1.5 0.1 setosa 9.8 1.6 5.798276 1.503330 > names(iris) [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" "Sepal.Dim" "Petal.Dim" "Sepa.Radius" "Petal.Radius" [1] "Sepal.Dim" > names(iris)[8] [1] "Sepa.Radius" > names(iris)[8]<-"sepal.radius"

> plot(iris[,6:9]) Sepal.Dim PLOTS 2 4 6 8 1 2 3 4 5 6 7 9 11 13 15 2 4 6 8 Petal.Dim Sepal.Radius 5 6 7 8 1 2 3 4 5 6 7 Petal.Radius 9 11 13 15 5 6 7 8

PLOTS > plot(iris[,6:9],col=as.numeric(iris$species)) 2 4 6 8 1 2 3 4 5 6 7 Sepal.Dim 9 11 13 15 2 4 6 8 Petal.Dim Sepal.Radius 5 6 7 8 1 2 3 4 5 6 7 Petal.Radius 9 11 13 15 5 6 7 8

BOXPLOTS > boxplot(iris[1:50,6],iris[51:100,6],iris[101:150,6]) 9 10 11 12 13 14 15 16 1 2 3

> boxplot( Iris$Petal.Radius[1:50], Iris$Petal.Radius[51:100], Iris$Petal.Radius[101:150] ) 1 2 3 4 5 6 7 PLOTS 1 2 3

PLOTS > boxplot( names=c( setosa","versicolor","virginica"), Iris$Petal.Radius[1:50], Iris$Petal.Radius[51:100], Iris$Petal.Radius[101:150] ) 1 2 3 4 5 6 7 setosa versicolor virginica

PLOTS > plot(iris$petal.length,iris$petal.width,col=as.numeric(iris$species)) Iris$Petal.Width 0.5 1.0 1.5 2.0 2.5 1 2 3 4 5 6 7 Iris$Petal.Length

PLOTS > plot(iris20$radius,iris20$sepal.length,col=as.numeric(iris20$species)) Petal Length 4.5 5.0 5.5 6.0 6.5 7.0 4 5 6 7 Radius

PLOTS > plot( xlab="petal Length, ylab="petal Width, Iris$Petal.Length, Iris$Petal.Width, col=as.numeric(iris$species) ) Petal Width 0.5 1.0 1.5 2.0 2.5 1 2 3 4 5 6 7 Petal Length

> plot( xlab="petal Length, ylab="petal Width, Iris$Petal.Length, Iris$Petal.Width, col=as.numeric(iris$species), pch=as.numeric(iris$species)+16, cex=1.5) Petal Width PLOTS 0.5 1.0 1.5 2.0 2.5 1 2 3 4 5 6 7 Petal Length

PLOTS >?plot Description Generic function for plotting of R objects. For more details about the graphical parameter arguments, see par. For simple scatter plots, plot.default will be used. However, there are plot methods for many R objects, including functions, data.frames, density objects, etc. Use methods(plot) and the documentation for these. Usage plot(x, y,...)

SCATTERPLOT >?plot3d No documentation for plot3d in specified packages and libraries: you could try??plot3d

SCATTERPLOT >?plot3d No documentation for plot3d in specified packages and libraries: you could try??plot3d >??plot3d

SCATTERPLOT >??plot3d > library(rgl) Komunikat ostrzegawczy: pakiet rgl został zbudowany w wersji R 2.15.3 > plot3d(iris[,1:3])

SCATTERPLOT > plot3d(iris[,1:3],col=as.numeric(iris$species))

EKSPLORACJA histogram > hist(iris[,1]) > hist(iris[,2]) > hist(iris[,3]) > hist(iris[,4])

DATA EXPLORATION > boxplot(iris[,1:4]) > boxplot(iris[,1:4],notch=t)

TESTOWANIE HIPOTEZ H1 - hipoteza badawcza H0 - hipoteza alternatywna - wiedza podstawowa np. - wszystkie odmiany irysa mają takie same długości płatków

TESTOWANIE HIPOTEZ Procedura standardowa 1. Znajdź właściwy test statystyczny 2. Policz statystyki dla eksperymentu 3. Policz prawdopodobieństwo danych pod warunkiem prawdziwości H0 4. Jeżeli prawdopodobieństwo wygenerowania danych przy prawdziwości H0 jest mniejsze niż wcześniej ustalone kryterium (np 5%, 1%, 0.1%) odrzuć hipotezę zerową H0 - i zaakceptuj H1.

TESTOWANIE HIPOTEZ H1 - hipoteza badawcza H0 - hipoteza alternatywna - wiedza podstawowa H0: wszystkie odmiany irysa mają takie same długości płatków > boxplot(iris[iris$species=="setosa",3],iris[iris $Species=="virginica",3],iris[iris$Species=="versicolor",3]) 1 2 3 4 5 6 7 1 2 3

TESTOWANIE HIPOTEZ Procedura standardowa 1. Znajdź właściwy test statystyczny 2. Policz statystyki dla eksperymentu 3. Policz prawdopodobieństwo danych pod warunkiem prawdziwości H0 4. Jeżeli prawdopodobieństwo wygenerowania danych przy prawdziwości H0 jest mniejsze niż wcześniej ustalone kryterium (np 5%, 1%, 0.1%) odrzuć hipotezę zerową H0 - i zaakceptuj H1. > y1<-iris[iris$species=="setosa",2] > y2<-iris[iris$species=="virginica",2] > y3<-iris[iris$species=="versicolor",2] > t.test(y1,y2) Welch Two Sample t-test data: y1 and y2 t = 6.4503, df = 95.547, p-value = 4.571e-09 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.3142808 0.5937192 sample estimates: mean of x mean of y 3.428 2.974

TESTOWANIE HIPOTEZ > y1<-iris[iris$species=="setosa",2] > y2<-iris[iris$species=="virginica",2] > y3<-iris[iris$species=="versicolor",2] > t.test(y1,y2) Welch Two Sample t-test data: y1 and y2 t = 6.4503, df = 95.547, p-value = 4.571e-09 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.3142808 0.5937192 sample estimates: mean of x mean of y 3.428 2.974

TESTOWANIE HIPOTEZ 2.0 2.5 3.0 3.5 4.0 > t.test(y1,y3) Welch Two Sample t-test 1 2 3 data: y1 and y3 t = 9.455, df = 94.698, p-value = 2.484e-15 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.5198348 0.7961652 sample estimates: mean of x mean of y 3.428 2.770

TESTOWANIE HIPOTEZ 2.0 2.5 3.0 3.5 4.0 > t.test(y2,y3) Welch Two Sample t-test 1 2 3 data: y2 and y3 t = 3.2058, df = 97.927, p-value = 0.001819 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.07771636 0.33028364 sample estimates: mean of x mean of y 2.974 2.770

BUDOWANIE MODELI iris$petal.width 0.5 1.0 1.5 2.0 2.5 1 2 3 4 5 6 7 iris$petal.length

BUDOWANIE MODELI > lm(iris$petal.length~iris$petal.width)->iris.lm > iris.lm Call: lm(formula = iris$petal.length ~ iris$petal.width) Coefficients: (Intercept) iris$petal.width 1.084 2.230

BUDOWANIE MODELI Coefficients: (Intercept) iris$petal.width 1.084 2.230 > plot(iris$petal.width,iris$petal.length) > abline(1.084,2.230) iris$petal.length 1 2 3 4 5 6 7 0.5 1.0 1.5 2.0 2.5 iris$petal.width

BUDOWANIE MODELI > summary(iris.lm) Call: lm(formula = iris$petal.length ~ iris$petal.width) Residuals: Min 1Q Median 3Q Max -1.33542-0.30347-0.02955 0.25776 1.39453 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 1.08356 0.07297 14.85 <2e-16 *** iris$petal.width 2.22994 0.05140 43.39 <2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 0.4782 on 148 degrees of freedom Multiple R-squared: 0.9271, Adjusted R-squared: 0.9266 F-statistic: 1882 on 1 and 148 DF, p-value: < 2.2e-16

BUDOWANIE MODELI Residuals vs Fitted > plot(iris.lm) Residuals -1.5-1.0-0.5 0.0 0.5 1.0 1.5 135 108 115 2 3 4 5 6 Fitted values lm(iris$petal.length ~ iris$petal.width)

BUDOWANIE MODELI Normal Q-Q > plot(iris.lm) Standardized residuals -3-2 -1 0 1 2 3 115 108 135-2 -1 0 1 2 Theoretical Quantiles lm(iris$petal.length ~ iris$petal.width)

BUDOWANIE MODELI Scale-Location > plot(iris.lm) Standardized residuals 0.0 0.5 1.0 1.5 135 108 115 2 3 4 5 6 Fitted values lm(iris$petal.length ~ iris$petal.width)

BUDOWANIE MODELI Residuals vs Leverage > plot(iris.lm) Standardized residuals -3-2 -1 0 1 2 3 Cook's distance 142 115 145 0.000 0.005 0.010 0.015 0.020 0.025 Leverage lm(iris$petal.length ~ iris$petal.width)

UCZENIE MASZYNOWE > library(randomforest) > randomforest(x=iris[,1:4],y=iris$species,ntree=2000,importance=t)- >rf.1 > rf.1 Call: randomforest(x = iris[, 1:4], y = iris$species, ntree = 2000, importance = T) Type of random forest: classification Number of trees: 2000 No. of variables tried at each split: 2 OOB estimate of error rate: 4.67% Confusion matrix: setosa versicolor virginica class.error setosa 50 0 0 0.00 versicolor 0 47 3 0.06 virginica 0 4 46 0.08