Nauka Przyroda Technologie



Podobne dokumenty
Latent Dirichlet Allocation Models and their Evaluation IT for Practice 2016

Fig 5 Spectrograms of the original signal (top) extracted shaft-related GAD components (middle) and

deep learning for NLP (5 lectures)


Zakopane, plan miasta: Skala ok. 1: = City map (Polish Edition)

Proposal of thesis topic for mgr in. (MSE) programme in Telecommunications and Computer Science


Karpacz, plan miasta 1:10 000: Panorama Karkonoszy, mapa szlakow turystycznych (Polish Edition)

Zarządzanie sieciami telekomunikacyjnymi

DUAL SIMILARITY OF VOLTAGE TO CURRENT AND CURRENT TO VOLTAGE TRANSFER FUNCTION OF HYBRID ACTIVE TWO- PORTS WITH CONVERSION

OpenPoland.net API Documentation

Knovel Math: Jakość produktu

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition)

EXAMPLES OF CABRI GEOMETRE II APPLICATION IN GEOMETRIC SCIENTIFIC RESEARCH

Rozpoznawanie twarzy metodą PCA Michał Bereta 1. Testowanie statystycznej istotności różnic między jakością klasyfikatorów

Tychy, plan miasta: Skala 1: (Polish Edition)


Hard-Margin Support Vector Machines

Helena Boguta, klasa 8W, rok szkolny 2018/2019

Weronika Mysliwiec, klasa 8W, rok szkolny 2018/2019

Analysis of Movie Profitability STAT 469 IN CLASS ANALYSIS #2

ARNOLD. EDUKACJA KULTURYSTY (POLSKA WERSJA JEZYKOWA) BY DOUGLAS KENT HALL

Katowice, plan miasta: Skala 1: = City map = Stadtplan (Polish Edition)

Formularz recenzji magazynu. Journal of Corporate Responsibility and Leadership Review Form

y = The Chain Rule Show all work. No calculator unless otherwise stated. If asked to Explain your answer, write in complete sentences.

DOI: / /32/37

QUANTITATIVE AND QUALITATIVE CHARACTERISTICS OF FINGERPRINT BIOMETRIC TEMPLATES

Network Services for Spatial Data in European Geo-Portals and their Compliance with ISO and OGC Standards

Akademia Morska w Szczecinie. Wydział Mechaniczny

Towards Stability Analysis of Data Transport Mechanisms: a Fluid Model and an Application

Krytyczne czynniki sukcesu w zarządzaniu projektami

TTIC 31210: Advanced Natural Language Processing. Kevin Gimpel Spring Lecture 9: Inference in Structured Prediction

ERASMUS + : Trail of extinct and active volcanoes, earthquakes through Europe. SURVEY TO STUDENTS.

OPTYMALIZACJA PUBLICZNEGO TRANSPORTU ZBIOROWEGO W GMINIE ŚRODA WIELKOPOLSKA

POLITYKA PRYWATNOŚCI / PRIVACY POLICY


Instrukcja obsługi User s manual

Machine Learning for Data Science (CS4786) Lecture 11. Spectral Embedding + Clustering

ZASTOSOWANIE MODELU GOMPERTZ A W INŻYNIERII ROLNICZEJ

MaPlan Sp. z O.O. Click here if your download doesn"t start automatically

General Certificate of Education Ordinary Level ADDITIONAL MATHEMATICS 4037/12


P R A C A D Y P L O M O W A

Rev Źródło:

KONSPEKT DO LEKCJI MATEMATYKI W KLASIE 3 POLO/ A LAYER FOR CLASS 3 POLO MATHEMATICS


Machine Learning for Data Science (CS4786) Lecture 24. Differential Privacy and Re-useable Holdout

LEARNING AGREEMENT FOR STUDIES

Wydział Fizyki, Astronomii i Informatyki Stosowanej Uniwersytet Mikołaja Kopernika w Toruniu

Miedzy legenda a historia: Szlakiem piastowskim z Poznania do Gniezna (Biblioteka Kroniki Wielkopolski) (Polish Edition)

Linear Classification and Logistic Regression. Pascal Fua IC-CVLab

Few-fermion thermometry

PORÓWNANIE ALGORYTMÓW OPTYMALIZACJI GLOBALNEJ W MODELOWANIU ODWROTNYM PROCESÓW SUSZENIA PRODUKTÓW ROLNICZYCH

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition)

Machine Learning for Data Science (CS4786) Lecture11. Random Projections & Canonical Correlation Analysis

Egzamin maturalny z języka angielskiego na poziomie dwujęzycznym Rozmowa wstępna (wyłącznie dla egzaminującego)


Extraclass. Football Men. Season 2009/10 - Autumn round

Revenue Maximization. Sept. 25, 2018

Warsztaty Ocena wiarygodności badania z randomizacją

Wprowadzenie do programu RapidMiner, część 2 Michał Bereta 1. Wykorzystanie wykresu ROC do porównania modeli klasyfikatorów

Pielgrzymka do Ojczyzny: Przemowienia i homilie Ojca Swietego Jana Pawla II (Jan Pawel II-- pierwszy Polak na Stolicy Piotrowej) (Polish Edition)

ZGŁOSZENIE WSPÓLNEGO POLSKO -. PROJEKTU NA LATA: APPLICATION FOR A JOINT POLISH -... PROJECT FOR THE YEARS:.

ABOUT NEW EASTERN EUROPE BESTmQUARTERLYmJOURNAL

Przedmioty do wyboru oferowane na stacjonarnych studiach II stopnia (magisterskich) dla II roku w roku akademickim 2015/2016

SSW1.1, HFW Fry #20, Zeno #25 Benchmark: Qtr.1. Fry #65, Zeno #67. like

FORMULARZ REKLAMACJI Complaint Form

Emilka szuka swojej gwiazdy / Emily Climbs (Emily, #2)

TTIC 31210: Advanced Natural Language Processing. Kevin Gimpel Spring Lecture 8: Structured PredicCon 2

The data reporting such indexes for a number of years (about twelve years of such data are were fitted to a logistic curve:

tum.de/fall2018/ in2357

Sargent Opens Sonairte Farmers' Market

MS Visual Studio 2005 Team Suite - Performance Tool

Council of the European Union Brussels, 7 April 2016 (OR. en, pl)

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition)

User s manual for icarwash

Blow-Up: Photographs in the Time of Tumult; Black and White Photography Festival Zakopane Warszawa 2002 / Powiekszenie: Fotografie w czasach zgielku

Stargard Szczecinski i okolice (Polish Edition)

Domy inaczej pomyślane A different type of housing CEZARY SANKOWSKI

SubVersion. Piotr Mikulski. SubVersion. P. Mikulski. Co to jest subversion? Zalety SubVersion. Wady SubVersion. Inne różnice SubVersion i CVS

UMOWY WYPOŻYCZENIA KOMENTARZ

DODATKOWE ĆWICZENIA EGZAMINACYJNE

Wprowadzenie do programu RapidMiner, część 5 Michał Bereta

PROGRAM STAŻU. Nazwa podmiotu oferującego staż / Company name IBM Global Services Delivery Centre Sp z o.o.

INSPECTION METHODS FOR QUALITY CONTROL OF FIBRE METAL LAMINATES IN AEROSPACE COMPONENTS

The use of objective statistical processing for presenting the data in scientific publications

Wykaz linii kolejowych, które są wyposażone w urządzenia systemu ETCS

Is there a relationship between age and side dominance of tubal ectopic pregnancies? A preliminary report

ZGŁOSZENIE WSPÓLNEGO POLSKO -. PROJEKTU NA LATA: APPLICATION FOR A JOINT POLISH -... PROJECT FOR THE YEARS:.

Ankiety Nowe funkcje! Pomoc Twoje konto Wyloguj. BIODIVERSITY OF RIVERS: Survey to students

Wykaz linii kolejowych, które są wyposażone w urzadzenia systemu ETCS

INSTRUKCJE JAK AKTYWOWAĆ SWOJE KONTO PAYLUTION

Steeple #3: Gödel s Silver Blaze Theorem. Selmer Bringsjord Are Humans Rational? Dec RPI Troy NY USA

Pro-tumoral immune cell alterations in wild type and Shbdeficient mice in response to 4T1 breast carcinomas

Raport bieżący: 44/2018 Data: g. 21:03 Skrócona nazwa emitenta: SERINUS ENERGY plc

The impact of the global gravity field models on the orbit determination of LAGEOS satellites

Dolny Slask 1: , mapa turystycznosamochodowa: Plan Wroclawia (Polish Edition)

ANKIETA ŚWIAT BAJEK MOJEGO DZIECKA

The Lorenz System and Chaos in Nonlinear DEs

Transkrypt:

Nauka Przyroda Technologie ISSN 1897-7820 http://www.npt.up-poznan.net Dział: Rolnictwo Copyright Wydawnictwo Uniwersytetu Przyrodniczego w Poznaniu 2010 Tom 4 Zeszyt 4 ALICJA SZABELSKA 1, MICHAŁ SIATKOWSKI 2, TERESA GOSZCZURNA 1, JOANNA ZYPRYCH 1 1 Department of Mathematical and Statistical Methods Poznań University of Life Sciences 2 Department of Agricultural Engineering Poznań University of Life Sciences COMPARISON OF GROWTH MODELS IN PACKAGE R Summary. In this paper we present five growth models (growth curves; growth curve models): exponential, Gompertz, logistic, log-logistic and Weibull. For each growth curve the formula and graphical representation are described. Next, we introduce in package R methods for estimation of the model parameters and goodness of fit to the data criteria: Akaike Information Criterion and Bayesian Information Criterion. This criteria permit comparing and choosing the best model. In further part an example illustrating the process of estimating parameters and choosing the best model for a given dataset is presented. Key words: Akaike Information Criterion, Bayesian Information Criterion, growth curve models, package R Introduction The growth curve models are a useful tool for modelling natural events that involve investigations of changes of response in time. In HUNT (1982) and HANUSZ et AL. (2008) are described the exemplary applications of the chosen growth curves in agricultural issues. For a given model we can consider time as a continuous variable instead of discrete one. The basic premise in growth curve models is that there exists a functional relationship between response and time. This relationship may be modelled. The models considered in this paper differ from each other by the form and number of parameters. This results in different shapes of the curves. Two questions should be answered when modelling such type of the process. Which growth curve should be used and what will be the values of parameters in the chosen model? These questions can be answered using mathematical methods, such as estimation. Five models are presented explicitly in this paper. The goal is to provide the reader with various functions in package R for this kind of analysis and a ready-made code for

2 Szabelska A., Siatkowski M., Goszczurna T., Zyprych J., 2010. Comparison of growth models in package R. Nauka identification of the best model for specific sets of data. The utilization of growth curves in package R is supported by the example. Material and methods First, five growth curves are presented with graphical representation: exponential, Gompertz, logistic, log-logistic and Weibull. To stress the flexibility of the models we display graphs for two exemplary sets of parameters for each model. Exponential model (Fig. 1): x f exp x α = 10, β = 5, γ = 25 α = 10, β = 5, γ = 25 Fig. 1. Plots of exponential model for example two sets of parameters Rys. 1. Wykres modelu wykładniczego dla dwóch przykładowych zestawów parametrów Gompertz model (Fig. 2): Logistic model (Fig. 3): Log-logistic model (Fig. 4): f x exp exp x f f x x 1 exp x 1 exp ln x ln

Szabelska A., Siatkowski M., Goszczurna T., Zyprych J., 2010. Comparison of growth models in package R. Nauka 3 α = 10, β = 5, γ = 0.08, δ = 40 α = 10, β = 5, γ = 0.08, δ = 40 Fig. 2. Plots of Gompertz model for example two sets of parameters Rys. 2. Wykres modelu Gompertza dla dwóch przykładowych zestawów parametrów α = 10, β = 5, γ = 0.15, δ = 35, ε = 0.5 α = 10, β = 5, γ = 0.15, δ = 35, ε = 0.5 Fig. 3. Plots of logistic model for example two sets of parameters Rys. 3. Wykres modelu logistycznego dla dwóch przykładowych zestawów parametrów Weibull model (Fig. 5): f x exp exp ln x ln More information about the models can be found in works by SEBER and WILD (1989) and FINNEY (1979). To be able to use these growth curves it is required to find the values of the parameters in the model. For this purpose statistical package R (R: A LANGUAGE... 2009) can be used. In package R there is package drc (RITZ and STREIBIG 2005). There exists function drm that estimates parameters for all five models introduced above. The estimation is based on Nelder-Mead, quasi-newton and conjugate-

4 Szabelska A., Siatkowski M., Goszczurna T., Zyprych J., 2010. Comparison of growth models in package R. Nauka α = 10, β = 5, γ = 7, δ = 65, ε = 1.5 α = 10, β = 5, γ = 7, δ = 65, ε = 1.5 Fig. 4. Plots of log-logistic model for example two sets of parameters Rys. 4. Wykres modelu log-logistycznego dla dwóch przykładowych zestawów parametrów α = 10, β = 5, γ = 2.5, δ = 60 α = 10, β = 5, γ = 2.5, δ = 60 Fig. 5. Plots of Weibull model for example two sets of parameters Rys. 5. Wykres modelu Weibulla dla dwóch przykładowych zestawów parametrów -gradient algorithms (NOCEDAL and WRIGHT 1999). The utilization of this function was illustrated using two artificial datasets presented in Table 1 and Table 2. Usage of drm function can be obtained in R by writing a following code: library(drc) dataset1<-read.table("dataset1.txt") result.m_k<-drm(response~time, data = dataset1, fct = model_k) summary(result.m_k) plot(result.m_k)

Szabelska A., Siatkowski M., Goszczurna T., Zyprych J., 2010. Comparison of growth models in package R. Nauka 5 Table 1. The first set of data Tabela 1. Pierwszy zbiór danych Time 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 Response 3 4 6 6 9 16 20 24 35 42 47 64 80 93 120 Time 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 Response 133 161 174 178 192 196 206 211 213 221 224 229 229 230 232 Table 2. The second set of data Tabela 2. Drugi zbiór danych Time 1 5 9 13 17 21 25 29 33 37 41 45 49 Response 100 100 100 100 100 100 100 100 100 99 97 96 93 Time 53 57 61 65 69 73 77 81 85 89 93 97 101 Response 88 85 80 73 67 64 62 58 57 55 52 52 52 The first two lines of the code switch on the drc package and include data that will be used. In the third line the drm function is performed. The expression model_k written with italic font should be replaced with the appropriate name of a chosen growth curve model function from the drc package. The names of functions of the considered models are given in Table 3. Table 3. Names of functions of growth curve models in drc package Tabela 3. Nazwy funkcji modeli wzrostu w pakiecie drc Name of model Exponential Gompertz Logistic Log-logistic Weibull Name of function EXD.3() G.4() L.5() LL.5() W1.4() The fourth line of the code presents values of estimated coefficients with additional information about the precision of the estimation procedure. The last line generates the plot of the model with estimated values, as well as with the data points. It is also worth mentioning that drc package contains more growth curves than the presented ones. Mostly, they are similar to one of the presented models, but with the reduced number of parameters. Readers can find more about other models in help of drm function in R.

6 Szabelska A., Siatkowski M., Goszczurna T., Zyprych J., 2010. Comparison of growth models in package R. Nauka When parameters for every model are specified we have to decide which model will be used as the mathematical description of the studied process in the experiment. Since all the models can be considered from statistical point of view, we can make the decision based on some statistical criteria. We use Akaike Information Criterion AIC (AKAIKE 1974) and Bayesian Information Criterion BIC (SCHWARZ 1978) as a tool of comparison between the models. Both of these values are standard measures of goodness of fit to nonlinear data. It is considered that the model with the lowest value of AIC or alternatively BIC is the most suitable one. The AIC can be calculated in R by using of the function AIC from stats package as follows: AIC(result.m_1, result.m_2, result.m_3, result.m_4, result.m_5) The BIC can be derived with usage of the same function. However, the parameters of this function need to be used in the following form: AIC(result.m_1, result.m_2, result.m_3, result.m_4, result.m_5, k=log(nrow(dataset1))). Results The estimation of parameters for the five considered models and choice of the most suitable model was undergone for the sets of data from Table 1 and Table 2 separately. Table 4 gathers the values of the estimated parameters based on the first dataset. Table 4. The estimates of parameters for the dataset 1 Tabela 4. Estymatory parametrów dla danych nr 1 Model exponential Gompertz logistic log-logistic Weibull α = 1 738.86 β = 39.19 γ = 1 628.74 α = 9.39 β = 119.93 γ = 380 969.66 δ = 78 732.23 α = 0.03 β = 2.17 γ = 231.86 δ = 141.66 ε = 1.20 α = 6.24 β = 5.66 γ = 234.66 δ = 178.73 ε = 0.52 α = 2.72 β = 12.00 γ = 267.75 δ = 140.47 In Table 5 the results of estimation of parameters for the second dataset are presented. Five so-called empirical models were obtained using estimated parameters, which we would like to compare. Thus, we calculated AIC and BIC to find the model with the best fit. The obtained values are presented in Table 6 and Table 7. Table 6 and Table 7 reveal that the order of AIC and BIC values are the same for both datasets. This means that both coefficients indicate the same choice of the model. In the case of first dataset the lowest values of AIC, equal to 169.76, and BIC, equal to 178.17, has logistic model. However, for the second dataset log-logistic model indicates the best fit with values 53.01 for AIC and 60.56 for BIC. In Figure 6 the plots of the most accurate model with estimated parameters are presented. In addition, the plots contain the points from datasets 1 and 2, respectively.

Szabelska A., Siatkowski M., Goszczurna T., Zyprych J., 2010. Comparison of growth models in package R. Nauka 7 Table 5. The estimates of parameters for the dataset 2 Tabela 5. Estymatory parametrów dla danych nr 2 Model exponential Gompertz logistic log-logistic Weibull α = 13 964.57 β = 112.59 γ = 23 448.18 α = 0.07 β = 99.90 γ = 48.82 δ = 59.06 α = 0.15 β = 49.01 γ = 100.18 δ = 54.71 ε = 0.42 α = 7.18 β = 48.59 γ = 100.07 δ = 61.23 ε = 0.78 α = 5.05 β = 52.97 γ = 100.50 δ = 68.10 Table 6. The values of AIC and BIC coefficients for the dataset 1 Tabela 6. Wartości współczynników AIC i BIC dla danych nr 1 Criterion Model exponential Gompertz logistic log-logistic Weibull AIC 268.92 363.79 169.76 170.71 204.91 BIC 274.52 370.80 178.17 179.12 211.92 Table 7. The values of AIC and BIC coefficients for the dataset 2 Tabela 7. Wartości współczynników AIC i BIC dla danych nr 2 Criterion Model exponential Gompertz logistic log-logistic Weibull AIC 177.98 57.98 54.01 53.01 78.89 BIC 183.02 64.27 61.56 60.56 85.18 Fig. 6. Models with the best fit for the dataset 1 and the dataset 2 Rys. 6. Najlepiej dopasowane modele dla danych nr 1 i danych nr 2

8 Szabelska A., Siatkowski M., Goszczurna T., Zyprych J., 2010. Comparison of growth models in package R. Nauka Conclusions Considered five growth curves showed differential flexibility in representation of the data. The results showed that when modelling data from Table 1 and Table 2, logistic or log-logistic model should be used. For both of these models there is small difference in values of AIC and BIC. This result suggests that both of the curves could be chosen as satisfactory. Gompertz and Weibull models could also be acceptable in some cases. However, simple structure of the exponential model can lead to weak fit to data compromising its practical usefulness in this kind of problems. The methods introduced in Material and methods section show the procedure of finding the most suitable model (with the appropriate parameters) to the data. Furthermore, the statistical issues of estimation of parameters and the choice of the model can be successfully and easily overtaken by usage of package R. In our opinion the proposed algorithm for detecting the description of the growth process with suitable growth curve model may find practical application in automating such kinds of analysis. Acknowledgments The authors wish to thank for the reviewers comments and suggestions that improved their manuscript. References AKAIKE H., 1974. A new look at the statistical model identification. IEEE Trans. Autom. Control 19: 716-723. FINNEY D.J., 1979. Bioassay and the practise of statistical inference. Int. Stat. Rev. 47: 1-12. HANUSZ Z., SIARKOWSKI Z., OSTROWSKI K., 2008. Zastosowania modelu Gompertza w inżynierii rolniczej. Inż. Roln. 7: 105. HUNT R., 1982. Plant growth curves: the functional approach to plant growth analysis. Arnold, London. NOCEDAL J., WRIGHT S.J., 1999. Numerical optimization. Springer, New York. R: A LANGUAGE and environment for statistical computing. 2009. R Development Core Team. R Foundation for Statistical Computing, Vienna, Austria. [http://www.r-project.org]. RITZ C., STREIBIG J.C., 2005. Bioassay analysis using R. J. Stat. Softw. 12, 5: 1-22. SCHWARZ G., 1978. Estimating the dimension of a model. Ann. Stat. 6: 461-464. SEBER G.A.F., WILD C.J., 1989. Nonlinear regression. Wiley, New York. PORÓWNANIE KRZYWYCH WZROSTU W PAKIECIE R Streszczenie. W pracy przedstawiono pięć modeli krzywych wzrostu: wykładniczy, Gompertza, logistyczny, log-logistyczny i Weibulla. Korzystając z ogólnodostępnego pakietu statystycznego R, w szczególności z pakietu drc, przeprowadzono estymację parametrów rozpatrywanych modeli krzywych wzrostu. Zaprezentowano w ten sposób funkcje dopasowujące do danych. Ponadto wyznaczono wartości współczynników AIC oraz BIC, określające miarę dopasowania modeli do

Szabelska A., Siatkowski M., Goszczurna T., Zyprych J., 2010. Comparison of growth models in package R. Nauka 9 danych empirycznych, umożliwiając wskazanie modelu najlepszego. Otrzymane wyniki przedstawiono w postaci graficznej. Słowa kluczowe: Informacyjne Kryterium Akaike, Bayesowskie Kryterium Informacyjne Schwarza, krzywe wzrostu, pakiet R Corresponding address Adres do korespondencji: Alicja Szabelska, Katedra Metod Matematycznych i Statystycznych, Uniwersytet Przyrodniczy w Poznaniu, ul. Wojska Polskiego 28, 60-637 Poznań, Poland, e-mail: aszab@up.poznan.pl Accepted for print Zaakceptowano do druku: 18.05.2010 For citation Do cytowania: Szabelska A., Siatkowski M., Goszczurna T., Zyprych J., 2010. Comparison of growth models in package R. Nauka