XLIV Conference on Mathematical Statistics in Będlewo 3 7 December, 2018
Organizers Banach Center, Institute of Mathematics of the Polish Academy of Sciences Faculty of Mathematics and Computer Science, Nicolaus Copernicus University in Toruń Komisja Statystyki Komitetu Matematyki Polskiej Akademii Nauk Sponsors Banach Center, Institute of Mathematics of the Polish Academy of Sciences Działalność Upowszechniaj aca Naukȩ, Polska Akademia Nauk Faculty of Mathematics and Computer Science, Nicolaus Copernicus University in Toruń Institute of Mathematics of the Polish Academy of Sciences Polski Koncern Naftowy ORLEN Spółka Akcyjna Scientific Comittee prof. Teresa Ledwina, Institute of Mathematics of the Polish Academy of Sciences prof. Jan Mielniczuk, Institute of Computer Science of the Polish Academy of Sciences prof. Wojciech Niemiro, University of Warsaw, Nicolaus Copernicus University in Toruń Organizing Comittee dr hab. Aleksander Zaigrajew Chair dr Agnieszka Goroncy dr Krzysztof Jasiński dr Wojciech Rejchel Editor Krzysztof Jasiński 2
Contents 1 Conference programme 4 2 Invited Speaker 11 3 Abstracts 14 4 List of Participants 59 3
Conference programme 4
Monday, 3rd of December 8:00 Breakfast 9:00 9:10 Opening of the conference 9:10 9:55 Krzysztof Łatuszyński Bayesian Inference in Intractable Likelihood Models, part I 9:55 10:10 Coffee break 10:10 10:30 Teresa Ledwina Intermediate efficiency of some weighted goodness-of-fit statistics 10:30 10:50 Dominik Szynal, Waldemar Wołyński On Families of Tests for Uniformity 10:50 11:10 Hryhorii Chereda, Tim Beissbarth, Frank Kramer Graph-based Convolutional Neural Networks for analyzing pathways in cancer 11:10 11:40 Coffee break 11:40 12:00 Maryia Shpak Structure learning for Continiuous Time Bayesian Networks via penalized maximum likelihood methods 12:00 12:20 John Noble, Łukasz Rajkowski Properties of the MAP estimator in a selection of Bayesian mixture models 12:20 12:40 Małgorzata Łazęcka Metoda losowych rzutów w szukaniu rzadkich reprezentacji sygnału 13:00 Lunch 15:30 15:50 Nikolay I. Nikolov Distance-based models for imperfect ranking in ranked set sampling 15:50 16:10 Dominik Ambroziak Improving graphical lasso 16:10 16:40 Coffee break 16:40 17:00 Adam Mieldzioc Estymacja macierzy kowariancji z zadaną strukturą na przykładzie danych metabolomicznych 17:00 17:20 Mateusz Wilk, Aleksander Zaigrajew Optimal designs for heteroscedastic RCR models 18:00 Dinner 5
Tuesday, 4th of December 8:00 Breakfast 9:00 9:45 Krzysztof Łatuszyński Bayesian Inference in Intractable Likelihood Models, part II 9:45 10:00 Coffee break 10:00 10:20 Mateusz John Porównanie estymatorów macierzy kowariancji o strukturze Toeplitza w wysokowymiarowym modelu wielowymiarowym 10:20 10:40 Krzysztof Rudaś Shrinkage estimators in uplift modeling 10:40 11:00 Monika Mokrzycka Porównanie algorytmów wyznaczania estymatorów największej wiarogodności i najmniejszej straty w modelach podwójnie wielowymiarowych 11:00 11:30 Coffee break 11:30 11:50 Piotr Pawlas, Dominik Szynal Characteristics and properties of k-th record values from normal distribution 11:50 12:10 Krzysztof Jasiński Characterizations of geometric distributions based on discrete records 12:10 12:30 Agnieszka Piliszek When does the last double record occur? 13:00 Lunch 15:00 15:20 Tadeusz Bednarski, Piotr Nowak Scale Fisher consistency of partial likelihood estimator for regression parameters in the Cox model with arbitrary frailty 15:20 15:40 Michał Biel Punktowe przedziały ufności w modelu odwrotnej regresji z operatorem splotu 15:40 16:00 Jakub Wojdyła Nonparametric bootstrap confidence bands for unfolding sphere size distributions 16:00 16:30 Coffee break 16:30 16:50 Mariusz Kubkowski, Jan Mielniczuk Procedure of selection in binary model for general loss function and random predictors 16:50 17:10 Michał Kos Asymptotic Control of False Discovery Rate in Linear Regression via SLOPE 17:10 17:30 Tomasz Górecki, Mirosław Krzyśko, Waldemar Wołyński Regularyzowana uogólniona analiza korelacji kanonicznych dla danych funkcjonalnych 18:00 Dinner 19:00 Meeting of committee 6
Wednesday, 5th of December 7:30 8:30 Breakfast 8:30 14:30 Excursion 14:30 15:30 Lunch 15:30 16:15 Krzysztof Łatuszyński Bayesian Inference in Intractable Likelihood Models, part III 16:15 16:45 Coffee break 16:45 17:05 Jan Mielniczuk, Mariusz Kubkowski Distributions of Interaction Information 17:05 17:25 Piotr Pokarowski Prediction error of penalized least squares for linear models 17:25 17:45 Wojciech Rejchel Rank-based model selection 19:00 Conference Dinner 7
Thursday, 6th of December 8:00 Breakfast 9:00 9:45 Krzysztof Łatuszyński Bayesian Inference in Intractable Likelihood Models, part IV 9:45 10:00 Coffee break 10:00 10:20 Małgorzata Bogdan On the properties of thresholded LASSO 10:20 10:40 Konrad Furmańczyk Estimation of autocovariance matrix for high dimensional linear process 10:40 11:00 Joanna Karłowska-Pik Development of a system for defining human appearance through DNA analysis 11:00 11:30 Coffee break 11:30 11:50 Mariusz Bieniek Pewne własności rozkładów czasu życia systemów niezawodnościowych 11:50 12:10 Anna Dembińska, Agnieszka Goroncy Moments of discrete lifetimes of reliability systems with DNID components 12:10 12:30 Magdalena Szymkowiak Support dependent generalized aging intensities 13:00 Lunch 15:00 15:20 Tomasz Rychlik Verifying stochastic orderings of system lifetimes 15:20 15:40 Wojciech Niemiro Fixed relative precision estimators of growth rate for compound Poisson and Lévy processes 15:40 16:00 Błażej Miasojedow The Wasserstein Distance as a Dissimilarity Measure for Mass Spectra with Application to Spectral Deconvolution 16:00 16:30 Coffee break 16:30 16:50 Magdalena Alama-Bućko, Aleksander Zaigrajew Optimal confidence regions for location and scale parameters of Gumbel and Burr distributions based on k-th record values 16:50 17:10 Agnieszka Kulawik Robust estimation and its application to a classification method 17:10 17:30 Piotr Sulewski Modification of Anderson-Darling goodness-of-fit test of normality 18:00 19:00 Dinner 19:00 Concert 8
Friday, 7th of December 8:00 Breakfast 9:00 9:20 Przemysław Grzegorzewski On some dispersion measures for random intervals 9:20 9:40 Augustyn Markiewicz Linear Prediction Sufficiency in the Linear Model 9:40 10:00 Mariusz Grz adziel On computing REML estimators of variance components in linear mixed models 10:00 10:20 Coffee break 10:20 10:40 Marek Męczarski O estymacji współczynnika dopasowania w statystyce aktuarialnej 10:40 11:00 Grzegorz Wyłupek Data-Driven Kaplan-Meier One-Sided Two-Sample Tests 11:00 11:20 Karol Opara Estimation of parameters of mechanical suspension models 11:20 11:30 Closing of the conference 12:00 Lunch 9
2018 AWARDS FOR YOUNG PARTICIPANTS 1. Michał Biel, Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie 2. Mariusz Kubkowski, Politechnika Warszawska i Instytut Podstaw Informatyki PAN 3. Małgorzata Łazęcka, Politechnika Warszawska 4. Agnieszka Piliszek, Politechnika Warszawska 5. Łukasz Rajkowski, Uniwersytet Warszawski 6. Krzysztof Rudaś, Politechnika Warszawska 7. Jakub Wojdyła, Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie 10
Invited Speaker 11
Krzysztof Łatuszyński obtained MSc (2002) and PhD (2008, advisor Wojciech Niemiro) degrees in Mathematics from the University of Warsaw and an MSc degree in Econometrics from the Warsaw School of Economics. Between 2008 and 2012 he held Research Associate positions at the Universities of Warwick and Toronto. Since 2012 he works as Assistant and Associate (2016) Professor at the Department of Statistics, University of Warwick. He is also a Faculty Fellow of the Alan Turing Institute in London. His research is funded by the Royal Society through the Royal Society University Research Fellowship (2014-2019). He works in the area of computational statistics with main focus on theory and methodology of Markov chain Monte Carlo and related algorithms. 12
Bayesian Inference in Intractable Likelihood Models Krzysztof Łatuszyński Alan Turing Institute, Department of Statistics, University of Warwick Complex statistical models with intractable likelihood function are becoming ever more common in applied statistics. Unfortunately vanilla MCMC procedures used for posterior sampling and Bayesian Inference require pointwise evaluations of the likelihood and are not applicable in such settings. I will briefly review the standard toolbox for MCMC sampling with intractable likelihoods to introduce pseudo-marginal methods and ABC, among others. Next I will focus on an approach that involves unbiased estimators of the likelihood and application of Bernoulli Factory type algorithms to circumvent the issue. The Bernoulli Factory is a fundamental problem about algorithmic processing of randomness that dates back to John von Neumann. Celebrated by probabilists and computer scientists, it has recently attracted attention of statisticians. I will present MCMC motivated solutions to the Bernoullli Factory problem and discuss its extensions. The lectures will be illustrated with examples of Bayesian Inference for complex diffusion processes, such as jump diffusions or Markov switching diffusions, as well as with Bayesian model selection. 13
Abstracts 14
Optimal confidence regions for location and scale parameters of Gumbel and Burr distributions based on k-th record values Magdalena Alama-Bućko 1, Aleksander Zaigrajew 2 1 Institute of Mathematics and Physics, UTP University of Science and Technology of Bydgoszcz 2 Faculty of Mathematics and Computer Science, Nicolaus Copernicus University of Toruń A family of distributions with location and scale parameters and the problem of estimating these parameters using the strong confidence regions having the smallest area are considered. The construction of confidence regions based on linear combinations of two or more ordered statistics was presented in [1,2]. Here, two k-th record values instead of ordered statistics are used for the construction of confidence regions for the location and scale parameters of Gumbel and Burr distributions. The obtained results are compared with those based on ordered statistics. [1] A. Zaigraev, M. Alama-Bućko (2013), On optimal choice of order statistics in large samples for the construction of confidence regions for the location and scale, Metrika. Vol. 76, pp. 577-593. [2] A. Zaigraev, M. Alama-Bućko (2018), Optimal choice of order statistics under confidence region estimation in case of large samples, Metrika. Vol. 81, pp. 283-305. 15
Improving graphical lasso Dominik Ambroziak I will talk about methods for selection of sparse conditional-dependence graphs for gaussian random variables. In recent years there has been developed two main approaches for this task. First is based on selection of the neighbourhood of every vertex [2]. Second requires estimation of a precision matrix (inverse covariance matrix), which in case of gaussian random vectors determines the structure of the conditional-dependence graph. The second approach is implemented in the graphical lasso method (GL). It finds an estimator minimizing negative log-likelihood with l 1 penalty [1]. I will also present an original method for sparse graph selection, which is based on assumption that GL selects a supergraph of true dependency graph. In this method, after selection of a graph with GL, we search for a graph which minimizes negative log-likelihood with l 0 penalty in a family of subgraphs chosen in a similar way like it was proposed for linear model selection [3]. Presented results were obtained in collaboration with P. Pokarowski. [1] Friedman, J., Hastie, T., Tibshirani, R. (2008), Sparse inverse covariance estimation with the graphical lasso, Biostatistics, 9(3), 432 441 [2] Meinshausen, N., Buhlmann, P. (2006), High-dimensional graphs and variable selection with the lasso, The annals of statistics, 34(3), 1436 1462 [3] Pokarowski, P., Mielniczuk, J. (2015), Combined l1 and greedy l0 penalized least squares for linear model selection, Journal of Machine Learning Research, 16(5), 961 992 16
Punktowe przedziały ufności w modelu odwrotnej regresji z operatorem splotu Michał Biel Wydział Matematyki Stosowanej, Akademia Górniczo-Hutnicza w Krakowie Rozważany jest problem konstrukcji asymptotycznych punktowych przedziałów ufności w modelu odwrotnej regresji z operatorem splotu. Problem był rozpatrywany w pracy [1]. Dzięki uzupełnieniu istotnych luk w dowodach możliwe było zmodyfikowanie bootstrapowych przedziałów ufności. Skorygowane przedziały ufności daj a znacznie lepsze wyniki symulacyjne niż oryginalne, których poprawność wydaje się w atpliwa. [1] N. Bissantz, M. Birke (2009), Asymptotic normality and confidence intervals for inverse regression models with convolution-type operators, Journal of Multivariate Analysis 100(10), 2364 2375 17
Pewne własności rozkładów czasu życia systemów niezawodnościowych Mariusz Bieniek Instytut Matematyki, Uniwersytet Marii Curie Skłodowskiej, Lublin W referacie omówione zostan a wybrane własności rozkładów czasu życia systemów koherentnych w modelu z czasami życia komponentów zależnymi od poprzednich uszkodzeń. Przy użyciu własności VDP gęstości uogólnionych statystyk porz adkowych zbadamy jednomodalność i dwumodalność, logarytmiczn a wypukłość oraz zachowanie własności IFR przez rozkład czasu życia całego systemu. W szczególności zaprezentujemy wyniki dla klasycznych systemów koherentnych, gdy czasy życia poszczególnych komponentów s a niezależne oraz o jednakowych rozkładach. [1] M. Bieniek, M. Burkschat (2018), On unimodality of the lifetime distribution of coherent systems with failure-dependent component lifetimes, J. Appl. Prob. 55, 473 487 [2] M. Bieniek, M. Burschat, T. Rychlik (2018), Conditions on unimodality and logconcavity for densities of coherent systems with an application to Bernstein operators, J. Math. Anal. Appl. 467, 863-873 18
On the properties of thresholded LASSO Małgorzata Bogdan Uniwersytet Wrocławski We will present new asymptotic theoretical results illustrating very good selection properties of the thresholded LASSO. We will consider the setup with a fixed design matrix and the increasing signal magnitude and a scenario with a random gaussian design matrix and the asymptotic setup of the Approximate Message Passing Theory. We will also discuss the application of the knockoff methodology for selecting the appropriate threshold. This is a joint work with Emmanuel J. Candes, Weijie Su, Patrick Tardivel and Asaf Weinstein. 19
Graph-based Convolutional Neural Networks for analyzing pathways in cancer Hryhorii Chereda 1, Tim Beissbarth 1, Frank Kramer 2 1 University Medical Center Goettingen, Institute of Medical Bioinformatics 2 University Augsburg, IT-Infrastructure for Translational Medicial Research Gene expression data is commonly available in cancer research and represents a snapshot of the current status of a specific tumor tissue. This high-dimensional data can be analyzed in order to predict diagnoses, prognoses, and suggest treatment plans. Biological pathways are represented by directed graphs detailing interactions between molecules and gene expression data can be assigned to the vertexes of these pathways. In recent years deep learning was applied to a wide range of problems in various areas. Such deep learning tools as convolutional neural networks (CNNs) have already shown prominent results in different applications, such as visual object recognition, object detection and speech recognition. Furthermore, CNN s have been applied to bioinformatic challenges such as predicting the effects of mutations in non-coding DNA on gene expression and disease. Recently, CNN s have been extended to include graph-structured data. We are planning to map gene-expression data to the vertices of biological pathways and feed this graph-structured data into CNNs in order to classify patients. The usual CNN architecture consists of three types of layers: convolutional layers, pooling layers, and fully connected layers. The first two layers utilize the structure of the data preparing informative features for the fully connected neural network layers. In our work, we consider three popular, but different approaches developed for application of CNN on graph-structured data. Our research aims to compare these approaches in order to address the question if the use of graph-based CNNs is able to provide valuable classification improvements by utilizing prior pathway knowledge. Preliminary results show that the utilizing of the WNT signaling pathways as a prior knowledge does not seem to improve the performance of the classifier in the case of breast cancer patients. Hence, the future work will concern different ways of the integration of the prior knowledge. 20
Estimation of autocovariance matrix for high dimensional linear process Konrad Furmańczyk Department of Applied Matematics, Warsaw University of Life Sciences Under some mild restrictions the rate of the error bounds for estimators of autocovariance matrices p p for high dimensional linear process is given. We showed that these estimators are consistent in the operator norm in Sub-Gaussian case, when p = O ( n γ/2) for some γ > 1 and in general case when p 2/β log p n 0 as p = p(n) and the sample size n. Our results have been compared with the results obtained by Bickel and E. Levina for independent data and by Chen et al. and Bhattacharjee and Bose for dependent data. Additionally, we present the non-asymptotic bounds for probability of errors of those estimators. [1] M. Bhattacharjee and A. Bose (2014), Estimation of autocovariance matrices for infinite dimensional vector linear process, J. Time Ser. Anal. 35, 262-281 [2] P. J. Bickel and E. Levina (2008), Covariance regularization by thresholding, Ann. Stat. 36, 2577-2604 [3] P. J. Bickel and E. Levina (2008), Regularized estimation of large covariance matrices, Ann. Stat. 36, 199-227 [4] X. Chen, M. Xu and W. B. Wu (2013), Covariance and precision matrix estimation for high-dimensional time series, Ann. Stat. 41, 2994-3021 [5] K. Furmańczyk (2018), Estimation of autocovariance matrix for high dimensional linear process, Preprint 21
Moments of discrete lifetimes of reliability systems with DNID components Agnieszka Goroncy 1, Anna Dembińska 2 1 Faculty of Mathematics and Computer Science, Nicolaus Copernicus University in Toruń 2 Faculty of Mathematics and Information Science, Warsaw University of Technology We consider a set of possibly dependent and not necessarily identically distributed discrete random variables. For such setting we compute moments of respective order statistics and present applications of this result to reliability theory. We introduce a method which allows for establishing expectation of lifetimes of coherent systems consisting of possibly dependent and nonhomogeneous components. We focus on the case of systems with multivariate geometrically distributed components lifetimes. [1] K. Davies and A. Dembińska (2018), Computing moments of discrete order statistics from non-identical distributions, J. Comput. Appl. Math., 328, 340 354 [2] A. Dembińska (2018), On reliability analysis of k-out-of-n systems consisting of heterogeneous components with discrete lifetimes, IEEE Transactions on Reliability, 99, 1 13 [3] A. Dembińska and A. Goroncy (2018), Moments of discrete lifetimes of coherent systems with possibly dependent and nonhomogeneous components, Manuscript [4] J.D. Esary and A.W. Marshall (1973), Multivariate geometric distributions generated by a cumulative damage process, Naval Postgraduate School Technical Report, NPS55EY73041A [5] J. Navarro, J.M. Ruiz, C.J. Sandoval (2007), Properties of coherent systems with dependent components, Commun. Stat.-Theory Methods, 36, 175 191 22
Regularyzowana uogólniona analiza korelacji kanonicznych dla danych funkcjonalnych Tomasz Górecki, Mirosław Krzyśko, Waldemar Wołyński Wydział Matematyki i Informatyki Uniwersytet im. Adama Mickiewicza w Poznaniu Uogólniona analiza korelacji kanonicznych (GCANO) jest metodą, która pozwala na łączną analizę kilku zbiorów danych poprzez redukcję wymiaru (Carroll (1968)). Podstawowym problemem GCANO jest konstrukcja serii składowych, które maksymalizują zależności pomiędzy wieloma zbiorami danych. Metoda ta zostanie zaprezentowana dla wielowymiarowych danych funkcjonalnych. Wielowymiarowe dane funkcjonalne (Ramsay & Silverman (2005), Horváth & Kokoszka (2012)) są rozumiane jako realizacje wielowymiarowych procesów losowych. Przedstawiona metodologia zostanie zilustrowana na rzeczywistych danych. [1] J.D. Carroll (1968), A generalization of canonical correlation analysis to three or more sets of variables, In: Proceedings of the 76th Annual Convention of the American Psychological Association, 227 228. [2] L. Horváth, P. Kokoszka (2012), Inference for Functional Data with Applications, Springer. New York. [3] A. Markos, A.I. D enza (2016), Incremental generalized canonical correlation analysis, In: Analysis of Large and Complex Data, Studies in Classification, Data Analysis, and Knowledge Organization, 185 194. [4] J.O. Ramsay, B.W. Silverman (2005), Functional Data Analysis, Springer. New York. 23
On computing REML estimators of variance components in linear mixed models Mariusz Grz adziel Wrocław University of Environmental and Life Sciences The REML likelihood function in linear mixed models may have multiple local maxima; compare [1] and [3]. There are several approaches for computing its global maximum that can be applied in some special models: the branch-and-bound type algorithm proposed by Lavine et al. [4] and the algebraic methods in which the problem of computing the REML estimator reduces to finding all roots of a certain polynomial (or system of polynomial equations); compare [1] and [2]. Their efficiency will be illustrated on examples. [1] E. Gross, M. Drton, S. Petrović (2012), Maximum likelihood degree of variance component models, Electronic Journal of Statistics 6, 993 1016 [2] M. Grz adziel (2016), On the maximum likelihood degree of linear mixed models with two variance, preprint arxiv:1608.08789v2 [3] L. Henn, J. S. Hodges (2014), Multiple local maxima in restricted likelihoods and posterior distributions for mixed linear models, International Statistical Review 82, 90 105 [4] M. Lavine, A. Bray, J. S. Hodges (2015), Approximately exact calculations for linear mixed models, Electronic Journal of Statistics 9, 2293 2323 24
On some dispersion measures for random intervals Przemysław Grzegorzewski Faculty of Mathematics and Information Science, Warsaw University of Technology Measures of dispersion play a key role both in descriptive and inferential statistics. However, most of the contributions on measures of dispersion are limited to univariate real data. Recently, the interval-valued data have drawn an increasing interest. Quite often a real random variable is imprecisely observed or is so uncertain that the results are recorded as the real intervals containing the true outcomes of the experiment (epistemic view). Sometimes the experimental data appear as essentially interval-valued data describing a precise information (e.g. ranges of fluctuations of some physical measurements, time interval spanned by an activity). Such intervals correspond to the ontic view. In the epistemic approach we assume that the interval-valued observations are perceptions of the unknown, not observed, true outcomes of a real-valued random variable. On the other hand, in the ontic approach we deal no longer with usual real-valued random variables but with random intervals. Further on, we restrict our attention to samples created as the realizations of random intervals. Although central tendency measures for random intervals have been extensively examined in the literature, it seems that the only measures of dispersion considered in this context were the sample variance and standard deviation. The lack of measures based on quantiles, like the range or interquartile range, can be somehow explained by the fact that interval data are not linearly ordered. However, using a suitable interpretation of the aforementioned measures we can generalize them also for random intervals. Hence, our goal is to propose a generalization of the range and the interquartile range that could be applied for characterizing the dispersion in the sample of random intervals (see Grzegorzewski, 2019). [1] P. Grzegorzewski (2019), Measures of dispersion for interval data, In: Destercke S. et al. (Eds.), Uncertainty Modelling in Data Science, Springer, pp. 91 98 25
Characterizations of geometric distributions based on discrete records Krzysztof Jasiński Faculty of Mathematics and Computer Science, Nicolaus Copernicus University in Toruń We present characterization theorems based on discrete records. It should be noted that the distribution theory of records in the case in which the parent distribution is discrete becomes more complicated than in the continuous case. This is so due to the presence of ties among observations. We obtain characterizations which are based on kth records (see Jasiński (2018a)) and weak kth records (see Jasiński (2018b)), respectively. We give some conditions involving certain properties of them, which identify the associated cumulative distribution function F. We extend the results of Dembińska (2008) and Dembińska and López-Blázquez (2005). [1] A. Dembińska (2008), kth records from geometric distribution, Statist. Probab. Lett. 78, 1662 1670 [2] A. Dembińska, F. López-Blázquez (2005), A characterization of geometric distribution through kth weak records, Commun. Statist. Theor. Meth. 34, 2345 2351 [3] K. Jasiński (2018a), Characterizations of geometric distributions based on kth record values, Statistics 52, 1116 1127 [4] K. Jasiński (2018b), Weak kth records from geometric distribution and some characterizations, Commun. Stat. Theor. Meth. Doi: 10.1080/03610926.2018.1529245 26
Porównanie estymatorów macierzy kowariancji o strukturze Toeplitza w wysoko-wymiarowym modelu wielowymiarowym Mateusz John Instytut Matematyki, Politechnika Poznańska W pracy rozważamy model eksperymentu, w którym dla każdego z obiektów badane jest wiele cech lub jedna cecha badana jest w wielu punktach czasowych. Zakładamy, że macierzy kowariancji dla cech/punktów czasowych ma strukturę macierzy Toeplitza. Problem estymacji takiej struktury kowariancji rozważany był między innymi w pracach Cui i in. (2016), Filipiak i in. (2018). Celem pracy jest porównanie estymatorów macierzy kowariancji o strukturze Toeplitza otrzymanych poprzez rzut na stożek macierzy nieujemnie określonych (zob. Filipiak i in., 2018) z estymatorami otrzymanymi metod a shrinkage. W pracy dopuszczać będziemy sytuację, gdzie wielkość próby nie przekracza liczby cech/punktów czasowych. [1] X. Cui, C. Li, J. Zhao, L. Zeng, D. Zhang, J. Pan (2016), Covariance structure regularization via Frobenius norm discrepancy, Linear Algebra Appl. 510, 124 145 [2] K. Filipiak, A. Markiewicz, A. Mieldzioc, A. Sawikowska (2018), On projection of a positive definite matrix on a cone of nonnegative definite Toeplitz matrices, Electron. J. Linear Algebra 33, 74 82 27
Development of a system for defining human appearance through DNA analysis Joanna Karłowska-Pik Nicolaus Copernicus University in Toruń The presentation will discuss the stages and statistical methods used for designing a system which defines human appearance on the base of DNA material. The research was carried out in cooperation with the group under the guidance of prof. Wojciech Branicki and dr Ewelina Pośpiech from Malopolska Centre of Biotechnology of the Jagiellonian University. The first problem concerned the initial selection of about 1000 SNPs which determine such features of human appearance as hair morphology, hair, skin and eyes colour, the occurrence of premature baldness or hair graying, etc. based on a database with a small number of records and a huge number of SNPs. For the SNPs preselection single markers tests and HyperLASSO and MOSGWA algorithms were used. The second task was to create rankings of the most informative SNPs based on a new database with pre-selected SNPs and a higher number of records, taking into account interactions between SNPs, and to make the choice of variables to models. Finally, for each feature several MLP models were built and tested to select the best ones. This allowed to discover some new SNPs, which determine the human appearance and have not been considered so far. [1] F. Frommlet, M. Bogdan, D. Ramsey (2016), Phenotypes and Genotypes. The Search for Influential Genes, Springer-Verlag, London [2] N. De Jay et all. (2013), mrmre: an R package for parallelized mrmr ensamble feature selection, Bioinformatics, Vol. 29, no. 18, pp. 2365 2368 [3] P. Romański, L. Kotthoff, FSelector: Selecting Attributes, https://cran.r-project.org/web/packages/fselector/index.html 28
Asymptotic Control of False Discovery Rate in Linear Regression via SLOPE Michał Kos Uniwersytet Wrocławski Sorted L-One Penalized Estimator (SLOPE) is a relatively new convex optimization procedure for identifying important predictors in high-dimensional multiple regression models. When the design matrix X n p is orthogonal SLOPE with the sequence of tuning parameters selected according to Benjamini - Hochberg (BH) controls False Discovery Rate (FDR). In this talk we will present new results illustrating asymptotic FDR control by SLOPE when the elements of the design matrix are iid variables from a normal distribution. We will discuss both low dimensional set-up, where p is fixed and n goes to infinity, and the high dimensional set-up, where p may diverge to infinity much quicker than n. We will illustrate our asymptotic results with computer simulations. [1] M. Bogdan, E. van den Berg, C. Sabatti, W. Su, E. Candes (2015), SLOPE - Adaptive Variable Selection via Convex Optimization, Annals of Applied Statistics Vol. 9, No. 3, 1103-1140 29
Procedure of selection in binary model for general loss function and random predictors Mariusz Kubkowski, Jan Mielniczuk Warsaw University of Technology, Institute of Computer Science Polish Academy of Sciences We consider random sample (X (n) i, Y (n) i ) R pn {0, 1} for i = 1,..., n and possibly p n > n. The sample satisfies P(Y (n) i = 1 X (n) i ) = q(x (n) i ), where q : R pn (0, 1). We discuss a problem of finding consistent estimator ˆβ n R pn of βn, where βn minimizes risk function: R n(b) = Eρ(b T X (n) 1, Y (n) 1 ), for b R pn and ρ : R {0, 1} R. In the talk we will show conditions for existence and uniqueness of βn. Moreover, we will present two-stage procedure to find ˆβ n as above and present results about its consistency. In particular probabilistic bound on l 1 distance ˆβ L βn 1 for Lasso estimate needed for its screening consistency will be discussed. The bound involves Lipschitz constant L of the loss, subgaussianity parameter s n and restricted eigenvalue T EXX T κ(d, s) = inf, C(d,s) 2 2 : s c 1 d s 1} and s is a subvector of vector where C(d, s) = { R pn with indexes of coefficients restricted to set s of active variables. We will additionally show probabilistic bound for constistency of Generalized Information Criterion for such variables. Theoretical results are supported by a simulation study. [1] M. Kubkowski, J. Mielniczuk (2018), Selection consistency of two-step selection for misspecified binary model, submitted [2] M. Kubkowski, J. Mielniczuk (2017), Active sets of predictors for misspecified logistic regression, Statistics 51(5), 1023 1045 30
Robust estimation and its application to a classification method Agnieszka Kulawik Institute of Mathematics, University of Silesia in Katowice The presentation will concern a method of obtaining empirical discriminant functions using robust estimators for parameters of multivariate normal model. There will be also shown results of application of the method in a case of some real data. The data describes chemical analysis of wine. In the empirical example three estimators for parameters of multivariate normal model will be compared - maximum likelihood estimator and two robust estimators. [1] A. Kulawik, S. Zontek (2016), Robust estimation in the multivariate normal model, Discussiones Mathematicae Probability and Statistics Vol. 36, No. 1-2, 53 66 [2] P. Rousseeuw (1985), Multivariate estimation with high breakdown point, W. Grossmann, G. Pflug, I. Vincze and W. Wertz (Eds.), Mathematical Statistics and Applications, Reidel Publishing Company, Dordrecht, 283 297 31
Intermediate efficiency of some weighted goodness-of-fit statistics Teresa Ledwina Institute of Mathematics, Polish Academy of Sciences, ul. Kopernika 18, 51-617 Wrocław, Poland This contribution compares Anderson-Darling and some Eicker-Jaeschke statistics to the classical unweighted Kolmogorov-Smirnov statistic. The goal is to provide some quantitative comparison of such tests and to study real possibilities to detect by them the departures from the hypothesized distribution that occur in the tails. The contribution covers the case when under the alternative some moderately large portion of probability mass is allocated towards the tails. It is demonstrated that the approach allows for tractable, analytic comparison between the given test and the benchmark, and for reliable quantitative evaluation of weighted statistics. Finite sample results illustrate the proposed approach and confirm the theoretical findings. In course of the investigation we also prove that slight and natural modification of the solution proposed by Borovkov and Sycheva (1968) leads to a statistic which is a member of Eicker-Jaeschke class and can be considered as an attractive competitor of very popular supremum-type Anderson-Darling statistic. The talk shall be based on two papers joint with Bogdan Ćmiel and Tadeusz Inglot. [1] A.A. Borovkov, N.M. Sycheva (1968), On asymptotically optimal non-parametric criteria, Theory Probab. Appl. 13, 359 393 [2] B. Ćmiel, T. Inglot, T. Ledwina (2018), Intermediate efficiency of some weighted goodness-of-fit statistics, submitted [3] T. Inglot, T. Ledwina, B. Ćmiel (2018), Intermediate efficiency in nonparametric testing problems with an application to some weighted statistics, ESAIM: Probability and Statistics, accepted 32
Metoda losowych rzutów w szukaniu rzadkich reprezentacji sygnału Małgorzata Łazęcka Politechnika Warszawska Referat oparty jest na pracy magisterskiej pod tytułem Metoda losowych rzutów w szukaniu rzadkich reprezentacji sygnału. Zostan a przedstawione główne idee teorii oszczędnego próbkowania - użyteczność własności ograniczonej izometrii (RIP) do bezstratnej rekonstrukcji rzadkich sygnałów oraz sposób konstruowania macierzy maj acych wyżej wymienion a własność wykorzystuj acy zmienne o rozkładach ściśle sub-gaussowskich. Pokazana zostanie nieprawdziwość twierdzenia pochodz acego z pracy M. A. Davenporta Concentration of measure for sub-gaussian random variables (por. R. Baraniuk et al. 2014), a następnie zostanie przedstawiona poprawna wersja tego twierdzenia. [1] R. Baraniuk, M. A. Davenport, R. DeVore, M. Wakin (2007), A simple proof of the restricted isometry property for random matrices, Constructive Approximation 28(3), 253 263 [2] R. Baraniuk, M. A. Davenport, M. F. Duarte, C. Hegde (2014), An Introduction to Compressive Sensing, https://legacy.cnx.org/content/col11133/1.5/ [3] E. J. Candès, M. B. Wakin (2008), An introduction to compressive sampling, IEEE Signal Processing Magazine 25(2), 21 30 33
Linear Prediction Sufficiency in the Linear Model Augustyn Markiewicz Poznań University of Life Sciences A linear statistic Fy is called linearly prediction sufficient, or shortly BLUP-sufficient, for the new observation y, say, if there exists a matrix A such that AFy is the best linear unbiased predictor, BLUP, for y. We review some properties of linear prediction sufficiency that have not been received much attention in the literature and provide some clarifying comments. In particular, we consider the best linear unbiased prediction of the error term related to y.we also explore some interesting properties of mixed linear models including the connection between a particular extended linear model and its transformed version. [1] J. Isotalo, A. Markiewicz and S. Puntanen (2018), Some properties of linear prediction sufficiency in the linear model, In: M. Tez and D. von Rosen (eds.), Trends and Perspectives in Linear Statistical Inference, Contributions to Statistics, Springer, 111 129 34
O estymacji współczynnika dopasowania w statystyce aktuarialnej Marek Męczarski Instytut Ekonometrii, Szkoła Główna Handlowa w Warszawie Referat zawiera przegl ad metod estymacji ważnego dla statystyki aktuarialnej współczynnika dopasowania, parametru będ acego elementem oszacowań prawdopodobieństwa ruiny. Wyniki nie s a nowe (ostatnie dekady XX w.), ale dość mało spopularyzowane i rzadko omawiane w podręcznikach i monografiach. [1] M. Csörgő, J. Steinebach (1991), On the estimation of the adjustment coefficient in risk theory via intermediate order statistics, Insurance: Mathematics and Economics 10, 37-50. [2] M. Csörgő, J. Teugels (1990), Empirical Laplace transform and approximation of compound distributions, Journal of Applied Probability 27, 88-101. [3] J. Grandell (1979), Empirical bounds for ruin probabilities, Stochastic Processes and their Applications 8, 243-255. [4] U. Herkenrath (1986), On the estimation of the adjustment coefficient in risk theory by means of stochastic approximation procedures, Insurance: Mathematics and Economics 5, 305-313. [5] V. Mammitzsch (1986), A note on the adjustment coefficient in ruin theory, Insurance: Mathematics and Economics 5, 147-149. [6] S. M. Pitts, R. Grübel, P. Embrechts (1996), Confidence bounds for the adjustment coefficient, Advances in Applied Probability 28, 802-827. 35
The Wasserstein Distance as a Dissimilarity Measure for Mass Spectra with Application to Spectral Deconvolution Błażej Miasojedow University of Warsaw We propose a new approach for the comparison of mass spectra using a metric known in the computer science under the name of Earth Mover s Distance and in mathematics as the Wasserstein distance. We argue that this approach allows for natural and robust solutions to various problems in the analysis of mass spectra. In particular, we show an application to the problem of deconvolution, in which we infer proportions of several overlapping isotopic envelopes of similar compounds. Combined with the previously proposed generator of isotopic envelopes, IsoSpec, our approach works for a wide range of masses and charges in the presence of several types of measurement inaccuracies. To reduce the computational complexity of the solution, we derive an effective implementation of the Interior Point Method as the optimization procedure. The software for mass spectral comparison and deconvolution based on Wasserstein distance is available at https://github.com/mciach/wassersteinms. [1] Majewski, S., Ciach, M. A., Startek, M., Niemyska, W., Miasojedow, B., Gambin, A. (2018). The Wasserstein Distance as a Dissimilarity Measure for Mass Spectra with Application to Spectral Deconvolution.,, 36
Estymacja macierzy kowariancji z zadaną strukturą na przykładzie danych metabolomicznych Adam Mieldzioc Uniwersytet Przyrodniczy w Poznaniu W pracy rozważamy problem estymacji macierzy kowariancji o zadanej strukturze. Zastosujemy metody opisane w pracach ([1], [2] i [3]) do oceny struktury kowariancji danych metabolomicznych([4]). Dokonamy wyboru spośród struktur: kompletnej symetrii, autogresji rzędu jeden i struktury Toeplitza przy użyciu funkcji rozbieżności opartej na normie Frobeniusa lub na entropijnej funkcji straty. Dla zbyt dużej wartości funkcji rozbieżności wybranej struktury dobieramy za pomocą analizy skupień i map ciepła lepiej dopasowane struktury blokowe, o blokach ze strukturą: kompletnej symetrii, autogresji rzędu jeden lub struktur Toeplitza. Na koniec wyznaczamy ocenę macierzy kowariancji o wybranej stukturze. Prezentowane wyniki badań zostały uzyskane we współpracy z Moniką Mokrzycką i Anetą Sawikowską. Podziękowania Badania zostały częściowo wsparte przez projekt o numerze WND-POIG.01.03.01-00-101/08. [1] Cui X., Li X., Zhao J., Zeng L., Zhang D., Pan J. (2016), Covariance structure regularization via Frobenius norm discrepancy, Linear Algebra Appl. 510:124 145. [2] FilipiakK., Markiewicz A., Mieldzioc A., Sawikowska A. (2018), On projection of a positive definite matrix on a cone of nonnegative definite Toeplitz matrices, Electronic Journal of Linear Algebra 33:74 82. [3] Lin L., Higham N. J., Pan J. (2014), Covariance structure regularization via entropy loss function, Computational Statistics and Data Analysis 72:315 327. [4] Mieldzioc A., Mokrzycka M., Sawikowska A. (2018), Estimation of the covariance matrix for metabolomic data, Submitted. 37
Distributions of Interaction Information Jan Mielniczuk, Mariusz Kubkowski Institute of Computer Science Polish Academy of Sciences, Warsaw University of Technology Interaction Information is one of the most promising measures of interaction strength having many desirable properties. However, its use for interaction detection was hindered by the fact that apart from the simple case of overall independence, asymptotic distribution of its estimate has not been known. In the talk we discuss asymptotic distributions of its empirical versions which are needed for formal testing of interactions. We prove that for trivariate qualitative vector standardized empirical interaction information converges to the normal law unless the distribution coincides with its Kirkwood approximation. In the opposite case the convergence is to the distribution of weighted centered chi-squared random variables. This case is of special importance as it roughly corresponds to interaction information being zero and the asymptotic distribution can be used for construction of formal tests for interaction detection. The result generalizes Han (1980) result for the case when all coordinate random variables are independent. The derivation relies on studying structure of covariance matrix of asymptotic distribution and its eigenvalues. For the case of 3 3 2 contingency table corresponding to study of two interacting Single Nucleotide Polymorphisms (SNPs) for prediction of binary outcome, we provide complete description of the asymptotic law and construct approximate critical regions for testing of interactions when two SNPs are possibly dependent. We show in numerical experiments that the test based on the derived asymptotic distribution is easy to implement and yields actual significance levels consistently closer to the nominal ones than the test based on chi-squared reference distribution. [1] M. Kubkowski, J. Mielniczuk (2018), Testing the significance of attributes interactions revisited, manuscript [2] T.S. Han (1980), Multiple mutual informations and multiple interactions in frequency data, Information and Control, 46(1), 26-45 38
Porównanie algorytmów wyznaczania estymatorów największej wiarogodności i najmniejszej straty w modelach podwójnie wielowymiarowych Monika Mokrzycka Instytut Genetyki Roślin Polskiej Akademii Nauk w Poznaniu Zagadnienie estymacji metod a największej wiarogodności (MLE) kowariancyjnej struktury Ψ Σ, gdzie jeden komponent posiada strukturę kompletnej symetrii rozważane było przez wielu autorów. Proponowane estymatory nie były jednak podane explicite i st ad wyznacza się je zazwyczaj tylko numerycznie. Celem tego referatu jest zaprezentowanie alternatywnej metody wyznaczania estymatora największej wiarogodności struktury kowariancyjnej oraz pokazanie, że nowy algorytm jest znacznie szybszy od algorytmów znanych z literatury. Innym estymatorem struktury kowariancyjnej może być rozwi azanie zagadnienia minimalizacji entropijnej funkcji straty. Pokazane zostanie, że w przypadku struktury iloczynu Kroneckera dwóch dowolnych macierzy określonych dodatnio (bez narzuconej struktury) istnieje relacja między estymatorami największej wiarogodności oraz estymatorami najmniejszej straty. W przypadku, gdy jedna z tych macierzy ma strukturę kompletnej symetrii, zaprezentowane zostan a trzy metody wyznaczania estymatorów badanej struktury z wykorzystaniem entropijnej funkcji straty (ELE) oraz porównanie prędkości działania proponowanych algorytmów. Ponadto zaprezentowane zostanie porównanie statystycznych własności MLE oraz ELE, takie jak obci ażoność, zmienność oraz strata. Prezentowane wyniki powstały we współpracy z K. Filipiak i D. Klein. [1] K. Filipiak, D. Klein, M. Mokrzycka (2018), Estimators comparison of separable covariance structure with one component as compound symmetry matrix, Electronic Journal of Linear Algebra 33, 83 98 39
Fixed relative precision estimators of growth rate for compound Poisson and Lévy processes Wojciech Niemiro Wydział Matematyki, Informatyki i Mechaniki, Uniwersytet Warszawski, Wydział Matematyki i Informatyki, Uniwersytet Mikołaja Kopernika, Toruń We consider compound Poisson processes or, more generally, Lévy processes X(t) with positive bounded jumps. The problem is to estimate the growth rate µ = EX(t)/t with fixed relative precision, i.e. to construct an estimator ˆµ such that Pr( ˆµ µ < µε) 1 α, for a given precision parameter ε and confidence parameter α, given a trajectory X(t) for 0 t T. Such an estimator must be sequential, i.e. the length T of the observed trajectory must be random and chosen adaptively. Assume that the upper bound on jumps is known (w.l.o.g. equal to 1). We consider the estimator ˆµ b = b/t b, where T b = min{t : X(t) b}, with a suitably chosen b = b(ε, α). We show that this estimator is nearly worst case optimal in a certain asymptotic sense, for ε 0 and α 0. The worst case turns out to be the process with jumps 1, i.e. the Poisson process with intensity µ. [1] P. Dagum, R. Karp, M. Luby, S. Ross (2000), An optimal algorithm for Monte Carlo estimation, SIAM J. Comput., 29(5):1484 1496 [2] L. Gajek, W. Niemiro, P. Pokarowski (2013), Optimal Monte Carlo integration with fixed relative precision, J. Complexity, 29(1):4 26 40
Distance-based models for imperfect ranking in ranked set sampling Nikolay I. Nikolov Institute of Mathematics and Informatics, Bulgarian Academy of Sciences In this work, we consider some statistical measures of deviation from the perfect ranking in the framework of ranked set sampling (RSS). We use nonparametric approach for testing the null hypothesis for perfect ranking. The Distance-based Mallows models with appropriate distance on permutations are suggested in the case of imperfect ranking. Some asymptotic results for the corresponding error probability matrix are derived for the models based on Spearman s footrule, Spearman s rho and Lee distance. We propose an EM algorithm for estimating the unknown parameter in the Mallows models in order to compare the power of the presented test statistics. As an application of the proposed models for imperfect ranking in n-cycle RSS, we consider an illustrative example. This is a joint work with Eugenia Stoimenova (Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia, Bulgaria) 41
Scale Fisher consistency of partial likelihood estimator for regression parameters in the Cox model with arbitrary frailty Tadeusz Bednarski, Piotr Nowak Institute of Economic Sciences, University of Wroclaw, Faculty of Law, Administration and Economics, ul. Uniwersytecka 22/26, 50-145 Wroclaw, Poland. Properties of the partial likelihood estimator for the Cox regression model are studied under frailty. It is shown that the estimator is scale Fisher consistent under practically arbitrary frailty and a large class of covariates distributions. A simulation experiment indicates its good asymptotic behavior when explanatory variables are linear transforms of independent and symmetric random variables abut it also indicates its high bias under some nonsymmetric distributions of strongly correlated explanatory variables. [1] P. Ruud (1983), Sufficient conditions for the consistency of maximum likelihood estimation despite misspecification of distribution in multinomial discrete choice models, Econometrica 51(1), 225 228. [2] T. Stoker (1986), Consistent estimation of scaled coefficients, Econometrica 54(6), 1461 1481. 42
Estimation of parameters of mechanical suspension models Karol Opara Systems Research Institute, Polish Academy of Sciences Analysis of vibrations within a vehicle allows for the evaluation of road evenness and ride comfort. Transition between vibrations recorded by the sensors and road unevenness profile depends on the vehicle s suspension which acts as a mechanical filter. Classical suspension models consist of masses connected by springs and dampers. Robust estimation of their parameters requires the use of domain knowledge to appropriately constrain the search space. The choice of loss function is unobvious due to the various ways of quantifying the similarities between two spatial series. Apart from the differences in the estimated and actual profile height there are also localization inaccuracies (errors in variables). The study details the model identification issues aiming at quantifying the road pavement condition with use of smartphones as vibration and localization sensors. [1] M. W. Sayers, S. M. Karamihas (1998), The little book of profiling, University of Michigan 705 43
Characteristics and properties of k-th record values from normal distribution Piotr Pawlas, Dominik Szynal Uniwersytet Marii-Curie Skłodowskiej w Lublinie Balakrishnan and Chan (1998) studied properties of the records values based on a sample from normal distribution. A relation satisfied by the second single moment and product moment of successive record values from normal distribution was presented. They discussed also the BLUE estimators and prediction of future record values from normal distribution. A test of spuriosity of a current record values was proposed. The similar problems for the k-th record values from normal distribution were studied by Chacko and Mary (2013). We generalize the results presented in the above papers using of the higher order moments. The main result contains a new characterization of normal distribution. [1] N. Balakrishnan and P. S. Chan. (1998), On the normal records values and associated inference., Statist. Probab. Lett. 39, 73-80 [2] M. Chacko and M. Shy Mary. (2013), Estimation and prediction based on k-th record values from normal distribution., Statistica, anno LXXIII. 4, 505-515 44