Nonconvex Variance Reduced Optimization with Arbitrary Sampling

Podobne dokumenty
Hard-Margin Support Vector Machines

Machine Learning for Data Science (CS4786) Lecture11. Random Projections & Canonical Correlation Analysis

Linear Classification and Logistic Regression. Pascal Fua IC-CVLab

Gradient Coding using the Stochastic Block Model

Machine Learning for Data Science (CS4786) Lecture 11. Spectral Embedding + Clustering

Supervised Hierarchical Clustering with Exponential Linkage. Nishant Yadav

TTIC 31210: Advanced Natural Language Processing. Kevin Gimpel Spring Lecture 9: Inference in Structured Prediction

Mixed-integer Convex Representability

Revenue Maximization. Sept. 25, 2018

tum.de/fall2018/ in2357

Label-Noise Robust Generative Adversarial Networks

Machine Learning for Data Science (CS4786) Lecture 24. Differential Privacy and Re-useable Holdout

TTIC 31210: Advanced Natural Language Processing. Kevin Gimpel Spring Lecture 8: Structured PredicCon 2

DM-ML, DM-FL. Auxiliary Equipment and Accessories. Damper Drives. Dimensions. Descritpion

Analysis of Movie Profitability STAT 469 IN CLASS ANALYSIS #2

Agnostic Learning and VC dimension

Steeple #3: Gödel s Silver Blaze Theorem. Selmer Bringsjord Are Humans Rational? Dec RPI Troy NY USA

The Lorenz System and Chaos in Nonlinear DEs

Few-fermion thermometry

deep learning for NLP (5 lectures)

Wprowadzenie do programu RapidMiner, część 2 Michał Bereta 1. Wykorzystanie wykresu ROC do porównania modeli klasyfikatorów

MoA-Net: Self-supervised Motion Segmentation. Pia Bideau, Rakesh R Menon, Erik Learned-Miller

Rozpoznawanie twarzy metodą PCA Michał Bereta 1. Testowanie statystycznej istotności różnic między jakością klasyfikatorów

General Certificate of Education Ordinary Level ADDITIONAL MATHEMATICS 4037/12

Proposal of thesis topic for mgr in. (MSE) programme in Telecommunications and Computer Science

Previously on CSCI 4622

Helena Boguta, klasa 8W, rok szkolny 2018/2019

Logistic Regression. Machine Learning CS5824/ECE5424 Bert Huang Virginia Tech

Towards Stability Analysis of Data Transport Mechanisms: a Fluid Model and an Application

Weronika Mysliwiec, klasa 8W, rok szkolny 2018/2019

Home Software Hardware Benchmarks Services Store Support Forums About Us

Primal Formulation. Find u h V h such that. A h (u h, v h )= fv h dx v h V h, where. u h v h dx. A h (u h, v h ) = DGFEM Primal Formulation.

New Roads to Cryptopia. Amit Sahai. An NSF Frontier Center

Maximum A Posteriori Chris Piech CS109, Stanford University

Counting Rules. Counting operations on n objects. Sort, order matters (perms) Choose k (combinations) Put in r buckets. None Distinct.

Department of Electrical- and Information Technology. Dealing with stochastic processes

Machine Learning for Data Science (CS4786) Lecture 8. Kernel PCA & Isomap + TSNE

Realizacja systemów wbudowanych (embeded systems) w strukturach PSoC (Programmable System on Chip)

Latent Dirichlet Allocation Models and their Evaluation IT for Practice 2016

Compressing the information contained in the different indexes is crucial for performance when implementing an IR system

RESONANCE OF TORSIONAL VIBRATION OF SHAFTS COUPLED BY MECHANISMS

OPTYMALIZACJA LICZBY WARSTW DLA ALOKACJI NEYMANA


Zakopane, plan miasta: Skala ok. 1: = City map (Polish Edition)

Instrukcja obsługi User s manual

Zmiany techniczne wprowadzone w wersji Comarch ERP Altum

OpenPoland.net API Documentation

TTIC 31190: Natural Language Processing

EXAMPLES OF CABRI GEOMETRE II APPLICATION IN GEOMETRIC SCIENTIFIC RESEARCH

Instrukcja konfiguracji usługi Wirtualnej Sieci Prywatnej w systemie Mac OSX

Cracow University of Economics Poland. Overview. Sources of Real GDP per Capita Growth: Polish Regional-Macroeconomic Dimensions

OBWIESZCZENIE MINISTRA INFRASTRUKTURY. z dnia 18 kwietnia 2005 r.

Stability of Tikhonov Regularization Class 07, March 2003 Alex Rakhlin

STATISTICAL METHODS IN BIOLOGY

Lecture 18 Review for Exam 1

Presented by. Dr. Morten Middelfart, CTO

Zarządzanie sieciami telekomunikacyjnymi

Oxford PWN Polish English Dictionary (Wielki Slownik Polsko-angielski)

Cracow University of Economics Poland


EN 71. (389 x 410 x h272cm) (389 x 410 x h272cm) fungoo.eu FLEPPI

Camspot 4.4 Camspot 4.5

Rachunek lambda, zima

PRZYKŁAD ZASTOSOWANIA DOKŁADNEGO NIEPARAMETRYCZNEGO PRZEDZIAŁU UFNOŚCI DLA VaR. Wojciech Zieliński

Neural Networks (The Machine-Learning Kind) BCS 247 March 2019

USB firmware changing guide. Zmiana oprogramowania za przy użyciu połączenia USB. Changelog / Lista Zmian

Convolution semigroups with linear Jacobi parameters

DUAL SIMILARITY OF VOLTAGE TO CURRENT AND CURRENT TO VOLTAGE TRANSFER FUNCTION OF HYBRID ACTIVE TWO- PORTS WITH CONVERSION

PassMark - CPU Benchmarks - List of Benchmarked CPUs

EGARA Adam Małyszko FORS. POLAND - KRAKÓW r

Wydział Fizyki, Astronomii i Informatyki Stosowanej Uniwersytet Mikołaja Kopernika w Toruniu

ZGŁOSZENIE WSPÓLNEGO POLSKO -. PROJEKTU NA LATA: APPLICATION FOR A JOINT POLISH -... PROJECT FOR THE YEARS:.

P R A C A D Y P L O M O W A

A sufficient condition of regularity for axially symmetric solutions to the Navier-Stokes equations

No matter how much you have, it matters how much you need

Wykorzystanie możliwości serwerów Online Judge w przygotowaniu drużyny oraz w organizacji zawodów w programowaniu

Knovel Math: Jakość produktu

photo graphic Jan Witkowski Project for exhibition compositions typography colors : : janwi@janwi.com

HAPPY ANIMALS L01 HAPPY ANIMALS L03 HAPPY ANIMALS L05 HAPPY ANIMALS L07

HAPPY ANIMALS L02 HAPPY ANIMALS L04 HAPPY ANIMALS L06 HAPPY ANIMALS L08

Counting quadrant walks via Tutte s invariant method

Fig 5 Spectrograms of the original signal (top) extracted shaft-related GAD components (middle) and

Bayesian graph convolutional neural networks

Strangeness in nuclei and neutron stars: many-body forces and the hyperon puzzle

DO MONTAŻU POTRZEBNE SĄ DWIE OSOBY! INSTALLATION REQUIRES TWO PEOPLE!

CS 6170: Computational Topology, Spring 2019 Lecture 09

Employment. Number of employees employed on a contract of employment by gender in Company

Modulacja i kodowanie. Labolatorium. Kodowanie źródłowe Kod Huffman a

Wykaz linii kolejowych, które są wyposażone w urządzenia systemu ETCS

DODATKOWE ĆWICZENIA EGZAMINACYJNE

Najlepsze drukarki 3D

GRY EDUKACYJNE I ICH MOŻLIWOŚCI DZIĘKI INTERNETOWI DZIŚ I JUTRO. Internet Rzeczy w wyobraźni gracza komputerowego

Home Software Hardware Benchmarks Services Store Support Forums About Us

Wykaz linii kolejowych, które są wyposażone w urzadzenia systemu ETCS

Stargard Szczecinski i okolice (Polish Edition)

y = The Chain Rule Show all work. No calculator unless otherwise stated. If asked to Explain your answer, write in complete sentences.

Maszyny wektorów podpierajacych w regresji rangowej

DO MONTAŻU POTRZEBNE SĄ DWIE OSOBY! INSTALLATION REQUIRES TWO PEOPLE!

HAPPY K04 INSTRUKCJA MONTAŻU ASSEMBLY INSTRUCTIONS DO MONTAŻU POTRZEBNE SĄ DWIE OSOBY! INSTALLATION REQUIRES TWO PEOPLE! W5 W6 G1 T2 U1 U2 TZ1

reinforcement learning through the optimization lens Benjamin Recht University of California, Berkeley

Transkrypt:

Nonconvex Variance Reduced Optimization with Arbitrary Sampling Samuel Horváth Peter Richtárik

Empirical Risk Minimization nx min x2r d f(x) := 1 n i=1 f i (x)

Empirical Risk Minimization nx min x2r d f(x) := 1 n n is big i=1 f i (x)

Empirical Risk Minimization nx min x2r d f(x) := 1 n i=1 f i (x) n is big non-convex, L i -smooth krf i (x) rf i (y)k applel i kx yk

Baseline Variance Reduced SGD Methods SVRG Johnson & Zhang NIPS 2013 SAGA Defazio, Bach & Lacoste-Julien NIPS 2014 SARAH Nguyen, Liu, Scheinberg & Takáč ICML 2017

Baseline Variance Reduced SGD Methods SVRG Johnson & Zhang NIPS 2013 Uniform sampling SAGA Defazio, Bach & Lacoste-Julien NIPS 2014 SARAH Nguyen, Liu, Scheinberg & Takáč ICML 2017 Uniform sampling Uniform sampling

Baseline Variance Reduced SGD Methods Mini-batch SVRG Konečný & Richtárik FAMS 2017 SAGA Reddi, Hefny, Sra, Poczos, Smola CDC 2016 SARAH Nguyen, Liu, Scheinberg & Takáč 2017

Baseline Variance Reduced SGD Methods Mini-batch Mini-batch size SVRG Konečný & Richtárik FAMS 2017 SAGA Reddi, Hefny, Sra, Poczos, Smola CDC 2016 SARAH Nguyen, Liu, Scheinberg & Takáč 2017 Uniform sampling Uniform sampling Uniform sampling

Contributions Analysis of SVRG, SAGA and SARAH in the arbitrary sampling paradigm Construction of optimal minibatch sampling

Contributions Richtárik & Takáč (OL 2016; arxiv 2013) Qu, Richtárik & Zhang (NIPS 2015) Qu & Richtárik (COAP 2016) Chambolle, Ehrhardt, Richtárik & Schoenlieb (SIOPT 2018) Hanzely & Richtárik (AISTATS 2019) Qian, Qu & Richtárik (ICML 2019) Gower, Loizou, Qian, Sailanbayev, Shulgin & Richtárik (ICML 2019) Analysis of SVRG, SAGA and SARAH in the arbitrary sampling paradigm Construction of optimal minibatch sampling

Contributions Richtárik & Takáč (OL 2016; arxiv 2013) Qu, Richtárik & Zhang (NIPS 2015) Qu & Richtárik (COAP 2016) Chambolle, Ehrhardt, Richtárik & Schoenlieb (SIOPT 2018) Hanzely & Richtárik (AISTATS 2019) Qian, Qu & Richtárik (ICML 2019) Gower, Loizou, Qian, Sailanbayev, Shulgin & Richtárik (ICML 2019) Analysis of SVRG, SAGA and SARAH in the arbitrary sampling paradigm Construction of optimal minibatch sampling First optimal/importance sampling for minibatches!

S S Data Sampling (i.e., Mini-batching) Mechanisms Sampling: a random subset of {1, 2,, n} Probability matrix P 2 R n n associated with sampling P ij := Prob({i, j} S) Probability vector p 2 R n associated with sampling p i := Prob({i} S) =P ii Proper sampling: p i > 0 for all i =1, 2,...,n

S S Data Sampling (i.e., Mini-batching) Mechanisms Sampling: a random subset of {1, 2,, n} A sampling is uniquely defined by assigning probabilities to all 2 n subsets of {1, 2,..., n} Probability matrix P 2 R n n associated with sampling P ij := Prob({i, j} S) Probability vector p 2 R n associated with sampling p i := Prob({i} S) =P ii Proper sampling: p i > 0 for all i =1, 2,...,n

S S <latexit sha1_base64="wyrux/pk9vh5a+mivjphdxvu0v4=">aaack3icbvc7tsmwfhv4lviqmljynegmvzv0gavsvrzgebqqnvhlue5r4dirfqokov4pc7/caampsfifukudrzmdnxop7r0nsgu34hmvztz8wulscmmlvlq2vrfz2dq+ncrtlhwoekp3i2ky4jj1ging3vqzkkscxuxxxxp/6ozpw5w8gdxlyukgksecerbsv9j2z3etbwupxi6+5tdcqvyribjgkgm3idwhht8uplvjpteraru86dcatwcgwnsk269uvbo3bf5l/bmpohlo+5vhg6vzwirqqyzp+v4kyue0ccryubxkhqwexpmh61kqscjmwex/hen9qwymp8rkap6q3xmfsyzjk8hojgrg5rc3ef/zehner2hbzzobk/rruzwjdappismdrhkfkvtcqob2vkxhxnydtt6ylch//fjfctmo+17dp2tuw+1zhsw0i/bqaflriwqhe3skooiio/santglc+88ow/o+9fondpl7kafcd4+ath6ptm=</latexit> <latexit sha1_base64="wyrux/pk9vh5a+mivjphdxvu0v4=">aaack3icbvc7tsmwfhv4lviqmljynegmvzv0gavsvrzgebqqnvhlue5r4dirfqokov4pc7/caampsfifukudrzmdnxop7r0nsgu34hmvztz8wulscmmlvlq2vrfz2dq+ncrtlhwoekp3i2ky4jj1ging3vqzkkscxuxxxxp/6ozpw5w8gdxlyukgksecerbsv9j2z3etbwupxi6+5tdcqvyribjgkgm3idwhht8uplvjpteraru86dcatwcgwnsk269uvbo3bf5l/bmpohlo+5vhg6vzwirqqyzp+v4kyue0ccryubxkhqwexpmh61kqscjmwex/hen9qwymp8rkap6q3xmfsyzjk8hojgrg5rc3ef/zehner2hbzzobk/rruzwjdappismdrhkfkvtcqob2vkxhxnydtt6ylch//fjfctmo+17dp2tuw+1zhsw0i/bqaflriwqhe3skooiio/santglc+88ow/o+9fondpl7kafcd4+ath6ptm=</latexit> <latexit sha1_base64="wyrux/pk9vh5a+mivjphdxvu0v4=">aaack3icbvc7tsmwfhv4lviqmljynegmvzv0gavsvrzgebqqnvhlue5r4dirfqokov4pc7/caampsfifukudrzmdnxop7r0nsgu34hmvztz8wulscmmlvlq2vrfz2dq+ncrtlhwoekp3i2ky4jj1ging3vqzkkscxuxxxxp/6ozpw5w8gdxlyukgksecerbsv9j2z3etbwupxi6+5tdcqvyribjgkgm3idwhht8uplvjpteraru86dcatwcgwnsk269uvbo3bf5l/bmpohlo+5vhg6vzwirqqyzp+v4kyue0ccryubxkhqwexpmh61kqscjmwex/hen9qwymp8rkap6q3xmfsyzjk8hojgrg5rc3ef/zehner2hbzzobk/rruzwjdappismdrhkfkvtcqob2vkxhxnydtt6ylch//fjfctmo+17dp2tuw+1zhsw0i/bqaflriwqhe3skooiio/santglc+88ow/o+9fondpl7kafcd4+ath6ptm=</latexit> <latexit sha1_base64="wyrux/pk9vh5a+mivjphdxvu0v4=">aaack3icbvc7tsmwfhv4lviqmljynegmvzv0gavsvrzgebqqnvhlue5r4dirfqokov4pc7/caampsfifukudrzmdnxop7r0nsgu34hmvztz8wulscmmlvlq2vrfz2dq+ncrtlhwoekp3i2ky4jj1ging3vqzkkscxuxxxxp/6ozpw5w8gdxlyukgksecerbsv9j2z3etbwupxi6+5tdcqvyribjgkgm3idwhht8uplvjpteraru86dcatwcgwnsk269uvbo3bf5l/bmpohlo+5vhg6vzwirqqyzp+v4kyue0ccryubxkhqwexpmh61kqscjmwex/hen9qwymp8rkap6q3xmfsyzjk8hojgrg5rc3ef/zehner2hbzzobk/rruzwjdappismdrhkfkvtcqob2vkxhxnydtt6ylch//fjfctmo+17dp2tuw+1zhsw0i/bqaflriwqhe3skooiio/santglc+88ow/o+9fondpl7kafcd4+ath6ptm=</latexit> Data Sampling (i.e., Mini-batching) Mechanisms Sampling: a random subset of {1, 2,, n} A sampling is uniquely defined by assigning probabilities to all 2 n subsets of {1, 2,..., n} Probability matrix Probability vector Proper sampling: P 2 R n n associated with sampling P ij := Prob({i, j} S) p 2 R n associated with sampling p i := Prob({i} S) =P ii p i > 0 for all i =1, 2,...,n Examples Standard sampling: S = {i} with probability 1 n for all i =1, 2,...,n Standard mini-batch sampling: S = C with probability 1 ( n b) for all C {1, 2,...,n} such that C = b

S S <latexit sha1_base64="wyrux/pk9vh5a+mivjphdxvu0v4=">aaack3icbvc7tsmwfhv4lviqmljynegmvzv0gavsvrzgebqqnvhlue5r4dirfqokov4pc7/caampsfifukudrzmdnxop7r0nsgu34hmvztz8wulscmmlvlq2vrfz2dq+ncrtlhwoekp3i2ky4jj1ging3vqzkkscxuxxxxp/6ozpw5w8gdxlyukgksecerbsv9j2z3etbwupxi6+5tdcqvyribjgkgm3idwhht8uplvjpteraru86dcatwcgwnsk269uvbo3bf5l/bmpohlo+5vhg6vzwirqqyzp+v4kyue0ccryubxkhqwexpmh61kqscjmwex/hen9qwymp8rkap6q3xmfsyzjk8hojgrg5rc3ef/zehner2hbzzobk/rruzwjdappismdrhkfkvtcqob2vkxhxnydtt6ylch//fjfctmo+17dp2tuw+1zhsw0i/bqaflriwqhe3skooiio/santglc+88ow/o+9fondpl7kafcd4+ath6ptm=</latexit> <latexit sha1_base64="wyrux/pk9vh5a+mivjphdxvu0v4=">aaack3icbvc7tsmwfhv4lviqmljynegmvzv0gavsvrzgebqqnvhlue5r4dirfqokov4pc7/caampsfifukudrzmdnxop7r0nsgu34hmvztz8wulscmmlvlq2vrfz2dq+ncrtlhwoekp3i2ky4jj1ging3vqzkkscxuxxxxp/6ozpw5w8gdxlyukgksecerbsv9j2z3etbwupxi6+5tdcqvyribjgkgm3idwhht8uplvjpteraru86dcatwcgwnsk269uvbo3bf5l/bmpohlo+5vhg6vzwirqqyzp+v4kyue0ccryubxkhqwexpmh61kqscjmwex/hen9qwymp8rkap6q3xmfsyzjk8hojgrg5rc3ef/zehner2hbzzobk/rruzwjdappismdrhkfkvtcqob2vkxhxnydtt6ylch//fjfctmo+17dp2tuw+1zhsw0i/bqaflriwqhe3skooiio/santglc+88ow/o+9fondpl7kafcd4+ath6ptm=</latexit> <latexit sha1_base64="wyrux/pk9vh5a+mivjphdxvu0v4=">aaack3icbvc7tsmwfhv4lviqmljynegmvzv0gavsvrzgebqqnvhlue5r4dirfqokov4pc7/caampsfifukudrzmdnxop7r0nsgu34hmvztz8wulscmmlvlq2vrfz2dq+ncrtlhwoekp3i2ky4jj1ging3vqzkkscxuxxxxp/6ozpw5w8gdxlyukgksecerbsv9j2z3etbwupxi6+5tdcqvyribjgkgm3idwhht8uplvjpteraru86dcatwcgwnsk269uvbo3bf5l/bmpohlo+5vhg6vzwirqqyzp+v4kyue0ccryubxkhqwexpmh61kqscjmwex/hen9qwymp8rkap6q3xmfsyzjk8hojgrg5rc3ef/zehner2hbzzobk/rruzwjdappismdrhkfkvtcqob2vkxhxnydtt6ylch//fjfctmo+17dp2tuw+1zhsw0i/bqaflriwqhe3skooiio/santglc+88ow/o+9fondpl7kafcd4+ath6ptm=</latexit> <latexit sha1_base64="wyrux/pk9vh5a+mivjphdxvu0v4=">aaack3icbvc7tsmwfhv4lviqmljynegmvzv0gavsvrzgebqqnvhlue5r4dirfqokov4pc7/caampsfifukudrzmdnxop7r0nsgu34hmvztz8wulscmmlvlq2vrfz2dq+ncrtlhwoekp3i2ky4jj1ging3vqzkkscxuxxxxp/6ozpw5w8gdxlyukgksecerbsv9j2z3etbwupxi6+5tdcqvyribjgkgm3idwhht8uplvjpteraru86dcatwcgwnsk269uvbo3bf5l/bmpohlo+5vhg6vzwirqqyzp+v4kyue0ccryubxkhqwexpmh61kqscjmwex/hen9qwymp8rkap6q3xmfsyzjk8hojgrg5rc3ef/zehner2hbzzobk/rruzwjdappismdrhkfkvtcqob2vkxhxnydtt6ylch//fjfctmo+17dp2tuw+1zhsw0i/bqaflriwqhe3skooiio/santglc+88ow/o+9fondpl7kafcd4+ath6ptm=</latexit> Data Sampling (i.e., Mini-batching) Mechanisms Sampling: a random subset of {1, 2,, n} A sampling is uniquely defined by assigning probabilities to all 2 n subsets of {1, 2,..., n} Probability matrix Probability vector Proper sampling: P 2 R n n associated with sampling P ij := Prob({i, j} S) p 2 R n associated with sampling p i := Prob({i} S) =P ii p i > 0 for all i =1, 2,...,n Examples Standard sampling: S = {i} with probability 1 n for all i =1, 2,...,n Standard mini-batch sampling: S = C with probability 1 ( n b) for all C {1, 2,...,n} such that C = b Arbitrary sampling paradigm = perform iteration complexity analysis for any proper sampling

<latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> <latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> <latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> <latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> From Standard Sampling to Arbitrary Sampling SVRG with Arbitrary Sampling x + X = x i2s 1 (rf i (x) np i rf i (ˆx)) + rf(ˆx)!

<latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> <latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> <latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> <latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> From Standard Sampling to Arbitrary Sampling SVRG with Arbitrary Sampling x + = x X i2s 1 Unbiased estimator of the gradient np i (rf i (x) rf i (ˆx)) + rf(ˆx)!

<latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> <latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> <latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> <latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> <latexit sha1_base64="lmndbt30co6xq28rzf/6r/4u7hs=">aaacchicbvc7tsmwfhv4lvikmdjg0saxvukxwjaqwbilrb9sg0wo67rwhtuyhaqqysjcr7awgbarn8dg3+cmgadltefn3kt7zwktrpv23w9rzxvtfwozslxd3tnd27cpdjtkpbktnhzmyf6ifgguk7ammpfeigmkq0a64erm5ncfifru8hs9tygfoxgnecvigymwt5wkopakdikjcoblgc8dgakjewpqou5g19y6wwaue68knvcifdhfg6haauy4xgwp1ffcrpszkppirvlqifukqxicrqrvkecxux5wbmnhmvggxflica0l9fdghmklpnfojmokx2rrm4n/ef1ur5d+rnmsaslx/fcumqgfnluch1qsrnnueiqlnb9cpeamem26q5osvmxiy6ttqhtu3btr1jrxzr0vcaxowtnwwavoglvqam2awsn4bq/gzxqyxqx362m+umkvo0fgd6zph+iul/q=</latexit> <latexit sha1_base64="lmndbt30co6xq28rzf/6r/4u7hs=">aaacchicbvc7tsmwfhv4lvikmdjg0saxvukxwjaqwbilrb9sg0wo67rwhtuyhaqqysjcr7awgbarn8dg3+cmgadltefn3kt7zwktrpv23w9rzxvtfwozslxd3tnd27cpdjtkpbktnhzmyf6ifgguk7ammpfeigmkq0a64erm5ncfifru8hs9tygfoxgnecvigymwt5wkopakdikjcoblgc8dgakjewpqou5g19y6wwaue68knvcifdhfg6haauy4xgwp1ffcrpszkppirvlqifukqxicrqrvkecxux5wbmnhmvggxflica0l9fdghmklpnfojmokx2rrm4n/ef1ur5d+rnmsaslx/fcumqgfnluch1qsrnnueiqlnb9cpeamem26q5osvmxiy6ttqhtu3btr1jrxzr0vcaxowtnwwavoglvqam2awsn4bq/gzxqyxqx362m+umkvo0fgd6zph+iul/q=</latexit> <latexit sha1_base64="lmndbt30co6xq28rzf/6r/4u7hs=">aaacchicbvc7tsmwfhv4lvikmdjg0saxvukxwjaqwbilrb9sg0wo67rwhtuyhaqqysjcr7awgbarn8dg3+cmgadltefn3kt7zwktrpv23w9rzxvtfwozslxd3tnd27cpdjtkpbktnhzmyf6ifgguk7ammpfeigmkq0a64erm5ncfifru8hs9tygfoxgnecvigymwt5wkopakdikjcoblgc8dgakjewpqou5g19y6wwaue68knvcifdhfg6haauy4xgwp1ffcrpszkppirvlqifukqxicrqrvkecxux5wbmnhmvggxflica0l9fdghmklpnfojmokx2rrm4n/ef1ur5d+rnmsaslx/fcumqgfnluch1qsrnnueiqlnb9cpeamem26q5osvmxiy6ttqhtu3btr1jrxzr0vcaxowtnwwavoglvqam2awsn4bq/gzxqyxqx362m+umkvo0fgd6zph+iul/q=</latexit> <latexit sha1_base64="lmndbt30co6xq28rzf/6r/4u7hs=">aaacchicbvc7tsmwfhv4lvikmdjg0saxvukxwjaqwbilrb9sg0wo67rwhtuyhaqqysjcr7awgbarn8dg3+cmgadltefn3kt7zwktrpv23w9rzxvtfwozslxd3tnd27cpdjtkpbktnhzmyf6ifgguk7ammpfeigmkq0a64erm5ncfifru8hs9tygfoxgnecvigymwt5wkopakdikjcoblgc8dgakjewpqou5g19y6wwaue68knvcifdhfg6haauy4xgwp1ffcrpszkppirvlqifukqxicrqrvkecxux5wbmnhmvggxflica0l9fdghmklpnfojmokx2rrm4n/ef1ur5d+rnmsaslx/fcumqgfnluch1qsrnnueiqlnb9cpeamem26q5osvmxiy6ttqhtu3btr1jrxzr0vcaxowtnwwavoglvqam2awsn4bq/gzxqyxqx362m+umkvo0fgd6zph+iul/q=</latexit> From Standard Sampling to Arbitrary Sampling SVRG with Arbitrary Sampling x + = x X i2s 1 Unbiased estimator of the gradient np i (rf i (x) rf i (ˆx)) + rf(ˆx) Standard sampling:! Arbitrary sampling p i = 1 n for all i

Convergence Rate I n + n2/3 O

<latexit sha1_base64="ti37wostf5wqbv622wqejirgk/m=">aaacenicbvc7tgjbfj3ff+iltbszccbqkf0alyngxbitesqsktnhlkyynv1nzo1k4rts/bubc42xtblzbxwehyinucnjoffm3nu8idolbfvbsq2srq1vpdczw9s7u3vz/yo6cmnjouzdhsqmrxrwjqcmmebqjcsqwopq8ayxe79xd1kxunzoyqttgpqe8xkl2kidbdgfudlal2psjlxbpe6wx3gouqpbmny53gexisv4kpkdbm4u2vpgzelmsq7nue1kv9xusomahkacknvy7ei3eyi1oxzggtdwebe6id1ogspiakqdtf8a4xojdlefslnc46n6eyihgvldwdodadf9tehnxp+8vqz9s3bcrbrrehs2yi851ige5io7talvfggiozkzwzhte0monilmtajo4svlpf4uoxbjus7nkufzonlocb2janlqkaqgk1rfnutri3pgr+jnerjerhfry9aasuyzh+gprm8f8d2cya==</latexit> <latexit sha1_base64="ti37wostf5wqbv622wqejirgk/m=">aaacenicbvc7tgjbfj3ff+iltbszccbqkf0alyngxbitesqsktnhlkyynv1nzo1k4rts/bubc42xtblzbxwehyinucnjoffm3nu8idolbfvbsq2srq1vpdczw9s7u3vz/yo6cmnjouzdhsqmrxrwjqcmmebqjcsqwopq8ayxe79xd1kxunzoyqttgpqe8xkl2kidbdgfudlal2psjlxbpe6wx3gouqpbmny53gexisv4kpkdbm4u2vpgzelmsq7nue1kv9xusomahkacknvy7ei3eyi1oxzggtdwebe6id1ogspiakqdtf8a4xojdlefslnc46n6eyihgvldwdodadf9tehnxp+8vqz9s3bcrbrrehs2yi851ige5io7talvfggiozkzwzhte0monilmtajo4svlpf4uoxbjus7nkufzonlocb2janlqkaqgk1rfnutri3pgr+jnerjerhfry9aasuyzh+gprm8f8d2cya==</latexit> <latexit sha1_base64="ti37wostf5wqbv622wqejirgk/m=">aaacenicbvc7tgjbfj3ff+iltbszccbqkf0alyngxbitesqsktnhlkyynv1nzo1k4rts/bubc42xtblzbxwehyinucnjoffm3nu8idolbfvbsq2srq1vpdczw9s7u3vz/yo6cmnjouzdhsqmrxrwjqcmmebqjcsqwopq8ayxe79xd1kxunzoyqttgpqe8xkl2kidbdgfudlal2psjlxbpe6wx3gouqpbmny53gexisv4kpkdbm4u2vpgzelmsq7nue1kv9xusomahkacknvy7ei3eyi1oxzggtdwebe6id1ogspiakqdtf8a4xojdlefslnc46n6eyihgvldwdodadf9tehnxp+8vqz9s3bcrbrrehs2yi851ige5io7talvfggiozkzwzhte0monilmtajo4svlpf4uoxbjus7nkufzonlocb2janlqkaqgk1rfnutri3pgr+jnerjerhfry9aasuyzh+gprm8f8d2cya==</latexit> <latexit sha1_base64="ti37wostf5wqbv622wqejirgk/m=">aaacenicbvc7tgjbfj3ff+iltbszccbqkf0alyngxbitesqsktnhlkyynv1nzo1k4rts/bubc42xtblzbxwehyinucnjoffm3nu8idolbfvbsq2srq1vpdczw9s7u3vz/yo6cmnjouzdhsqmrxrwjqcmmebqjcsqwopq8ayxe79xd1kxunzoyqttgpqe8xkl2kidbdgfudlal2psjlxbpe6wx3gouqpbmny53gexisv4kpkdbm4u2vpgzelmsq7nue1kv9xusomahkacknvy7ei3eyi1oxzggtdwebe6id1ogspiakqdtf8a4xojdlefslnc46n6eyihgvldwdodadf9tehnxp+8vqz9s3bcrbrrehs2yi851ige5io7talvfggiozkzwzhte0monilmtajo4svlpf4uoxbjus7nkufzonlocb2janlqkaqgk1rfnutri3pgr+jnerjerhfry9aasuyzh+gprm8f8d2cya==</latexit> Convergence Rate I # data points n + n2/3 O Ekrf(x)k 2 apple

<latexit sha1_base64="ti37wostf5wqbv622wqejirgk/m=">aaacenicbvc7tgjbfj3ff+iltbszccbqkf0alyngxbitesqsktnhlkyynv1nzo1k4rts/bubc42xtblzbxwehyinucnjoffm3nu8idolbfvbsq2srq1vpdczw9s7u3vz/yo6cmnjouzdhsqmrxrwjqcmmebqjcsqwopq8ayxe79xd1kxunzoyqttgpqe8xkl2kidbdgfudlal2psjlxbpe6wx3gouqpbmny53gexisv4kpkdbm4u2vpgzelmsq7nue1kv9xusomahkacknvy7ei3eyi1oxzggtdwebe6id1ogspiakqdtf8a4xojdlefslnc46n6eyihgvldwdodadf9tehnxp+8vqz9s3bcrbrrehs2yi851ige5io7talvfggiozkzwzhte0monilmtajo4svlpf4uoxbjus7nkufzonlocb2janlqkaqgk1rfnutri3pgr+jnerjerhfry9aasuyzh+gprm8f8d2cya==</latexit> <latexit sha1_base64="ti37wostf5wqbv622wqejirgk/m=">aaacenicbvc7tgjbfj3ff+iltbszccbqkf0alyngxbitesqsktnhlkyynv1nzo1k4rts/bubc42xtblzbxwehyinucnjoffm3nu8idolbfvbsq2srq1vpdczw9s7u3vz/yo6cmnjouzdhsqmrxrwjqcmmebqjcsqwopq8ayxe79xd1kxunzoyqttgpqe8xkl2kidbdgfudlal2psjlxbpe6wx3gouqpbmny53gexisv4kpkdbm4u2vpgzelmsq7nue1kv9xusomahkacknvy7ei3eyi1oxzggtdwebe6id1ogspiakqdtf8a4xojdlefslnc46n6eyihgvldwdodadf9tehnxp+8vqz9s3bcrbrrehs2yi851ige5io7talvfggiozkzwzhte0monilmtajo4svlpf4uoxbjus7nkufzonlocb2janlqkaqgk1rfnutri3pgr+jnerjerhfry9aasuyzh+gprm8f8d2cya==</latexit> <latexit sha1_base64="ti37wostf5wqbv622wqejirgk/m=">aaacenicbvc7tgjbfj3ff+iltbszccbqkf0alyngxbitesqsktnhlkyynv1nzo1k4rts/bubc42xtblzbxwehyinucnjoffm3nu8idolbfvbsq2srq1vpdczw9s7u3vz/yo6cmnjouzdhsqmrxrwjqcmmebqjcsqwopq8ayxe79xd1kxunzoyqttgpqe8xkl2kidbdgfudlal2psjlxbpe6wx3gouqpbmny53gexisv4kpkdbm4u2vpgzelmsq7nue1kv9xusomahkacknvy7ei3eyi1oxzggtdwebe6id1ogspiakqdtf8a4xojdlefslnc46n6eyihgvldwdodadf9tehnxp+8vqz9s3bcrbrrehs2yi851ige5io7talvfggiozkzwzhte0monilmtajo4svlpf4uoxbjus7nkufzonlocb2janlqkaqgk1rfnutri3pgr+jnerjerhfry9aasuyzh+gprm8f8d2cya==</latexit> <latexit sha1_base64="ti37wostf5wqbv622wqejirgk/m=">aaacenicbvc7tgjbfj3ff+iltbszccbqkf0alyngxbitesqsktnhlkyynv1nzo1k4rts/bubc42xtblzbxwehyinucnjoffm3nu8idolbfvbsq2srq1vpdczw9s7u3vz/yo6cmnjouzdhsqmrxrwjqcmmebqjcsqwopq8ayxe79xd1kxunzoyqttgpqe8xkl2kidbdgfudlal2psjlxbpe6wx3gouqpbmny53gexisv4kpkdbm4u2vpgzelmsq7nue1kv9xusomahkacknvy7ei3eyi1oxzggtdwebe6id1ogspiakqdtf8a4xojdlefslnc46n6eyihgvldwdodadf9tehnxp+8vqz9s3bcrbrrehs2yi851ige5io7talvfggiozkzwzhte0monilmtajo4svlpf4uoxbjus7nkufzonlocb2janlqkaqgk1rfnutri3pgr+jnerjerhfry9aasuyzh+gprm8f8d2cya==</latexit> Convergence Rate I # data points n + n2/3 O Ekrf(x)k 2 apple KEY QUANTITY: DEPENDS ON THE SAMPLING

Convergence Rate II := b Ln 2 nx i=1 v i L 2 i p i

<latexit sha1_base64="lz5g7p8kggfaagfrymokojesgpi=">aaacehicbvc7tsmwfhv4lvikmljyvaimku4cs6ukfoyoraipqqmr4zqtvcejbaepivijlpwkcwmistky8te4bqzoodkvjs65v/feeyscke0439bk6tr6xmzpq7y9s7u3bx8cdlscsklbjoax7avyuc4ebwumoe0lkuio4lqbjk+nfvebssvicacncfuipbqszarri/n2mdt3ayyzzl53q4ljhvjm5k5kiz9jdztfc9j0mev5dswpojpazyikugefwr795q5ikkzuamkxun3kjnrlsnsmcjqx3vtrbjmxhtk+oqjhvhnz7kecnhplamnymhiazttfexmoljpegemmsb6prw8q/uf1ux1eehktsaqpipnfycqhjue0hthgkhlnj4zgipm5fzirnrlok2hzhiawx14mnvovovv0w6s0roo4suaynibzgmafaiab0ajtqmajeaav4m16sl6sd+tj3rpiftnh4a+szx/jkz0x</latexit> <latexit sha1_base64="lz5g7p8kggfaagfrymokojesgpi=">aaacehicbvc7tsmwfhv4lvikmljyvaimku4cs6ukfoyoraipqqmr4zqtvcejbaepivijlpwkcwmistky8te4bqzoodkvjs65v/feeyscke0439bk6tr6xmzpq7y9s7u3bx8cdlscsklbjoax7avyuc4ebwumoe0lkuio4lqbjk+nfvebssvicacncfuipbqszarri/n2mdt3ayyzzl53q4ljhvjm5k5kiz9jdztfc9j0mev5dswpojpazyikugefwr795q5ikkzuamkxun3kjnrlsnsmcjqx3vtrbjmxhtk+oqjhvhnz7kecnhplamnymhiazttfexmoljpegemmsb6prw8q/uf1ux1eehktsaqpipnfycqhjue0hthgkhlnj4zgipm5fzirnrlok2hzhiawx14mnvovovv0w6s0roo4suaynibzgmafaiab0ajtqmajeaav4m16sl6sd+tj3rpiftnh4a+szx/jkz0x</latexit> <latexit sha1_base64="lz5g7p8kggfaagfrymokojesgpi=">aaacehicbvc7tsmwfhv4lvikmljyvaimku4cs6ukfoyoraipqqmr4zqtvcejbaepivijlpwkcwmistky8te4bqzoodkvjs65v/feeyscke0439bk6tr6xmzpq7y9s7u3bx8cdlscsklbjoax7avyuc4ebwumoe0lkuio4lqbjk+nfvebssvicacncfuipbqszarri/n2mdt3ayyzzl53q4ljhvjm5k5kiz9jdztfc9j0mev5dswpojpazyikugefwr795q5ikkzuamkxun3kjnrlsnsmcjqx3vtrbjmxhtk+oqjhvhnz7kecnhplamnymhiazttfexmoljpegemmsb6prw8q/uf1ux1eehktsaqpipnfycqhjue0hthgkhlnj4zgipm5fzirnrlok2hzhiawx14mnvovovv0w6s0roo4suaynibzgmafaiab0ajtqmajeaav4m16sl6sd+tj3rpiftnh4a+szx/jkz0x</latexit> <latexit sha1_base64="lz5g7p8kggfaagfrymokojesgpi=">aaacehicbvc7tsmwfhv4lvikmljyvaimku4cs6ukfoyoraipqqmr4zqtvcejbaepivijlpwkcwmistky8te4bqzoodkvjs65v/feeyscke0439bk6tr6xmzpq7y9s7u3bx8cdlscsklbjoax7avyuc4ebwumoe0lkuio4lqbjk+nfvebssvicacncfuipbqszarri/n2mdt3ayyzzl53q4ljhvjm5k5kiz9jdztfc9j0mev5dswpojpazyikugefwr795q5ikkzuamkxun3kjnrlsnsmcjqx3vtrbjmxhtk+oqjhvhnz7kecnhplamnymhiazttfexmoljpegemmsb6prw8q/uf1ux1eehktsaqpipnfycqhjue0hthgkhlnj4zgipm5fzirnrlok2hzhiawx14mnvovovv0w6s0roo4suaynibzgmafaiab0ajtqmajeaav4m16sl6sd+tj3rpiftnh4a+szx/jkz0x</latexit> Convergence Rate II P ij := Prob({i, j} S) Constants satisfying: P pp > Diag(p 1 v 1,p 2 v 2,...,p n v n ). krf i (x) rf i (y)k applel i kx yk Expected mini-batch size: b =E S <latexit sha1_base64="tpgusjk+wuyjr/gurkqvcvd3bus=">aaab+xicbvbntwixej3fl8svvy9egshee9nlohctojhxifgqbdakwwo0tn1n2yuhc//eiwen8eo/8ea/scaefhzjjc/vzwrmxhhzpo3nftu5tfwnza38dmfnd2//wd08augouytwscqj1qyxppxjwjfmcnqmfcui5pqphn7m/kcrvzpf8tgmyxoi3jesxwg2vuq4bim8qmlbcxq7rzohsanjfr2ynwdajx5gipch1ng/2t2ijijkqzjwuuv7sqlsrawjne4l7uttgjmh7towprilqon0fvkunvmli3qrsiunmqu/j1istb6l0hykbaz62zuj/3mtxpqug5tjodfukswixskridasbtrlihldx5zgopi9fzebvpgyg1bbhuavv7xkgpwy75x9+0qxep3fkyctoivz8oecqnahnagdgre8wyu8oanz4rw7h4vwnjpnhmmfoj8/mdqsuw==</latexit> <latexit sha1_base64="tpgusjk+wuyjr/gurkqvcvd3bus=">aaab+xicbvbntwixej3fl8svvy9egshee9nlohctojhxifgqbdakwwo0tn1n2yuhc//eiwen8eo/8ea/scaefhzjjc/vzwrmxhhzpo3nftu5tfwnza38dmfnd2//wd08augouytwscqj1qyxppxjwjfmcnqmfcui5pqphn7m/kcrvzpf8tgmyxoi3jesxwg2vuq4bim8qmlbcxq7rzohsanjfr2ynwdajx5gipch1ng/2t2ijijkqzjwuuv7sqlsrawjne4l7uttgjmh7towprilqon0fvkunvmli3qrsiunmqu/j1istb6l0hykbaz62zuj/3mtxpqug5tjodfukswixskridasbtrlihldx5zgopi9fzebvpgyg1bbhuavv7xkgpwy75x9+0qxep3fkyctoivz8oecqnahnagdgre8wyu8oanz4rw7h4vwnjpnhmmfoj8/mdqsuw==</latexit> <latexit sha1_base64="tpgusjk+wuyjr/gurkqvcvd3bus=">aaab+xicbvbntwixej3fl8svvy9egshee9nlohctojhxifgqbdakwwo0tn1n2yuhc//eiwen8eo/8ea/scaefhzjjc/vzwrmxhhzpo3nftu5tfwnza38dmfnd2//wd08augouytwscqj1qyxppxjwjfmcnqmfcui5pqphn7m/kcrvzpf8tgmyxoi3jesxwg2vuq4bim8qmlbcxq7rzohsanjfr2ynwdajx5gipch1ng/2t2ijijkqzjwuuv7sqlsrawjne4l7uttgjmh7towprilqon0fvkunvmli3qrsiunmqu/j1istb6l0hykbaz62zuj/3mtxpqug5tjodfukswixskridasbtrlihldx5zgopi9fzebvpgyg1bbhuavv7xkgpwy75x9+0qxep3fkyctoivz8oecqnahnagdgre8wyu8oanz4rw7h4vwnjpnhmmfoj8/mdqsuw==</latexit> <latexit sha1_base64="tpgusjk+wuyjr/gurkqvcvd3bus=">aaab+xicbvbntwixej3fl8svvy9egshee9nlohctojhxifgqbdakwwo0tn1n2yuhc//eiwen8eo/8ea/scaefhzjjc/vzwrmxhhzpo3nftu5tfwnza38dmfnd2//wd08augouytwscqj1qyxppxjwjfmcnqmfcui5pqphn7m/kcrvzpf8tgmyxoi3jesxwg2vuq4bim8qmlbcxq7rzohsanjfr2ynwdajx5gipch1ng/2t2ijijkqzjwuuv7sqlsrawjne4l7uttgjmh7towprilqon0fvkunvmli3qrsiunmqu/j1istb6l0hykbaz62zuj/3mtxpqug5tjodfukswixskridasbtrlihldx5zgopi9fzebvpgyg1bbhuavv7xkgpwy75x9+0qxep3fkyctoivz8oecqnahnagdgre8wyu8oanz4rw7h4vwnjpnhmmfoj8/mdqsuw==</latexit> b nx v i L 2 i := nx Ln 2 i=1 p i L = 1 n L i i=1 # data points p i := Prob({i} S) =P ii

<latexit sha1_base64="lz5g7p8kggfaagfrymokojesgpi=">aaacehicbvc7tsmwfhv4lvikmljyvaimku4cs6ukfoyoraipqqmr4zqtvcejbaepivijlpwkcwmistky8te4bqzoodkvjs65v/feeyscke0439bk6tr6xmzpq7y9s7u3bx8cdlscsklbjoax7avyuc4ebwumoe0lkuio4lqbjk+nfvebssvicacncfuipbqszarri/n2mdt3ayyzzl53q4ljhvjm5k5kiz9jdztfc9j0mev5dswpojpazyikugefwr795q5ikkzuamkxun3kjnrlsnsmcjqx3vtrbjmxhtk+oqjhvhnz7kecnhplamnymhiazttfexmoljpegemmsb6prw8q/uf1ux1eehktsaqpipnfycqhjue0hthgkhlnj4zgipm5fzirnrlok2hzhiawx14mnvovovv0w6s0roo4suaynibzgmafaiab0ajtqmajeaav4m16sl6sd+tj3rpiftnh4a+szx/jkz0x</latexit> <latexit sha1_base64="lz5g7p8kggfaagfrymokojesgpi=">aaacehicbvc7tsmwfhv4lvikmljyvaimku4cs6ukfoyoraipqqmr4zqtvcejbaepivijlpwkcwmistky8te4bqzoodkvjs65v/feeyscke0439bk6tr6xmzpq7y9s7u3bx8cdlscsklbjoax7avyuc4ebwumoe0lkuio4lqbjk+nfvebssvicacncfuipbqszarri/n2mdt3ayyzzl53q4ljhvjm5k5kiz9jdztfc9j0mev5dswpojpazyikugefwr795q5ikkzuamkxun3kjnrlsnsmcjqx3vtrbjmxhtk+oqjhvhnz7kecnhplamnymhiazttfexmoljpegemmsb6prw8q/uf1ux1eehktsaqpipnfycqhjue0hthgkhlnj4zgipm5fzirnrlok2hzhiawx14mnvovovv0w6s0roo4suaynibzgmafaiab0ajtqmajeaav4m16sl6sd+tj3rpiftnh4a+szx/jkz0x</latexit> <latexit sha1_base64="lz5g7p8kggfaagfrymokojesgpi=">aaacehicbvc7tsmwfhv4lvikmljyvaimku4cs6ukfoyoraipqqmr4zqtvcejbaepivijlpwkcwmistky8te4bqzoodkvjs65v/feeyscke0439bk6tr6xmzpq7y9s7u3bx8cdlscsklbjoax7avyuc4ebwumoe0lkuio4lqbjk+nfvebssvicacncfuipbqszarri/n2mdt3ayyzzl53q4ljhvjm5k5kiz9jdztfc9j0mev5dswpojpazyikugefwr795q5ikkzuamkxun3kjnrlsnsmcjqx3vtrbjmxhtk+oqjhvhnz7kecnhplamnymhiazttfexmoljpegemmsb6prw8q/uf1ux1eehktsaqpipnfycqhjue0hthgkhlnj4zgipm5fzirnrlok2hzhiawx14mnvovovv0w6s0roo4suaynibzgmafaiab0ajtqmajeaav4m16sl6sd+tj3rpiftnh4a+szx/jkz0x</latexit> <latexit sha1_base64="lz5g7p8kggfaagfrymokojesgpi=">aaacehicbvc7tsmwfhv4lvikmljyvaimku4cs6ukfoyoraipqqmr4zqtvcejbaepivijlpwkcwmistky8te4bqzoodkvjs65v/feeyscke0439bk6tr6xmzpq7y9s7u3bx8cdlscsklbjoax7avyuc4ebwumoe0lkuio4lqbjk+nfvebssvicacncfuipbqszarri/n2mdt3ayyzzl53q4ljhvjm5k5kiz9jdztfc9j0mev5dswpojpazyikugefwr795q5ikkzuamkxun3kjnrlsnsmcjqx3vtrbjmxhtk+oqjhvhnz7kecnhplamnymhiazttfexmoljpegemmsb6prw8q/uf1ux1eehktsaqpipnfycqhjue0hthgkhlnj4zgipm5fzirnrlok2hzhiawx14mnvovovv0w6s0roo4suaynibzgmafaiab0ajtqmajeaav4m16sl6sd+tj3rpiftnh4a+szx/jkz0x</latexit> Convergence Rate II P ij := Prob({i, j} S) Constants satisfying: P pp > Diag(p 1 v 1,p 2 v 2,...,p n v n ). krf i (x) rf i (y)k applel i kx yk Expected mini-batch size: b =E S <latexit sha1_base64="tpgusjk+wuyjr/gurkqvcvd3bus=">aaab+xicbvbntwixej3fl8svvy9egshee9nlohctojhxifgqbdakwwo0tn1n2yuhc//eiwen8eo/8ea/scaefhzjjc/vzwrmxhhzpo3nftu5tfwnza38dmfnd2//wd08augouytwscqj1qyxppxjwjfmcnqmfcui5pqphn7m/kcrvzpf8tgmyxoi3jesxwg2vuq4bim8qmlbcxq7rzohsanjfr2ynwdajx5gipch1ng/2t2ijijkqzjwuuv7sqlsrawjne4l7uttgjmh7towprilqon0fvkunvmli3qrsiunmqu/j1istb6l0hykbaz62zuj/3mtxpqug5tjodfukswixskridasbtrlihldx5zgopi9fzebvpgyg1bbhuavv7xkgpwy75x9+0qxep3fkyctoivz8oecqnahnagdgre8wyu8oanz4rw7h4vwnjpnhmmfoj8/mdqsuw==</latexit> <latexit sha1_base64="tpgusjk+wuyjr/gurkqvcvd3bus=">aaab+xicbvbntwixej3fl8svvy9egshee9nlohctojhxifgqbdakwwo0tn1n2yuhc//eiwen8eo/8ea/scaefhzjjc/vzwrmxhhzpo3nftu5tfwnza38dmfnd2//wd08augouytwscqj1qyxppxjwjfmcnqmfcui5pqphn7m/kcrvzpf8tgmyxoi3jesxwg2vuq4bim8qmlbcxq7rzohsanjfr2ynwdajx5gipch1ng/2t2ijijkqzjwuuv7sqlsrawjne4l7uttgjmh7towprilqon0fvkunvmli3qrsiunmqu/j1istb6l0hykbaz62zuj/3mtxpqug5tjodfukswixskridasbtrlihldx5zgopi9fzebvpgyg1bbhuavv7xkgpwy75x9+0qxep3fkyctoivz8oecqnahnagdgre8wyu8oanz4rw7h4vwnjpnhmmfoj8/mdqsuw==</latexit> <latexit sha1_base64="tpgusjk+wuyjr/gurkqvcvd3bus=">aaab+xicbvbntwixej3fl8svvy9egshee9nlohctojhxifgqbdakwwo0tn1n2yuhc//eiwen8eo/8ea/scaefhzjjc/vzwrmxhhzpo3nftu5tfwnza38dmfnd2//wd08augouytwscqj1qyxppxjwjfmcnqmfcui5pqphn7m/kcrvzpf8tgmyxoi3jesxwg2vuq4bim8qmlbcxq7rzohsanjfr2ynwdajx5gipch1ng/2t2ijijkqzjwuuv7sqlsrawjne4l7uttgjmh7towprilqon0fvkunvmli3qrsiunmqu/j1istb6l0hykbaz62zuj/3mtxpqug5tjodfukswixskridasbtrlihldx5zgopi9fzebvpgyg1bbhuavv7xkgpwy75x9+0qxep3fkyctoivz8oecqnahnagdgre8wyu8oanz4rw7h4vwnjpnhmmfoj8/mdqsuw==</latexit> <latexit sha1_base64="tpgusjk+wuyjr/gurkqvcvd3bus=">aaab+xicbvbntwixej3fl8svvy9egshee9nlohctojhxifgqbdakwwo0tn1n2yuhc//eiwen8eo/8ea/scaefhzjjc/vzwrmxhhzpo3nftu5tfwnza38dmfnd2//wd08augouytwscqj1qyxppxjwjfmcnqmfcui5pqphn7m/kcrvzpf8tgmyxoi3jesxwg2vuq4bim8qmlbcxq7rzohsanjfr2ynwdajx5gipch1ng/2t2ijijkqzjwuuv7sqlsrawjne4l7uttgjmh7towprilqon0fvkunvmli3qrsiunmqu/j1istb6l0hykbaz62zuj/3mtxpqug5tjodfukswixskridasbtrlihldx5zgopi9fzebvpgyg1bbhuavv7xkgpwy75x9+0qxep3fkyctoivz8oecqnahnagdgre8wyu8oanz4rw7h4vwnjpnhmmfoj8/mdqsuw==</latexit> b nx v i L 2 i := nx Ln 2 i=1 p i L = 1 n L i i=1 # data points p i := Prob({i} S) =P ii Optimal rate: minimize over {(v i,p i )} n i=1

Convergence Rate III

Experiments

Nonconvex Variance Reduced Optimization with Arbitrary Sampling Samuel Horváth 1 Peter Richtárik 1,2,3 1 KAUST 2 University of Edinburgh 3 Moscow Institute of Physics and Technology min xœr d The Problem nx f(x) := 1 f i(x) (1) n i=1 f i is L i-smooth but non-convex n is big Arbitrary Sampling Sampling: arandomset-valuedmappings with values being subsets of [n] :={1, 2,...,n}. A sampling is used to generate minibatches in each iteration. Probability matrix associated with sampling S : def P ij =Prob({i, j} S) Probability vector associated with sampling S : def p =(p 1,...,p n), p i =Prob(iœS) Minibatch size: b =E[ S ] (expected size of S) Proper sampling: Sampling for which p i > 0 for all i œ [n] Arbitrary sampling = any proper sampling Key Lemma Let 1, 2,..., n be vectors in R d def and let = 1 P n n i=1 i be their average. Let S be a proper sampling. Let v =(v 1,...,v n) > 0 be such that P pp Diag(p 1v 1, p 2v 2,...,p nv n). (2) Then 2 6 X i E 4 iœs np 2 3 7 5 Æ 1 X n v i i n 2 Î iî 2. i=1 p i Whenever (2) holds, it must be the case that v i Ø 1 p i. Optimal Sampling & Superlinear Speedup Under our analysis, the independent sampling S ú defined by 8 >< def (b + k n) P Li kj=1 if i Æ k Lj, p i = >: 1, if i>k, is optimal, where k is the largest integer satisfying 0 <b+ k n Æ P k i=1 Li. Lk All 3 methods enjoy superlinear speed in b up to the minibatch size b max := max{b bl n Æ P n i=1 L i}. # Stochastic Gradient Evaluations to Achieve E apple ÎÒf(x)Î 2 Æ Alg Uniform sampling Arbitrary sampling [NEW] S ú (Best Sampling) [NEW] ( ) ( ) 8 9 SVRG max n, (1+4/3)Lmaxc1n2/3 2/3 < 2/3 (1+4 /3) Lc1n (1+4(n b) 3n ) Lc1n = [1] max n, max n, : ; SAGA n + 2Lmaxc2n2/3 2/3 (1+ ) Lc2n [2] n + n + (1+n b 2/3 n ) Lc2n SARAH n + n b n 1 L2 maxc3 [3] n + L 2 c3 2 n + n b L 2 n 2 c3 2 Constants: L max =max i L i L = 1 n Pi L i c 1,c 2,c 3 = universal constants := L b P n vil 2 2 n 2 i i=1 pi Numerical Results Main Contributions Samplings SVRG with Arbitrary Sampling We develop arbitrary sampling variants of 3 popular variance-reduced methods for solving the non-convex problem (1): SVRG [1], SAGA [2], SARAH [3]. We are able calculate the optimal sampling out of all samplings of a given minibatch size. This is the first time an optimal minibatch sampling was computed (from the class of all samplings). We design importance sampling & approximate importance sampling for minibatches, which vastly outperform standard uniform minibatch strategies in practice. Uniform S u :Everysubsetof[n] of size b (minibatch size) is chosen with the same probability: 1 /( n b) Independent S ú :Foreachi œ [n] we independently flip a coin, and with probability p i include element i into S. Approximate Independent S a :Fixsome k œ [n] and let a = Ák max iæk p ië. Wenow sample a single set S Õ of cardinality a using the uniform minibatch sampling S u.subsequently, we apply an independent sampling S ú to select elements of S Õ,withselectionprobabilities p Õ i = kp i/a. TheresultingrandomsetisS a. Algorithm 1: SVRG x 0 = x 0 m = x 0, M = ÁT/mË; for s =0to M 1 do x s+1 0 = x s m; g s+1 = 1 P n n i=1 Òf i( x s ) for t =0to m 1 do Draw a random subset (minibatch) S t S vt s+1 = P 1 (Òf itœst it (x s+1 npi t t ) Òf it ( x s )) + g s+1 x s+1 t+1 = x s+1 t vt s+1 end x s+1 = x s+1 m end Output: Iterate x a chosen uniformly random from {{x s+1 t } m t=0} M s=0 References [1] Sashank J Reddi, Ahmed Hefny, Suvrit Sra, Barnabás Póczos, and Alex Smola. Stochastic variance reduction for nonconvex optimization. In The 33th International Conference on Machine Learning, pages 314 323, 2016. [2] Sashank J Reddi, Suvrit Sra, Barnabás Póczos, and Alex Smola. Fast incremental method for smooth nonconvex optimization. In Decision and Control (CDC), 2016 IEEE 55th Conference on, pages1971 1977.IEEE,2016. [3] Lam M Nguyen, Jie Liu, Katya Scheinberg, and Martin Taká. Stochastic recursive gradient algorithm for nonconvex optimization. arxiv:1705.07261, 2017. Poster: Pacific Ballroom #95 (Today 6:30 9:00 PM)

Thank you!