Nonconvex Variance Reduced Optimization with Arbitrary Sampling Samuel Horváth Peter Richtárik
Empirical Risk Minimization nx min x2r d f(x) := 1 n i=1 f i (x)
Empirical Risk Minimization nx min x2r d f(x) := 1 n n is big i=1 f i (x)
Empirical Risk Minimization nx min x2r d f(x) := 1 n i=1 f i (x) n is big non-convex, L i -smooth krf i (x) rf i (y)k applel i kx yk
Baseline Variance Reduced SGD Methods SVRG Johnson & Zhang NIPS 2013 SAGA Defazio, Bach & Lacoste-Julien NIPS 2014 SARAH Nguyen, Liu, Scheinberg & Takáč ICML 2017
Baseline Variance Reduced SGD Methods SVRG Johnson & Zhang NIPS 2013 Uniform sampling SAGA Defazio, Bach & Lacoste-Julien NIPS 2014 SARAH Nguyen, Liu, Scheinberg & Takáč ICML 2017 Uniform sampling Uniform sampling
Baseline Variance Reduced SGD Methods Mini-batch SVRG Konečný & Richtárik FAMS 2017 SAGA Reddi, Hefny, Sra, Poczos, Smola CDC 2016 SARAH Nguyen, Liu, Scheinberg & Takáč 2017
Baseline Variance Reduced SGD Methods Mini-batch Mini-batch size SVRG Konečný & Richtárik FAMS 2017 SAGA Reddi, Hefny, Sra, Poczos, Smola CDC 2016 SARAH Nguyen, Liu, Scheinberg & Takáč 2017 Uniform sampling Uniform sampling Uniform sampling
Contributions Analysis of SVRG, SAGA and SARAH in the arbitrary sampling paradigm Construction of optimal minibatch sampling
Contributions Richtárik & Takáč (OL 2016; arxiv 2013) Qu, Richtárik & Zhang (NIPS 2015) Qu & Richtárik (COAP 2016) Chambolle, Ehrhardt, Richtárik & Schoenlieb (SIOPT 2018) Hanzely & Richtárik (AISTATS 2019) Qian, Qu & Richtárik (ICML 2019) Gower, Loizou, Qian, Sailanbayev, Shulgin & Richtárik (ICML 2019) Analysis of SVRG, SAGA and SARAH in the arbitrary sampling paradigm Construction of optimal minibatch sampling
Contributions Richtárik & Takáč (OL 2016; arxiv 2013) Qu, Richtárik & Zhang (NIPS 2015) Qu & Richtárik (COAP 2016) Chambolle, Ehrhardt, Richtárik & Schoenlieb (SIOPT 2018) Hanzely & Richtárik (AISTATS 2019) Qian, Qu & Richtárik (ICML 2019) Gower, Loizou, Qian, Sailanbayev, Shulgin & Richtárik (ICML 2019) Analysis of SVRG, SAGA and SARAH in the arbitrary sampling paradigm Construction of optimal minibatch sampling First optimal/importance sampling for minibatches!
S S Data Sampling (i.e., Mini-batching) Mechanisms Sampling: a random subset of {1, 2,, n} Probability matrix P 2 R n n associated with sampling P ij := Prob({i, j} S) Probability vector p 2 R n associated with sampling p i := Prob({i} S) =P ii Proper sampling: p i > 0 for all i =1, 2,...,n
S S Data Sampling (i.e., Mini-batching) Mechanisms Sampling: a random subset of {1, 2,, n} A sampling is uniquely defined by assigning probabilities to all 2 n subsets of {1, 2,..., n} Probability matrix P 2 R n n associated with sampling P ij := Prob({i, j} S) Probability vector p 2 R n associated with sampling p i := Prob({i} S) =P ii Proper sampling: p i > 0 for all i =1, 2,...,n
S S <latexit sha1_base64="wyrux/pk9vh5a+mivjphdxvu0v4=">aaack3icbvc7tsmwfhv4lviqmljynegmvzv0gavsvrzgebqqnvhlue5r4dirfqokov4pc7/caampsfifukudrzmdnxop7r0nsgu34hmvztz8wulscmmlvlq2vrfz2dq+ncrtlhwoekp3i2ky4jj1ging3vqzkkscxuxxxxp/6ozpw5w8gdxlyukgksecerbsv9j2z3etbwupxi6+5tdcqvyribjgkgm3idwhht8uplvjpteraru86dcatwcgwnsk269uvbo3bf5l/bmpohlo+5vhg6vzwirqqyzp+v4kyue0ccryubxkhqwexpmh61kqscjmwex/hen9qwymp8rkap6q3xmfsyzjk8hojgrg5rc3ef/zehner2hbzzobk/rruzwjdappismdrhkfkvtcqob2vkxhxnydtt6ylch//fjfctmo+17dp2tuw+1zhsw0i/bqaflriwqhe3skooiio/santglc+88ow/o+9fondpl7kafcd4+ath6ptm=</latexit> <latexit sha1_base64="wyrux/pk9vh5a+mivjphdxvu0v4=">aaack3icbvc7tsmwfhv4lviqmljynegmvzv0gavsvrzgebqqnvhlue5r4dirfqokov4pc7/caampsfifukudrzmdnxop7r0nsgu34hmvztz8wulscmmlvlq2vrfz2dq+ncrtlhwoekp3i2ky4jj1ging3vqzkkscxuxxxxp/6ozpw5w8gdxlyukgksecerbsv9j2z3etbwupxi6+5tdcqvyribjgkgm3idwhht8uplvjpteraru86dcatwcgwnsk269uvbo3bf5l/bmpohlo+5vhg6vzwirqqyzp+v4kyue0ccryubxkhqwexpmh61kqscjmwex/hen9qwymp8rkap6q3xmfsyzjk8hojgrg5rc3ef/zehner2hbzzobk/rruzwjdappismdrhkfkvtcqob2vkxhxnydtt6ylch//fjfctmo+17dp2tuw+1zhsw0i/bqaflriwqhe3skooiio/santglc+88ow/o+9fondpl7kafcd4+ath6ptm=</latexit> <latexit sha1_base64="wyrux/pk9vh5a+mivjphdxvu0v4=">aaack3icbvc7tsmwfhv4lviqmljynegmvzv0gavsvrzgebqqnvhlue5r4dirfqokov4pc7/caampsfifukudrzmdnxop7r0nsgu34hmvztz8wulscmmlvlq2vrfz2dq+ncrtlhwoekp3i2ky4jj1ging3vqzkkscxuxxxxp/6ozpw5w8gdxlyukgksecerbsv9j2z3etbwupxi6+5tdcqvyribjgkgm3idwhht8uplvjpteraru86dcatwcgwnsk269uvbo3bf5l/bmpohlo+5vhg6vzwirqqyzp+v4kyue0ccryubxkhqwexpmh61kqscjmwex/hen9qwymp8rkap6q3xmfsyzjk8hojgrg5rc3ef/zehner2hbzzobk/rruzwjdappismdrhkfkvtcqob2vkxhxnydtt6ylch//fjfctmo+17dp2tuw+1zhsw0i/bqaflriwqhe3skooiio/santglc+88ow/o+9fondpl7kafcd4+ath6ptm=</latexit> <latexit sha1_base64="wyrux/pk9vh5a+mivjphdxvu0v4=">aaack3icbvc7tsmwfhv4lviqmljynegmvzv0gavsvrzgebqqnvhlue5r4dirfqokov4pc7/caampsfifukudrzmdnxop7r0nsgu34hmvztz8wulscmmlvlq2vrfz2dq+ncrtlhwoekp3i2ky4jj1ging3vqzkkscxuxxxxp/6ozpw5w8gdxlyukgksecerbsv9j2z3etbwupxi6+5tdcqvyribjgkgm3idwhht8uplvjpteraru86dcatwcgwnsk269uvbo3bf5l/bmpohlo+5vhg6vzwirqqyzp+v4kyue0ccryubxkhqwexpmh61kqscjmwex/hen9qwymp8rkap6q3xmfsyzjk8hojgrg5rc3ef/zehner2hbzzobk/rruzwjdappismdrhkfkvtcqob2vkxhxnydtt6ylch//fjfctmo+17dp2tuw+1zhsw0i/bqaflriwqhe3skooiio/santglc+88ow/o+9fondpl7kafcd4+ath6ptm=</latexit> Data Sampling (i.e., Mini-batching) Mechanisms Sampling: a random subset of {1, 2,, n} A sampling is uniquely defined by assigning probabilities to all 2 n subsets of {1, 2,..., n} Probability matrix Probability vector Proper sampling: P 2 R n n associated with sampling P ij := Prob({i, j} S) p 2 R n associated with sampling p i := Prob({i} S) =P ii p i > 0 for all i =1, 2,...,n Examples Standard sampling: S = {i} with probability 1 n for all i =1, 2,...,n Standard mini-batch sampling: S = C with probability 1 ( n b) for all C {1, 2,...,n} such that C = b
S S <latexit sha1_base64="wyrux/pk9vh5a+mivjphdxvu0v4=">aaack3icbvc7tsmwfhv4lviqmljynegmvzv0gavsvrzgebqqnvhlue5r4dirfqokov4pc7/caampsfifukudrzmdnxop7r0nsgu34hmvztz8wulscmmlvlq2vrfz2dq+ncrtlhwoekp3i2ky4jj1ging3vqzkkscxuxxxxp/6ozpw5w8gdxlyukgksecerbsv9j2z3etbwupxi6+5tdcqvyribjgkgm3idwhht8uplvjpteraru86dcatwcgwnsk269uvbo3bf5l/bmpohlo+5vhg6vzwirqqyzp+v4kyue0ccryubxkhqwexpmh61kqscjmwex/hen9qwymp8rkap6q3xmfsyzjk8hojgrg5rc3ef/zehner2hbzzobk/rruzwjdappismdrhkfkvtcqob2vkxhxnydtt6ylch//fjfctmo+17dp2tuw+1zhsw0i/bqaflriwqhe3skooiio/santglc+88ow/o+9fondpl7kafcd4+ath6ptm=</latexit> <latexit sha1_base64="wyrux/pk9vh5a+mivjphdxvu0v4=">aaack3icbvc7tsmwfhv4lviqmljynegmvzv0gavsvrzgebqqnvhlue5r4dirfqokov4pc7/caampsfifukudrzmdnxop7r0nsgu34hmvztz8wulscmmlvlq2vrfz2dq+ncrtlhwoekp3i2ky4jj1ging3vqzkkscxuxxxxp/6ozpw5w8gdxlyukgksecerbsv9j2z3etbwupxi6+5tdcqvyribjgkgm3idwhht8uplvjpteraru86dcatwcgwnsk269uvbo3bf5l/bmpohlo+5vhg6vzwirqqyzp+v4kyue0ccryubxkhqwexpmh61kqscjmwex/hen9qwymp8rkap6q3xmfsyzjk8hojgrg5rc3ef/zehner2hbzzobk/rruzwjdappismdrhkfkvtcqob2vkxhxnydtt6ylch//fjfctmo+17dp2tuw+1zhsw0i/bqaflriwqhe3skooiio/santglc+88ow/o+9fondpl7kafcd4+ath6ptm=</latexit> <latexit sha1_base64="wyrux/pk9vh5a+mivjphdxvu0v4=">aaack3icbvc7tsmwfhv4lviqmljynegmvzv0gavsvrzgebqqnvhlue5r4dirfqokov4pc7/caampsfifukudrzmdnxop7r0nsgu34hmvztz8wulscmmlvlq2vrfz2dq+ncrtlhwoekp3i2ky4jj1ging3vqzkkscxuxxxxp/6ozpw5w8gdxlyukgksecerbsv9j2z3etbwupxi6+5tdcqvyribjgkgm3idwhht8uplvjpteraru86dcatwcgwnsk269uvbo3bf5l/bmpohlo+5vhg6vzwirqqyzp+v4kyue0ccryubxkhqwexpmh61kqscjmwex/hen9qwymp8rkap6q3xmfsyzjk8hojgrg5rc3ef/zehner2hbzzobk/rruzwjdappismdrhkfkvtcqob2vkxhxnydtt6ylch//fjfctmo+17dp2tuw+1zhsw0i/bqaflriwqhe3skooiio/santglc+88ow/o+9fondpl7kafcd4+ath6ptm=</latexit> <latexit sha1_base64="wyrux/pk9vh5a+mivjphdxvu0v4=">aaack3icbvc7tsmwfhv4lviqmljynegmvzv0gavsvrzgebqqnvhlue5r4dirfqokov4pc7/caampsfifukudrzmdnxop7r0nsgu34hmvztz8wulscmmlvlq2vrfz2dq+ncrtlhwoekp3i2ky4jj1ging3vqzkkscxuxxxxp/6ozpw5w8gdxlyukgksecerbsv9j2z3etbwupxi6+5tdcqvyribjgkgm3idwhht8uplvjpteraru86dcatwcgwnsk269uvbo3bf5l/bmpohlo+5vhg6vzwirqqyzp+v4kyue0ccryubxkhqwexpmh61kqscjmwex/hen9qwymp8rkap6q3xmfsyzjk8hojgrg5rc3ef/zehner2hbzzobk/rruzwjdappismdrhkfkvtcqob2vkxhxnydtt6ylch//fjfctmo+17dp2tuw+1zhsw0i/bqaflriwqhe3skooiio/santglc+88ow/o+9fondpl7kafcd4+ath6ptm=</latexit> Data Sampling (i.e., Mini-batching) Mechanisms Sampling: a random subset of {1, 2,, n} A sampling is uniquely defined by assigning probabilities to all 2 n subsets of {1, 2,..., n} Probability matrix Probability vector Proper sampling: P 2 R n n associated with sampling P ij := Prob({i, j} S) p 2 R n associated with sampling p i := Prob({i} S) =P ii p i > 0 for all i =1, 2,...,n Examples Standard sampling: S = {i} with probability 1 n for all i =1, 2,...,n Standard mini-batch sampling: S = C with probability 1 ( n b) for all C {1, 2,...,n} such that C = b Arbitrary sampling paradigm = perform iteration complexity analysis for any proper sampling
<latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> <latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> <latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> <latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> From Standard Sampling to Arbitrary Sampling SVRG with Arbitrary Sampling x + X = x i2s 1 (rf i (x) np i rf i (ˆx)) + rf(ˆx)!
<latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> <latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> <latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> <latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> From Standard Sampling to Arbitrary Sampling SVRG with Arbitrary Sampling x + = x X i2s 1 Unbiased estimator of the gradient np i (rf i (x) rf i (ˆx)) + rf(ˆx)!
<latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> <latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> <latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> <latexit sha1_base64="fors//lgdcrlu94d3yhmshgoank=">aaach3icbvfnbxmxepuubdrwlckry4gikaei7jykekeqcofybgkrxcvk63gtq17vyp5fisz/fx4un/4ntrqu0jkspef35nk8m0wjpmuk+rxft7zu37m7vdo7d//bw0f93ccntm4nfxneq9qcfcwkjbwyoeqlzhojwfuocvqcf1zrp9+fsblwx3hvikxicy1lyrkgku//onpltz14b0t4cvqga6bkldiktq1yj4fkdy5ucjkjzh6+ee9lw7hlvbsq6cax3v9xa1yobmuuh8vr+uw/d7pg6jz+bnti+qjhshepxmqdrlo8p0jgysbgjkg7mcbdhof9n3rw87ysgrli1k7tpmhmmyosk+f7tlwiyfyczcu0qm0qyto36cld88dmokxnobphw151ofzzu6qkkfkxxnjr2pr8nzztstzmnnrni0lzi0jlqwbrwc8fztiijmovaonghr8cx7awzayr64uhpndbvglo9sdpmk4/7w+opntj2czpytmyjcl5s47ij3jmjorhw9gl6hv0eo/er+i38effahx1nifkn4jf/wztysjs</latexit> <latexit sha1_base64="lmndbt30co6xq28rzf/6r/4u7hs=">aaacchicbvc7tsmwfhv4lvikmdjg0saxvukxwjaqwbilrb9sg0wo67rwhtuyhaqqysjcr7awgbarn8dg3+cmgadltefn3kt7zwktrpv23w9rzxvtfwozslxd3tnd27cpdjtkpbktnhzmyf6ifgguk7ammpfeigmkq0a64erm5ncfifru8hs9tygfoxgnecvigymwt5wkopakdikjcoblgc8dgakjewpqou5g19y6wwaue68knvcifdhfg6haauy4xgwp1ffcrpszkppirvlqifukqxicrqrvkecxux5wbmnhmvggxflica0l9fdghmklpnfojmokx2rrm4n/ef1ur5d+rnmsaslx/fcumqgfnluch1qsrnnueiqlnb9cpeamem26q5osvmxiy6ttqhtu3btr1jrxzr0vcaxowtnwwavoglvqam2awsn4bq/gzxqyxqx362m+umkvo0fgd6zph+iul/q=</latexit> <latexit sha1_base64="lmndbt30co6xq28rzf/6r/4u7hs=">aaacchicbvc7tsmwfhv4lvikmdjg0saxvukxwjaqwbilrb9sg0wo67rwhtuyhaqqysjcr7awgbarn8dg3+cmgadltefn3kt7zwktrpv23w9rzxvtfwozslxd3tnd27cpdjtkpbktnhzmyf6ifgguk7ammpfeigmkq0a64erm5ncfifru8hs9tygfoxgnecvigymwt5wkopakdikjcoblgc8dgakjewpqou5g19y6wwaue68knvcifdhfg6haauy4xgwp1ffcrpszkppirvlqifukqxicrqrvkecxux5wbmnhmvggxflica0l9fdghmklpnfojmokx2rrm4n/ef1ur5d+rnmsaslx/fcumqgfnluch1qsrnnueiqlnb9cpeamem26q5osvmxiy6ttqhtu3btr1jrxzr0vcaxowtnwwavoglvqam2awsn4bq/gzxqyxqx362m+umkvo0fgd6zph+iul/q=</latexit> <latexit sha1_base64="lmndbt30co6xq28rzf/6r/4u7hs=">aaacchicbvc7tsmwfhv4lvikmdjg0saxvukxwjaqwbilrb9sg0wo67rwhtuyhaqqysjcr7awgbarn8dg3+cmgadltefn3kt7zwktrpv23w9rzxvtfwozslxd3tnd27cpdjtkpbktnhzmyf6ifgguk7ammpfeigmkq0a64erm5ncfifru8hs9tygfoxgnecvigymwt5wkopakdikjcoblgc8dgakjewpqou5g19y6wwaue68knvcifdhfg6haauy4xgwp1ffcrpszkppirvlqifukqxicrqrvkecxux5wbmnhmvggxflica0l9fdghmklpnfojmokx2rrm4n/ef1ur5d+rnmsaslx/fcumqgfnluch1qsrnnueiqlnb9cpeamem26q5osvmxiy6ttqhtu3btr1jrxzr0vcaxowtnwwavoglvqam2awsn4bq/gzxqyxqx362m+umkvo0fgd6zph+iul/q=</latexit> <latexit sha1_base64="lmndbt30co6xq28rzf/6r/4u7hs=">aaacchicbvc7tsmwfhv4lvikmdjg0saxvukxwjaqwbilrb9sg0wo67rwhtuyhaqqysjcr7awgbarn8dg3+cmgadltefn3kt7zwktrpv23w9rzxvtfwozslxd3tnd27cpdjtkpbktnhzmyf6ifgguk7ammpfeigmkq0a64erm5ncfifru8hs9tygfoxgnecvigymwt5wkopakdikjcoblgc8dgakjewpqou5g19y6wwaue68knvcifdhfg6haauy4xgwp1ffcrpszkppirvlqifukqxicrqrvkecxux5wbmnhmvggxflica0l9fdghmklpnfojmokx2rrm4n/ef1ur5d+rnmsaslx/fcumqgfnluch1qsrnnueiqlnb9cpeamem26q5osvmxiy6ttqhtu3btr1jrxzr0vcaxowtnwwavoglvqam2awsn4bq/gzxqyxqx362m+umkvo0fgd6zph+iul/q=</latexit> From Standard Sampling to Arbitrary Sampling SVRG with Arbitrary Sampling x + = x X i2s 1 Unbiased estimator of the gradient np i (rf i (x) rf i (ˆx)) + rf(ˆx) Standard sampling:! Arbitrary sampling p i = 1 n for all i
Convergence Rate I n + n2/3 O
<latexit sha1_base64="ti37wostf5wqbv622wqejirgk/m=">aaacenicbvc7tgjbfj3ff+iltbszccbqkf0alyngxbitesqsktnhlkyynv1nzo1k4rts/bubc42xtblzbxwehyinucnjoffm3nu8idolbfvbsq2srq1vpdczw9s7u3vz/yo6cmnjouzdhsqmrxrwjqcmmebqjcsqwopq8ayxe79xd1kxunzoyqttgpqe8xkl2kidbdgfudlal2psjlxbpe6wx3gouqpbmny53gexisv4kpkdbm4u2vpgzelmsq7nue1kv9xusomahkacknvy7ei3eyi1oxzggtdwebe6id1ogspiakqdtf8a4xojdlefslnc46n6eyihgvldwdodadf9tehnxp+8vqz9s3bcrbrrehs2yi851ige5io7talvfggiozkzwzhte0monilmtajo4svlpf4uoxbjus7nkufzonlocb2janlqkaqgk1rfnutri3pgr+jnerjerhfry9aasuyzh+gprm8f8d2cya==</latexit> <latexit sha1_base64="ti37wostf5wqbv622wqejirgk/m=">aaacenicbvc7tgjbfj3ff+iltbszccbqkf0alyngxbitesqsktnhlkyynv1nzo1k4rts/bubc42xtblzbxwehyinucnjoffm3nu8idolbfvbsq2srq1vpdczw9s7u3vz/yo6cmnjouzdhsqmrxrwjqcmmebqjcsqwopq8ayxe79xd1kxunzoyqttgpqe8xkl2kidbdgfudlal2psjlxbpe6wx3gouqpbmny53gexisv4kpkdbm4u2vpgzelmsq7nue1kv9xusomahkacknvy7ei3eyi1oxzggtdwebe6id1ogspiakqdtf8a4xojdlefslnc46n6eyihgvldwdodadf9tehnxp+8vqz9s3bcrbrrehs2yi851ige5io7talvfggiozkzwzhte0monilmtajo4svlpf4uoxbjus7nkufzonlocb2janlqkaqgk1rfnutri3pgr+jnerjerhfry9aasuyzh+gprm8f8d2cya==</latexit> <latexit sha1_base64="ti37wostf5wqbv622wqejirgk/m=">aaacenicbvc7tgjbfj3ff+iltbszccbqkf0alyngxbitesqsktnhlkyynv1nzo1k4rts/bubc42xtblzbxwehyinucnjoffm3nu8idolbfvbsq2srq1vpdczw9s7u3vz/yo6cmnjouzdhsqmrxrwjqcmmebqjcsqwopq8ayxe79xd1kxunzoyqttgpqe8xkl2kidbdgfudlal2psjlxbpe6wx3gouqpbmny53gexisv4kpkdbm4u2vpgzelmsq7nue1kv9xusomahkacknvy7ei3eyi1oxzggtdwebe6id1ogspiakqdtf8a4xojdlefslnc46n6eyihgvldwdodadf9tehnxp+8vqz9s3bcrbrrehs2yi851ige5io7talvfggiozkzwzhte0monilmtajo4svlpf4uoxbjus7nkufzonlocb2janlqkaqgk1rfnutri3pgr+jnerjerhfry9aasuyzh+gprm8f8d2cya==</latexit> <latexit sha1_base64="ti37wostf5wqbv622wqejirgk/m=">aaacenicbvc7tgjbfj3ff+iltbszccbqkf0alyngxbitesqsktnhlkyynv1nzo1k4rts/bubc42xtblzbxwehyinucnjoffm3nu8idolbfvbsq2srq1vpdczw9s7u3vz/yo6cmnjouzdhsqmrxrwjqcmmebqjcsqwopq8ayxe79xd1kxunzoyqttgpqe8xkl2kidbdgfudlal2psjlxbpe6wx3gouqpbmny53gexisv4kpkdbm4u2vpgzelmsq7nue1kv9xusomahkacknvy7ei3eyi1oxzggtdwebe6id1ogspiakqdtf8a4xojdlefslnc46n6eyihgvldwdodadf9tehnxp+8vqz9s3bcrbrrehs2yi851ige5io7talvfggiozkzwzhte0monilmtajo4svlpf4uoxbjus7nkufzonlocb2janlqkaqgk1rfnutri3pgr+jnerjerhfry9aasuyzh+gprm8f8d2cya==</latexit> Convergence Rate I # data points n + n2/3 O Ekrf(x)k 2 apple
<latexit sha1_base64="ti37wostf5wqbv622wqejirgk/m=">aaacenicbvc7tgjbfj3ff+iltbszccbqkf0alyngxbitesqsktnhlkyynv1nzo1k4rts/bubc42xtblzbxwehyinucnjoffm3nu8idolbfvbsq2srq1vpdczw9s7u3vz/yo6cmnjouzdhsqmrxrwjqcmmebqjcsqwopq8ayxe79xd1kxunzoyqttgpqe8xkl2kidbdgfudlal2psjlxbpe6wx3gouqpbmny53gexisv4kpkdbm4u2vpgzelmsq7nue1kv9xusomahkacknvy7ei3eyi1oxzggtdwebe6id1ogspiakqdtf8a4xojdlefslnc46n6eyihgvldwdodadf9tehnxp+8vqz9s3bcrbrrehs2yi851ige5io7talvfggiozkzwzhte0monilmtajo4svlpf4uoxbjus7nkufzonlocb2janlqkaqgk1rfnutri3pgr+jnerjerhfry9aasuyzh+gprm8f8d2cya==</latexit> <latexit sha1_base64="ti37wostf5wqbv622wqejirgk/m=">aaacenicbvc7tgjbfj3ff+iltbszccbqkf0alyngxbitesqsktnhlkyynv1nzo1k4rts/bubc42xtblzbxwehyinucnjoffm3nu8idolbfvbsq2srq1vpdczw9s7u3vz/yo6cmnjouzdhsqmrxrwjqcmmebqjcsqwopq8ayxe79xd1kxunzoyqttgpqe8xkl2kidbdgfudlal2psjlxbpe6wx3gouqpbmny53gexisv4kpkdbm4u2vpgzelmsq7nue1kv9xusomahkacknvy7ei3eyi1oxzggtdwebe6id1ogspiakqdtf8a4xojdlefslnc46n6eyihgvldwdodadf9tehnxp+8vqz9s3bcrbrrehs2yi851ige5io7talvfggiozkzwzhte0monilmtajo4svlpf4uoxbjus7nkufzonlocb2janlqkaqgk1rfnutri3pgr+jnerjerhfry9aasuyzh+gprm8f8d2cya==</latexit> <latexit sha1_base64="ti37wostf5wqbv622wqejirgk/m=">aaacenicbvc7tgjbfj3ff+iltbszccbqkf0alyngxbitesqsktnhlkyynv1nzo1k4rts/bubc42xtblzbxwehyinucnjoffm3nu8idolbfvbsq2srq1vpdczw9s7u3vz/yo6cmnjouzdhsqmrxrwjqcmmebqjcsqwopq8ayxe79xd1kxunzoyqttgpqe8xkl2kidbdgfudlal2psjlxbpe6wx3gouqpbmny53gexisv4kpkdbm4u2vpgzelmsq7nue1kv9xusomahkacknvy7ei3eyi1oxzggtdwebe6id1ogspiakqdtf8a4xojdlefslnc46n6eyihgvldwdodadf9tehnxp+8vqz9s3bcrbrrehs2yi851ige5io7talvfggiozkzwzhte0monilmtajo4svlpf4uoxbjus7nkufzonlocb2janlqkaqgk1rfnutri3pgr+jnerjerhfry9aasuyzh+gprm8f8d2cya==</latexit> <latexit sha1_base64="ti37wostf5wqbv622wqejirgk/m=">aaacenicbvc7tgjbfj3ff+iltbszccbqkf0alyngxbitesqsktnhlkyynv1nzo1k4rts/bubc42xtblzbxwehyinucnjoffm3nu8idolbfvbsq2srq1vpdczw9s7u3vz/yo6cmnjouzdhsqmrxrwjqcmmebqjcsqwopq8ayxe79xd1kxunzoyqttgpqe8xkl2kidbdgfudlal2psjlxbpe6wx3gouqpbmny53gexisv4kpkdbm4u2vpgzelmsq7nue1kv9xusomahkacknvy7ei3eyi1oxzggtdwebe6id1ogspiakqdtf8a4xojdlefslnc46n6eyihgvldwdodadf9tehnxp+8vqz9s3bcrbrrehs2yi851ige5io7talvfggiozkzwzhte0monilmtajo4svlpf4uoxbjus7nkufzonlocb2janlqkaqgk1rfnutri3pgr+jnerjerhfry9aasuyzh+gprm8f8d2cya==</latexit> Convergence Rate I # data points n + n2/3 O Ekrf(x)k 2 apple KEY QUANTITY: DEPENDS ON THE SAMPLING
Convergence Rate II := b Ln 2 nx i=1 v i L 2 i p i
<latexit sha1_base64="lz5g7p8kggfaagfrymokojesgpi=">aaacehicbvc7tsmwfhv4lvikmljyvaimku4cs6ukfoyoraipqqmr4zqtvcejbaepivijlpwkcwmistky8te4bqzoodkvjs65v/feeyscke0439bk6tr6xmzpq7y9s7u3bx8cdlscsklbjoax7avyuc4ebwumoe0lkuio4lqbjk+nfvebssvicacncfuipbqszarri/n2mdt3ayyzzl53q4ljhvjm5k5kiz9jdztfc9j0mev5dswpojpazyikugefwr795q5ikkzuamkxun3kjnrlsnsmcjqx3vtrbjmxhtk+oqjhvhnz7kecnhplamnymhiazttfexmoljpegemmsb6prw8q/uf1ux1eehktsaqpipnfycqhjue0hthgkhlnj4zgipm5fzirnrlok2hzhiawx14mnvovovv0w6s0roo4suaynibzgmafaiab0ajtqmajeaav4m16sl6sd+tj3rpiftnh4a+szx/jkz0x</latexit> <latexit sha1_base64="lz5g7p8kggfaagfrymokojesgpi=">aaacehicbvc7tsmwfhv4lvikmljyvaimku4cs6ukfoyoraipqqmr4zqtvcejbaepivijlpwkcwmistky8te4bqzoodkvjs65v/feeyscke0439bk6tr6xmzpq7y9s7u3bx8cdlscsklbjoax7avyuc4ebwumoe0lkuio4lqbjk+nfvebssvicacncfuipbqszarri/n2mdt3ayyzzl53q4ljhvjm5k5kiz9jdztfc9j0mev5dswpojpazyikugefwr795q5ikkzuamkxun3kjnrlsnsmcjqx3vtrbjmxhtk+oqjhvhnz7kecnhplamnymhiazttfexmoljpegemmsb6prw8q/uf1ux1eehktsaqpipnfycqhjue0hthgkhlnj4zgipm5fzirnrlok2hzhiawx14mnvovovv0w6s0roo4suaynibzgmafaiab0ajtqmajeaav4m16sl6sd+tj3rpiftnh4a+szx/jkz0x</latexit> <latexit sha1_base64="lz5g7p8kggfaagfrymokojesgpi=">aaacehicbvc7tsmwfhv4lvikmljyvaimku4cs6ukfoyoraipqqmr4zqtvcejbaepivijlpwkcwmistky8te4bqzoodkvjs65v/feeyscke0439bk6tr6xmzpq7y9s7u3bx8cdlscsklbjoax7avyuc4ebwumoe0lkuio4lqbjk+nfvebssvicacncfuipbqszarri/n2mdt3ayyzzl53q4ljhvjm5k5kiz9jdztfc9j0mev5dswpojpazyikugefwr795q5ikkzuamkxun3kjnrlsnsmcjqx3vtrbjmxhtk+oqjhvhnz7kecnhplamnymhiazttfexmoljpegemmsb6prw8q/uf1ux1eehktsaqpipnfycqhjue0hthgkhlnj4zgipm5fzirnrlok2hzhiawx14mnvovovv0w6s0roo4suaynibzgmafaiab0ajtqmajeaav4m16sl6sd+tj3rpiftnh4a+szx/jkz0x</latexit> <latexit sha1_base64="lz5g7p8kggfaagfrymokojesgpi=">aaacehicbvc7tsmwfhv4lvikmljyvaimku4cs6ukfoyoraipqqmr4zqtvcejbaepivijlpwkcwmistky8te4bqzoodkvjs65v/feeyscke0439bk6tr6xmzpq7y9s7u3bx8cdlscsklbjoax7avyuc4ebwumoe0lkuio4lqbjk+nfvebssvicacncfuipbqszarri/n2mdt3ayyzzl53q4ljhvjm5k5kiz9jdztfc9j0mev5dswpojpazyikugefwr795q5ikkzuamkxun3kjnrlsnsmcjqx3vtrbjmxhtk+oqjhvhnz7kecnhplamnymhiazttfexmoljpegemmsb6prw8q/uf1ux1eehktsaqpipnfycqhjue0hthgkhlnj4zgipm5fzirnrlok2hzhiawx14mnvovovv0w6s0roo4suaynibzgmafaiab0ajtqmajeaav4m16sl6sd+tj3rpiftnh4a+szx/jkz0x</latexit> Convergence Rate II P ij := Prob({i, j} S) Constants satisfying: P pp > Diag(p 1 v 1,p 2 v 2,...,p n v n ). krf i (x) rf i (y)k applel i kx yk Expected mini-batch size: b =E S <latexit sha1_base64="tpgusjk+wuyjr/gurkqvcvd3bus=">aaab+xicbvbntwixej3fl8svvy9egshee9nlohctojhxifgqbdakwwo0tn1n2yuhc//eiwen8eo/8ea/scaefhzjjc/vzwrmxhhzpo3nftu5tfwnza38dmfnd2//wd08augouytwscqj1qyxppxjwjfmcnqmfcui5pqphn7m/kcrvzpf8tgmyxoi3jesxwg2vuq4bim8qmlbcxq7rzohsanjfr2ynwdajx5gipch1ng/2t2ijijkqzjwuuv7sqlsrawjne4l7uttgjmh7towprilqon0fvkunvmli3qrsiunmqu/j1istb6l0hykbaz62zuj/3mtxpqug5tjodfukswixskridasbtrlihldx5zgopi9fzebvpgyg1bbhuavv7xkgpwy75x9+0qxep3fkyctoivz8oecqnahnagdgre8wyu8oanz4rw7h4vwnjpnhmmfoj8/mdqsuw==</latexit> <latexit sha1_base64="tpgusjk+wuyjr/gurkqvcvd3bus=">aaab+xicbvbntwixej3fl8svvy9egshee9nlohctojhxifgqbdakwwo0tn1n2yuhc//eiwen8eo/8ea/scaefhzjjc/vzwrmxhhzpo3nftu5tfwnza38dmfnd2//wd08augouytwscqj1qyxppxjwjfmcnqmfcui5pqphn7m/kcrvzpf8tgmyxoi3jesxwg2vuq4bim8qmlbcxq7rzohsanjfr2ynwdajx5gipch1ng/2t2ijijkqzjwuuv7sqlsrawjne4l7uttgjmh7towprilqon0fvkunvmli3qrsiunmqu/j1istb6l0hykbaz62zuj/3mtxpqug5tjodfukswixskridasbtrlihldx5zgopi9fzebvpgyg1bbhuavv7xkgpwy75x9+0qxep3fkyctoivz8oecqnahnagdgre8wyu8oanz4rw7h4vwnjpnhmmfoj8/mdqsuw==</latexit> <latexit sha1_base64="tpgusjk+wuyjr/gurkqvcvd3bus=">aaab+xicbvbntwixej3fl8svvy9egshee9nlohctojhxifgqbdakwwo0tn1n2yuhc//eiwen8eo/8ea/scaefhzjjc/vzwrmxhhzpo3nftu5tfwnza38dmfnd2//wd08augouytwscqj1qyxppxjwjfmcnqmfcui5pqphn7m/kcrvzpf8tgmyxoi3jesxwg2vuq4bim8qmlbcxq7rzohsanjfr2ynwdajx5gipch1ng/2t2ijijkqzjwuuv7sqlsrawjne4l7uttgjmh7towprilqon0fvkunvmli3qrsiunmqu/j1istb6l0hykbaz62zuj/3mtxpqug5tjodfukswixskridasbtrlihldx5zgopi9fzebvpgyg1bbhuavv7xkgpwy75x9+0qxep3fkyctoivz8oecqnahnagdgre8wyu8oanz4rw7h4vwnjpnhmmfoj8/mdqsuw==</latexit> <latexit sha1_base64="tpgusjk+wuyjr/gurkqvcvd3bus=">aaab+xicbvbntwixej3fl8svvy9egshee9nlohctojhxifgqbdakwwo0tn1n2yuhc//eiwen8eo/8ea/scaefhzjjc/vzwrmxhhzpo3nftu5tfwnza38dmfnd2//wd08augouytwscqj1qyxppxjwjfmcnqmfcui5pqphn7m/kcrvzpf8tgmyxoi3jesxwg2vuq4bim8qmlbcxq7rzohsanjfr2ynwdajx5gipch1ng/2t2ijijkqzjwuuv7sqlsrawjne4l7uttgjmh7towprilqon0fvkunvmli3qrsiunmqu/j1istb6l0hykbaz62zuj/3mtxpqug5tjodfukswixskridasbtrlihldx5zgopi9fzebvpgyg1bbhuavv7xkgpwy75x9+0qxep3fkyctoivz8oecqnahnagdgre8wyu8oanz4rw7h4vwnjpnhmmfoj8/mdqsuw==</latexit> b nx v i L 2 i := nx Ln 2 i=1 p i L = 1 n L i i=1 # data points p i := Prob({i} S) =P ii
<latexit sha1_base64="lz5g7p8kggfaagfrymokojesgpi=">aaacehicbvc7tsmwfhv4lvikmljyvaimku4cs6ukfoyoraipqqmr4zqtvcejbaepivijlpwkcwmistky8te4bqzoodkvjs65v/feeyscke0439bk6tr6xmzpq7y9s7u3bx8cdlscsklbjoax7avyuc4ebwumoe0lkuio4lqbjk+nfvebssvicacncfuipbqszarri/n2mdt3ayyzzl53q4ljhvjm5k5kiz9jdztfc9j0mev5dswpojpazyikugefwr795q5ikkzuamkxun3kjnrlsnsmcjqx3vtrbjmxhtk+oqjhvhnz7kecnhplamnymhiazttfexmoljpegemmsb6prw8q/uf1ux1eehktsaqpipnfycqhjue0hthgkhlnj4zgipm5fzirnrlok2hzhiawx14mnvovovv0w6s0roo4suaynibzgmafaiab0ajtqmajeaav4m16sl6sd+tj3rpiftnh4a+szx/jkz0x</latexit> <latexit sha1_base64="lz5g7p8kggfaagfrymokojesgpi=">aaacehicbvc7tsmwfhv4lvikmljyvaimku4cs6ukfoyoraipqqmr4zqtvcejbaepivijlpwkcwmistky8te4bqzoodkvjs65v/feeyscke0439bk6tr6xmzpq7y9s7u3bx8cdlscsklbjoax7avyuc4ebwumoe0lkuio4lqbjk+nfvebssvicacncfuipbqszarri/n2mdt3ayyzzl53q4ljhvjm5k5kiz9jdztfc9j0mev5dswpojpazyikugefwr795q5ikkzuamkxun3kjnrlsnsmcjqx3vtrbjmxhtk+oqjhvhnz7kecnhplamnymhiazttfexmoljpegemmsb6prw8q/uf1ux1eehktsaqpipnfycqhjue0hthgkhlnj4zgipm5fzirnrlok2hzhiawx14mnvovovv0w6s0roo4suaynibzgmafaiab0ajtqmajeaav4m16sl6sd+tj3rpiftnh4a+szx/jkz0x</latexit> <latexit sha1_base64="lz5g7p8kggfaagfrymokojesgpi=">aaacehicbvc7tsmwfhv4lvikmljyvaimku4cs6ukfoyoraipqqmr4zqtvcejbaepivijlpwkcwmistky8te4bqzoodkvjs65v/feeyscke0439bk6tr6xmzpq7y9s7u3bx8cdlscsklbjoax7avyuc4ebwumoe0lkuio4lqbjk+nfvebssvicacncfuipbqszarri/n2mdt3ayyzzl53q4ljhvjm5k5kiz9jdztfc9j0mev5dswpojpazyikugefwr795q5ikkzuamkxun3kjnrlsnsmcjqx3vtrbjmxhtk+oqjhvhnz7kecnhplamnymhiazttfexmoljpegemmsb6prw8q/uf1ux1eehktsaqpipnfycqhjue0hthgkhlnj4zgipm5fzirnrlok2hzhiawx14mnvovovv0w6s0roo4suaynibzgmafaiab0ajtqmajeaav4m16sl6sd+tj3rpiftnh4a+szx/jkz0x</latexit> <latexit sha1_base64="lz5g7p8kggfaagfrymokojesgpi=">aaacehicbvc7tsmwfhv4lvikmljyvaimku4cs6ukfoyoraipqqmr4zqtvcejbaepivijlpwkcwmistky8te4bqzoodkvjs65v/feeyscke0439bk6tr6xmzpq7y9s7u3bx8cdlscsklbjoax7avyuc4ebwumoe0lkuio4lqbjk+nfvebssvicacncfuipbqszarri/n2mdt3ayyzzl53q4ljhvjm5k5kiz9jdztfc9j0mev5dswpojpazyikugefwr795q5ikkzuamkxun3kjnrlsnsmcjqx3vtrbjmxhtk+oqjhvhnz7kecnhplamnymhiazttfexmoljpegemmsb6prw8q/uf1ux1eehktsaqpipnfycqhjue0hthgkhlnj4zgipm5fzirnrlok2hzhiawx14mnvovovv0w6s0roo4suaynibzgmafaiab0ajtqmajeaav4m16sl6sd+tj3rpiftnh4a+szx/jkz0x</latexit> Convergence Rate II P ij := Prob({i, j} S) Constants satisfying: P pp > Diag(p 1 v 1,p 2 v 2,...,p n v n ). krf i (x) rf i (y)k applel i kx yk Expected mini-batch size: b =E S <latexit sha1_base64="tpgusjk+wuyjr/gurkqvcvd3bus=">aaab+xicbvbntwixej3fl8svvy9egshee9nlohctojhxifgqbdakwwo0tn1n2yuhc//eiwen8eo/8ea/scaefhzjjc/vzwrmxhhzpo3nftu5tfwnza38dmfnd2//wd08augouytwscqj1qyxppxjwjfmcnqmfcui5pqphn7m/kcrvzpf8tgmyxoi3jesxwg2vuq4bim8qmlbcxq7rzohsanjfr2ynwdajx5gipch1ng/2t2ijijkqzjwuuv7sqlsrawjne4l7uttgjmh7towprilqon0fvkunvmli3qrsiunmqu/j1istb6l0hykbaz62zuj/3mtxpqug5tjodfukswixskridasbtrlihldx5zgopi9fzebvpgyg1bbhuavv7xkgpwy75x9+0qxep3fkyctoivz8oecqnahnagdgre8wyu8oanz4rw7h4vwnjpnhmmfoj8/mdqsuw==</latexit> <latexit sha1_base64="tpgusjk+wuyjr/gurkqvcvd3bus=">aaab+xicbvbntwixej3fl8svvy9egshee9nlohctojhxifgqbdakwwo0tn1n2yuhc//eiwen8eo/8ea/scaefhzjjc/vzwrmxhhzpo3nftu5tfwnza38dmfnd2//wd08augouytwscqj1qyxppxjwjfmcnqmfcui5pqphn7m/kcrvzpf8tgmyxoi3jesxwg2vuq4bim8qmlbcxq7rzohsanjfr2ynwdajx5gipch1ng/2t2ijijkqzjwuuv7sqlsrawjne4l7uttgjmh7towprilqon0fvkunvmli3qrsiunmqu/j1istb6l0hykbaz62zuj/3mtxpqug5tjodfukswixskridasbtrlihldx5zgopi9fzebvpgyg1bbhuavv7xkgpwy75x9+0qxep3fkyctoivz8oecqnahnagdgre8wyu8oanz4rw7h4vwnjpnhmmfoj8/mdqsuw==</latexit> <latexit sha1_base64="tpgusjk+wuyjr/gurkqvcvd3bus=">aaab+xicbvbntwixej3fl8svvy9egshee9nlohctojhxifgqbdakwwo0tn1n2yuhc//eiwen8eo/8ea/scaefhzjjc/vzwrmxhhzpo3nftu5tfwnza38dmfnd2//wd08augouytwscqj1qyxppxjwjfmcnqmfcui5pqphn7m/kcrvzpf8tgmyxoi3jesxwg2vuq4bim8qmlbcxq7rzohsanjfr2ynwdajx5gipch1ng/2t2ijijkqzjwuuv7sqlsrawjne4l7uttgjmh7towprilqon0fvkunvmli3qrsiunmqu/j1istb6l0hykbaz62zuj/3mtxpqug5tjodfukswixskridasbtrlihldx5zgopi9fzebvpgyg1bbhuavv7xkgpwy75x9+0qxep3fkyctoivz8oecqnahnagdgre8wyu8oanz4rw7h4vwnjpnhmmfoj8/mdqsuw==</latexit> <latexit sha1_base64="tpgusjk+wuyjr/gurkqvcvd3bus=">aaab+xicbvbntwixej3fl8svvy9egshee9nlohctojhxifgqbdakwwo0tn1n2yuhc//eiwen8eo/8ea/scaefhzjjc/vzwrmxhhzpo3nftu5tfwnza38dmfnd2//wd08augouytwscqj1qyxppxjwjfmcnqmfcui5pqphn7m/kcrvzpf8tgmyxoi3jesxwg2vuq4bim8qmlbcxq7rzohsanjfr2ynwdajx5gipch1ng/2t2ijijkqzjwuuv7sqlsrawjne4l7uttgjmh7towprilqon0fvkunvmli3qrsiunmqu/j1istb6l0hykbaz62zuj/3mtxpqug5tjodfukswixskridasbtrlihldx5zgopi9fzebvpgyg1bbhuavv7xkgpwy75x9+0qxep3fkyctoivz8oecqnahnagdgre8wyu8oanz4rw7h4vwnjpnhmmfoj8/mdqsuw==</latexit> b nx v i L 2 i := nx Ln 2 i=1 p i L = 1 n L i i=1 # data points p i := Prob({i} S) =P ii Optimal rate: minimize over {(v i,p i )} n i=1
Convergence Rate III
Experiments
Nonconvex Variance Reduced Optimization with Arbitrary Sampling Samuel Horváth 1 Peter Richtárik 1,2,3 1 KAUST 2 University of Edinburgh 3 Moscow Institute of Physics and Technology min xœr d The Problem nx f(x) := 1 f i(x) (1) n i=1 f i is L i-smooth but non-convex n is big Arbitrary Sampling Sampling: arandomset-valuedmappings with values being subsets of [n] :={1, 2,...,n}. A sampling is used to generate minibatches in each iteration. Probability matrix associated with sampling S : def P ij =Prob({i, j} S) Probability vector associated with sampling S : def p =(p 1,...,p n), p i =Prob(iœS) Minibatch size: b =E[ S ] (expected size of S) Proper sampling: Sampling for which p i > 0 for all i œ [n] Arbitrary sampling = any proper sampling Key Lemma Let 1, 2,..., n be vectors in R d def and let = 1 P n n i=1 i be their average. Let S be a proper sampling. Let v =(v 1,...,v n) > 0 be such that P pp Diag(p 1v 1, p 2v 2,...,p nv n). (2) Then 2 6 X i E 4 iœs np 2 3 7 5 Æ 1 X n v i i n 2 Î iî 2. i=1 p i Whenever (2) holds, it must be the case that v i Ø 1 p i. Optimal Sampling & Superlinear Speedup Under our analysis, the independent sampling S ú defined by 8 >< def (b + k n) P Li kj=1 if i Æ k Lj, p i = >: 1, if i>k, is optimal, where k is the largest integer satisfying 0 <b+ k n Æ P k i=1 Li. Lk All 3 methods enjoy superlinear speed in b up to the minibatch size b max := max{b bl n Æ P n i=1 L i}. # Stochastic Gradient Evaluations to Achieve E apple ÎÒf(x)Î 2 Æ Alg Uniform sampling Arbitrary sampling [NEW] S ú (Best Sampling) [NEW] ( ) ( ) 8 9 SVRG max n, (1+4/3)Lmaxc1n2/3 2/3 < 2/3 (1+4 /3) Lc1n (1+4(n b) 3n ) Lc1n = [1] max n, max n, : ; SAGA n + 2Lmaxc2n2/3 2/3 (1+ ) Lc2n [2] n + n + (1+n b 2/3 n ) Lc2n SARAH n + n b n 1 L2 maxc3 [3] n + L 2 c3 2 n + n b L 2 n 2 c3 2 Constants: L max =max i L i L = 1 n Pi L i c 1,c 2,c 3 = universal constants := L b P n vil 2 2 n 2 i i=1 pi Numerical Results Main Contributions Samplings SVRG with Arbitrary Sampling We develop arbitrary sampling variants of 3 popular variance-reduced methods for solving the non-convex problem (1): SVRG [1], SAGA [2], SARAH [3]. We are able calculate the optimal sampling out of all samplings of a given minibatch size. This is the first time an optimal minibatch sampling was computed (from the class of all samplings). We design importance sampling & approximate importance sampling for minibatches, which vastly outperform standard uniform minibatch strategies in practice. Uniform S u :Everysubsetof[n] of size b (minibatch size) is chosen with the same probability: 1 /( n b) Independent S ú :Foreachi œ [n] we independently flip a coin, and with probability p i include element i into S. Approximate Independent S a :Fixsome k œ [n] and let a = Ák max iæk p ië. Wenow sample a single set S Õ of cardinality a using the uniform minibatch sampling S u.subsequently, we apply an independent sampling S ú to select elements of S Õ,withselectionprobabilities p Õ i = kp i/a. TheresultingrandomsetisS a. Algorithm 1: SVRG x 0 = x 0 m = x 0, M = ÁT/mË; for s =0to M 1 do x s+1 0 = x s m; g s+1 = 1 P n n i=1 Òf i( x s ) for t =0to m 1 do Draw a random subset (minibatch) S t S vt s+1 = P 1 (Òf itœst it (x s+1 npi t t ) Òf it ( x s )) + g s+1 x s+1 t+1 = x s+1 t vt s+1 end x s+1 = x s+1 m end Output: Iterate x a chosen uniformly random from {{x s+1 t } m t=0} M s=0 References [1] Sashank J Reddi, Ahmed Hefny, Suvrit Sra, Barnabás Póczos, and Alex Smola. Stochastic variance reduction for nonconvex optimization. In The 33th International Conference on Machine Learning, pages 314 323, 2016. [2] Sashank J Reddi, Suvrit Sra, Barnabás Póczos, and Alex Smola. Fast incremental method for smooth nonconvex optimization. In Decision and Control (CDC), 2016 IEEE 55th Conference on, pages1971 1977.IEEE,2016. [3] Lam M Nguyen, Jie Liu, Katya Scheinberg, and Martin Taká. Stochastic recursive gradient algorithm for nonconvex optimization. arxiv:1705.07261, 2017. Poster: Pacific Ballroom #95 (Today 6:30 9:00 PM)
Thank you!