reinforcement learning through the optimization lens Benjamin Recht University of California, Berkeley

Wielkość: px
Rozpocząć pokaz od strony:

Download "reinforcement learning through the optimization lens Benjamin Recht University of California, Berkeley"

Transkrypt

1 reinforcement learning through the optimization lens Benjamin Recht University of California, Berkeley

2 trustable, scalable, predictable

3 Control Theory! Reinforcement Learning is the study of how to use past data to enhance the future manipulation of a dynamical system

4 Disciplinary Biases AE/CE/EE/ME CS Control Theory Reinforcement Learning RL Control continuous discrete model action data action IEEE Transactions Science Magazine

5 Disciplinary Biases AE/CE/EE/ME CS Control Theory Reinforcement Learning Today s talk will try to unify these camps and point out how to merge their perspectives. RL Control continuous discrete model action data action IEEE Transactions Science Magazine

6 Main research challenge: What are the fundamental limits of learning systems that interact with the physical environment? How well must we understand a system in order to control it? statistical learning theory theoretical foundations robust control theory core optimization

7 Control theory is the study of dynamical systems with inputs y G xt u x t+1 = Ax t + Bu t y t = Cx t + Du t Simplest case of such systems are linear systems xt is called the state, and the dimension of the state is called the degree, d. ut is called the input, and the dimension is p. yt is called the output, and the dimension is q. For today, will only consider C=I, D=0 (xt observed)

8 Reinforcement Learning Control theory is the study of dynamical systems with inputs ^ discrete y G x t u p(x t+1 past) =p(x t+1 x t, u t ) p(y t past) =p(y t x t, u t ) Simplest example: Partially Observed Markov Decision Process (POMDP) xt is the state, and it takes values in [d] ut is called the input, and takes values in [p]. yt is called the output, and takes values in [q]. For today, will only consider when xt observed (MDP).

9 <latexit sha1_base64="otgopnlc3lpbujxkzhalqk3gehs=">aaacoxicbvhbahsxejw3tzs9oe1jx0rnwqhx7izc8xiibaet5mg9oanyyzkrhdsiwmmrrsvm8u/0a/ra/kx/plrhgdrugobwzlw0z/jksudx/kcv3bp95+69vfv7dx4+evykffd03blvbq6fucze5ubqsy1dkqtwsriiza7wir961+gx39e6afq3wlsyljdvciifukcydm9m4dpij7zrs6q3vouh1/nzta+szw+extfupkpdrn2j+/eq+c5i1qdd1jhidlrpuddcl6hjkhbulmqvptvykklhcn/shvygrmckowa1lojserxwkr8mtmenxoania/yfytqkj1blhnilifmbltryp9pi0+t47swuvkewlwpmnjfyfdgi15ii4luigaqvoa/cjedc4kckxttvr0rfbub1hovptafbrgk5mqhka6pbkmbreopuin+fbtjz3i6oxs1tg3k7ns5lch9s3aufbitha6sbnu/c86p+knctz6/7py+xz9mjz1nl1ixjewno2uf2yanmwa/2e/2i/2ootgnabb9uu6nwuuaz2wjotffdanq+g==</latexit> <latexit sha1_base64="swbjujk950ux4q9mabdzrzvkgic=">aaacjxicbvfdsxtbfj2s9aoxndg+fpoynfgus9gvrr/aytocffbb0agql+xu5cyznj1dzu6whcx9nx1t/0//twdjbjn4yebwzv2ye0+ckwnj9/9vvkuxyyuray+r669eb9tqm1vxns2nwlzivwpuy7copmy2svj4mxmejfz4e99/k/wbn2istpuvjtime+hr2zmcyffr/e0wkmgvgpmpn/lxpoyi7/ewzyok6g2/6u+cl4jgchpsgufrziw866yit1ctugbtj/azcgswjixccfuut5ibuic+dhzukkani8kky77tmc7vpcy9txzcpq0oilf2lmqumwea2hmtjj/tojn1jsnc <latexit sha1_base64="+5ynezhvzc7ginha9qex+xhlpxg=">aaacihicbvfdsxtbfj1sbavph7e++jiychfk2bxb+icicu2dd4qncsmy3j3cjbdnz5ezu8ww+fv6qj+p/6azmyum6ywbm+fc75swmhyh4e9g8grt9zu36xvnd+8/fnxsbx26cxlpffzurnn7l4jdtqz7tkzxrraiwarxnr0/q/xbn2gd5eyhtwummxgbgpec9lts2i4t <latexit sha1_base64="dox/ktybitgjchwuzwtodyh8jia=">aaacghicbvfdsxtbfj1s1apt/aipvgwgiykkuyk09elaqr98udqqjeu4o7ljls7obmfufspi7+hr/vn+ Controller Design G u xt x t+1 = Ax t + Bu t x K u t = t ( t ) A dynamical system is connected in feedback with a controller that tries to get the closed loop to behave. Actions decided based on observed trajectories t =(u 1,...,u t 1, x 0,...,x t ) A mapping from trajectory to action is called a policy, t ( t ) Optimal control: find policy that minimizes some objective.

10 <latexit sha1_base64="vs+14vgxeycwqa4/abiirwhhyzg=">aaadgnicbvjnb9naelxnv0n5sohizuvelyooshescfspoia49fbe01bkgmu9gser7q6t3tfkspxpupjhucguxpg3rfmjsmjilmbfe/n2z8zpiyxfmpzlb1euxrt+y+tmz/vw7tt3uzv3tm1egg4jnsvcnkfmghqarihqwnlhgkluwll6cdjwz5/awjhre1wuecs21sitnkgdku5xmsju6iozwxz1jwxdosrn55uswijxgwqys6hioevt6k2dajwq4zjauiuv7kf1xxnymgb/nucgthcpgjgdyuxpa2ohogws5k79okrjpsn+qgfqvndolnehr9fcojiia5w6fpskfvfs7yxdcblkm4napoe1czzs+dgd5lxuojflzu04cgumnr0klse1wvoogl9guxi7vdmfnq6w86zji4dmsjyb92kks/tfioopaxcqdcpmmnada8d/cemss+dxjxrrimh+evfwsoi5azzdjsiar7lwcengulcspmogcxqrxlll6v0ax+mkmpda8hwca6jeorrmqauomnbnv9vbisx5wlqlr83k/rdotqh7r8vuob0cuf9e722i3uki9ffvjqdphle4jn4/7r28alez5t3whnp9l/keeqfeo+/yg3nc3/yj/4x/mvgsfau+bz8upyhf1tz3vil4+rs0rp43</late <latexit sha1_base64="otgopnlc3lpbujxkzhalqk3gehs=">aaacoxicbvhbahsxejw3tzs9oe1jx0rnwqhx7izc8xiibaet5mg9oanyyzkrhdsiwmmrrsvm8u/0a/ra/kx/plrhgdrugobwzlw0z/jksudx/kcv3bp95+69vfv7dx4+evykffd03blvbq6fucze5ubqsy1dkqtwsriiza7wir961+gx39e6afq3wlsyljdvciifukcydm9m4dpij7zrs6q3vouh1/nzta+szw+extfupkpdrn2j+/eq+c5i1qdd1jhidlrpuddcl6hjkhbulmqvptvykklhcn/shvygrmckowa1lojserxwkr8mtmenxoania/yfytqkj1blhnilifmbltryp9pi0+t47swuvkewlwpmnjfyfdgi15ii4luigaqvoa/cjedc4kckxttvr0rfbub1hovptafbrgk5mqhka6pbkmbreopuin+fbtjz3i6oxs1tg3k7ns5lch9s3aufbitha6sbnu/c86p+knctz6/7py+xz9mjz1nl1ixjewno2uf2yanmwa/2e/2i/2ootgnabb9uu6nwuuaz2wjotffdanq+g==</latexit> <latexit sha1_base64="dox/ktybitgjchwuzwtodyh8jia=">aaacghicbvfdsxtbfj1s1apt/aipvgwgiykkuyk09elaqr98udqqjeu4o7ljls7obmfufspi7+hr/vn+ Optimal control h i PT minimize E e t=1 C t(x t, u t ) x G x t u e s.t. x t+1 = f t (x t, u t, e t ) u t = t ( t ) Ct is the cost. If you maximize, it s called a reward. et is a noise process ft is the state-transition function t =(u 1,...,u t 1, x 0,...,x t ) is an observed trajectory t ( t ) is the policy. This is the optimization decision variable.

11 <latexit sha1_base64="q0lcccsuikrn3vfvuntoazacfvw=">aaacnhicbvhtahnbfj2svwv9su1poqwgmcuydougf4rssy1yoajjc8myzm7ejennz5azo5kw7cp4np61d9k3ctajybivdhpmnpsx9960kmjign42gjtbd+/d336w8/dr4ydpm7vp+ly7w6hhtdtmkmuwpfdqq4esrgodle8lxkbxh2v98gcyk7t6jrmc4pynlrgjztbtsfpvncnxdvtrd/s07rlsop9+e1wl+7dqdgwm0r4kzvbydedgn0g0bc2ytitktxepm81ddgq5znyoordaugqgbzdq7qydhylxazaggyek5wdjct5rrv96jqmjbfxrsofsvxely62d5an3zblo7lpwk//tbg5h7+nsqmihkl4onhksoqb1eggmdhcumw8yn8l/lfijm4yjh+jklxnuavhkj+xukcf1bmusxcka5kklmdoh6q7kt0jk+o0ps8/feij/vz+2ltsnyizqds79pttbhrnfslq+/k3qp+xgytf6+rz1dlxcztz5tl6qnonio3jezsgf6rfofpjf5de5cfadk+bz8gxhgjswmxtkxyl+h5o0zuo=</latexit> <latexit sha1_base64="+5ynezhvzc7ginha9qex+xhlpxg=">aaacihicbvfdsxtbfj1sbavph7e++jiychfk2bxb+icicu2dd4qncsmy3j3cjbdnz5ezu8ww+fv6qj+p/6azmyum6ywbm+fc75swmhyh4e9g8grt9zu36xvnd+8/fnxsbx26cxlpffzurnn7l4jdtqz7tkzxrraiwarxnr0/q/xbn2gd5eyhtwummxgbgpec9lts2i4t Optimal control G u xt x t+1 = F(u t, u t 1, u t 2,...) x K u t = t ( t ) t A dynamical system is connected in feedback with a controller that tries to get the closed loop to behave. Optimal control: find policy that minimizes some objective. Major challenge: how to perform optimal control when the system is unknown? Today: Reinvent RL attempting to answer this question

12 HVAC ROOM t ( u)+ ( u u + pi) = + g M T = Q +ṁ s c p (T s T ) sensor state action

13 Identify everything Identify a coarse model We don t need no stinking models! HVAC ROOM t ( u)+ ( u u + pi) = + g M T = Q +ṁ s c p (T s T ) sensor state action PDE control High performance aerodynamics model predictive control reinforcement learning PID control? We need robust fundamentals to distinguish these approaches

14 But PID control works 50 Bode Diagram One decade Magnitude (db) dB Gain crossover point 0.5-6dB -20 Loglog slope = Frequency (rad/sec) 2 parameters suffice for 95% of all control applications. How much needs to be modeled for more advanced control? Can we learn to compensate for poor models, changing conditions?

15 <latexit sha1_base64="vs+14vgxeycwqa4/abiirwhhyzg=">aaadgnicbvjnb9naelxnv0n5sohizuvelyooshescfspoia49fbe01bkgmu9gser7q6t3tfkspxpupjhucguxpg3rfmjsmjilmbfe/n2z8zpiyxfmpzlb1euxrt+y+tmz/vw7tt3uzv3tm1egg4jnsvcnkfmghqarihqwnlhgkluwll6cdjwz5/awjhre1wuecs21sitnkgdku5xmsju6iozwxz1jwxdosrn55uswijxgwqys6hioevt6k2dajwq4zjauiuv7kf1xxnymgb/nucgthcpgjgdyuxpa2ohogws5k79okrjpsn+qgfqvndolnehr9fcojiia5w6fpskfvfs7yxdcblkm4napoe1czzs+dgd5lxuojflzu04cgumnr0klse1wvoogl9guxi7vdmfnq6w86zji4dmsjyb92kks/tfioopaxcqdcpmmnada8d/cemss+dxjxrrimh+evfwsoi5azzdjsiar7lwcengulcspmogcxqrxlll6v0ax+mkmpda8hwca6jeorrmqauomnbnv9vbisx5wlqlr83k/rdotqh7r8vuob0cuf9e722i3uki9ffvjqdphle4jn4/7r28alez5t3whnp9l/keeqfeo+/yg3nc3/yj/4x/mvgsfau+bz8upyhf1tz3vil4+rs0rp43</late <latexit sha1_base64="otgopnlc3lpbujxkzhalqk3gehs=">aaacoxicbvhbahsxejw3tzs9oe1jx0rnwqhx7izc8xiibaet5mg9oanyyzkrhdsiwmmrrsvm8u/0a/ra/kx/plrhgdrugobwzlw0z/jksudx/kcv3bp95+69vfv7dx4+evykffd03blvbq6fucze5ubqsy1dkqtwsriiza7wir961+gx39e6afq3wlsyljdvciifukcydm9m4dpij7zrs6q3vouh1/nzta+szw+extfupkpdrn2j+/eq+c5i1qdd1jhidlrpuddcl6hjkhbulmqvptvykklhcn/shvygrmckowa1lojserxwkr8mtmenxoania/yfytqkj1blhnilifmbltryp9pi0+t47swuvkewlwpmnjfyfdgi15ii4luigaqvoa/cjedc4kckxttvr0rfbub1hovptafbrgk5mqhka6pbkmbreopuin+fbtjz3i6oxs1tg3k7ns5lch9s3aufbitha6sbnu/c86p+knctz6/7py+xz9mjz1nl1ixjewno2uf2yanmwa/2e/2i/2ootgnabb9uu6nwuuaz2wjotffdanq+g==</latexit> <latexit sha1_base64="dox/ktybitgjchwuzwtodyh8jia=">aaacghicbvfdsxtbfj1s1apt/aipvgwgiykkuyk09elaqr98udqqjeu4o7ljls7obmfufspi7+hr/vn+ Optimal control h i PT minimize E e t=1 C t(x t, u t ) x G x t u e s.t. x t+1 = f t (x t, u t, e t ) u t = t ( t ) Ct is the cost. If you maximize, it s called a reward. et is a noise process ft is the state-transition function t =(u 1,...,u t 1, x 0,...,x t ) is an observed trajectory t ( t ) is the policy. This is the optimization decision variable.

16 <latexit sha1_base64="mr94ezqth17vzwjopx3thjsdtck=">aaacg3icbvfbaxnred5ztdz6aaqpvhwmqousdktbfsguffshdxwnlsrlmd2zjeppztlntiqu+sw+6o/y33g2jwasbwy+vm/uu5saaqfp71zy5+69nfu7d/yepnr8zl998prbcjvx2fnoo39vqebnfntm <latexit sha1_base64="ljvoamm0zxufsbpjh2gmnfj3dfs=">aaacl3icbvfdaxnbfj2svwv9avwtvgwgiquju0wwfvccfu1dh1psbcfzl7utm2tofcwzdyvhmx/gr/fvf4r/xtk0bzn4yebwzrn3zr03l5t0fmd/gtgdjbv37m8+2hr46pgtp9s7z756wzqbxwgvdzc5eftsyjckkbwshilofv7kvx9r/ei7oi+toadpgamgkzfdkyaclw03+77uwuxv4tm3c55k1xvrktfelly/72phpbjmfl <latexit sha1_base64="oi5ov9kcoehyn9bwcwwjat8txu4=">aaac4hicbvhlihnbfk1ux2p7yujstwfqrkyyxslmufagfhqxixgnm5buqnx1taeyquqm6ryknp0brsstn+xop3fpdrjlknih4hdouy+6n6uudbjhv4lwytvr12/s3ixu3b5z915v9/5nv9zwwfcuqrtngxegpiehslrwxlngolnwll286fszl2cdlm0nnfeqal4yozgco6fgptubn7jpwvqkjhku0jsz5mjlri0yfuiztzioxgiw+t+xy2rppo02k+ikyqfa85fdto7suttu9enbvai6ddgk9mkqtse7qzrkpag1gbskozdicyvpwy1koacnktpbxcufl2dkoeeaxnos1tlsx57j6as0/hmkc/zyrso1c3odeacfdoo2ty78nzaqcxkuntjunyiry0atwlesabdjmkslatxcay6s9lnsmewwc/sxwouyqf2bwptjm6unfguog6zcgvrusqeouttdr5p3uin6krtht2qxxb+ql9vje29lide9o/hnnk+3zp4gbhp922d4fpbywd686b+/xl <latexit sha1_base64="+ojsm2yurvuosb1y5f0dyh5w9ea=">aaaco3icbvhbbtnaen2ywym3fb55wrgbioscjzbahkcvqikhpbroakxyisbritpqem3tjqsek5/b1/akh8hfse4digkjrfbonllpwmlyhia/osgvq9eu39i5uxvr9p2 Newton s Laws z t+1 = z t + v t v t+1 = v t + a t ma t = u t minimize TX 1 t=0 subject to x t+1 = 1 (xt ) 1 > apple x t + apple 0 1/m u t x t = apple zt v t

17 <latexit sha1_base64="mr94ezqth17vzwjopx3thjsdtck=">aaacg3icbvfbaxnred5ztdz6aaqpvhwmqousdktbfsguffshdxwnlsrlmd2zjeppztlntiqu+sw+6o/y33g2jwasbwy+vm/uu5saaqfp71zy5+69nfu7d/yepnr8zl998prbcjvx2fnoo39vqebnfntm <latexit sha1_base64="c3vasflhsyp2zenoal07qvhm1sk=">aaaci3icbvfdsxtbfj1sbwttq7e++ncxoaeqoyrdeaqlgvhbffdbulofzf3utm6swznzzeaujcz5nx1tf5d/xtmyqpp0wsdhnpsx9540v9jrgd7ugmcrz1+8xh219vrn2/wn+ua7ny4r <latexit sha1_base64="oi5ov9kcoehyn9bwcwwjat8txu4=">aaac4hicbvhlihnbfk1ux2p7yujstwfqrkyyxslmufagfhqxixgnm5buqnx1taeyquqm6ryknp0brsstn+xop3fpdrjlknih4hdouy+6n6uudbjhv4lwytvr12/s3ixu3b5z915v9/5nv9zwwfcuqrtngxegpiehslrwxlngolnwll286fszl2cdlm0nnfeqal4yozgco6fgptubn7jpwvqkjhku0jsz5mjlri0yfuiztzioxgiw+t+xy2rppo02k+ikyqfa85fdto7suttu9enbvai6ddgk9mkqtse7qzrkpag1gbskozdicyvpwy1koacnktpbxcufl2dkoeeaxnos1tlsx57j6as0/hmkc/zyrso1c3odeacfdoo2ty78nzaqcxkuntjunyiry0atwlesabdjmkslatxcay6s9lnsmewwc/sxwouyqf2bwptjm6unfguog6zcgvrusqeouttdr5p3uin6krtht2qxxb+ql9vje29lide9o/hnnk+3zp4gbhp922d4fpbywd686b+/xl <latexit sha1_base64="+ojsm2yurvuosb1y5f0dyh5w9ea=">aaaco3icbvhbbtnaen2ywym3fb55wrgbioscjzbahkcvqikhpbroakxyisbritpqem3tjqsek5/b1/akh8hfse4digkjrfbonllpwmlyhia/osgvq9eu39i5uxvr9p2 Newton s Laws z t+1 = z t + v t v t+1 = v t + a t ma t = u t minimize TX (x t ) 2 1 t=0 subject to x t+1 = +ru 2 t apple x t + apple 0 1/m u t x t = apple zt v t

18 <latexit sha1_base64="j4lebcdojzuwdwucyrzfilabtyq=">aaadhhicbvlfb9mwehbcr1eydpdii0ufggxvcsdba5xgamhdhjzot0lnfjmu01qznci+obyr/wqv/co8iv6r+g+wuydrjpoiu3zf3dl3n/nkcanr9dsil12+cvxaxvxojzubt253t+4cmblwli1okup9khpdbfdsbbweo6k0izix7dg/e+p5489mg16qiswqlkoyvbzglicdsu63jgdtrizrmiwak0ttswrezq3kikv+htx4iu4kgvme23dnilgb46tqhnq4scmgj6awmyvb3jwo8tyd08f40hu8g+vl30fve82nm0itpo1u+td3neeudcdu8ac/bov2flrzlinowtskvvnw7ux9agn4yhc3qq+1dpbtbwkykwktmqiqidhjokogde2au8hcmlvhfafnzmrgllremppa5uyb/mahe1yu2n0k8bl9t8isacxc5i7t78ascx78hzeuoxizwq6qgpii5wcvtcbqyi8pnndnkiifcwjv3n0v0xlx6wyn4sopy94voyut2hmtoc0nba0vmadnhggysmkvn8q+50lgt0qzvo/v+cu6tp7efsunhmytffds1kmlyu6qeh39f4ojp/046sehz3u7e600g+geuo+2uyxeof30ar2gealbzvasebumwq/h9/bh+pm8nqzamrtoxcjffwdk/f3k</latexit> <latexit sha1_base64="mr94ezqth17vzwjopx3thjsdtck=">aaacg3icbvfbaxnred5ztdz6aaqpvhwmqousdktbfsguffshdxwnlsrlmd2zjeppztlntiqu+sw+6o/y33g2jwasbwy+vm/uu5saaqfp71zy5+69nfu7d/yepnr8zl998prbcjvx2fnoo39vqebnfntm <latexit sha1_base64="c3vasflhsyp2zenoal07qvhm1sk=">aaaci3icbvfdsxtbfj1sbwttq7e++ncxoaeqoyrdeaqlgvhbffdbulofzf3utm6swznzzeaujcz5nx1tf5d/xtmyqpp0wsdhnpsx9540v9jrgd7ugmcrz1+8xh219vrn2/wn+ua7ny4r <latexit sha1_base64="oi5ov9kcoehyn9bwcwwjat8txu4=">aaac4hicbvhlihnbfk1ux2p7yujstwfqrkyyxslmufagfhqxixgnm5buqnx1taeyquqm6ryknp0brsstn+xop3fpdrjlknih4hdouy+6n6uudbjhv4lwytvr12/s3ixu3b5z915v9/5nv9zwwfcuqrtngxegpiehslrwxlngolnwll286fszl2cdlm0nnfeqal4yozgco6fgptubn7jpwvqkjhku0jsz5mjlri0yfuiztzioxgiw+t+xy2rppo02k+ikyqfa85fdto7suttu9enbvai6ddgk9mkqtse7qzrkpag1gbskozdicyvpwy1koacnktpbxcufl2dkoeeaxnos1tlsx57j6as0/hmkc/zyrso1c3odeacfdoo2ty78nzaqcxkuntjunyiry0atwlesabdjmkslatxcay6s9lnsmewwc/sxwouyqf2bwptjm6unfguog6zcgvrusqeouttdr5p3uin6krtht2qxxb+ql9vje29lide9o/hnnk+3zp4gbhp922d4fpbywd686b+/xl <latexit sha1_base64="+ojsm2yurvuosb1y5f0dyh5w9ea=">aaaco3icbvhbbtnaen2ywym3fb55wrgbioscjzbahkcvqikhpbroakxyisbritpqem3tjqsek5/b1/akh8hfse4digkjrfbonllpwmlyhia/osgvq9eu39i5uxvr9p2 Simplest Example: LQR minimize s.t. E h 1 T P T t=1 x t Qx t + u t Ru t i x t+1 = Ax t + Bu t + e t minimize TX (x t ) 2 1 t=0 subject to x t+1 = x t = +ru 2 t apple x t + apple zt v t apple 0 1/m u t

19 The Linearization Principle If a machine learning algorithm does crazy things when restricted to linear models, it s going to do crazy things on complex nonlinear models too. Would you believe someone had a good SAT solver if it couldn t solve 2-SAT? This has been a fruitful research direction: Recurrent neural networks (Hardt, Ma, R. 2016) Generalization and Margin in Neural Nets (Zhang et al 2017) Residual Networks (Hardt and Ma 2017) Bayesian Optimization (Jamieson et al 2017) Adaptive gradient methods (Wilson et al 2017)

20 <latexit sha1_base64="j4lebcdojzuwdwucyrzfilabtyq=">aaadhhicbvlfb9mwehbcr1eydpdii0ufggxvcsdba5xgamhdhjzot0lnfjmu01qznci+obyr/wqv/co8iv6r+g+wuydrjpoiu3zf3dl3n/nkcanr9dsil12+cvxaxvxojzubt253t+4cmblwli1okup9khpdbfdsbbweo6k0izix7dg/e+p5489mg16qiswqlkoyvbzglicdsu63jgdtrizrmiwak0ttswrezq3kikv+htx4iu4kgvme23dnilgb46tqhnq4scmgj6awmyvb3jwo8tyd08f40hu8g+vl30fve82nm0itpo1u+td3neeudcdu8ac/bov2flrzlinowtskvvnw7ux9agn4yhc3qq+1dpbtbwkykwktmqiqidhjokogde2au8hcmlvhfafnzmrgllremppa5uyb/mahe1yu2n0k8bl9t8isacxc5i7t78ascx78hzeuoxizwq6qgpii5wcvtcbqyi8pnndnkiifcwjv3n0v0xlx6wyn4sopy94voyut2hmtoc0nba0vmadnhggysmkvn8q+50lgt0qzvo/v+cu6tp7efsunhmytffds1kmlyu6qeh39f4ojp/046sehz3u7e600g+geuo+2uyxeof30ar2gealbzvasebumwq/h9/bh+pm8nqzamrtoxcjffwdk/f3k</latexit> <latexit sha1_base64="mr94ezqth17vzwjopx3thjsdtck=">aaacg3icbvfbaxnred5ztdz6aaqpvhwmqousdktbfsguffshdxwnlsrlmd2zjeppztlntiqu+sw+6o/y33g2jwasbwy+vm/uu5saaqfp71zy5+69nfu7d/yepnr8zl998prbcjvx2fnoo39vqebnfntm <latexit sha1_base64="c3vasflhsyp2zenoal07qvhm1sk=">aaaci3icbvfdsxtbfj1sbwttq7e++ncxoaeqoyrdeaqlgvhbffdbulofzf3utm6swznzzeaujcz5nx1tf5d/xtmyqpp0wsdhnpsx9540v9jrgd7ugmcrz1+8xh219vrn2/wn+ua7ny4r <latexit sha1_base64="oi5ov9kcoehyn9bwcwwjat8txu4=">aaac4hicbvhlihnbfk1ux2p7yujstwfqrkyyxslmufagfhqxixgnm5buqnx1taeyquqm6ryknp0brsstn+xop3fpdrjlknih4hdouy+6n6uudbjhv4lwytvr12/s3ixu3b5z915v9/5nv9zwwfcuqrtngxegpiehslrwxlngolnwll286fszl2cdlm0nnfeqal4yozgco6fgptubn7jpwvqkjhku0jsz5mjlri0yfuiztzioxgiw+t+xy2rppo02k+ikyqfa85fdto7suttu9enbvai6ddgk9mkqtse7qzrkpag1gbskozdicyvpwy1koacnktpbxcufl2dkoeeaxnos1tlsx57j6as0/hmkc/zyrso1c3odeacfdoo2ty78nzaqcxkuntjunyiry0atwlesabdjmkslatxcay6s9lnsmewwc/sxwouyqf2bwptjm6unfguog6zcgvrusqeouttdr5p3uin6krtht2qxxb+ql9vje29lide9o/hnnk+3zp4gbhp922d4fpbywd686b+/xl <latexit sha1_base64="+ojsm2yurvuosb1y5f0dyh5w9ea=">aaaco3icbvhbbtnaen2ywym3fb55wrgbioscjzbahkcvqikhpbroakxyisbritpqem3tjqsek5/b1/akh8hfse4digkjrfbonllpwmlyhia/osgvq9eu39i5uxvr9p2 Simplest Example: LQR minimize s.t. E h 1 T P T t=1 x t Qx t + u t Ru t i x t+1 = Ax t + Bu t + e t minimize TX (x t ) 2 1 t=0 subject to x t+1 = x t = +ru 2 t apple x t + apple zt v t apple 0 1/m u t

21 <latexit sha1_base64="j4lebcdojzuwdwucyrzfilabtyq=">aaadhhicbvlfb9mwehbcr1eydpdii0ufggxvcsdba5xgamhdhjzot0lnfjmu01qznci+obyr/wqv/co8iv6r+g+wuydrjpoiu3zf3dl3n/nkcanr9dsil12+cvxaxvxojzubt253t+4cmblwli1okup9khpdbfdsbbweo6k0izix7dg/e+p5489mg16qiswqlkoyvbzglicdsu63jgdtrizrmiwak0ttswrezq3kikv+htx4iu4kgvme23dnilgb46tqhnq4scmgj6awmyvb3jwo8tyd08f40hu8g+vl30fve82nm0itpo1u+td3neeudcdu8ac/bov2flrzlinowtskvvnw7ux9agn4yhc3qq+1dpbtbwkykwktmqiqidhjokogde2au8hcmlvhfafnzmrgllremppa5uyb/mahe1yu2n0k8bl9t8isacxc5i7t78ascx78hzeuoxizwq6qgpii5wcvtcbqyi8pnndnkiifcwjv3n0v0xlx6wyn4sopy94voyut2hmtoc0nba0vmadnhggysmkvn8q+50lgt0qzvo/v+cu6tp7efsunhmytffds1kmlyu6qeh39f4ojp/046sehz3u7e600g+geuo+2uyxeof30ar2gealbzvasebumwq/h9/bh+pm8nqzamrtoxcjffwdk/f3k</latexit> Simplest Example: LQR minimize s.t. E h 1 T P T t=1 x t Qx t + u t Ru t i x t+1 = Ax t + Bu t + e t Suppose (A,B) unknown What is the optimal estimation/design scheme? How many samples are needed for near optimal control?

22 <latexit sha1_base64="vs+14vgxeycwqa4/abiirwhhyzg=">aaadgnicbvjnb9naelxnv0n5sohizuvelyooshescfspoia49fbe01bkgmu9gser7q6t3tfkspxpupjhucguxpg3rfmjsmjilmbfe/n2z8zpiyxfmpzlb1euxrt+y+tmz/vw7tt3uzv3tm1egg4jnsvcnkfmghqarihqwnlhgkluwll6cdjwz5/awjhre1wuecs21sitnkgdku5xmsju6iozwxz1jwxdosrn55uswijxgwqys6hioevt6k2dajwq4zjauiuv7kf1xxnymgb/nucgthcpgjgdyuxpa2ohogws5k79okrjpsn+qgfqvndolnehr9fcojiia5w6fpskfvfs7yxdcblkm4napoe1czzs+dgd5lxuojflzu04cgumnr0klse1wvoogl9guxi7vdmfnq6w86zji4dmsjyb92kks/tfioopaxcqdcpmmnada8d/cemss+dxjxrrimh+evfwsoi5azzdjsiar7lwcengulcspmogcxqrxlll6v0ax+mkmpda8hwca6jeorrmqauomnbnv9vbisx5wlqlr83k/rdotqh7r8vuob0cuf9e722i3uki9ffvjqdphle4jn4/7r28alez5t3whnp9l/keeqfeo+/yg3nc3/yj/4x/mvgsfau+bz8upyhf1tz3vil4+rs0rp43</late Optimal control h i PT minimize E e t=1 C t(x t, u t ) s.t. x t+1 = f t (x t, u t, e t ) x G x t u e u t = t ( t ) generic solutions with known dynamics: Batch Optimization Dynamic Programming

23 <latexit sha1_base64="vs+14vgxeycwqa4/abiirwhhyzg=">aaadgnicbvjnb9naelxnv0n5sohizuvelyooshescfspoia49fbe01bkgmu9gser7q6t3tfkspxpupjhucguxpg3rfmjsmjilmbfe/n2z8zpiyxfmpzlb1euxrt+y+tmz/vw7tt3uzv3tm1egg4jnsvcnkfmghqarihqwnlhgkluwll6cdjwz5/awjhre1wuecs21sitnkgdku5xmsju6iozwxz1jwxdosrn55uswijxgwqys6hioevt6k2dajwq4zjauiuv7kf1xxnymgb/nucgthcpgjgdyuxpa2ohogws5k79okrjpsn+qgfqvndolnehr9fcojiia5w6fpskfvfs7yxdcblkm4napoe1czzs+dgd5lxuojflzu04cgumnr0klse1wvoogl9guxi7vdmfnq6w86zji4dmsjyb92kks/tfioopaxcqdcpmmnada8d/cemss+dxjxrrimh+evfwsoi5azzdjsiar7lwcengulcspmogcxqrxlll6v0ax+mkmpda8hwca6jeorrmqauomnbnv9vbisx5wlqlr83k/rdotqh7r8vuob0cuf9e722i3uki9ffvjqdphle4jn4/7r28alez5t3whnp9l/keeqfeo+/yg3nc3/yj/4x/mvgsfau+bz8upyhf1tz3vil4+rs0rp43</late RL Methods x G x t u e h i PT minimize E e t=1 C t(x t, u t ) approximate dynamic programming s.t. x t+1 = f t (x t, u t, e t ) u t = t ( t ) model-based direct policy search How to solve optimal control when the model f is unknown? Model-based: fit model from data Model-free - Approximate dynamic programming: estimate cost from data - Direct policy search: search for actions from data

24 <latexit sha1_base64="wtso4cvxqkkm5sp9ikogdmu8yxs=">aaadghicbvjnb9qwehxcv7t8behixwjftrxvnkfileoliolg0emr3bbsoksod7jr1xeie4j2ifjhupjhucgu3pg3ontuynczydl4vednz4ytqkmlqfdh82/cvhx7zszm5+69+w8edrcendm8nakgile5uui4bsu1dfgigovcam8sbefj5vhdn38by2wut3feqjtxizapfbwdfhe/swqmulfcgd6vk6xqdsusffzlustmfowablowczwmsfwujoepshhebjnffr6e9edtehrjfxbjbhnjdjnymswisdbgdndqwmyc+nly0woaxmt3odgzzjz1g0ewqjojhrzx6tdq4/zvcbcxdijf0pukbjmeaemk3viins5fmyfgobi1ozaomhj2kiucv2jpoedikk9g5flnm7brtehmtz85zezt3lilks7qf09upln2nivo2ftfrnin+d9uvgk6h1vsfywcflcxpawimnnmnhqsdqhuc5dwyar7kxvtbrhan8clwxbebyilsqpzqaxix7cckpyh4q60gbmxuqmqei+vop+4tvs4gdg162wbuv9wtita3wp3s/tomtgnjfxt/3py9miqbopw48ve4zt2nbvkcxlk+iqkr8gh+uboyjaib9pb8/a91/43/4f/0/91jfw99sxjsht+77+2e/1q</late <latexit sha1_base64="qv2lanekunbubcf2z1ek6m/o+og=">aaacnxicbvhbahsxejw3tzs9oe1jhypqcg4jzrcuksfqc+2dksmt44b3wwblss2ilyq0g2ww/0k/pq/tf/rvqnvcqo0oca7nzeuzp7bkeorj363o1u07d+/t3d9/8pdr4yftg6cx3lro4eayzdxlar6v1dggsqovrumoc4xd4updow+v0xlp9ddawmxkmgo5kqiouhm7o89rokqwpavrnznz9bqcncna03gv0ye/4qkoig934l68cr4lkjxoshwc5wetlb0buzwossjwfptelriahemhclmfvh4ticuy4ihadsx6rf6ttosvajpme+pc08rx7l8vnztel8oizjzam7+tnet/tfffk9osltpwhfrcdjpuipphzx34wdoupbybghay/jwlgtgqfk64mwxv26ly2ksev1okm8ytvtgchatsi5ugdbnv/veqxb <latexit sha1_base64="z65z1fewznivdrnx0njfmyzk9iw=">aaaczhicbvfnb9naen2yrxk+0nlksiicpakn7aqjxooqgqshqiqcnjvi15psnvaq67w1o64sbxzlx/fdohof/8a6nyikjlts2/fezozojaspdpr+95z36/adu/e27rcfphz0+elne+fc5kvmfmbymeulmrguheidfcj5rae5zgpjh+ord7u+vobaifx9wxnbowwsjaacatoq7gzdfncg16clvft0iiagkzatkv5lvqshkbpy4pffxdrt/acii8xm3v85te8bx28w414z4+5icxnqjjtdv+8vg26coafd0srzvn2kwknoyowrzbkmgqv+gzefjyjjxrxd0vac2bukfosggoybyc4nunexjpnqaa7duuix7l8zfjjj5tnyotpa1kxrnfk/bvti9dcyqhulcsvugk1lstgn9tjprgjoum4dakafeytlkwhg6ia+0mvzu+bs5sd2virb8glfyyxouimjdccmhkp/zt8ikelnuiaeictfp6orw8u99yirapzo3gbv7obzlsryh/8mod/ob34/+ps6e/y2wc0weuaekx4jybtytd6smzigjhwjp8hp8ss79dczxnvj9vpnzloyet7x35/24uw=</latexit> <latexit sha1_base64="7wws5p4/hlk3102z185pb4izzt8=">aaadjnicbvjnbxmxepuuxyv8pxdgwmuiokrvarwlkoasqajaofrqrnnwipev13e2vm3vyp6telb7f7jyr7ghxi2fgjdzeekyydlovzlnzzynhrqwwvcn51+7fupmra3bntt3791/0n1+egbz0ja+zlnmzuvklzdc8yeikpyimjyqvplz9pkw4c+vulei16cwl3isakbfrdakdkq6x0nkm6eragyd15wudyeonj9vsmihxgde4x1mfivpmlzv64tkimeusd6bebglsioyrpwnu3yyqh+wwh6zwc4xiptcteirzqmigp2zq96lajza5iqayir+dua9vfrowhxtyicnsch6bghddwjx4/ansbcxbuei8gystukptxgsbhsxgeesvfwdk9taurqweds5eexyn3bpeuhzjc34ykwakm7jarhbgj9zybhpcuoobrxa/+2oqlj2rljx2wzjrnmn+d9uvmlkvvwjxztanvtencklhhw3rugxmjybnluemipcwzgbukmzodtxbllof5yttflnsi1ypuzrqiqzgopay0frozupqimhjf5itcxhjxn/wcfb0p03ihng94/dn9g7g8xokgh9/zvj2fmgcopow4vewevwmi30bd1ffrshl+gavucnaiiy99gbeo+8i/+l/83/7v9ylvpe2/miryt/6zcbuwox</latexit> h i PT minimize E e t=1 C t(x t, u t ) s.t. x t+1 = f(x t, u t, e t ) u t = t ( t ) Model-based RL Collect some simulation data. Should have x t+1 '(x t, u t )+ t Fit dynamics with supervised learning: ˆ' = arg min ' NX 1 t=0 x t+1 '(x t, u t ) 2 Solve approximate problem: h i PT minimize E! t=1 C t(x t, u t ) s.t. x t+1 = '(x t, u t )+! t u t = ( t )

25 <latexit sha1_base64="euyqlm8oqoqnwpvqldlbjjbjanm=">aaadnxicbvjlbxmxepyurxietehixskikhrfuwgjlpvkacehhxastfk8jbyon7fqe1f2lcry+7u48jc4cenc+qt400uicsnzm/7m5znpasgfhsj6horxrl67fmprzuvw7tt3t9s794y2lw3ja5bl3jyl1hipnb+aamnpcsopsiu/ts9e1/7tt9xykes+laqekdrvihomgofg7w8k5vohhtwglionzduiks3ntgktlpjck7ylirrq7preiokmfgt+mqidwaiiisistd3bikiewyhkhjixv65fywjlnwqhcxxex/mxnd/bj7xg+7hc3j7u+rjmqkjt1nahw7ec+9t9umih+fwtdfshe83h0cjct5onj9udqbstbw8acwn0ucph450gizoclypryjjao4qjahjfdgst3m9fwl5qdkgnforntrw3ivuuuskppdlbww780ycx6l8zjiprfyr1kfvu7lqvbv/ng5wqvuyc0eujxlplrlkpmes45g1phoem5miblbnh34rzjpp1g2d3pcuydshzyiruxmrb8glfqyxmwvapwg6kelb9vo6dkbj/pnrixs3ox68vw7v33oipapu057+qfrwr7amj19e/aqyfdeoog5887xwendrsoqfoidpdmxqbdtf7diwgiaw7qs8ybmpwa/gj/bn+ugwngybnplqr8pcfxx4jrw==</latexit> <latexit sha1_base64="uo+1skyh9ob/xn5keqmafdwhdva=">aaac03icbvfdixmxfe3hr3x92k4++hisqotsokxqfvtu0ieck253f5pustn3pmgtzgxyr7beerff/vn+ch+dr/pupq1gwy8edufcj9xzp4wsdnu9h43o2vubn2/t3n69c/fe/b3m/omtl5dwwfdkkrdnu+5asqndlkjgrlda9vtb6ftida2ffglrzg6ocv7awppmyfqkjogancfhlinlylcqbpz7iilisc1sy4vntmaan/dpo3ladcrpvoib8ilnmupmaq+lqao2gyxp0wfqoklyc96vklmym2fn0mz1ur1f0g0qr0clrojost8ysyqxpqadqnhnrngvwlhnfqvquo2y0khbxqxpybsg4rrc2c+cqoitwcq0zw14bumc/bfcc+3cxe9dzr2d29rq8n/aqmt05dhlu5qiriwhpawimnpavppicwlvpaaurax/pwlgg4syzf+bsuhdgfjbx Simple Example: LQR minimize s.t. lim T!1 E h Obvious strategy : Estimate (A,B), build control. 1 T x t+1 = Ax t + Bu t + e t P T t=1 x t Qx t + u t Ru t i Gaussian noise Run an experiment for T steps with random input. Then minimize (A,B) P T i=1 kx i+1 Ax i Bu i k 2 If T Õ 2 (d + p) min( c ) 2 where c = A c A + BB controllability Gramian then A Â B ˆB and w.h.p. [Dean, Mania, Matni, R.,Tu, 2017] [Mania, R., Simchowitz, Tu, 2018]

26 <latexit sha1_base64="vs+14vgxeycwqa4/abiirwhhyzg=">aaadgnicbvjnb9naelxnv0n5sohizuvelyooshescfspoia49fbe01bkgmu9gser7q6t3tfkspxpupjhucguxpg3rfmjsmjilmbfe/n2z8zpiyxfmpzlb1euxrt+y+tmz/vw7tt3uzv3tm1egg4jnsvcnkfmghqarihqwnlhgkluwll6cdjwz5/awjhre1wuecs21sitnkgdku5xmsju6iozwxz1jwxdosrn55uswijxgwqys6hioevt6k2dajwq4zjauiuv7kf1xxnymgb/nucgthcpgjgdyuxpa2ohogws5k79okrjpsn+qgfqvndolnehr9fcojiia5w6fpskfvfs7yxdcblkm4napoe1czzs+dgd5lxuojflzu04cgumnr0klse1wvoogl9guxi7vdmfnq6w86zji4dmsjyb92kks/tfioopaxcqdcpmmnada8d/cemss+dxjxrrimh+evfwsoi5azzdjsiar7lwcengulcspmogcxqrxlll6v0ax+mkmpda8hwca6jeorrmqauomnbnv9vbisx5wlqlr83k/rdotqh7r8vuob0cuf9e722i3uki9ffvjqdphle4jn4/7r28alez5t3whnp9l/keeqfeo+/yg3nc3/yj/4x/mvgsfau+bz8upyhf1tz3vil4+rs0rp43</late h i PT minimize E e t=1 C t(x t, u t ) s.t. x t+1 = f t (x t, u t, e t ) u t = t ( t ) Approximate Dynamic Programming

27 <latexit sha1_base64="rli7jlp75guz6h4r9bivhozu6ym=">aaadmhicbvjnj9mwehxc11k+undkyqhytdqqshasxcqtwnby2enxtlsr1svyxke11nyie4jsovcjupjh4is48itwukgwlsnfgr838+yzlzitwkiqfpf8a9dv3ly1c7t15+69+w/auw9pbzobxicslak5j6nlumg+aqgsn2eguxvlfhzfhnb82udurej1gfyznym60cirjikdovzxevof0cu1hq6qusqqrvscfquswijxivd4dxnfyrnh5dsq4ktybkbe5ioqyrhwh8b4mijueue/j6ch990xccdyvb9wpwleygkzqhpvo4bbreh4cdwe4urvc587gf7nigqhhevyww5zfsqtroyarfvbhot589qo3qkgwtrwdhi2sqc1myp2vrmzpyxxxaot1nppggqwc3igmoru9nzyjliluubtl2qquj2v6y1x+jld5jhjjfs04dx6b0djlburfbvkel92k6vb/3hthjjxs1lolaeu2evfss4xpli2dm+f4qzkyiwugeheitmsgsragxvllrv2xtmvscoi14klc76bsijauadadookxu9vhgkp8xuqlt6unfvdotma7r4rcwg2f+z+ht3bknaghjvr305onw/cybcevogcvg6s2ugp0vpursf6iq7qozrce8s8j96rn/jo/c/+n/+h//oy1peankfosvi/fgnhzqxm</latexit> <latexit sha1_base64="vmo6japptd64iivb8nxgxouc0de=">aaackhicbvfdsxtbfj1srdw02qipvgwnhyswsfsk9avovwgpplhqohcx5e7kbji4o7vm3jwejq/9nx1tf47/prnjcibxwsdhnpsx9544v9ks7z/uvgcbzzdfbg3xx77a2x3d2nvv2awwarsiu5m5icgikhq7jenhtw4q0ljhdxx3vunx92iszpqvtximuxhqmugb5kiocbjdi8qrd8g0nw7zz/wsslrjodgur42m3/fnwddbsabntojlak8w3g4yuasossiwth/4oyulgjjc4br+w1jmqdzbepsoakjrhuvsiyl/65gbtzljniy+yx9xljbao0ljl5kcjeyqvpfpaf2ckqowldovclwyd0okxsnj1un4qboupcyogdds/zwlergq5a63ngxwo0extek <latexit sha1_base64="q+v5fer10yz4ggdyxoob2qde5v8=">aaacw3icbvhbattaef2rtzs9oe1jx5aagksmkukheqmepqf9ccwltrowhvitr/ai1ursjornor/qz7sv7yd0zbtq2x1yojxzzmznjq2kmbigpzrenbv37j84ehj46pgtp8+6r89hpqw1hyevzalvu2zacgvdfcjhttlailtctzpftprnn9bgloorliuiczztihocoaos7qdryvpgxwt0je4korjbn/qiyf1fvw7osemyztpuxjajhwyiicmxbxooo8bp1r4+bmfei9kc46tbcwfhkug+idagrzzxnrx14sm05hubcrlkxoyjsmlymo2cs2goj7wbivgczwdsogifmniubm/oa8dmavzq9xtsfftvhmwfmcsidc52drortet/thgn2wlshapqbmxxjbjauixpu0u6fro4yqudjgvh/kr5ngng0e16q8uqdgv8axk7qjxg5rr2wikl1myrbrbgqrvt2q9csvqfkuov2h3/vv3zvvbfi5la079yb1xbntkdjnpd/z4ynqyicbb9ftm7f7c5zqf5sv4rn0tkltknh8k1grjovpof5bf57v16uac9xfu9zibnbdkkr/kdepdecg==</latexit> <latexit sha1_base64="pkddr1xni4717umkasm7i9gzej0=">aaac0hicbvhlattafb2rryr9oe0ym6gmybfjpfjon4xqtlsllnkhnyakxgh8jq8ajctmvberontbv+pn9au6bf+gi8eb2u6fgcm59zh33ksswqdn/ew5n27eun1nb//g7r37dx72dx9ntvlrdhneyljfjsyafaomkfdczawbfymeiyq/7fsll6cnknvnxfyqfsxtihwcoaxifhbwis6hibi6zl36iozmz2ehvnzult210ilor7vlj2lymjwnsfo2jrtoqwkpbnqan/mx3w7t68wrug6ortbhko4pvlg3crol/duykhwcx4e9kjyvvc5aizfmmmd3kowaplfwce1bwbuogm9zbogfihvgomblqkufwmzg01lbp5cu2h8rglyysywsm9ltyra1jvyfftsyvowaoaoaqfgrqwktkza0s5tohaaocmkb41ryv1i+z5pxtmzvtfn1robvbnisaiv4oymtvuicnbokasyyun1wztshjf3elkfnncfxqm3bycm3ihnormf2usrdsbyh8bft3wxtz2pfg/sfng9oxq9ps0eoybmyjd55qu7ie3jojosth+qx+u3+ob+dhfpv+xav6vtwny/jrjjf/wjmcoov</latexit> <latexit sha1_base64="bgz4tmjmns5hqxzoe92twkwzkl8=">aaacghicbvfbsxtbfj6s9rzv0t72zwgqepc4k4iickkf9sghljooxcwcnzwkq2zn15mzkrdkd/ja/qz+m87gfjqkbw58fn+5nyhv0plv/y55ax/wnza3tss7u3v7b5xdo7znmiowjrkvmkcilcq Dynamic Programming h i PT minimize E e t=1 C t(x t, u t )+C f (x T+1 ) s.t. Terminal value: V T+1 (x) =C f (x T+1 ) Recursive formula (recurse backwards): V k (x) =min k ( k ) = arg min u Optimal Policy: x t+1 = f t (x t, u t, e t ), x 1 = x u t = t ( t ) u =: V 1 (x) C k (x, u)+e e [V k+1 (f k (x, u, e))] Value function Cost-to-go C k (x k, u)+e e [V k+1 (f k (x k, u, e))]

28 <latexit sha1_base64="djbuqd6sqjxgss29bfvvuzlzhwe=">aaadjhicbvjnb9naef2brxk+uhanlisiukeq2qgjjfsplfzw6cgfpk0up9z6m0lw3v1bu2pkypnvcowpcemcupbbwkdgigkjwfvmzcybnr0nmrqwg+cx51+6foxqty3rrrs3b e2zt3hay8fsm5jrhfqtqmecbek4za0wlek6s83d1/oqzgcts3cd5bipfplpmbgfoqlj9lupgkntjjghzqpsyakuqsytscs2u+aivfuwjxxcwjovbfumy4dcyuypl3amrs7l/pkxoeepzu3pun3sb5gvvy306r4j7zuvf/rpfrkxnoiqipovtyrfuudi5bse0q982intnoctyikcpmxvg7u7qdrzg10hyga5prbdveqnonpjcguyumbxdmmhw5orqcalu3nxcxvg5m8lqqc0u2fg5enmkpnlmme5s4z6ndmh+w1eyze1cjs6zfio7gqvj/8wgou5ej0qhsxxb84tgk1xstgm9jjowbjjkuqomg+husvmmgcbrlxopy0i7a740svnkwvb0dcusxainc6qfvezoeqryvzcsfmla0sn6o3+jtryob+2lqud77nd9mfrjwrjbslj6/ovg+eu3dlrh0cvo7l6zmg3ygdwkwyqkr8gu+ub6zec4d b/5x/7v/w/95kep7tc09smt+7z8pdp9v</latexit> <latexit sha1_base64="dy8oi6n3bn2d1dkwloypctkty8u=">aaaea3iclvplbtnafhushiw8wtjb5opallqlsisk2bsvl2drrvkstflgicatitpqegznjfgikzcs+rj2ic0fwn/waywtb5gedspzpj733jfvdzbwprtr/iyvk1euxru+c6n689bto3d39+71vzxkqnsk5rg8clcinana00xzepfiiqoa0/pg8k1up/9epwkx6op5qv0ih4jngmhauqo90pf+yhsfelljtnw2ox4mkgjiznilk9lxrkdbyn5zyoled6aqd/c7byidsfccnk1w9qcgp+ra2n0mkiycah+hav14m35aqyzmynninstw0reqfiik8r/rcl+6kakzlb8o1mefhoytdmdrxra+tfurk3ong7ghpepuk7eesowgzf9vel6eymgeh0ea4orkrgmpcernv2ynm7k52q25lxdxybt4bag5xwnbafpohjm0okitjpuaeg6ifyolzottripsrrnmlnfibxbm2zrvfoutwrpljgess3sjdqv2bw+di6xmuwcv+qqotvto/ss2spxkhw+ysfjnbvkmmqqcdaz5fskysuo0n1uaiws2vibtldhrdlfxsixij5ssdwjmqwakhtmnluuzltisiuoim5f3zd4zzuejfgpo85mtrdzsbm68zsht6vdu/hciusw2a/e2p/826b+1plfldz7vtl4wo9lxhjqpnibjoc+de+ed03z6din9kj8o18qpk58rxyvfkt+x0nkp8lnvrj3kj9+u9u7g</latexit> <latexit sha1_base64="adb8qmdtu4tflqnxv6h14vnn31g=">aaacnxicbvfdaxnbfj2svwv9svxrhw4girunuyk0l0lrsvsqssr8fnj1utu5sybozi4zdyvhyv/w1/rv/4f/prpjfkzihyhdofdj7j1xpqql3/9b8+7t3h/wcpfr3umnt589r++/6ns0nwj7ilwpuyzbopiaeyrj4wvmejjy4sc+/llqg59orex1l+yzhglmtbxlaesoqn7sr93mlcq674pfif/ek/jjle+s0b0t1rt+y18g3wzbbrqsik60xwuvrqnie9qkffg7dpymwgimsafwsxevw8xaxmmehw5qsncgxxklbx/jmbefp8y9txzj/ltrqgltpildzgi0tztasf5pg+y0pg4lqbocuivvohguokw8va8fsyoc1nwbeea6v3ixbqoc3bxxpix7zyjwnilmuzyiheegq2hgbhxpkrkqutyqojnk8e+glw/lyztuvne2ljunciljvms7q/thvrizjng8/zbof2gffiv49rfx8rmyzpe9yq9zkwxsij2wc9zhpsbyl3bdfrm/3oh31wt7f6tur1b <latexit sha1_base64="xgebh7uy6gt6q+hya5tae+6qm3i=">aaacjxicbvfdaxnbfj2srdba2krfcr4mbiepenalpx2ouqrqpvqhypig0nw5o7ljhszoljn3jwgpv8zx/t/+g2ftccbxwjchc+73jtmllfn+74r3agv78zodp7vp9vafh1rrl3o2zy3arkhvavoxwfrsy5ckkexnbigjfd7g04+lfvsnjzwp7ta8wzcbszyjkyacfvupe1gnmys6tf6eu+/rew9hnrjf1brf8hfgn0gwbhw2thzuq4r3w1tkcwoscqwdbh5gyqggpfb4v3uxw8xatggmawc1jgjdyjhcpx/jmcefpcy9txzb/htrqgltpimdzwi Simplest Example: LQR minimize s.t. E h PT 1 t=1 x t Qx t + u t Ru t + x T P Tx T i x t+1 = Ax t + Bu t + e t Dynamic Programming: V T 1 (x T 1 )=mine x T u T 1 1Qx T 1 + u T 1Ru T 1 + V T (x T ) =min u T 1 apple xt 1 u T 1 V T (x T )=x T P T x T apple Q 0 0 R + A B PT A B apple xt 1 u T Tr(P T ) u T 1 = (B P T B + R) 1 B P T Ax T =: K T 1 x T 1 V T (x T 1 )=x T 1P T 1 x T 1 P T 1 = Q + A P T A A P T B (R + B P T B) 1 B P T A

29 <latexit sha1_base64="euyqlm8oqoqnwpvqldlbjjbjanm=">aaadnxicbvjlbxmxepyurxietehixskikhrfuwgjlpvkacehhxastfk8jbyon7fqe1f2lcry+7u48jc4cenc+qt400uicsnzm/7m5znpasgfhsj6horxrl67fmprzuvw7tt3t9s794y2lw3ja5bl3jyl1hipnb+aamnpcsopsiu/ts9e1/7tt9xykes+laqekdrvihomgofg7w8k5vohhtwglionzduiks3ntgktlpjck7ylirrq7preiokmfgt+mqidwaiiisistd3bikiewyhkhjixv65fywjlnwqhcxxex/mxnd/bj7xg+7hc3j7u+rjmqkjt1nahw7ec+9t9umih+fwtdfshe83h0cjct5onj9udqbstbw8acwn0ucph450gizoclypryjjao4qjahjfdgst3m9fwl5qdkgnforntrw3ivuuuskppdlbww780ycx6l8zjiprfyr1kfvu7lqvbv/ng5wqvuyc0eujxlplrlkpmes45g1phoem5miblbnh34rzjpp1g2d3pcuydshzyiruxmrb8glfqyxmwvapwg6kelb9vo6dkbj/pnrixs3ox68vw7v33oipapu057+qfrwr7amj19e/aqyfdeoog5887xwendrsoqfoidpdmxqbdtf7diwgiaw7qs8ybmpwa/gj/bn+ugwngybnplqr8pcfxx4jrw==</latexit> Simplest Example: LQR minimize s.t. lim T!1 E h 1 T x t+1 = Ax t + Bu t + e t P T t=1 x t Qx t + u t Ru t i When (A,B) known, optimal to build control ut = Kxt u t = (B PB + R) 1 B PAx t =: Kx t P = Q + A PA A PB (R + B PB) 1 B PA Discrete Algebraic Riccati Equation Dynamic programming has simple form because quadratics are miraculous. Solution is independent of noise variance. For finite time horizons, we could solve this with a variety of batch solvers. Note that the solution is only time invariant on the infinite time horizon.

30 <latexit sha1_base64="vs+14vgxeycwqa4/abiirwhhyzg=">aaadgnicbvjnb9naelxnv0n5sohizuvelyooshescfspoia49fbe01bkgmu9gser7q6t3tfkspxpupjhucguxpg3rfmjsmjilmbfe/n2z8zpiyxfmpzlb1euxrt+y+tmz/vw7tt3uzv3tm1egg4jnsvcnkfmghqarihqwnlhgkluwll6cdjwz5/awjhre1wuecs21sitnkgdku5xmsju6iozwxz1jwxdosrn55uswijxgwqys6hioevt6k2dajwq4zjauiuv7kf1xxnymgb/nucgthcpgjgdyuxpa2ohogws5k79okrjpsn+qgfqvndolnehr9fcojiia5w6fpskfvfs7yxdcblkm4napoe1czzs+dgd5lxuojflzu04cgumnr0klse1wvoogl9guxi7vdmfnq6w86zji4dmsjyb92kks/tfioopaxcqdcpmmnada8d/cemss+dxjxrrimh+evfwsoi5azzdjsiar7lwcengulcspmogcxqrxlll6v0ax+mkmpda8hwca6jeorrmqauomnbnv9vbisx5wlqlr83k/rdotqh7r8vuob0cuf9e722i3uki9ffvjqdphle4jn4/7r28alez5t3whnp9l/keeqfeo+/yg3nc3/yj/4x/mvgsfau+bz8upyhf1tz3vil4+rs0rp43</late <latexit sha1_base64="q+v5fer10yz4ggdyxoob2qde5v8=">aaacw3icbvhbattaef2rtzs9oe1jx5aagksmkukheqmepqf9ccwltrowhvitr/ai1ursjornor/qz7sv7yd0zbtq2x1yojxzzmznjq2kmbigpzrenbv37j84ehj46pgtp8+6r89hpqw1hyevzalvu2zacgvdfcjhttlailtctzpftprnn9bgloorliuiczztihocoaos7qdryvpgxwt0je4korjbn/qiyf1fvw7osemyztpuxjajhwyiicmxbxooo8bp1r4+bmfei9kc46tbcwfhkug+idagrzzxnrx14sm05hubcrlkxoyjsmlymo2cs2goj7wbivgczwdsogifmniubm/oa8dmavzq9xtsfftvhmwfmcsidc52drortet/thgn2wlshapqbmxxjbjauixpu0u6fro4yqudjgvh/kr5ngng0e16q8uqdgv8axk7qjxg5rr2wikl1myrbrbgqrvt2q9csvqfkuov2h3/vv3zvvbfi5la079yb1xbntkdjnpd/z4ynqyicbb9ftm7f7c5zqf5sv4rn0tkltknh8k1grjovpof5bf57v16uac9xfu9zibnbdkkr/kdepdecg==</latexit> <latexit sha1_base64="pkddr1xni4717umkasm7i9gzej0=">aaac0hicbvhlattafb2rryr9oe0ym6gmybfjpfjon4xqtlsllnkhnyakxgh8jq8ajctmvberontbv+pn9au6bf+gi8eb2u6fgcm59zh33ksswqdn/ew5n27eun1nb//g7r37dx72dx9ntvlrdhneyljfjsyafaomkfdczawbfymeiyq/7fsll6cnknvnxfyqfsxtihwcoaxifhbwis6hibi6zl36iozmz2ehvnzult210ilor7vlj2lymjwnsfo2jrtoqwkpbnqan/mx3w7t68wrug6ortbhko4pvlg3crol/duykhwcx4e9kjyvvc5aizfmmmd3kowaplfwce1bwbuogm9zbogfihvgomblqkufwmzg01lbp5cu2h8rglyysywsm9ltyra1jvyfftsyvowaoaoaqfgrqwktkza0s5tohaaocmkb41ryv1i+z5pxtmzvtfn1robvbnisaiv4oymtvuicnbokasyyun1wztshjf3elkfnncfxqm3bycm3ihnormf2usrdsbyh8bft3wxtz2pfg/sfng9oxq9ps0eoybmyjd55qu7ie3jojosth+qx+u3+ob+dhfpv+xav6vtwny/jrjjf/wjmcoov</latexit> h i PT minimize E e t=1 C t(x t, u t ) s.t. x t+1 = f t (x t, u t, e t ) u t = t ( t ) Approximate Dynamic Programming Recursive formula: V k (x) =min u C k (x, u)+e e [V k+1 (f k (x, u, e))] Optimal Policy: k ( k ) = arg min u C k (x k, u)+e e [V k+1 (f k (x k, u, e))]

31 <latexit sha1_base64="c96nrog7aifhrw62vbldao2qo+c=">aaadihicbvjdb9mwfhxc1ygfa+grf4ukqrntlsck8vkpmba87geiuk2qs8hxndsa7ut2dwqj8md45y/whniex4ptzrjtuvkk63pppfa9j0khhyug+o35n27eun1n527n3v0hd3e7vuenni8n4xowy9ycj9ryktsfgadjzwvdquokp0suj5r62rdurmj1z1gwpfi00yivjikd4u53kvbm6ioaq5d1jwxdisrjf5uswijxldd4dxnfyz4k1bs65ktyfkbeliquybtwf0tofjayzfqpegh4alci4acmyz8ykc0hiqsvtemynnil1/k8rpeip9fca97wswcpu8oifgjagdyahcl1rh1d3o0hw2avedsj26sp2jije15ezjkrfdfajlv2ggyfre4objpcjvpaxlb2stm+dammituowm21xs8cmsnpbtynaa/qfzsqqqxdqsqxm/3yzvod/q82lsf9fvvcfyvwza4uskujicenrxgmdgcgly6hzaj3vszm1fagzsi1w1babwdrk1sluguwz/ggkmebhjrqclduueamqt4lkfenqi0+bsy6rjrzpjx4kzib9udy/s16f4vsdak317+dnl4yhsew/piyp37twrodnqcnaibcdijg6am6qrpevj536i291/43/4f/0/91rfw9tucxwgv/z1+kywfm</latexit> <latexit sha1_base64="w0y04clrch/hfcnyey3rxb6ejpw=">aaacynicbvfnixnbeo2mx+v6ldwjl8ygjbjcjah6erzx0umok5rsqmyyejo1k2a7e4bugklo5ua/8pd49kp/wp4kikksahi896qqqyqrplayht87wbxrn27eorp9fofuvfspuicpp7asdycjl2vpljnmqqonexqo4biywfqm4sk7omv1iy9grcj1z1xvkchwajelztbtaxcyteockcx6ywf9twmldorqhp71l8n6qj/rjeofhossc++a1eets8hxrv/m5q17cinbbesxwctt9sjrua56ckit6jftnkcnnssel7xwojflzu0scitmhdmouitmok4tvixfsqjmhmqmwczupx9dn3pmtvps+kerrtl/mxxt1q5u5p3tghzfa8n/abma81eje7qqettfnmprsbgk7tlpxbjgkfcemg6e/yvlc2yyr7/yns7r2hxwnuncstacl3pyyyuu0tbpwkdfhg6ncu+flpqt05ao2x3/ux3zvu6/fyvaoxz7u+rbgdkfjnpf/ygyph9f4sj6+kj3+mz7mipymdwhfrkrl+sufcdnzei4+uz+kj/kvzaotlak3myadly5j8hobf9/a3dt4nc=</latexit> <latexit sha1_base64="gpskemdf6g9ld6tv+mg9j/edltg=">aaacyhicbvfdi9nafj3gr3x96uqjl4nfslcurar9erzxuwqfvtz2f5oqjtobdnizszi50zaqf/+vp8unx/vfogkr2nyla4dzzr137r1zjyxfmpzr865dv3hz1shtwzt3791/0d96olflbtimesllc5kxc1jogknaczevaayycrfz1umnx3wby0wpp+oygksxqotcciaosvvncsx8rubf05izilzcp03d0hn/mawd+ozgbvokxyrhpmuad23aqbtlyhfkj+la9ppopiqgii0o5pik/ue4cldb90g0aqoyibp0qjfes5lxcjryyaydrmgfscmmci6hpyxrcxxjv6yaqyoakbbjs5q+pu8dm6n5adztsffsvxknu9yuveac3rh2v+vi/2ntgvnxssn0vsnovm6u15jisbtv0pkwwfeuhwdccpdxyufmmi5u4vtdvrur4futnitac17oyievueddhgkbfro6m6p5l6sk50xbetrt+k/qynay/1yuau3w1f1vb3tmd5bod/37ypj8fiwj6nolwfgbzwkoygpyhpgkii/jmflazsiycpkd/cs/yg/vo1d5x73l2ur1njmpyfz43/4ai/xgwg==</latexit> <latexit sha1_base64="/+atgfozwlrx7ggx96wxxxb/mzq=">aaacuhicbvhbahsxejw3l6tpzwkf+yjqcg4jzrcugvou6kl7kieu1k7axrazsrww1g1pttgs/qb+tv+bv6l27ujsd0bwdm6zkwymt1j4jopbvntv/oohb4epjh4/efrsefv4xdcb0je+yeyad5od51jopkcbkt9yx0hlkl/n836tx//kzgujv+ps8lrbocvummbaze3+mbsxobr0f9n8hi7bwmcwtf9fz8qaoqvra73jroanyaqrdpbk7u7ci5ug+ydzga7zxfv23erhe8nkxtuycd6pkthiwofdwsrfhy1lzy2worr8fkagxx1and2u6jvatojuuha00oa9m1gb8n6p8ubugdo/q9xk/7rridpztblalsg1wz80lsvfq+vr0ylwnkfcbgdmifbxymbgggey8nyrtw3l2vyn1alugpkj32elltbbid1hbulxxvwfhzt0g2hpl0uxw39qkfvl3u+ieojplsmw9cmeoswk2r3/phi+7svxl/n6rnpxcboaq/kkvczdkpd35ij8ivdkqbj5rx6tp+q2+hd9iipirk1ra5pzkmxf5p4cvujzow==</latexit> P 1 minimize E e t=1 t C(x t, u t ) s.t. x t+1 = f(x t, u t, e t ) u t = ( t ) discount factor Approximate Dynamic Programming Bellman Equation: V (x) =minc(x, u)+ E e [V (f(x, u, e))] u Optimal Policy: (x) = arg min C(x, u)+ E e [V (f(x, u, e))] u Generate algorithms using the insight: V (x k ) C(x k, u k )+ V (x k+1 )+ k

32 <latexit sha1_base64="c96nrog7aifhrw62vbldao2qo+c=">aaadihicbvjdb9mwfhxc1ygfa+grf4ukqrntlsck8vkpmba87geiuk2qs8hxndsa7ut2dwqj8md45y/whniex4ptzrjtuvkk63pppfa9j0khhyug+o35n27eun1n527n3v0hd3e7vuenni8n4xowy9ycj9ryktsfgadjzwvdquokp0suj5r62rdurmj1z1gwpfi00yivjikd4u53kvbm6ioaq5d1jwxdisrjf5uswijxldd4dxnfyz4k1bs65ktyfkbeliquybtwf0tofjayzfqpegh4alci4acmyz8ykc0hiqsvtemynnil1/k8rpeip9fca97wswcpu8oifgjagdyahcl1rh1d3o0hw2avedsj26sp2jije15ezjkrfdfajlv2ggyfre4objpcjvpaxlb2stm+dammituowm21xs8cmsnpbtynaa/qfzsqqqxdqsqxm/3yzvod/q82lsf9fvvcfyvwza4uskujicenrxgmdgcgly6hzaj3vszm1fagzsi1w1babwdrk1sluguwz/ggkmebhjrqclduueamqt4lkfenqi0+bsy6rjrzpjx4kzib9udy/s16f4vsdak317+dnl4yhsew/piyp37twrodnqcnaibcdijg6am6qrpevj536i291/43/4f/0/91rfw9tucxwgv/z1+kywfm</latexit> <latexit sha1_base64="azszeztxx0p7gmuxegkrcofcx2k=">aaacvnicbvhbihnbeo2mt914y+qjl41bsdcemuxqfyg4igr7sismu5azhp5ozatzvgzdnziw5jv8gh980v+xj4lgegsaduecquqqykophibhz1zw6/adu/eojtv3hzx89lhz8mtitgu5jlmrxl5nzieugsyoumj1aygptmjvdnpw6fffwdph9fdclpaovmirc87qu2nn8/flbzgo+vqtpdualzrwdodzvn9yptwsygk5tmlcmkuynaqb0msb9wd6/dikyo5jo+10w2g4dnoioi3okm1cpcetjj4zxinqycvzbhqfjsy1syi4hfu7rhyujn+waqyeaqbajfv65hv94zkzzy31tynds/9m1ew5t1szdzbjuh2tif+ntsvm3ys10gwfopmmuv5jioy2c6qzyygjxhraubx+r5tpmwuc/zp3uqxrl8b3jqkxlrbczgcplbhayzzpabutupmq/iikpf+ydvs8wfjf1zdt5n57uqh0g3n/s90/mpudrpvrpwst02eudqplv93ru+1pjsgz8pz0serekxh5rc7imhdynfwgv8jvybtkgqrmxhq0tjlpyu4eiz+p/nre</latexit> <latexit sha1_base64="w0y04clrch/hfcnyey3rxb6ejpw=">aaacynicbvfnixnbeo2mx+v6ldwjl8ygjbjcjah6erzx0umok5rsqmyyejo1k2a7e4bugklo5ua/8pd49kp/wp4kikksahi896qqqyqrplayht87wbxrn27eorp9fofuvfspuicpp7asdycjl2vpljnmqqonexqo4biywfqm4sk7omv1iy9grcj1z1xvkchwajelztbtaxcyteockcx6ywf9twmldorqhp71l8n6qj/rjeofhossc++a1eets8hxrv/m5q17cinbbesxwctt9sjrua56ckit6jftnkcnnssel7xwojflzu0scitmhdmouitmok4tvixfsqjmhmqmwczupx9dn3pmtvps+kerrtl/mxxt1q5u5p3tghzfa8n/abma81eje7qqettfnmprsbgk7tlpxbjgkfcemg6e/yvlc2yyr7/yns7r2hxwnuncstacl3pyyyuu0tbpwkdfhg6ncu+flpqt05ao2x3/ux3zvu6/fyvaoxz7u+rbgdkfjnpf/ygyph9f4sj6+kj3+mz7mipymdwhfrkrl+sufcdnzei4+uz+kj/kvzaotlak3myadly5j8hobf9/a3dt4nc=</latexit> <latexit sha1_base64="jqwqz2wojja3hljcmszje1ktl10=">aaacwxicbvfdixmxfe3hr3x92k4++hissi2wmiocviglvfshd1u0uwudowtso9owswzibqrlmd/lr9fh/svm2gq29uli4zx7b3lutusplibhz1zw6/adu/eo7h8/epjo8un79mmllzzhmogflmx1yixiowgcaivclwaysivcptfdrr/6bsakqn/fvqmjyrkwmeamptvrj8bdzd/16lvh5n5jazwzprinfcnfmlyf61kfdswhw2mshj5v7qym427wfpsh13dnvdiifihjrn0jb+e66cgitqbdtnexo20l8bzgtofglpm10ygsmamyqcel1mexs1ayfsnymhqomqkbvgvbnx3hmtnncuoprrpm/62omlj2pvkf2xix+1pd/k+boszejpxqpupqfpnq5itfgjyzphnhgkncecc4ef6vlc+yyrz9phdewfcuge84qzzoc17myy+vuetdpgkbfro6cvv9ellsl0xbompm/ff1bru5+0hkam1/5nepewfjfihr/vgpwewrqrqoovhrzvn77wqoydpynhrjrn6qc/kzxjaj4eq7+uf+kd/bmbbbgzhnatda1jwloxfufwccm9xj</latexit> <latexit sha1_base64="gpskemdf6g9ld6tv+mg9j/edltg=">aaacyhicbvfdi9nafj3gr3x96uqjl4nfslcurar9erzxuwqfvtz2f5oqjtobdnizszi50zaqf/+vp8unx/vfogkr2nyla4dzzr137r1zjyxfmpzr865dv3hz1shtwzt3791/0d96olflbtimesllc5kxc1jogknaczevaayycrfz1umnx3wby0wpp+oygksxqotcciaosvvncsx8rubf05izilzcp03d0hn/mawd+ozgbvokxyrhpmuad23aqbtlyhfkj+la9ppopiqgii0o5pik/ue4cldb90g0aqoyibp0qjfes5lxcjryyaydrmgfscmmci6hpyxrcxxjv6yaqyoakbbjs5q+pu8dm6n5adztsffsvxknu9yuveac3rh2v+vi/2ntgvnxssn0vsnovm6u15jisbtv0pkwwfeuhwdccpdxyufmmi5u4vtdvrur4futnitac17oyievueddhgkbfro6m6p5l6sk50xbetrt+k/qynay/1yuau3w1f1vb3tmd5bod/37ypj8fiwj6nolwfgbzwkoygpyhpgkii/jmflazsiycpkd/cs/yg/vo1d5x73l2ur1njmpyfz43/4ai/xgwg==</latexit> <latexit sha1_base64="9joy7emzi33vky9ahbtpwqdn0gg=">aaackxicbvfdaxnbfj1s/wjrv9o+6sngebioybcitg9cuehbprro2kj2cxcnn5uhm7plzf1jwplir/fv/43/prnxbzn4yebwzv2ye09akokodh+3gp179x883n3bf/t4ydnn7ypdk5exvubq5cq3nyk4vnlgkcqpvcksgk4vxqe372v9+htaj3pzlryfjhoyi6dsahlq3h4rf7i77/g3paabxvqacvuu+wv3flz2xu1o2a9xwbdb1iaoa+jifnbk4kkuso2ghalnrlfyufkbjskulvfj0meb4hyyhhloqknlqtuas/7kmxm+za1/hvik/beiau3cqqc+uwpn3kzwk//trivnt5nkmqiknolpogmpoow8vgmfsiuc1midefb6v3ixawuc/oxwpqx6fyjwnqnmpzein+agq2hofjzpkdriu29vfzrk8s9ghd+x2yz+qr5tlxc/yeysoz739pjevri3jno8/za4oulhyt+6fn0zvgus2wxp2uv P 1 minimize E e t=1 t C(x t, u t ) s.t. x t+1 = f(x t, u t, e t ) u t = ( t ) Approximate Dynamic Programming Q(x, u) =C(x, u)+e e [ V (f(x, u, e))] Bellman Equation: V (x) =minc(x, u)+ E e [V (f(x, u, e))] u Q(x, u) =C(x, u)+ E e applemin u 0 Q(f(x, u, e), u 0 ) Optimal Policy: (x) = arg min C(x, u)+ E e [V (f(x, u, e))] u (x) = arg min Q(x, u) u

33 <latexit sha1_base64="c96nrog7aifhrw62vbldao2qo+c=">aaadihicbvjdb9mwfhxc1ygfa+grf4ukqrntlsck8vkpmba87geiuk2qs8hxndsa7ut2dwqj8md45y/whniex4ptzrjtuvkk63pppfa9j0khhyug+o35n27eun1n527n3v0hd3e7vuenni8n4xowy9ycj9ryktsfgadjzwvdquokp0suj5r62rdurmj1z1gwpfi00yivjikd4u53kvbm6ioaq5d1jwxdisrjf5uswijxldd4dxnfyz4k1bs65ktyfkbeliquybtwf0tofjayzfqpegh4alci4acmyz8ykc0hiqsvtemynnil1/k8rpeip9fca97wswcpu8oifgjagdyahcl1rh1d3o0hw2avedsj26sp2jije15ezjkrfdfajlv2ggyfre4objpcjvpaxlb2stm+dammituowm21xs8cmsnpbtynaa/qfzsqqqxdqsqxm/3yzvod/q82lsf9fvvcfyvwza4uskujicenrxgmdgcgly6hzaj3vszm1fagzsi1w1babwdrk1sluguwz/ggkmebhjrqclduueamqt4lkfenqi0+bsy6rjrzpjx4kzib9udy/s16f4vsdak317+dnl4yhsew/piyp37twrodnqcnaibcdijg6am6qrpevj536i291/43/4f/0/91rfw9tucxwgv/z1+kywfm</latexit> <latexit sha1_base64="azszeztxx0p7gmuxegkrcofcx2k=">aaacvnicbvhbihnbeo2mt914y+qjl41bsdcemuxqfyg4igr7sismu5azhp5ozatzvgzdnziw5jv8gh980v+xj4lgegsaduecquqqykophibhz1zw6/adu/eojtv3hzx89lhz8mtitgu5jlmrxl5nzieugsyoumj1aygptmjvdnpw6fffwdph9fdclpaovmirc87qu2nn8/flbzgo+vqtpdualzrwdodzvn9yptwsygk5tmlcmkuynaqb0msb9wd6/dikyo5jo+10w2g4dnoioi3okm1cpcetjj4zxinqycvzbhqfjsy1syi4hfu7rhyujn+waqyeaqbajfv65hv94zkzzy31tynds/9m1ew5t1szdzbjuh2tif+ntsvm3ys10gwfopmmuv5jioy2c6qzyygjxhraubx+r5tpmwuc/zp3uqxrl8b3jqkxlrbczgcplbhayzzpabutupmq/iikpf+ydvs8wfjf1zdt5n57uqh0g3n/s90/mpudrpvrpwst02eudqplv93ru+1pjsgz8pz0serekxh5rc7imhdynfwgv8jvybtkgqrmxhq0tjlpyu4eiz+p/nre</latexit> <latexit sha1_base64="jqwqz2wojja3hljcmszje1ktl10=">aaacwxicbvfdixmxfe3hr3x92k4++hissi2wmiocviglvfshd1u0uwudowtso9owswzibqrlmd/lr9fh/svm2gq29uli4zx7b3lutusplibhz1zw6/adu/eo7h8/epjo8un79mmllzzhmogflmx1yixiowgcaivclwaysivcptfdrr/6bsakqn/fvqmjyrkwmeamptvrj8bdzd/16lvh5n5jazwzprinfcnfmlyf61kfdswhw2mshj5v7qym427wfpsh13dnvdiifihjrn0jb+e66cgitqbdtnexo20l8bzgtofglpm10ygsmamyqcel1mexs1ayfsnymhqomqkbvgvbnx3hmtnncuoprrpm/62omlj2pvkf2xix+1pd/k+boszejpxqpupqfpnq5itfgjyzphnhgkncecc4ef6vlc+yyrz9phdewfcuge84qzzoc17myy+vuetdpgkbfro6cvv9ellsl0xbompm/ff1bru5+0hkam1/5nepewfjfihr/vgpwewrqrqoovhrzvn77wqoydpynhrjrn6qc/kzxjaj4eq7+uf+kd/bmbbbgzhnatda1jwloxfufwccm9xj</latexit> <latexit sha1_base64="9joy7emzi33vky9ahbtpwqdn0gg=">aaackxicbvfdaxnbfj1s/wjrv9o+6sngebioybcitg9cuehbprro2kj2cxcnn5uhm7plzf1jwplir/fv/43/prnxbzn4yebwzv2ye09akokodh+3gp179x883n3bf/t4ydnn7ypdk5exvubq5cq3nyk4vnlgkcqpvcksgk4vxqe372v9+htaj3pzlryfjhoyi6dsahlq3h4rf7i77/g3paabxvqacvuu+wv3flz2xu1o2a9xwbdb1iaoa+jifnbk4kkuso2ghalnrlfyufkbjskulvfj0meb4hyyhhloqknlqtuas/7kmxm+za1/hvik/beiau3cqqc+uwpn3kzwk//trivnt5nkmqiknolpogmpoow8vgmfsiuc1midefb6v3ixawuc/oxwpqx6fyjwnqnmpzein+agq2hofjzpkdriu29vfzrk8s9ghd+x2yz+qr5tlxc/yeysoz739pjevri3jno8/za4oulhyt+6fn0zvgus2wxp2uv <latexit sha1_base64="881ddg/md6xwd8c/m07dn8n1mzg=">aaacu3icbvfnb9naen2yj5bylckry4oinvwjyezi5yjuuqqcemgfasvfljxebjzfu2trdxylsvkp+dxcepwy1m4qjggklz7eezozm5nvulgmwx+d4nbto3d3du/t3x/w8nhj7v6ts1s6w/iilbi01xlyloxmixqo+xvlokhm8qusog30q6/cwfhqz7ioekig12iqgkcn0u77i/48lqyulq5pdfvlyjk9/usd0tghpydgsui0dgdl2ituxvg0hlid1qc9m+32wmhybt0g0qr0ycro0/1oek9k5htxycryo47ccpmadaom+xivdpzxwari+dhddyrbpg4hxtixnpnqawn800hb9t+mgps1c5v5pwkc2u2tif+njr1oxye10jvdrtlno6mtfevabi9ohoem5cidyeb4v1i2awmm/y7xurs1k87wjqnntgtwtvggk3gobjxposoqupmq/ickpj9aw3om8hn+ux3zru6/e7laozjzh9shw2z/kghz/dvg8uuwcofrxaveydvvaxbjm/kc9elejskj+ujoyygw8o18jz/jr+bnwiivgbyxbp1vzloyfoh7dske2bk=</latexit> <latexit sha1_base64="a3vylzvfult7aea2lan03t/af8q=">aaadaxicbvjnbxmxepuux234sssfiytphjqoabslkmofvklicoiheastli1wjneswlg9k3sweq32xju/wg1x5zfwn/gfennunakjwxp6783ym+nhjoxfipjt+bdu37l7b2u7dv/bw0ep6zu75zbndycet2vqlofmghqaeihqwmvmgkmhhivh9ktslz6dsslvn3cewucxsryjwrk6kq5/2+7graqytowqnhwpy+ysnrbzenqib2gzpiwawyt242tpkpmblkna6tssmmlmyt/+gezjpprtlnbxke+xdlngmt0iy3a+34qmge+wvyvrjaatlijugnajgmqzz/gon4islocknhljro2hqyadghkuxejzi3ilgentnoa+g5opsinimbasvnbmqkepcucjxba3mwqmrj2roxnwd7frwkx+t+vnoho9kitocgtnry4a5zjisqsd0eqy4cjndjbuhhsr5rnmgee3qzvbfruz4cudflncc54msmzknkfhjrsaiglddvw8f1lsj0xbelon+vp1zsu5+u6mbdr2qfsourvhdgsj18e/cc5fdskge3zfny7fllezrz6rpdikitkix+qdosm9wskf76n33nvzv/rf/r/+zyur7y1znpcv8h/9bfbf8pc=</latexit> P 1 minimize E e t=1 t C(x t, u t ) s.t. x t+1 = f(x t, u t, e t ) u t = ( t ) Approximate Dynamic Programming Q(x, u) =C(x, u)+e e [ V (f(x, u, e))] Bellman Equation: Q(x, u) =C(x, u)+ E e applemin u 0 Q(f(x, u, e), u 0 ) Optimal Policy: (x) = arg min Q(x, u) u Q(x k, u k ) C(x k, u k )+ min u 0 Q(x k+1, u 0 )+ k Q-learning: Q new (x k, u k )=(1 )Q old (x k, u k ) C(x k, u k )+ min Q old (x k+1, u 0 ) u 0

34 <latexit sha1_base64="vs+14vgxeycwqa4/abiirwhhyzg=">aaadgnicbvjnb9naelxnv0n5sohizuvelyooshescfspoia49fbe01bkgmu9gser7q6t3tfkspxpupjhucguxpg3rfmjsmjilmbfe/n2z8zpiyxfmpzlb1euxrt+y+tmz/vw7tt3uzv3tm1egg4jnsvcnkfmghqarihqwnlhgkluwll6cdjwz5/awjhre1wuecs21sitnkgdku5xmsju6iozwxz1jwxdosrn55uswijxgwqys6hioevt6k2dajwq4zjauiuv7kf1xxnymgb/nucgthcpgjgdyuxpa2ohogws5k79okrjpsn+qgfqvndolnehr9fcojiia5w6fpskfvfs7yxdcblkm4napoe1czzs+dgd5lxuojflzu04cgumnr0klse1wvoogl9guxi7vdmfnq6w86zji4dmsjyb92kks/tfioopaxcqdcpmmnada8d/cemss+dxjxrrimh+evfwsoi5azzdjsiar7lwcengulcspmogcxqrxlll6v0ax+mkmpda8hwca6jeorrmqauomnbnv9vbisx5wlqlr83k/rdotqh7r8vuob0cuf9e722i3uki9ffvjqdphle4jn4/7r28alez5t3whnp9l/keeqfeo+/yg3nc3/yj/4x/mvgsfau+bz8upyhf1tz3vil4+rs0rp43</late h i PT minimize E e t=1 C t(x t, u t ) s.t. x t+1 = f t (x t, u t, e t ) u t = t ( t ) Direct Policy Search

35 <latexit sha1_base64="bnc2p7snadzkkyaqpftniewfeuq=">aaact3icbvhbbtnaen2ys9tws+grlxurkjvqzcmkyltvkochd+gstlicovv6bi+6xpvdmwpq+x/4gl6bv2gdgokkjdts0tlzn6huamn3f/e8gzdv3d7z3evfuxvv/opb/sntw1rgwlqwqjdnkbcgumoukbsclwzehik4iy7etprznzawc/2zlixmc5fqtfakctricbxgkkkuhtfi2drknf29mee9qk9c1geukiui+mpzjw74m87dsyajqwped0hhxdjimpth/sr4ngg6mgsdtrb7vxkyf7lkqznuwtpz4jc0d+uipykmh1ywsievr min <latexit sha1_base64="zcjli+x7yefdo <latexit sha1_base64="xncvduignbply0pxcjg0898xebs=">aaac4nicbvhlitrafk1kfizjq0exbgobiq3sjiogimlga0vm0ai9m9ajovk5syqpvelvzta9it/gttz6v678fhdwutthd3uh4hdofds9j6mlmoj7pxx358rva9d3b+zdvhx7zt3b/r1juzwaw5rxstknctmghyipcprwwmtgzslhjdl71esn56cnqnqnnncqlsxxihocoaxiqrkqlkhg33vhodnyalirfuhdkmgrjo2blm5r7/l5x7eljwq4cyef8c5hdfket38sohrkkqfrraeweyerjqddf+wvgm6dyawgzbwten+jwrtitqkkuwtgzak/xqi1jqwx0o2fjyga8toww8xcxuowubu4s0cfwsalwaxtu0gx7l8vlsunmzejzez3nztat/5pmzwypytaoeogqfhlokyrfcvah5mmqgnhobeacs3sxykvmgycrrvruxa9a+brm7qxjrk8smgdlxibmlnsajzmqh6r9q2qkn5kytcj/si/vdu2l73xihdohh9zv9vok9kaemyefxsch4wdfxx8edi8flmyzpc8ia+jrwlylbysd2rcpost7+sn4zo7bup+dr+4x5eprroquu/wwv32c33a6oi=</latexit> min <latexit sha1_base64="54fqrxzqyz <latexit sha1_base64="cihlzcnuhll5+u92cyhc02yuj8o=">aaacuxicbvfbi9nafj7g2269dfxrl8gidefkiokcl4uu6mm+vls7c00ik8lpctzjjmycynaqp+sv8xx9nu66ewzrgqmf33fuj6kuwvl9q4f34+at23f29od3791/8hb08ojulrwrmjelks15iiwo1danjaxnlqfrjarokov3nx72hyzfun+lvqvritkns5schbwpjsmemtsnmeas2kapdrgffqjjppr8ogz5cx4wgvikat60cbxg4sxhj/bogijo+7r4npan/tr4lgh6mga9zekdqrsmpawl0csvshyr+bvfrhyhvnaow9pcjesfygdhobyf2khzr9vyz45j+bi0zjxxnftvrimka1df4ik72e221ph/0xy1ld9edeqqjtdyutgyvpxk <latexit sha1_base64="nigzqa0cicsnamj1ol8sebl/xgu=">aaaczxicbvfnb9naen2yrzz8pxdksiicjrkk7aqjsr1ufagolqictjfiy1pvjvaq67w1o66alubkv+j/cockv4f1ahbjggmlp/dm3uzmjkuubn3/e8e7dv3gzvs7u93bd+7eu9/be3biikpzmpbcfnqamanskjigqantugplewmnydmrrj89b21eot7hsoqoz6ksc8ezoirutcmeuqes05otaytl3d0nc6fig54zjrkgq+ltguymsysxb+ryloplw7/isj7rcjyjwewqrt0q1ly1int9f+svgm6doav90sy43ute4bzgvq4kuwtgzak/xmjzoeas6m5ygsgzp2mpzbxulact2dukavremxo6klr7cumk/bfcstyyzz64zgyus6k15p+0wywlg8gkvvyiil81wlssykgbfdk50mbrlh1gxav3v8ozphlht/ <latexit sha1_base64="i0aeayt9hihuk/vjtsdd2czegl8=">aaachhicbvfdaxpbfb03bwrtrzr57mtqksii7lyjlyueaqstxyeu1ijoinfhqw6znv1m7gzl8zfktf1r/tedvqtve+hc4zz7fanusuu+/7vkht14epy Sampling to Search min z2r d (z) =<latexit sha1_base64="zcjli+x7yefdo p(z) E p [ (z)] apple<latexit sha1_base64="54fqrxzqyz # E p(z;#) [ (z)] =: J(#) Search over probability distributions Use function approximations that might not capture optimal distribution Can build (incredibly high variance) stochastic gradient estimates by sampling: rj(#) =E p(z;#) [ (z)r # log p(z; #)]

36 <latexit sha1_base64="dqebqq2yo/bjn7jctcmtxxkfwfa=">aaacq3icbvhdahnbfj6svwvqt6qx3gygwgy07iqokelxb6x0iqwmlwaxchzykh06o7vmnc1nl7yjt+nt+wk+jbnpxcbxwigp7zv/jymutbqevxvery3bd+5u3mtu3x/w8ffr+/grzusjsc9ylzutbcwqqbfpkhsefayhsxqej6efav34di2vuf5o0wljdczajquactsw9xrpj87auioenxcfejqbpulsfzknq8k/ep9pna141eulf9hhcxpyagfdyg58hyql0gyl6w23g3e0ykwzosahwnpbgbquv666fapnzai0wia4hqkohnsqoy2r+yizvuoyer/nxrkmpmdvzlsqwtvnehdzz29xtzr8nzyoafw2rqquskitrhuns8up5/w1+egafksmdoaw0s3krqogblmblnwz1y5qlg1snzdainyek6yiczlgsiuugdt1vtvx <latexit sha1_base64="oo9jp+19oa2n8nsqhpl71jm7sec=">aaacynicbvfbaxnbfj6st7beun30ztaog5cwwwqlrsgqkjkhikytzjdwdvyko3r2dpk5w5osefnf+ut89fx/hlnplcbpgyfvvu/ct1iqaskifra8w7fv3l23s7t3/8hdr4/b+09obfezgunrqmkcjwbrsy1dkqtwrdqiealwndl/3+inf2islpq3mpuy5zdvciifkkpg7efupcfrmk6jczcuicgcf/b/f7r85vsesu08gmtsn3f5df6lpz+6fphox+1o0auwxrdbuaidtrlbel8vr2khqhw1cqxwjskgplh2oavqunilkoslihoy4shbdtnauf7ov+avhjpyswhcc50u2esrnetwzvleeezamd3ugvimbvtr5dcups4rqi2uck0qxangztj5kg0kujmhqbjpeuuiawoc3mrxqixzlyjwjqkvky1fkeigq+isddjsiuugdtnv/veqxb+ctrwvpxn9u13arvy/ykkk+6rv7qq7w87uiohm+rfbyuevdhrhl9ed43er0+ywz+w581ni3rbj9okn2jaj9op9yr/zh6/vgw/m1veuxmsv85stmff9l/yh4fa=</latexit> <latexit sha1_base64="nbbn4o5cmynrkblxni3vhtvu0jm=">aaac23icbvfnixnbeo2mh7uux1k9emkmwgqkzcycwiiskuhhdxhn7kjmcdwdmkyzpt1dd82yyzctn/hqv/ip+de86sgebbqzsadh9xtv1v2vkljjs0hwvendu37j5s7urb3bd+7eu9/df3bii8oihilcfeysaytkahyrjivnpuhie4wnyfnrrj+9qgnlot/svmq4h5mwqrrajpp005c8kpp4nmykv+jzsgfkfpqaehwkivewqamlmjqhwzkx/ulw77w/rfv3ymhzrv1wgp8ujt1emahwwbdbuay9to7hzl8tr9ncvdlqegqshydbsxhtekqhclkxvrzleocww7gdgnk0cb0yzmmfogbk08k444zbsf9w1jbbo88tl5kdzbatnet/thff6yu4lrqsclw4eiitfkecn+7yqtqosm0daggk+ysxgtgzye1g45vv7xlfxit1zawlkkbyyhvdkgfhwqqcpg6mqt9kpfgh0jyfn67/uv3brvbfyjkk+/tylvr3t5ldqsk2/dvg5gaqbopw/bpe0av1anbzi/ay+sxkz9kre8egbmqe+8z+sj/slxd7n7zp3pervk+zrnninsl7+htfc+ml</latexit> <latexit sha1_base64="7ypzmnmmi6uohph3ssog/g0xytm=">aaacy3icbvfdi9nafj3gr931q6upvgwwiqupysioilcooa8rvltdhsaum8ltmuxkemzulm1jh/1x/hfffduf4arbxbzegdicc+bo3hotskllqfc94127fupmrb39g9t37t673z18mlzlbqsorklkc5aarsu1jkiswrpkibsjwtpk/e2rn16gsblun2leyvxapuvmcibhtbvjvzysmnikcey+j4a59bd9hmlifeyb6aim5uiwdi4y45w/epmxczyjs5z6w3s62j92e8egwbxfbeea9ni6htpdthylpagl1cquwdsjg4rixvwuquhyikotvidoicojgxokthgzcmdjnzgm5bpsuoogwbh/3migshzejm5zaov2w2vj/2mtmmyv4kbqqibu4uqhwa04lbxnk6fsoca1dwceke6vxorgqjdlfoovve8kxcykzwwtpsht3givxzibr1qkaqrup2resax4j9cwn7sp/1fd21b238pmkn164har+ztmt5bwo/5dmd4ahmeg/pisd/x6vzo99og9zj4l2xn2zn6zirsxwb6xh+wn++v98ky38l5cwb3o+s5dtlhe19/on+ht</latexit> <latexit sha1_base64="fa1z8kn/imf/v9cyloes1fopwme=">aaacz3icbvfbi9nafj7g27reuvroy2arwpcsikcwlcxe0id96kldxwxcojmejsnojmhmzn1uipjqv/jv+ad81z/gpfvfth4y+pi+c5nznaru0plvf+94v65eu35j6+b2rdt37t7r7tw/skvlbi5foqpzkobfjtwoszlck9ig5inc4+t0vasfn6gxstafaf5ileoq5uwkieff3y97pmybsisp3zrxxfyvdsmzmjqhwaajfc5owsnrjvsxax5qsbte9d+mhoeqsplqfq+ntdok4m7ph/ql4jsgwiiew8yo3ule4bqqvy6ahajrj4ffuls7xlioblbdymij4hrsndioiucb1qstgv7ymvm+k4x7mvic/beihtzaez64zhzhu6615p+0suwzf1etdvkrane5afyptgvvhevtavcqmjsawkj3vy4ymcdi+b4yzdg7rlgysx1easmkka6xis7jgcmtug5st1vvb6vs/d1oyw9aj/+orm0r91/lvjj9cucoqwcbye4gwbr9m+do6tdwh8hhs97+y+vptthd9oj1wcces332jo3yman2jf1gp9kv79d75h32vlymep1lzqo2et7x30+55pi=</latexit> Reinforce Algorithm J(#) :=E p(z;#) [ (z)] Z r # J(#) = (z)r # p(z; #)dz Z = (z) Z r# p(z; #) p(z; #)dz p(z; #) = ( (z)r # log p(z; #)) p(z; #)dz = E p(z;#) [ (z)r # log p(z; #)]

37 <latexit sha1_base64="dqebqq2yo/bjn7jctcmtxxkfwfa=">aaacq3icbvhdahnbfj6svwvqt6qx3gygwgy07iqokelxb6x0iqwmlwaxchzykh06o7vmnc1nl7yjt+nt+wk+jbnpxcbxwigp7zv/jymutbqevxvery3bd+5u3mtu3x/w8ffr+/grzusjsc9ylzutbcwqqbfpkhsefayhsxqej6efav34di2vuf5o0wljdczajquactsw9xrpj87auioenxcfejqbpulsfzknq8k/ep9pna141eulf9hhcxpyagfdyg58hyql0gyl6w23g3e0ykwzosahwnpbgbquv666fapnzai0wia4hqkohnsqoy2r+yizvuoyer/nxrkmpmdvzlsqwtvnehdzz29xtzr8nzyoafw2rqquskitrhuns8up5/w1+egafksmdoaw0s3krqogblmblnwz1y5qlg1snzdainyek6yiczlgsiuugdt1vtvx <latexit sha1_base64="xncvduignbply0pxcjg0898xebs=">aaac4nicbvhlitrafk1kfizjq0exbgobiq3sjiogimlga0vm0ai9m9ajovk5syqpvelvzta9it/gttz6v678fhdwutthd3uh4hdofds9j6mlmoj7pxx358rva9d3b+zdvhx7zt3b/r1juzwaw5rxstknctmghyipcprwwmtgzslhjdl71esn56cnqnqnnncqlsxxihocoaxiqrkqlkhg33vhodnyalirfuhdkmgrjo2blm5r7/l5x7eljwq4cyef8c5hdfket38sohrkkqfrraeweyerjqddf+wvgm6dyawgzbwten+jwrtitqkkuwtgzak/xqi1jqwx0o2fjyga8toww8xcxuowubu4s0cfwsalwaxtu0gx7l8vlsunmzejzez3nztat/5pmzwypytaoeogqfhlokyrfcvah5mmqgnhobeacs3sxykvmgycrrvruxa9a+brm7qxjrk8smgdlxibmlnsajzmqh6r9q2qkn5kytcj/si/vdu2l73xihdohh9zv9vok9kaemyefxsch4wdfxx8edi8flmyzpc8ia+jrwlylbysd2rcpost7+sn4zo7bup+dr+4x5eprroquu/wwv32c33a6oi=</latexit> <latexit sha1_base64="jj1kekwesntnojcyyygyte4uvyc=">aaacjnicbvftaxnben6cl631ldvp4pffikqg4u7eciiwfeyhfqho2kjyhhobstjkd+/ynstnj+kv8av+hv+ne2kekziw8pa8m8/szosljs9x/lsv3bh56/bw9p2du/fup3jy3n104ovkkeyrqhfulaepmiz2mvjjwekqtk7xnj99bpttc3secvun5ywmbiawxqsaa5w1n1xmmzn0zgtzvxw7pafhu2tizntzuxp34kxitzasqucs4zjbbaxduaeqg <latexit sha1_base64="pp9kp46qo/izm8uhki7xz6n6m8w=">aaacwnicbvfti9naen7gt7v61topflksqgthsurqeohqg/pdcrxt3uetwmq7tdzunnf3clyn/vx+mvuqf8rnr0jfhfh4ej5nznzmkljjs75/3fju3b5z997efvv+g4ephnconpzzojicr6jqhbliwkksgkckseffardyrof5mvvq6oexakws9fealxjlkgo5lqliuxhn0/5j70c8owwvwvcgbpgsz9/xcjjjhu/zueoiik7xdaseqillzwn4u57yjjtdf+avg++cyaw6bbxd+kavhzncvdlqegqshqd+svhtakqhcneok4slibmkohzqq442qpdzl/glx0z4tdduaejldj2jhtzaez44zw6u2w2tif+njsuavolqqcukuiubrtnkcsp4s0q+kqyfqbkdiix0f+uiawoc3ko3uixrlyg2jqmvki1fmcetvtevgxckrcpb6maq+kqqxb+atvxuphn9u13zru4dy1ssptx199t9hbm7slc9/l1w9niq+ipg86vu0fvvafbym/ac9vjaxrmj9pen2ygj9otds9/sj3fsffo+e/bg6rvwou/zrng//wln8t5q</latexit> <latexit sha1_base64="/6wnxjvti3ogtr3oxgqus2ijbjg=">aaacshicbvfna9taef2rx2n6eac95rlebbyagikeggif0aasqw4prrodlcropby2wq3e7ijeff4x/tw9tsf+m64cl8z2bhye78282zmjcyut+f6fhvfo8zonz9aer794+er1rnpzzaxnsyowk3kvm14mfpxu2cvjcnufqchihvdx+rnwr27qwjnrbzqpmmxgrovicibhrc2jwq0yspagqtj3wzr/5p+zll/na1bfuspt9vco3bsn7kbnlt/xz8fxqtahltapi2izeq6gusgz1cquwnsp/ilcyllkoxc6pigtfibsggpfqq0z2rcattnlo44z8lfu3npez+z9igoyaydz7dizomquazx5knyvaxqyvlixjaewd41gpeku83plfcgnclitb0ay6f7krqigblnflnszercofiapbkstrt7ejvbrlrlwpexkqop6qupuksw/grb8xi4t+qc621pun8ixjlt37q6nd1es3ugc5fwvgsv9tub3gi8hrenp89osss22zdosyb/ymttjf6zlbpvbf Reinforce Algorithm J(#) :=E p(z;#) [ (z)] rj(#) =E p(z;#) [ (z)r # log p(z; #)] Sample z k p(z; # k ) Compute G(z k, # k )= (z k )r #k log p(z k ; # k ) Update # k+1 = # k k G(z k, # k )

38 <latexit sha1_base64="qqxcfhphmdwqztemz4v5odvaqpm=">aaachxicbvhfaxnben6cra1t1vqffvkahbbacfek+qqfbfvqh4qmletomlc3sybu7r27c6xxyh/iq/5p/jfupsmyxigbj++b35owmhyh4z9w8ght/fhg5pot7z2nz563d19cuqkycnuq0iw9tsghjom9jtz4xvqepnv4ld58bpsrw7socvonjyumoywmdukbe2rqbv+qmrkz10fryr <latexit sha1_base64="dqebqq2yo/bjn7jctcmtxxkfwfa=">aaacq3icbvhdahnbfj6svwvqt6qx3gygwgy07iqokelxb6x0iqwmlwaxchzykh06o7vmnc1nl7yjt+nt+wk+jbnpxcbxwigp7zv/jymutbqevxvery3bd+5u3mtu3x/w8ffr+/grzusjsc9ylzutbcwqqbfpkhsefayhsxqej6efav34di2vuf5o0wljdczajquactsw9xrpj87auioenxcfejqbpulsfzknq8k/ep9pna141eulf9hhcxpyagfdyg58hyql0gyl6w23g3e0ykwzosahwnpbgbquv666fapnzai0wia4hqkohnsqoy2r+yizvuoyer/nxrkmpmdvzlsqwtvnehdzz29xtzr8nzyoafw2rqquskitrhuns8up5/w1+egafksmdoaw0s3krqogblmblnwz1y5qlg1snzdainyek6yiczlgsiuugdt1vtvx <latexit sha1_base64="jj1kekwesntnojcyyygyte4uvyc=">aaacjnicbvftaxnben6cl631ldvp4pffikqg4u7eciiwfeyhfqho2kjyhhobstjkd+/ynstnj+kv8av+hv+ne2kekziw8pa8m8/szosljs9x/lsv3bh56/bw9p2du/fup3jy3n104ovkkeyrqhfulaepmiz2mvjjwekqtk7xnj99bpttc3secvun5ywmbiawxqsaa5w1n1xmmzn0zgtzvxw7pafhu2tizntzuxp34kxitzasqucs4zjbbaxduaeqg <latexit sha1_base64="pp9kp46qo/izm8uhki7xz6n6m8w=">aaacwnicbvfti9naen7gt7v61topflksqgthsurqeohqg/pdcrxt3uetwmq7tdzunnf3clyn/vx+mvuqf8rnr0jfhfh4ej5nznzmkljjs75/3fju3b5z997efvv+g4ephnconpzzojicr6jqhbliwkksgkckseffardyrof5mvvq6oexakws9fealxjlkgo5lqliuxhn0/5j70c8owwvwvcgbpgsz9/xcjjjhu/zueoiik7xdaseqillzwn4u57yjjtdf+avg++cyaw6bbxd+kavhzncvdlqegqshqd+svhtakqhcneok4slibmkohzqq442qpdzl/glx0z4tdduaejldj2jhtzaez44zw6u2w2tif+njsuavolqqcukuiubrtnkcsp4s0q+kqyfqbkdiix0f+uiawoc3ko3uixrlyg2jqmvki1fmcetvtevgxckrcpb6maq+kqqxb+atvxuphn9u13zru4dy1ssptx199t9hbm7slc9/l1w9niq+ipg86vu0fvvafbym/ac9vjaxrmj9pen2ygj9otds9/sj3fsffo+e/bg6rvwou/zrng//wln8t5q</latexit> <latexit sha1_base64="/6wnxjvti3ogtr3oxgqus2ijbjg=">aaacshicbvfna9taef2rx2n6eac95rlebbyagikeggif0aasqw4prrodlcropby2wq3e7ijeff4x/tw9tsf+m64cl8z2bhye78282zmjcyut+f6fhvfo8zonz9aer794+er1rnpzzaxnsyowk3kvm14mfpxu2cvjcnufqchihvdx+rnwr27qwjnrbzqpmmxgrovicibhrc2jwq0yspagqtj3wzr/5p+zll/na1bfuspt9vco3bsn7kbnlt/xz8fxqtahltapi2izeq6gusgz1cquwnsp/ilcyllkoxc6pigtfibsggpfqq0z2rcattnlo44z8lfu3npez+z9igoyaydz7dizomquazx5knyvaxqyvlixjaewd41gpeku83plfcgnclitb0ay6f7krqigblnflnszercofiapbkstrt7ejvbrlrlwpexkqop6qupuksw/grb8xi4t+qc621pun8ixjlt37q6nd1es3ugc5fwvgsv9tub3gi8hrenp89osss22zdosyb/ymttjf6zlbpvbf <latexit sha1_base64="ys3klrjikvg9dud1jmmn7l7zgpg=">aaacz3icbvhlbtnafj2yvympprbkmyjcslsibiren5uqqijff6kgbuvswdetm3jk8diaus4nvhbb/orf4afywicwtonoeq40o6nz7mpm3kru0plv/2h5167fuhlr6/b2nbv37u+0dx+c2kiyaoeiuiu5s8cikhqhjenhwwkq8kthazk9bvttczrwfvodzuqmcphqozecyffx+2n4dozsjijrbc+y8wp+j8n4mx6cktmghonudj/hwa+5+b4pixinuge05fgt6htevvla68xtjt/3f8e3qbaehbamqbzbisjxiaocnqkf1o4cv6sodj2ludjfdiuljygmpjhysblso3phwpw/ccyytwrjjia+yk9w1jbbo8stl5kdpxzda8j/aaokjvtrlxvzewpxowhsku4fbxzly2lqkjo5amji91yuujagypm+mmxru0sx8pp6otjsfgncyxvdkafhwqqcpg5+vb+vsvh3oc0/ktou/qqubsn338ipjpv0yc1x9zas3ukcdfs3wcnzfud3g+mxncnxy9vssufsmeuygl1kh+wdg7ahe+w7+8l+sd/esffj++j9vuz1wsuah2wlvg9/ajo45di=</latexit> <latexit sha1_base64="jwaab2pnwnllfm05bp8bogpjn2w=">aaac1nicbvfnixnbeo2mx+v6svk9emkmqoiazkrqkjufbt3syuwzg8imq6wnjmm2p2forlksh3gtr/4rf4m/wqte7zle2sqwndzee13v/wpckgnj93+0veuxr1y9tnn998bnw7f32vt3tmxegoedkavcdmdguumna5kkcfgyhgys8hr89qrwt8/rwjnrdzqvmmpgomuqbzcj4jyu3u8vwnmwnewchj/gywhyjk7kqbd4mpawnscqegfof0v+zxnl3mjjp77ipwyonvfc7vh9vym+dyiv6lbvhcf7rshmclfmqekoshyu+avflesphclfblhaleccwqrhdmri0ezvk8wcp3bmwtpcukojn+zfgxvk1s6zsxnmqfo7qdxk/7rrsenzqjk6kam1wa5ks8up53wwpjegbam5aycmdg/lygoupxlxr01pehco1n5szuotrz7gbqtorgycazeyklr+vfvgksxfg7b8se6m9fd1bwu5+1pojnlhr27hurdldgsjnupfbidp+ohfd9497ry+xk1mh91j91mxbewzo2rv2tebmmg+s5/sf/vtdb3p3hfv69lqtvz37rk18r79aak+59q=</latexit> Reinforce Algorithm J(#) :=E p(z;#) [ (z)] Sample z k p(z; # k ) Compute G(z k, # k )= (z k )r #k log p(z k ; # k ) Update # k+1 = # k k G(z k, # k ) Generic algorithm for solving discrete optimization: z 2 { 1, 1} d p(z; #) = dy i=1 exp(z i # i ) exp( # i )+exp(# i ) # k+1 = # k k (z k )(z k + tanh(# k )) Does this solve any discrete problem?

39 <latexit sha1_base64="vs+14vgxeycwqa4/abiirwhhyzg=">aaadgnicbvjnb9naelxnv0n5sohizuvelyooshescfspoia49fbe01bkgmu9gser7q6t3tfkspxpupjhucguxpg3rfmjsmjilmbfe/n2z8zpiyxfmpzlb1euxrt+y+tmz/vw7tt3uzv3tm1egg4jnsvcnkfmghqarihqwnlhgkluwll6cdjwz5/awjhre1wuecs21sitnkgdku5xmsju6iozwxz1jwxdosrn55uswijxgwqys6hioevt6k2dajwq4zjauiuv7kf1xxnymgb/nucgthcpgjgdyuxpa2ohogws5k79okrjpsn+qgfqvndolnehr9fcojiia5w6fpskfvfs7yxdcblkm4napoe1czzs+dgd5lxuojflzu04cgumnr0klse1wvoogl9guxi7vdmfnq6w86zji4dmsjyb92kks/tfioopaxcqdcpmmnada8d/cemss+dxjxrrimh+evfwsoi5azzdjsiar7lwcengulcspmogcxqrxlll6v0ax+mkmpda8hwca6jeorrmqauomnbnv9vbisx5wlqlr83k/rdotqh7r8vuob0cuf9e722i3uki9ffvjqdphle4jn4/7r28alez5t3whnp9l/keeqfeo+/yg3nc3/yj/4x/mvgsfau+bz8upyhf1tz3vil4+rs0rp43</late <latexit sha1_base64="ry/q9u1/eodhnqdeowrn6r+2wk4=">aaadkxicbvlbbtnaelxnryrlu3idlxurvskiyeziikgisavxur+kanpkwwotn5nk1d21ttuueoy/ifd+hdfglr9hnqrbukaynhvombm7m05zksyg4q8/uht5ytvrw9cbn27eur3d3llzblpccbjwtgbmnguwpnawqiestnmdtkusttkz/zo/oqdjraapcj5drnhei7hgdb2unl/sfczcl8wynq9kkasgvwk2k5xqqolpujfdqhxdazqwr6qkhas7ryivltdgibwfskrci6qpr2q/wfzssxeoezmpxpsu7gwpe7xvzkkfrrxzi+o/6i7ufbsxs9ybwqfi3i4+o/i5pwcgp4cs06cgr6tnjs1w2asxqs4m0sppeas4thb8mi4yxijqycwzdhifocbodgwx4houlosmn7ejdf2qmqibl4vxvushq0zknbn3asql9n+kkilr5yp1ynpqdporwf9xwwlhz+js6lxa0hx50biqbdns74qmhagocu4sxo1wbyv8ygzj6da6dsvcowe+1kk5k7tg2qg2uikznmybflaxoeuuytdcsvkbauso6s39yz1ttbdfiola2z1wv43uxbc7husb47+yh <latexit sha1_base64="o4j6otjoeqzpjitxt8gszcrluky=">aaadnxicbvjnixnbej0zv9b4ldwjl8zgsnglzerqkjwfvfsqw4qb7ei6dj2dmqtz7p6hu2zjhoz3efvvepamxv0l9iqjmsschur3xlv1vxwcswgx3//mb9eu37h5a+92487de/cfnpcfjmyagw5dnsruxmtmghqahihqwkvmgklywnl8evlx51dgrej1gs4zmcg20yirnkgdouzxgsnm6iizw5zliwxzocpof4uswijxgursjlqxnmdx8bamcojwkkykzqykehicu5urqmcjspx0rk4i7cycio+ws42yzxfcaz3r9rbxzvs49ufykios/fufqhvbg23ilo4inbmdiszdxterznaoya7whbsncnpapzlqtvq9/srirhpwtsur7tta9yd0mvjcguyumbxjsj/hxkvdwsw4/nmlgeoxbazj52qmwe6k1ahl8tqhu5kkxh2nzix+g1ewze1sxu5zdc1ucxx4p26cy/jyugid5qiarwslussykmpvzcomcjrl5zbuhhsr4xnmgee33y0qq9wz8i1oikwubu+nsivkxkbhdrsaiglddvw8e1ksj0xbmqhw+id1asu680bmbnrdgftcursjd h i PT minimize E e t=1 C t(x t, u t ) s.t. x t+1 = f t (x t, u t, e t ) u t = t ( t ) Direct Policy Search Policy Gradient h PT i minimize E et,u t t=1 C t(x t, u t ) s.t. x t+1 = f t (x t, u t, e t ) u t p(u x t ; #) probabilistic policy Random Search h PT i minimize E et,! t=1 C t(x t, u t ) s.t. x t+1 = f t (x t, u t, e t ) u t = ( t ; # +!) parameter perturbation Reinforce applied to either problems does not depend on the dynamics Both are Derivative-free algorithms!

40 <latexit sha1_base64="ry/q9u1/eodhnqdeowrn6r+2wk4=">aaadkxicbvlbbtnaelxnryrlu3idlxurvskiyeziikgisavxur+kanpkwwotn5nk1d21ttuueoy/ifd+hdfglr9hnqrbukaynhvombm7m05zksyg4q8/uht5ytvrw9cbn27eur3d3llzblpccbjwtgbmnguwpnawqiestnmdtkusttkz/zo/oqdjraapcj5drnhei7hgdb2unl/sfczcl8wynq9kkasgvwk2k5xqqolpujfdqhxdazqwr6qkhas7ryivltdgibwfskrci6qpr2q/wfzssxeoezmpxpsu7gwpe7xvzkkfrrxzi+o/6i7ufbsxs9ybwqfi3i4+o/i5pwcgp4cs06cgr6tnjs1w2asxqs4m0sppeas4thb8mi4yxijqycwzdhifocbodgwx4houlosmn7ejdf2qmqibl4vxvushq0zknbn3asql9n+kkilr5yp1ynpqdporwf9xwwlhz+js6lxa0hx50biqbdns74qmhagocu4sxo1wbyv8ygzj6da6dsvcowe+1kk5k7tg2qg2uikznmybflaxoeuuytdcsvkbauso6s39yz1ttbdfiola2z1wv43uxbc7husb47+yh <latexit sha1_base64="upao0ytgznfi8wjp4jzh6wbyake=">aaadbnicbvjda9rafj3er1q/tvoowucizmfdeilukeqhqn3oq8xdtrbjw81kdnfozbjmbsoucd999y/4jr76n/wlvjrzruvueifwoofmutp3jimkmoj7px332vubn29t3n68c/fe/qetryfhji814wowy1yfjmc4fiopukdkp4xmkcwsnytn+7v+csg1ebnq46zguqzjjuacavoqbn058ekeshtegmyjr+jqxrpkpkivnguwv7gbzm/6dn+bxtgty+zquivxbdshs3ncsfrzs6r/mpjtueeiif6ban35mbzxhgejptnin1enm9y41fz7/qloogga0cznhcvbthsmosszrpbjmgyy+avglc0vtpl5zlgaxga7hzefwqgg4yaqfrob02ewseko1/ztsbfsvycqyiyzzyl1zoats6rv5p+0yymj11elvfeiv+yy0aiufhnal4kmqnogcmybmc3sxsmbgaagdl1lxrbzbwdll6mmprist/kkk3gkgixpogygvp2q6kbist+cmvswnvef1cbwsvdojawa7qh9j1rnzwwxeqyofx0cv+offi/4sn3ee9uszom8jk+jrwkyq/bie3jebosrx84t57nzwv3sfnw/ud8vra7tnhlelsr98rszcvet</latexit> Policy Gradient h PT i minimize E et,u t t=1 C t(x t, u t ) s.t. x t+1 = f t (x t, u t, e t ) u t p(u x t ; #) Direct Policy Search probabilistic policy G(, #) = TX C(x t, u t )! XT 1 r # log p # (u t x t ; #)! t=1 t=0

41 <latexit sha1_base64="j4lebcdojzuwdwucyrzfilabtyq=">aaadhhicbvlfb9mwehbcr1eydpdii0ufggxvcsdba5xgamhdhjzot0lnfjmu01qznci+obyr/wqv/co8iv6r+g+wuydrjpoiu3zf3dl3n/nkcanr9dsil12+cvxaxvxojzubt253t+4cmblwli1okup9khpdbfdsbbweo6k0izix7dg/e+p5489mg16qiswqlkoyvbzglicdsu63jgdtrizrmiwak0ttswrezq3kikv+htx4iu4kgvme23dnilgb46tqhnq4scmgj6awmyvb3jwo8tyd08f40hu8g+vl30fve82nm0itpo1u+td3neeudcdu8ac/bov2flrzlinowtskvvnw7ux9agn4yhc3qq+1dpbtbwkykwktmqiqidhjokogde2au8hcmlvhfafnzmrgllremppa5uyb/mahe1yu2n0k8bl9t8isacxc5i7t78ascx78hzeuoxizwq6qgpii5wcvtcbqyi8pnndnkiifcwjv3n0v0xlx6wyn4sopy94voyut2hmtoc0nba0vmadnhggysmkvn8q+50lgt0qzvo/v+cu6tp7efsunhmytffds1kmlyu6qeh39f4ojp/046sehz3u7e600g+geuo+2uyxeof30ar2gealbzvasebumwq/h9/bh+pm8nqzamrtoxcjffwdk/f3k</latexit> <latexit sha1_base64="aqbp24lviyjw82kpzzckyu6idtc=">aaacpxicbvhbbhmxehwwwwmxpuwrf4sikqkkdhesfaluusr4kfilsvop2a5mnuli1etd2wouajxf4gt4hx/gb/bug0qsrrj8fm5cpdnpoaslmpzdcg7dvnp33s795oohjx7vtvb2bzz3rmbf5co3lylyvfjjnyqpvcwmqpyqveivtyr94hsak3pdo0wbcqztlsdsahkqayunnrgbozjii+uypksjahnv4/oerl7y8+rmr7irx1+qu5m02me3ri1vg2gf2mxlz8leix6nc+ey1cquwdumwoliegxjoxdzhdmlbyhrmolqqw0z2risw1vy554z80lu/nhea/bfibiyaxdz6j0zojnd1cryf9rq0eqwlquuhkewn4umtnhketunppygbamfbycm9h/lygygbplprlwpcxco1jop505lky9xg1u0jwoetegzsf11vx6usvgvoc0/ldmz/vv92krufjbtsfb1qv+zpthy9gujnse/dqzvulhyjc7fto/fr1azw56yz6zdivaohbnp7iz1mwdf2q/2k/0kxgsfg14wuhengquyj2znguqpro <latexit sha1_base64="np4otb/lzbrm8sy8kda7c6pm1ec=">aaaczxicbvfdaxnbfj2sx7v+pfroy2aqurfhvwr9eyovfcxyswkd2xs5ozubdj2zxwbu2ir18+q/8n/47qv+bmfticbxwsdhnpsx99y0lmjigh5vbveuxrt+y+vm9q3bd+7ea+/cp7gfm4z3wselm0jbcik076nayqel4absyu/t84ngp/3mjrwfpszzyuckxlrkggf6kmkp3iexapwyvwl+udny8hzbmokc/lukmdv0j8ygywkkod/oxghudx5bp5ikx4x1wxw8f/lq7rkk0wtpnittttglf0e3qbqehbkmo2snnyqzgjnfntij1g6jsmrrbqyfk7zejp3ljbbzgpohhxout6nqyufnh3smo3lh/nnif+y/fruoa2cq9znnunzda8j/auoh+ctrjxtpkgt2osh3kmjbgz9pjgxnkgceadpc/5wycrhg6f1fmbloxxk2skk1dvqwiunrrmqpgvck5aha6gar6q2qkn4cbemhge/wj+rbnnl3jrglte8p/wn17kayp0i0bv8mohnwi8je9pf5z//18jrb5cf5rlokii/ipnlhjkifmpkn/ca/ya/gq+ccl8h8mjvolwsekjuivv4gvjrkjq==</latexit> Policy Gradient for LQR minimize s.t. E h 1 T P T t=1 x t Qx t + u t Ru t i x t+1 = Ax t + Bu t + e t Greedy strategy : Build control ut = Kxt Sample a bunch of random vectors: t N (0, 2 I) Collect samples from control u t = Kx t + t : = {x 1,...,x T } Compute cost: C( ) = TX t=1 Update: K new K old t C( ) x t Qx t + u t Ru t T 1 X t=0 t x t policy gradient only has access to 0-th order information!!!

42 <latexit sha1_base64="o4j6otjoeqzpjitxt8gszcrluky=">aaadnxicbvjnixnbej0zv9b4ldwjl8zgsnglzerqkjwfvfsqw4qb7ei6dj2dmqtz7p6hu2zjhoz3efvvepamxv0l9iqjmsschur3xlv1vxwcswgx3//mb9eu37h5a+92487de/cfnpcfjmyagw5dnsruxmtmghqahihqwkvmgklywnl8evlx51dgrej1gs4zmcg20yirnkgdouzxgsnm6iizw5zliwxzocpof4uswijxgursjlqxnmdx8bamcojwkkykzqykehicu5urqmcjspx0rk4i7cycio+ws42yzxfcaz3r9rbxzvs49ufykios/fufqhvbg23ilo4inbmdiszdxterznaoya7whbsncnpapzlqtvq9/srirhpwtsur7tta9yd0mvjcguyumbxjsj/hxkvdwsw4/nmlgeoxbazj52qmwe6k1ahl8tqhu5kkxh2nzix+g1ewze1sxu5zdc1ucxx4p26cy/jyugid5qiarwslussykmpvzcomcjrl5zbuhhsr4xnmgee33y0qq9wz8i1oikwubu+nsivkxkbhdrsaiglddvw8e1ksj0xbmqhw+id1asu680bmbnrdgftcursjd <latexit sha1_base64="6+d+t3hyrysuqvej1zizpuxu/ba=">aaada3icbvjna9wwejxdryt92qs39ik6lxhpstih0fxsaimkhxxs2k0ca8dotbjxrjknna5zhi+99o/0vnrtd+n/6a+ovn6g7g4hbi/3nmy0mxqvghsiw9+ef+fuvfsp1ty3hj56/orpz3pr1bsvpmxac1ho8xexthdfbsbbspnsmyjhgp2nlg8b/eykacml9qwmjuskyrxpocxgqltzbf3owgayvwdxivlotumromhcgptwpo4ztainaitrhjtkppbvr/wfbpnd4mam3zid55lgnk3ke3gh3zbsrbpqu9ty9q2zdrphp5wfxgxrhhtrpe7sts+jxwwtjfnabtfmgiuljnav5vsweioudcsjvsq5gzqoigqmsbpb1fi1y8y4k7q7cvcmvx3demnmvi6cuxkymgwtif+ndsvi9hllvvkbu7qtlfucq4gbleax14ycmdpaqoburzhoibsquf0tvjnllhld6mrev4rtysywwahxoikjdqnjugq6skdccpyzkiopet6bf6pl28jbb55zmnvh7koo3orzlsrahv8qon3tr2e/+vs2e/b+vpo19ak9ragk0dt0gd6iezrafp3xnntd75x/1f/u//b/tlbfm995hhbc//uxpln0yg==</latexit> <latexit sha1_base64="eh+lm1eg20caiqwbiclxhabyf/4=">aaacxnicbvhlattafb2rrzr9oe2ym6gmiimxuii0m5racskii4tgscbsxdx4wh4ygomzq9rggppx/zyuum1/oynhhdruhweo59z3tusllqxbj4537/6dh492hu8+efrs+yvu3stlw1rg4eguqjdxkvhuuuoijcm8lg1cniq8sm+ogv3qfo2vhb6grylxdpmwuymahjv0z4/9qmgxg0f0c4zmsndnbzxsocu/slwe1hqqlr9c8cn/ntcgsqjpiyozwfnrsbu45yljzzunn3r7wtbygd8gyqt6rlwzzk8tr5ncvdlqegqshydbsxht2pfc4xi3qiywig4gw7gdgnk0cb2afcnfombcp4vxtxnfsf9g1jbbu8ht55kdzeym1pd/08yvtt/etdrlrajfxafpptgvvfkkn0idgttcarbgul65miebqw7da1vwuusua5pu80pluuxwg1u0jwooteg5sn1mvr9lpfhn0jafnpv/q7q0jex/kpkkozh1n9x9lwd3khbz/dvgcn8ybspw/f3v8gn7mh32mr1hpgvze3bittgzgzhbvrof7bf77z142qu8r3euxqenecxwzpv2b2p33yw=</latexit> (μ,λ)-evolution Bandit <latexit sha1_base64="kdgzh6hio9nc9jpwvim6pvpovhi=">aaacnhicbvftaxnben5cfwnrs9p6uzdficzqwp0i9ktliqufk1rs0kjyhnobsbj0b+/yns0jr36cv8av9of4b9xli5jegywh55mzz2cmlzs0fia/a8hwg4ephm/v7d55+uz5xn3/ogdzzwr2ra5yc52crsu1dkmswuvcigspwqv0plppv7dormz1jc0kjdmyazmsashtsf1tpzm4bumtjgjxyz6wlktkoo7m3y95pzln6nal1ojjvrg2w0xwtratqymt4ylzr8wdys5chpqeamv7uvhqxhovkrtodwfoyghibsby91bdhjyufxpn+rvpdpkon/5p4gv234osmmtnweozm <latexit sha1_base64="hbpv3rsxivbvxbvtlqyeh9u0gig=">aaack3icbvftaxnben5c1db6llb8jmhiefio4u4klqhstkbclyqmlesomlezxjbuy7e7jw1hvvlr/kp/xn/jxhrbja4mpdzpve9ekukpjn+3oo1bt+9sbt3dvnf/wcnh7z3dc28rj7avrllumgepshrskysfl6vd0lnci/zqbanffepnptvfavpipqewciwfu Random Search h PT i minimize E et,! t=1 C t(x t, u t ) s.t. x t+1 = f t (x t, u t, e t ) u t = ( t ; # +!) Direct Policy Search! TX G(!, #) = C(x t, u t ) r log p(!) t=1 parameter perturbation G (m) (!, #) = 1 m mx i=1 C(# +! i ) C(#! i ) 2! i TX C(#) = C(x t, u t ) t=1! i N (0, I) random finite difference approximation to the gradient aka Strategies SPSA Convex Opt

43 <latexit sha1_base64="j4lebcdojzuwdwucyrzfilabtyq=">aaadhhicbvlfb9mwehbcr1eydpdii0ufggxvcsdba5xgamhdhjzot0lnfjmu01qznci+obyr/wqv/co8iv6r+g+wuydrjpoiu3zf3dl3n/nkcanr9dsil12+cvxaxvxojzubt253t+4cmblwli1okup9khpdbfdsbbweo6k0izix7dg/e+p5489mg16qiswqlkoyvbzglicdsu63jgdtrizrmiwak0ttswrezq3kikv+htx4iu4kgvme23dnilgb46tqhnq4scmgj6awmyvb3jwo8tyd08f40hu8g+vl30fve82nm0itpo1u+td3neeudcdu8ac/bov2flrzlinowtskvvnw7ux9agn4yhc3qq+1dpbtbwkykwktmqiqidhjokogde2au8hcmlvhfafnzmrgllremppa5uyb/mahe1yu2n0k8bl9t8isacxc5i7t78ascx78hzeuoxizwq6qgpii5wcvtcbqyi8pnndnkiifcwjv3n0v0xlx6wyn4sopy94voyut2hmtoc0nba0vmadnhggysmkvn8q+50lgt0qzvo/v+cu6tp7efsunhmytffds1kmlyu6qeh39f4ojp/046sehz3u7e600g+geuo+2uyxeof30ar2gealbzvasebumwq/h9/bh+pm8nqzamrtoxcjffwdk/f3k</latexit> <latexit sha1_base64="gbej/5jhfqpxmesx5rjmy6uj8yg=">aaacnhicbvhtahnbfj2sx7v+pfptkmegpqbhv4t2z9gcyipungkhu4s7k7vj0nmzzeaoniztg/g0/tuh8w2ctsoyxasdh3pux9x78kpjr3h8uxvdu37j5q2t29t37t67/6c983dojlccb8ioy09zckikxgfjunhawyqyv3isn71t9jovaj00+gvnk8xkmgpzsaeuqhh7ez+ncgsca8033ucveqqqmsgylj90uwk/e8lt7fm43yl78sl4jkiwomowctzeawxpxahfoiahwllreleu1wbjcoux26l3wie4gymoatrqosvqxuyx/flgjrwwnjxnfmh+w1fd6dy8zenmctrz61pd/k8besr2s1rqyhnqctwo8iqt4c15+erafktmaycw <latexit sha1_base64="b7badqzs7+mpsxwxgaronsg5mmk=">aaacmhicbvhtahnbfj2sx7v+tfpp/wwgiyusdougp4mktvckfqqtzndyd3kzuxrmdpm5kw1lhscn8a8+im/jbbrbjf4yojxz5n7mlsbpcfy7e924eev2na272/fup3j4agf38akva6dwqepduvmcpgqyogrijeevqzc5xrp88m2rn31d56m0x3hwywagsdqhbryoi51uamuzejiyncbtbbr5mo/f+y1xgph6ii/2givux4uqmybzgq5yxsnfbidlx6wqdvpwgrwfjxhfwqooswmcb6e1xwr <latexit sha1_base64="kwygz+w7cnkjjvpxrgavefkydqs=">aaachnicbvfdaxnbfj1sty1ra2iffrkmqoisdkvpxwqhfrtsq0stfpil3j3cjenmz5ezozvh6u/xvx+t/8bznijjvdbw5pz7fzncsuth+lsshdx4ehhufvr7/otp8bn64/nazs4i7itmzeymaytkauytjiu3uufie4xxyeky1k9v0viz6w+0yjfoyablvaogt43rdtcmfs5bn1+ptgsv/wdcb4adcg18h0qb0gqb640blxg0yyrluznqyo0wcnokczakhck72shzzeesyizddzwkaoni3fsdf+wzc Random Search for LQR minimize s.t. E h 1 T P T t=1 x t Qx t + u t Ru t i x t+1 = Ax t + Bu t + e t Greedy strategy : Build control ut = Kxt Sample a random perturbation: N (0, 2 I) Collect samples from control u t =(K + )x t : = {x 1,...,x T } Compute cost: J( )= T t=1 x t Qx t + u t Ru t Update: K K t J( )

44 Reinforce is NOT Magic What is the variance? Necessarily becomes derivative free as you are accessing the decision variable by sampling Approximation can be far off But it s certainly super easy!

45 Deep Reinforcement Learning Simply parameterize Q-function or policy as a deep net Note, ADP is tricky to analyze with function approximation Policy search is considerably more straightforward: make the log-prob a deep net.

46 <latexit sha1_base64="mr94ezqth17vzwjopx3thjsdtck=">aaacg3icbvfbaxnred5ztdz6aaqpvhwmqousdktbfsguffshdxwnlsrlmd2zjeppztlntiqu+sw+6o/y33g2jwasbwy+vm/uu5saaqfp71zy5+69nfu7d/yepnr8zl998prbcjvx2fnoo39vqebnfntm <latexit sha1_base64="c3vasflhsyp2zenoal07qvhm1sk=">aaaci3icbvfdsxtbfj1sbwttq7e++ncxoaeqoyrdeaqlgvhbffdbulofzf3utm6swznzzeaujcz5nx1tf5d/xtmyqpp0wsdhnpsx9540v9jrgd7ugmcrz1+8xh219vrn2/wn+ua7ny4r <latexit sha1_base64="oi5ov9kcoehyn9bwcwwjat8txu4=">aaac4hicbvhlihnbfk1ux2p7yujstwfqrkyyxslmufagfhqxixgnm5buqnx1taeyquqm6ryknp0brsstn+xop3fpdrjlknih4hdouy+6n6uudbjhv4lwytvr12/s3ixu3b5z915v9/5nv9zwwfcuqrtngxegpiehslrwxlngolnwll286fszl2cdlm0nnfeqal4yozgco6fgptubn7jpwvqkjhku0jsz5mjlri0yfuiztzioxgiw+t+xy2rppo02k+ikyqfa85fdto7suttu9enbvai6ddgk9mkqtse7qzrkpag1gbskozdicyvpwy1koacnktpbxcufl2dkoeeaxnos1tlsx57j6as0/hmkc/zyrso1c3odeacfdoo2ty78nzaqcxkuntjunyiry0atwlesabdjmkslatxcay6s9lnsmewwc/sxwouyqf2bwptjm6unfguog6zcgvrusqeouttdr5p3uin6krtht2qxxb+ql9vje29lide9o/hnnk+3zp4gbhp922d4fpbywd686b+/xl <latexit sha1_base64="+ojsm2yurvuosb1y5f0dyh5w9ea=">aaaco3icbvhbbtnaen2ywym3fb55wrgbioscjzbahkcvqikhpbroakxyisbritpqem3tjqsek5/b1/akh8hfse4digkjrfbonllpwmlyhia/osgvq9eu39i5uxvr9p2 Simplest Example: LQR minimize TX (x t ) 2 1 t=0 subject to x t+1 = x t = +ru 2 t apple x t + apple zt v t apple 0 1/m u t samples nominal control and ADP with 10 samples

47 <latexit sha1_base64="mr94ezqth17vzwjopx3thjsdtck=">aaacg3icbvfbaxnred5ztdz6aaqpvhwmqousdktbfsguffshdxwnlsrlmd2zjeppztlntiqu+sw+6o/y33g2jwasbwy+vm/uu5saaqfp71zy5+69nfu7d/yepnr8zl998prbcjvx2fnoo39vqebnfntm <latexit sha1_base64="c3vasflhsyp2zenoal07qvhm1sk=">aaaci3icbvfdsxtbfj1sbwttq7e++ncxoaeqoyrdeaqlgvhbffdbulofzf3utm6swznzzeaujcz5nx1tf5d/xtmyqpp0wsdhnpsx9540v9jrgd7ugmcrz1+8xh219vrn2/wn+ua7ny4r <latexit sha1_base64="oi5ov9kcoehyn9bwcwwjat8txu4=">aaac4hicbvhlihnbfk1ux2p7yujstwfqrkyyxslmufagfhqxixgnm5buqnx1taeyquqm6ryknp0brsstn+xop3fpdrjlknih4hdouy+6n6uudbjhv4lwytvr12/s3ixu3b5z915v9/5nv9zwwfcuqrtngxegpiehslrwxlngolnwll286fszl2cdlm0nnfeqal4yozgco6fgptubn7jpwvqkjhku0jsz5mjlri0yfuiztzioxgiw+t+xy2rppo02k+ikyqfa85fdto7suttu9enbvai6ddgk9mkqtse7qzrkpag1gbskozdicyvpwy1koacnktpbxcufl2dkoeeaxnos1tlsx57j6as0/hmkc/zyrso1c3odeacfdoo2ty78nzaqcxkuntjunyiry0atwlesabdjmkslatxcay6s9lnsmewwc/sxwouyqf2bwptjm6unfguog6zcgvrusqeouttdr5p3uin6krtht2qxxb+ql9vje29lide9o/hnnk+3zp4gbhp922d4fpbywd686b+/xl <latexit sha1_base64="+ojsm2yurvuosb1y5f0dyh5w9ea=">aaaco3icbvhbbtnaen2ywym3fb55wrgbioscjzbahkcvqikhpbroakxyisbritpqem3tjqsek5/b1/akh8hfse4digkjrfbonllpwmlyhia/osgvq9eu39i5uxvr9p2 Simplest Example: LQR minimize TX (x t ) 2 1 t=0 subject to x t+1 = x t = +ru 2 t apple x t + apple zt v t apple 0 1/m u t samples nominal control and ADP with 10 samples

48 Extraordinary Claims Require Extraordinary Evidence* * only if your prior is correct How can we dismiss an entire field which claims such success? blog.openai.com/openai-baselines-dqn/ Reinforcement learning results are tricky to reproduce: performance is very noisy, algorithms have many moving parts which allow for subtle bugs, and many papers don t report all the required tricks. RL algorithms are challenging to implement correctly; good results typically only come after fixing many seemingly-trivial bugs. Average Return Average Return arxiv: HalfCheetah-v1 (TRPO, Different Random Seeds) Random Average (5 runs) Random Average (5 runs) Timesteps 10 6 HalfCheetah-v1 (TRPO, Codebase Comparison) 500 There has to be a better way! Schulman 2015 Schulman 2017 Duan Timesteps 10 6

49 Coarse-ID control G u y K

50 Coarse-ID control v Δ ^ G w u High dimensional stats bounds the error Coarse-grained model is trivial to fit y K Design robust control for feedback loop

51 <latexit sha1_base64="6bzqpf3vhyc8ozgggdlvwhkvsy4=">aaac2nicbvfnbxmxehwwrxk+ujhysyha5aprbouehjcqgashchvbakxsduv1jsmotndlz6ke7z44ia78lp4af4mrhpcmqsini1l+em9mpj6xfqodhegpvndu/iwllzyut69cvxb9rmfz5gexl1bcqoyqt4ezckdqwicqfbwwfotofbxkxy8a/eajwie5eu/zahitjgbhkav5ku2m4wwmacphrzjxlvj1o84lsijya4sgb5vggxo/qr0rf5nlq37n7/hyltqt8hluh73h8um/tje/nkuh3+azfooto512dga0bjx2umevxarfb9esdnky9tpnvhkpcllqmcsvcg4yhqulvh2hvocnlb0uqh6lcqw9bgz1sbvysm3vembex7n1xxbfsp9wvei7n9ezz9scpu6s1pd/04yljz8mfzqijddy9kfxqtjlvnkuh6efswrugzaw/axctouvkrwhk68sehcgv35szuqdmh/bgvbrjkzwpapsak3zq+ovksxfcep4hk6m9ff1brt56yvovfup9rzr5v5asjckorv+dtdy6t3rrw8fd3f7s2c22g12h22xid1hu+w122cdjtl39pp9yr+djpgcfam+nqygrwxnlbyswbc/ks3qpw==</latexit> Coarse-ID control (static case) minimize u x Qx subject to x = Bu + x 0 B unknown! Collect data: {(x i, u i )} x i = Bu i + x 0 + e i Estimate B: minimize B P N i=1 kbu i + x 0 x i k 2 Guarantee: kb ˆBk apple with high probability ˆB Note: x = ˆBu + x 0 + B u Robust optimization problem: minimize sup k u B kapple kq 1/2 (x Bu)k subject to x = ˆBu + x 0

52 <latexit sha1_base64="6bzqpf3vhyc8ozgggdlvwhkvsy4=">aaac2nicbvfnbxmxehwwrxk+ujhysyha5aprbouehjcqgashchvbakxsduv1jsmotndlz6ke7z44ia78lp4af4mrhpcmqsini1l+em9mpj6xfqodhegpvndu/iwllzyut69cvxb9rmfz5gexl1bcqoyqt4ezckdqwicqfbwwfotofbxkxy8a/eajwie5eu/zahitjgbhkav5ku2m4wwmacphrzjxlvj1o84lsijya4sgb5vggxo/qr0rf5nlq37n7/hyltqt8hluh73h8um/tje/nkuh3+azfooto512dga0bjx2umevxarfb9esdnky9tpnvhkpcllqmcsvcg4yhqulvh2hvocnlb0uqh6lcqw9bgz1sbvysm3vembex7n1xxbfsp9wvei7n9ezz9scpu6s1pd/04yljz8mfzqijddy9kfxqtjlvnkuh6efswrugzaw/axctouvkrwhk68sehcgv35szuqdmh/bgvbrjkzwpapsak3zq+ovksxfcep4hk6m9ff1brt56yvovfup9rzr5v5asjckorv+dtdy6t3rrw8fd3f7s2c22g12h22xid1hu+w122cdjtl39pp9yr+djpgcfam+nqygrwxnlbyswbc/ks3qpw==</latexit> Coarse-ID control (static case) minimize u x Qx subject to x = Bu + x 0 B unknown! Collect data: {(x i, u i )} x i = Bu i + x 0 + e i Estimate B: minimize B P N i=1 kbu i + x 0 x i k 2 Guarantee: kb ˆBk apple with high probability ˆB Solve robust optimization problem: minimize sup k u B kapple kq 1/2 (x Bu)k subject to x = ˆBu + x 0 Relaxation: (Triangle inequality!) minimize kq 1/2 xk + kuk u subject to x = ˆBu + x 0

53 Coarse-ID control (static case) minimize u x Qx subject to x = Bu + x 0 B unknown! Collect data: {(x i, u i )} x i = Bu i + x 0 + e i Estimate B: Guarantee: P N minimize B i=1 kbu i x i k 2 kb ˆBk apple with high probability ˆB Relaxation: (Triangle inequality!) minimize kq 1/2 xk + kuk u subject to x = ˆBu + x 0 Generalization bound cost(û) apple cost(u? )+4 ku? kkq 1/2 x? k ku? k 2

54 <latexit sha1_base64="6bzqpf3vhyc8ozgggdlvwhkvsy4=">aaac2nicbvfnbxmxehwwrxk+ujhysyha5aprbouehjcqgashchvbakxsduv1jsmotndlz6ke7z44ia78lp4af4mrhpcmqsini1l+em9mpj6xfqodhegpvndu/iwllzyut69cvxb9rmfz5gexl1bcqoyqt4ezckdqwicqfbwwfotofbxkxy8a/eajwie5eu/zahitjgbhkav5ku2m4wwmacphrzjxlvj1o84lsijya4sgb5vggxo/qr0rf5nlq37n7/hyltqt8hluh73h8um/tje/nkuh3+azfooto512dga0bjx2umevxarfb9esdnky9tpnvhkpcllqmcsvcg4yhqulvh2hvocnlb0uqh6lcqw9bgz1sbvysm3vembex7n1xxbfsp9wvei7n9ezz9scpu6s1pd/04yljz8mfzqijddy9kfxqtjlvnkuh6efswrugzaw/axctouvkrwhk68sehcgv35szuqdmh/bgvbrjkzwpapsak3zq+ovksxfcep4hk6m9ff1brt56yvovfup9rzr5v5asjckorv+dtdy6t3rrw8fd3f7s2c22g12h22xid1hu+w122cdjtl39pp9yr+djpgcfam+nqygrwxnlbyswbc/ks3qpw==</latexit> Coarse-ID control (static case) minimize u x Qx subject to x = Bu + x 0 B unknown! Collect data: {(x i, u i )} x i = Bu i + x 0 + e i Estimate B: minimize B P N i=1 kbu i + x 0 x i k 2 Guarantee: kb ˆBk apple with high probability ˆB Relaxation: (Triangle inequality!) minimize kq 1/2 xk + kuk u subject to x = ˆBu + x 0 Generalization bound cost(û) =cost(u? )+O( )

55 Coarse-ID optimization minimize x f(x) subject to g(x; #) apple 0 # unknown! Collect data: {(x i, g(x i ; #)} Estimate #: ˆ# Guarantee: dist(#, ˆ#) apple with high probability Relaxation: minimize ˆfrobust (x) x subject to g(x; ˆ#) apple 0 Generalization bound f(ˆx) apple f(x? )+err, kf ˆfrobust k, x?, #

56 <latexit sha1_base64="euyqlm8oqoqnwpvqldlbjjbjanm=">aaadnxicbvjlbxmxepyurxietehixskikhrfuwgjlpvkacehhxastfk8jbyon7fqe1f2lcry+7u48jc4cenc+qt400uicsnzm/7m5znpasgfhsj6horxrl67fmprzuvw7tt3t9s794y2lw3ja5bl3jyl1hipnb+aamnpcsopsiu/ts9e1/7tt9xykes+laqekdrvihomgofg7w8k5vohhtwglionzduiks3ntgktlpjck7ylirrq7preiokmfgt+mqidwaiiisistd3bikiewyhkhjixv65fywjlnwqhcxxex/mxnd/bj7xg+7hc3j7u+rjmqkjt1nahw7ec+9t9umih+fwtdfshe83h0cjct5onj9udqbstbw8acwn0ucph450gizoclypryjjao4qjahjfdgst3m9fwl5qdkgnforntrw3ivuuuskppdlbww780ycx6l8zjiprfyr1kfvu7lqvbv/ng5wqvuyc0eujxlplrlkpmes45g1phoem5miblbnh34rzjpp1g2d3pcuydshzyiruxmrb8glfqyxmwvapwg6kelb9vo6dkbj/pnrixs3ox68vw7v33oipapu057+qfrwr7amj19e/aqyfdeoog5887xwendrsoqfoidpdmxqbdtf7diwgiaw7qs8ybmpwa/gj/bn+ugwngybnplqr8pcfxx4jrw==</latexit> <latexit sha1_base64="uo+1skyh9ob/xn5keqmafdwhdva=">aaac03icbvfdixmxfe3hr3x92k4++hisqotsokxqfvtu0ieck253f5pustn3pmgtzgxyr7beerff/vn+ch+dr/pupq1gwy8edufcj9xzp4wsdnu9h43o2vubn2/t3n69c/fe/b3m/omtl5dwwfdkkrdnu+5asqndlkjgrlda9vtb6ftida2ffglrzg6ocv7awppmyfqkjogancfhlinlylcqbpz7iilisc1sy4vntmaan/dpo3ladcrpvoib8ilnmupmaq+lqao2gyxp0wfqoklyc96vklmym2fn0mz1ur1f0g0qr0clrojost8ysyqxpqadqnhnrngvwlhnfqvquo2y0khbxqxpybsg4rrc2c+cqoitwcq0zw14bumc/bfcc+3cxe9dzr2d29rq8n/aqmt05dhlu5qiriwhpawimnpavppicwlvpaaurax/pwlgg4syzf+bsuhdgfjbx Simple Example: LQR minimize s.t. lim T!1 E h Gaussian noise Obvious strategy : Estimate (A,B), build control ut = Kxt 1 T x t+1 = Ax t + Bu t + e t P T t=1 x t Qx t + u t Ru t i Run an experiment for T steps with random input. Then minimize (A,B) P T i=1 kx i+1 Ax i Bu i k 2 If T Õ 2 (d + p) min( c ) 2 where c = A c A + BB controllability Gramian then A Â B ˆB and w.h.p. [Dean, Mania, Matni, R.,Tu, 2017] [Mania, R., Simchowitz, Tu, 2018]

57 <latexit sha1_base64="qybpfo5ltn9luocv7dk5zoe+lxe=">aaadhhicbvjnbxmxepuuxyvqsohixsjc2qhpuxuq4acoupgk2hykanpkcro5xm9i1fzubs9q5pqvcowpcenckfg3ejnuahjgsvt03njgm8/dgjnt4vhven65e+/+g7whtuep1588rw88o9f5qqjtkpzn6myinevm0q5hhtozqleshpyedi/2kv30g1wa5flytaraf3gkwcyinp4a1h+gtgfidyi0xsyeuubwwqbpg5wznwaiti/hhmqhvg3tyyhwwckbzvgjs7hzxs5mhlhvmt6sbjporagzo0nz3g4lo20hnyg6ppyvrtednljsndbngtkxylg4ew7sbctwettkn4ums8fodeqnedueblwfyrw0wdyobhtbh6u5kqwvhncsds+jc9o3wblgohu1vgpayhkbr7tnocsc6r6dbttbv55jyzyrf6sbu/b2dyuf1hmx9jnvuhpzq8j/ab3szo/6lsminfsswaos5ndkslihpkxryvjea0wu82+fziz9tow3cahlthzbycik9qqujoqpxwk5utike1jtizct1vr2n3eov2kpyafy4eb1zss5+srgzohwx/8u2vxj9oyky+tfbsft7stetr68aex+nfuzbl6alyaccxgldsfncas6gatrwevgffah/b7+dh+fv2epytc/8xwsrpjnh9uaaes=</latexit> Simple Example: LQR Obvious strategy : Estimate (Â, ˆB), build control u t = ˆKx t minimize u P sup lim 1 T k A k 2 apple A, k B k 2 apple B T!1 T t=1 x t Qx t + u t Ru t s.t. x t+1 =(Â + A)x t +(ˆB + B )u t Solving an SDP relaxation of this robust control problem yields J(ˆK) J? apple C cl J? r 2 min( c ) 1/2 (d + p) + kk? k 2 T w.h.p. c = A c A + BB controllability Gramian cl := k(zi A BK? ) 1 k H1 closed loop gain This also tells you when your cost is finite! [Dean, Mania, Matni, R., Tu 2017]

58 Why robust? x t+1 = x t u t + e t Slightly unstable system, system ID tends to think some nodes are stable

59 Least-squares estimate may yield unstable controller Robust synthesis yields stable controller

60 Model-free performs worse than model-based

61 Why has no one done this before? Our guarantees for least-squares estimation required some heavy machinery. Indeed, best bounds building on papers from last few years Our SDP relaxation uses brand new techniques in controller parameterization (Matni et al) Naive robust synthesis is nonconvex and requires solving very large SDPs The Singularity has arrived!

62 <latexit sha1_base64="eumgloqutsxkdwjk4wwvpysqp+q=">aaadknicbvjnaxsxen3dfqxur5z2miuokayj4+yaqnsqksmk2d6kte4clmnkrwylsnqnnftilp1hvfap9bz67q+p1nygtjsgelw3mthm0ygt3eau3frbg4ephj/zelp59vzfy+3qzqttk+aash5nrarpr8qwwrxraqfbzjpnibwjdja6pcr1sx9mg56q7zdl2ecsiejjtgk4alj9hceaunso8zsa7rt1/fyqgyc6shcaycgu0bfu4eyfhxmpydbiswcqpawikjw8hhal1zs5j0muiile3qvn6xd2pz5ofwgp4zvoojs+gbaw5pmp1cvyxgmwapecbphekotwmoxldszssrgf4iqjipxcurldai1qrvnamybegpq3jjphjj/asupzyrrqqyzpx1ega0s0ccpyucg5yrmhl2tc+g4qipkz2pl+c/twmqkap9odbwjo3r9hitrmjkcusxzergsl+t+tn8p4w8bylexaff00guccqypks1dcnamgzg4qqrl7k6jt4jyeztkvlvpagamrk9jrxhgajmynfxanmjjsmjceq3iqe8yfqn+imqhbonknurklhh7mew6m0xx/rtu3kp0h8fr6n8fpqxlhzfjru9rhx6u1w96u98ylvdh77x16x7wtr+drf9f/5lf9tvaz+b3cbn8wqyg/vppaw4ng7z9wcawm</latexit> where Even LQR is not simple!!! minimize J := t=1 x T t Qx t + u T t Ru t s.t. J(ˆK) J? apple C cl J? c = A c A + BB controllability Gramian x t+1 = Ax t + Bu t + e t Gaussian noise r 2 min( c ) 1/2 (d + p) log(1/ ) + kk? k 2 n cl := k(zi A BK? ) 1 k H1 closed loop gain Hard to estimate Control insensitive to mismatch Easy to estimate Control very sensitive to mismatch 50 papers on Cosma Shalizi s blog say otherwise! Need to fix learning theory for time series.

63 <latexit sha1_base64="euyqlm8oqoqnwpvqldlbjjbjanm=">aaadnxicbvjlbxmxepyurxietehixskikhrfuwgjlpvkacehhxastfk8jbyon7fqe1f2lcry+7u48jc4cenc+qt400uicsnzm/7m5znpasgfhsj6horxrl67fmprzuvw7tt3t9s794y2lw3ja5bl3jyl1hipnb+aamnpcsopsiu/ts9e1/7tt9xykes+laqekdrvihomgofg7w8k5vohhtwglionzduiks3ntgktlpjck7ylirrq7preiokmfgt+mqidwaiiisistd3bikiewyhkhjixv65fywjlnwqhcxxex/mxnd/bj7xg+7hc3j7u+rjmqkjt1nahw7ec+9t9umih+fwtdfshe83h0cjct5onj9udqbstbw8acwn0ucph450gizoclypryjjao4qjahjfdgst3m9fwl5qdkgnforntrw3ivuuuskppdlbww780ycx6l8zjiprfyr1kfvu7lqvbv/ng5wqvuyc0eujxlplrlkpmes45g1phoem5miblbnh34rzjpp1g2d3pcuydshzyiruxmrb8glfqyxmwvapwg6kelb9vo6dkbj/pnrixs3ox68vw7v33oipapu057+qfrwr7amj19e/aqyfdeoog5887xwendrsoqfoidpdmxqbdtf7diwgiaw7qs8ybmpwa/gj/bn+ugwngybnplqr8pcfxx4jrw==</latexit> Simplest Example: LQR minimize s.t. lim T!1 E How many samples are needed for near optimal control? Lots of asymptotic work in the 80s (adaptive control) Fietcher, 1997: PAC, discounted costs, many assumptions on contractivity, bugs in proof. Abbas-Yadkori & Szepesvári, 2011: Regret, exponential in dimension, no guarantee on parameter convergence, NP-hard subroutine. Ibrahimi et al, 2012: require sparseness in state transitions Ouyang et al, 2017: Bayesian setting, unrealistic assumptions, not implementable Abeille and Lazaric, 2015: Suboptimal bounds h 1 T x t+1 = Ax t + Bu t + e t P T t=1 x t Qx t + u t Ru t i

64 The Linearization Principle If a machine learning algorithm does crazy things when restricted to linear models, it s going to do crazy things on complex nonlinear models too. What happens when we return to nonlinear models?

65 <latexit sha1_base64="geqtxxms1vozi4hkkogwziugzue=">aaadzxicbvndb9mwfm1apkb42ucrf4ufizdndppm6dtgd0xcqnlwnmmpjsd1vquxexyn28jck/+pn975ithxgnjxpsjh555z73xsrfnccgxhj5vo99bto3dx79n3hzx89hht/cmnpc0kososjqk8ixboeybosdgv0jnmusyjhb5hs706fzynmmepofjxgr1zfc5yzahwmjpbx/kzrvscivlhqeiwrepcrgkhlb25cujejirpjguxzu+nqrimoxi8w5emfxxgxrmfuydpbzatqgjhaiftehq7vprsrxf5hpnzbxspdn1wulh6/2zlq5cygcwlznhb8moweavfgx14wtincmuotndp+x3hvh3n79pbpetupv00y4y2sq7g4lckgragwou1pxbqpu7ivsmlck/bbv6o6zzsd3ng7ptu4/eccgvpmu5mvdyneoofrtr3e54bgb9jqh/g1fjxqoe6agp5zmrph0bkbahmnd7urd1pwbfi2cqu9ye0e3hec4jmjk0errnwafrm/hzq2dog3izngjsatwddamn4tvy9nksk4fqokua8p0uwu+mss30delrzyzhtdjozpv1tdqxmnb+xzxwswavnteccsv3ojtbsoqpepm+veksvhktpvpyryf/ltgsvb+osiaxqvbdtkc4sofjq320wyzislvxpgilkelzaplhiovqfyoupgja3fbom3o3bnvrobuy+br/gqvxmem69tjc1y+1a+9bqglmk87bzufolu3ah3xn3uvvvsdsrreep9u90v/0cyxirpg==</latexit> Random search of linear policies outperforms Deep Reinforcement Learning Larger is better

66 <latexit sha1_base64="geqtxxms1vozi4hkkogwziugzue=">aaadzxicbvndb9mwfm1apkb42ucrf4ufizdndppm6dtgd0xcqnlwnmmpjsd1vquxexyn28jck/+pn975ithxgnjxpsjh555z73xsrfnccgxhj5vo99bto3dx79n3hzx89hht/cmnpc0kososjqk8ixboeybosdgv0jnmusyjhb5hs706fzynmmepofjxgr1zfc5yzahwmjpbx/kzrvscivlhqeiwrepcrgkhlb25cujejirpjguxzu+nqrimoxi8w5emfxxgxrmfuydpbzatqgjhaiftehq7vprsrxf5hpnzbxspdn1wulh6/2zlq5cygcwlznhb8moweavfgx14wtincmuotndp+x3hvh3n79pbpetupv00y4y2sq7g4lckgragwou1pxbqpu7ivsmlck/bbv6o6zzsd3ng7ptu4/eccgvpmu5mvdyneoofrtr3e54bgb9jqh/g1fjxqoe6agp5zmrph0bkbahmnd7urd1pwbfi2cqu9ye0e3hec4jmjk0errnwafrm/hzq2dog3izngjsatwddamn4tvy9nksk4fqokua8p0uwu+mss30delrzyzhtdjozpv1tdqxmnb+xzxwswavnteccsv3ojtbsoqpepm+veksvhktpvpyryf/ltgsvb+osiaxqvbdtkc4sofjq320wyzislvxpgilkelzaplhiovqfyoupgja3fbom3o3bnvrobuy+br/gqvxmem69tjc1y+1a+9bqglmk87bzufolu3ah3xn3uvvvsdsrreep9u90v/0cyxirpg==</latexit> Larger is better

67 AverageReward300 0 Swimmer-v Hopper-v HalfCheetah-v Ant-v Episodes AverageReward Walker2d-v Episodes Humanoid-v Episodes Larger is better

68 <latexit sha1_base64="rli7jlp75guz6h4r9bivhozu6ym=">aaadmhicbvjnj9mwehxc11k+undkyqhytdqqshasxcqtwnby2enxtlsr1svyxke11nyie4jsovcjupjh4is48itwukgwlsnfgr838+yzlzitwkiqfpf8a9dv3ly1c7t15+69+w/auw9pbzobxicslak5j6nlumg+aqgsn2eguxvlfhzfhnb82udurej1gfyznym60cirjikdovzxevof0cu1hq6qusqqrvscfquswijxivd4dxnfyrnh5dsq4ktybkbe5ioqyrhwh8b4mijueue/j6ch990xccdyvb9wpwleygkzqhpvo4bbreh4cdwe4urvc587gf7nigqhhevyww5zfsqtroyarfvbhot589qo3qkgwtrwdhi2sqc1myp2vrmzpyxxxaot1nppggqwc3igmoru9nzyjliluubtl2qquj2v6y1x+jld5jhjjfs04dx6b0djlburfbvkel92k6vb/3hthjjxs1lolaeu2evfss4xpli2dm+f4qzkyiwugeheitmsgsragxvllrv2xtmvscoi14klc76bsijauadadookxu9vhgkp8xuqlt6unfvdotma7r4rcwg2f+z+ht3bknaghjvr305onw/cybcevogcvg6s2ugp0vpursf6iq7qozrce8s8j96rn/jo/c/+n/+h//oy1peankfosvi/fgnhzqxm</latexit> <latexit sha1_base64="pkddr1xni4717umkasm7i9gzej0=">aaac0hicbvhlattafb2rryr9oe0ym6gmybfjpfjon4xqtlsllnkhnyakxgh8jq8ajctmvberontbv+pn9au6bf+gi8eb2u6fgcm59zh33ksswqdn/ew5n27eun1nb//g7r37dx72dx9ntvlrdhneyljfjsyafaomkfdczawbfymeiyq/7fsll6cnknvnxfyqfsxtihwcoaxifhbwis6hibi6zl36iozmz2ehvnzult210ilor7vlj2lymjwnsfo2jrtoqwkpbnqan/mx3w7t68wrug6ortbhko4pvlg3crol/duykhwcx4e9kjyvvc5aizfmmmd3kowaplfwce1bwbuogm9zbogfihvgomblqkufwmzg01lbp5cu2h8rglyysywsm9ltyra1jvyfftsyvowaoaoaqfgrqwktkza0s5tohaaocmkb41ryv1i+z5pxtmzvtfn1robvbnisaiv4oymtvuicnbokasyyun1wztshjf3elkfnncfxqm3bycm3ihnormf2usrdsbyh8bft3wxtz2pfg/sfng9oxq9ps0eoybmyjd55qu7ie3jojosth+qx+u3+ob+dhfpv+xav6vtwny/jrjjf/wjmcoov</latexit> <latexit sha1_base64="wjxwbc8ktns7bqu2kdxmd9r45bw=">aaacxxicbvfdi9nafj3gr3x96uqjl4nfslguzbhcl4xfvdahfahouwtncjppttp0zhjmbpawepxx/hfbv/0dtrovboufgcm592puuwkphcug+nhx7ty9d//bwcpdr4+fph3wpxo+suvloix5iqtzntilumgyo0aj16ubplijv+nivnwvbsbyueivucohvizxihocoaos7igqrrl6yz49prezeasetuqqoefjwl8oqj59qypfcj6m9ccmqagjjgq4pzokdhs/a2sh1qd6/ciifi5x0u0fw2addb+eg9ajmxglr504mhw8uqcrs2btnaxkjgtmuhajzwfuwsgzx7acpg5qpsdg9xr1hr52zixmhxfpi12z/1butfm7uqnlblewu1pl/k+bvpidxlxqzywg+e2grjiuc9r6sgfcaee5cobxi9xfkz8zwzg6t7emrhuxwlc2qzevfryywq4rcymgodickiz0u1v9iaskx5i29ll1+k/q2ray/0hkau3g0p1u9/es3uhcxfv3wer4gabd8ppb3tn7zwkoyevyivgkjo/igflermrmoplofpjf5ld34skpvzvbvk+zqxlbtsl79gfagt6r</latexit> Model Predictive Control h i PT minimize E e t=1 C t(x t, u t )+C f (x T+1 ) s.t. Optimal Policy: x t+1 = f t (x t, u t, e t ), x 1 = x u t = t ( t ) k ( k ) = arg min u C k (x k, u)+e e [V k+1 (f k (x k, u, e))] MPC: use the policy at every time step 1 (x) = arg min u C k (x, u)+e e [V 1 (f 1 (x, u, e))] MPC ethos: plan on short time horizons, use feedback to correct modeling error and disturbance. MPC trades improved computation for control cleverness, requiring significant planning for each action.

69 Model Predictive Control Videos from Todorov Lab

70 Actionable Intelligence Control Theory Reinforcement Learning is the study of how to use past data to enhance the future manipulation of a dynamical system

71 Actionable Intelligence is the study of how to use past data to enhance the future manipulation of a dynamical system As soon as a machine learning system is unleashed in feedback with humans, that system is an actionable intelligence system, not a machine learning system.

Machine Learning for Data Science (CS4786) Lecture 11. Spectral Embedding + Clustering

Machine Learning for Data Science (CS4786) Lecture 11. Spectral Embedding + Clustering Machine Learning for Data Science (CS4786) Lecture 11 Spectral Embedding + Clustering MOTIVATING EXAMPLE What can you say from this network? MOTIVATING EXAMPLE How about now? THOUGHT EXPERIMENT For each

Bardziej szczegółowo

Machine Learning for Data Science (CS4786) Lecture 24. Differential Privacy and Re-useable Holdout

Machine Learning for Data Science (CS4786) Lecture 24. Differential Privacy and Re-useable Holdout Machine Learning for Data Science (CS4786) Lecture 24 Differential Privacy and Re-useable Holdout Defining Privacy Defining Privacy Dataset + Defining Privacy Dataset + Learning Algorithm Distribution

Bardziej szczegółowo

Linear Classification and Logistic Regression. Pascal Fua IC-CVLab

Linear Classification and Logistic Regression. Pascal Fua IC-CVLab Linear Classification and Logistic Regression Pascal Fua IC-CVLab 1 aaagcxicbdtdbtmwfafwdgxlhk8orha31ibqycvkdgpshdqxtwotng2pxtvqujmok1qlky5xllzrnobbediegwcap4votk2kqkf+/y/tnphdschtadu/giv3vtea99cfma8fpx7ytlxx7ckns4sylo3doom7jguhj1hxchmy/irhrlgh67lxb5x3blis8jjqynmedqujiu5zsqqagrx+yjcfpcrydusshmzeluzsg7tttiew5khhcuzm5rv0gn1unw6zl3gbzlpr3liwncyr6aaqinx4wnc/rpg6ix5szd86agoftuu0g/krjxdarph62enthdey3zn/+mi5zknou2ap+tclvhob9sxhwvhaqketnde7geqjp21zvjsfrcnkfhtejoz23vq97elxjlpbtmxpl6qxtl1sgfv1ptpy/yq9mgacrzkgje0hjj2rq7vtywnishnnkzsqekucnlblrarlh8x8szxolrrxkb8n6o4kmo/e7siisnozcfvsedlol60a/j8nmul/gby8mmssrfr2it8lkyxr9dirxxngzthtbaejv

Bardziej szczegółowo

tum.de/fall2018/ in2357

tum.de/fall2018/ in2357 https://piazza.com/ tum.de/fall2018/ in2357 Prof. Daniel Cremers From to Classification Categories of Learning (Rep.) Learning Unsupervised Learning clustering, density estimation Supervised Learning learning

Bardziej szczegółowo

Machine Learning for Data Science (CS4786) Lecture11. Random Projections & Canonical Correlation Analysis

Machine Learning for Data Science (CS4786) Lecture11. Random Projections & Canonical Correlation Analysis Machine Learning for Data Science (CS4786) Lecture11 5 Random Projections & Canonical Correlation Analysis The Tall, THE FAT AND THE UGLY n X d The Tall, THE FAT AND THE UGLY d X > n X d n = n d d The

Bardziej szczegółowo

Hard-Margin Support Vector Machines

Hard-Margin Support Vector Machines Hard-Margin Support Vector Machines aaacaxicbzdlssnafiyn9vbjlepk3ay2gicupasvu4iblxuaw2hjmuwn7ddjjmxm1bkcg1/fjqsvt76fo9/gazqfvn8y+pjpozw5vx8zkpvtfxmlhcwl5zxyqrm2vrg5zw3vxmsoezi4ogkr6phieky5crvvjhriqvdom9l2xxftevuwcekj3lktmhghgniauiyutvrwxtvme34a77kbvg73gtygpjsrfati1+xc8c84bvraowbf+uwnipyehcvmkjrdx46vlykhkgykm3ujjdhcyzqkxy0chur6ax5cbg+1m4bbjptjcubuz4kuhvjoql93hkin5hxtav5x6yyqopnsyuneey5ni4keqrxbar5wqaxbik00icyo/iveiyqqvjo1u4fgzj/8f9x67bzmxnurjzmijtlybwfgcdjgfdtajwgcf2dwaj7ac3g1ho1n4814n7wwjgjmf/ys8fenfycuzq==

Bardziej szczegółowo

Analysis of Movie Profitability STAT 469 IN CLASS ANALYSIS #2

Analysis of Movie Profitability STAT 469 IN CLASS ANALYSIS #2 Analysis of Movie Profitability STAT 469 IN CLASS ANALYSIS #2 aaaklnictzzjb9tgfmcnadpg7oy0lxa9edva9kkapdarhyk2k7gourinlwsweyzikuyiigvyleiv/cv767fpf/5crc1xt9va5mx7w3m/ecuqw1kuztpx/rl3/70h73/w4cog9dhhn3z62d6jzy+yzj766txpoir9nzszisjynetqr+rvlfvyoozu5xbybpsxb1wahul8phczdt2v4zgchb7uecwphlyigrgkjcyiflfyci0kxnmr4z6kw0jsokvot8isntpa3gbknlcufiv/h+hh+eur4fomd417rvtfjoit5pfju6yxiab2fmwk0y/feuybobqk+axnke8xzjjhfyd8kkpl9zdoddkazd5j6bzpemjb64smjb6vb4xmehysu08lsrszopxftlzee130jcb0zjxy7r5wa2f1s2off2+dyatrughnrtpkuprlcpu55zlxpss/yqe2eamjkcf0jye8w8yas0paf6t0t2i9stmcua+inbi2rt01tz22tubbqwidypvgz6piynkpobirkxgu54ibzoti4pkw2i5ow9lnuaoabhuxfxqhvnrj6w15tb3furnbm+scyxobjhr5pmj5j/w5ix9wsa2tlwx9alpshlunzjgnrwvqbpwzjl9wes+ptyn+ypy/jgskavtl8j0hz1djdhzwtpjbbvpr1zj7jpg6ve7zxfngj75zee0vmp9qm2uvgu/9zdofq6r+g8l4xctvo+v+xdrfr8oxiwutycu0qgyf8icuyvp/sixfi9zxe11vp6mrjjovpmxm6acrtbia+wjr9bevlgjwlz5xd3rfna9g06qytaoofk8olxbxc7xby2evqjmmk6pjvvzxmpbnct6+036xp5vdbrnbdqph8brlfn/n/khnfumhf6z1v7h/80yieukkd5j0un82t9mynxzmk0s/bzn4tacdziszdhwrl8x5ako8qp1n1zn0k6w2em0km9zj1i4yt1pt3xiprw85jmc2m1ut2geum6y6es2fwx6c+wlrpykblopbuj5nnr2byygfy5opllv4+jmm7s6u+tvhywbnb0kv2lt5th4xipmiij+y1toiyo7bo0d+vzvovjkp6aoejsubhj3qrp3fjd/m23pay8h218ibvx3nicofvd1xi86+kh6nb/b+hgsjp5+qwpurzlir15np66vmdehh6tyazdm1k/5ejtuvurgcqux6yc+qw/sbsaj7lkt4x9qmtp7euk6zbdedyuzu6ptsu2eeu3rxcz06uf6g8wyuveznhkbzynajbb7r7cbmla+jbtrst0ow2v6ntkwv8svnwqnu5pa3oxfeexf93739p93chq/fv+jr8r0d9brhpcxr2w88bvqbr41j6wvrb+u5dzjpvx+veoaxwptzp/8cen+xbg==

Bardziej szczegółowo

Revenue Maximization. Sept. 25, 2018

Revenue Maximization. Sept. 25, 2018 Revenue Maximization Sept. 25, 2018 Goal So Far: Ideal Auctions Dominant-Strategy Incentive Compatible (DSIC) b i = v i is a dominant strategy u i 0 x is welfare-maximizing x and p run in polynomial time

Bardziej szczegółowo

Gradient Coding using the Stochastic Block Model

Gradient Coding using the Stochastic Block Model Gradient Coding using the Stochastic Block Model Zachary Charles (UW-Madison) Joint work with Dimitris Papailiopoulos (UW-Madison) aaacaxicbvdlssnafj3uv62vqbvbzwarxjsqikaboelgzux7gcaeywtsdp1mwsxeaepd+ctuxcji1r9w5984bbpq1gmxdufcy733bcmjutn2t1fawl5zxsuvvzy2t7z3zn29lkwyguktjywrnqbjwigntuuvi51uebqhjlsdwfxebz8qiwnc79uwjv6mepxgfcoljd88uiox0m1hvlnzwzgowymjn7tjyzertmvpareju5aqkndwzs83thawe64wq1j2httvxo6eopirccxnjekrhqae6wrkuuykl08/gmnjryqwsoqurubu/t2ro1jkyrzozhipvpz3juj/xjdt0ywxu55mina8wxrldkoetukairuekzbubgfb9a0q95fawonqkjoez/7lrdi6trzbcm7pqvwrio4yoarh4aq44bzuwq1ogcba4be8g1fwzjwzl8a78tfrlrnfzd74a+pzb2h+lzm=

Bardziej szczegółowo

SSW1.1, HFW Fry #20, Zeno #25 Benchmark: Qtr.1. Fry #65, Zeno #67. like

SSW1.1, HFW Fry #20, Zeno #25 Benchmark: Qtr.1. Fry #65, Zeno #67. like SSW1.1, HFW Fry #20, Zeno #25 Benchmark: Qtr.1 I SSW1.1, HFW Fry #65, Zeno #67 Benchmark: Qtr.1 like SSW1.2, HFW Fry #47, Zeno #59 Benchmark: Qtr.1 do SSW1.2, HFW Fry #5, Zeno #4 Benchmark: Qtr.1 to SSW1.2,

Bardziej szczegółowo

Proposal of thesis topic for mgr in. (MSE) programme in Telecommunications and Computer Science

Proposal of thesis topic for mgr in. (MSE) programme in Telecommunications and Computer Science Proposal of thesis topic for mgr in (MSE) programme 1 Topic: Monte Carlo Method used for a prognosis of a selected technological process 2 Supervisor: Dr in Małgorzata Langer 3 Auxiliary supervisor: 4

Bardziej szczegółowo

Verification in POMDPs

Verification in POMDPs Verification in OMDs rivacy, machine teaching and other belief-related problems Ufuk Topcu The University of Texas at Austin Slides originally prepared by Bo Wu. http://u-t-autonomous.info Ufuk Topcu rotecting

Bardziej szczegółowo

deep learning for NLP (5 lectures)

deep learning for NLP (5 lectures) TTIC 31210: Advanced Natural Language Processing Kevin Gimpel Spring 2019 Lecture 6: Finish Transformers; Sequence- to- Sequence Modeling and AJenKon 1 Roadmap intro (1 lecture) deep learning for NLP (5

Bardziej szczegółowo

TTIC 31210: Advanced Natural Language Processing. Kevin Gimpel Spring Lecture 9: Inference in Structured Prediction

TTIC 31210: Advanced Natural Language Processing. Kevin Gimpel Spring Lecture 9: Inference in Structured Prediction TTIC 31210: Advanced Natural Language Processing Kevin Gimpel Spring 2019 Lecture 9: Inference in Structured Prediction 1 intro (1 lecture) Roadmap deep learning for NLP (5 lectures) structured prediction

Bardziej szczegółowo

Zakopane, plan miasta: Skala ok. 1: = City map (Polish Edition)

Zakopane, plan miasta: Skala ok. 1: = City map (Polish Edition) Zakopane, plan miasta: Skala ok. 1:15 000 = City map (Polish Edition) Click here if your download doesn"t start automatically Zakopane, plan miasta: Skala ok. 1:15 000 = City map (Polish Edition) Zakopane,

Bardziej szczegółowo

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition)

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition) Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition) Robert Respondowski Click here if your download doesn"t start automatically Wojewodztwo Koszalinskie:

Bardziej szczegółowo

Previously on CSCI 4622

Previously on CSCI 4622 More Naïve Bayes aaace3icbvfba9rafj7ew423vr998obg2gpzkojyh4rcx3ys4lafzbjmjifdototmhoilml+hf/mn3+kl+jkdwtr64gbj+8yl2/ywklhsfircg/dvnp33s796mhdr4+fdj4+o3fvywvorkuqe5zzh0oanjakhwe1ra5zhaf5xvgvn35f62rlvtcyxpnm50awundy1hzwi46jbmgprbtrrvidrg4jre4g07kak+picee6xfgiwvfaltorirucni64eeigkqhpegbwaxglabftpyq4gjbls/hw2ci7tr2xj5ddfmfzwtazj6ubmyddgchbzpf88dmrktfonct6vazputos5zakunhfweow5ukcn+puq8m1ulm7kq+d154pokysx4zgxw4nwq6dw+rcozwnhbuu9et/tgld5cgslazuci1yh1q2ynca/u9ais0kukspulds3xxegvtyfycu8iwk1598e0z2xx/g6ef94ehbpo0d9ok9yiowsvfskh1ix2zcbpsdvaxgww7wj4zdn+he2hogm8xz9s+e7/4cuf/ata==

Bardziej szczegółowo

Neural Networks (The Machine-Learning Kind) BCS 247 March 2019

Neural Networks (The Machine-Learning Kind) BCS 247 March 2019 Neural Networks (The Machine-Learning Kind) BCS 247 March 2019 Neurons http://biomedicalengineering.yolasite.com/neurons.php Networks https://en.wikipedia.org/wiki/network_theory#/media/file:social_network_analysis_visualization.png

Bardziej szczegółowo

MaPlan Sp. z O.O. Click here if your download doesn"t start automatically

MaPlan Sp. z O.O. Click here if your download doesnt start automatically Mierzeja Wislana, mapa turystyczna 1:50 000: Mikoszewo, Jantar, Stegna, Sztutowo, Katy Rybackie, Przebrno, Krynica Morska, Piaski, Frombork =... = Carte touristique (Polish Edition) MaPlan Sp. z O.O Click

Bardziej szczegółowo

Karpacz, plan miasta 1:10 000: Panorama Karkonoszy, mapa szlakow turystycznych (Polish Edition)

Karpacz, plan miasta 1:10 000: Panorama Karkonoszy, mapa szlakow turystycznych (Polish Edition) Karpacz, plan miasta 1:10 000: Panorama Karkonoszy, mapa szlakow turystycznych (Polish Edition) J Krupski Click here if your download doesn"t start automatically Karpacz, plan miasta 1:10 000: Panorama

Bardziej szczegółowo

Tychy, plan miasta: Skala 1: (Polish Edition)

Tychy, plan miasta: Skala 1: (Polish Edition) Tychy, plan miasta: Skala 1:20 000 (Polish Edition) Poland) Przedsiebiorstwo Geodezyjno-Kartograficzne (Katowice Click here if your download doesn"t start automatically Tychy, plan miasta: Skala 1:20 000

Bardziej szczegółowo

Helena Boguta, klasa 8W, rok szkolny 2018/2019

Helena Boguta, klasa 8W, rok szkolny 2018/2019 Poniższy zbiór zadań został wykonany w ramach projektu Mazowiecki program stypendialny dla uczniów szczególnie uzdolnionych - najlepsza inwestycja w człowieka w roku szkolnym 2018/2019. Składają się na

Bardziej szczegółowo

Few-fermion thermometry

Few-fermion thermometry Few-fermion thermometry Phys. Rev. A 97, 063619 (2018) Tomasz Sowiński Institute of Physics of the Polish Academy of Sciences Co-authors: Marcin Płodzień Rafał Demkowicz-Dobrzański FEW-BODY PROBLEMS FewBody.ifpan.edu.pl

Bardziej szczegółowo

TTIC 31210: Advanced Natural Language Processing. Kevin Gimpel Spring Lecture 8: Structured PredicCon 2

TTIC 31210: Advanced Natural Language Processing. Kevin Gimpel Spring Lecture 8: Structured PredicCon 2 TTIC 31210: Advanced Natural Language Processing Kevin Gimpel Spring 2019 Lecture 8: Structured PredicCon 2 1 Roadmap intro (1 lecture) deep learning for NLP (5 lectures) structured predic+on (4 lectures)

Bardziej szczegółowo

Katowice, plan miasta: Skala 1: = City map = Stadtplan (Polish Edition)

Katowice, plan miasta: Skala 1: = City map = Stadtplan (Polish Edition) Katowice, plan miasta: Skala 1:20 000 = City map = Stadtplan (Polish Edition) Polskie Przedsiebiorstwo Wydawnictw Kartograficznych im. Eugeniusza Romera Click here if your download doesn"t start automatically

Bardziej szczegółowo

Convolution semigroups with linear Jacobi parameters

Convolution semigroups with linear Jacobi parameters Convolution semigroups with linear Jacobi parameters Michael Anshelevich; Wojciech Młotkowski Texas A&M University; University of Wrocław February 14, 2011 Jacobi parameters. µ = measure with finite moments,

Bardziej szczegółowo

Department of Electrical- and Information Technology. Dealing with stochastic processes

Department of Electrical- and Information Technology. Dealing with stochastic processes Dealing with stochastic processes The stochastic process (SP) Definition (in the following material): A stochastic process is random process that happens over time, i.e. the process is dynamic and changes

Bardziej szczegółowo

ARNOLD. EDUKACJA KULTURYSTY (POLSKA WERSJA JEZYKOWA) BY DOUGLAS KENT HALL

ARNOLD. EDUKACJA KULTURYSTY (POLSKA WERSJA JEZYKOWA) BY DOUGLAS KENT HALL Read Online and Download Ebook ARNOLD. EDUKACJA KULTURYSTY (POLSKA WERSJA JEZYKOWA) BY DOUGLAS KENT HALL DOWNLOAD EBOOK : ARNOLD. EDUKACJA KULTURYSTY (POLSKA WERSJA Click link bellow and free register

Bardziej szczegółowo

Towards Stability Analysis of Data Transport Mechanisms: a Fluid Model and an Application

Towards Stability Analysis of Data Transport Mechanisms: a Fluid Model and an Application Towards Stability Analysis of Data Transport Mechanisms: a Fluid Model and an Application Gayane Vardoyan *, C. V. Hollot, Don Towsley* * College of Information and Computer Sciences, Department of Electrical

Bardziej szczegółowo

MoA-Net: Self-supervised Motion Segmentation. Pia Bideau, Rakesh R Menon, Erik Learned-Miller

MoA-Net: Self-supervised Motion Segmentation. Pia Bideau, Rakesh R Menon, Erik Learned-Miller MoA-Net: Self-supervised Motion Segmentation Pia Bideau, Rakesh R Menon, Erik Learned-Miller University of Massachusetts Amherst College of Information and Computer Science Motion Segmentation P Bideau,

Bardziej szczegółowo

Miedzy legenda a historia: Szlakiem piastowskim z Poznania do Gniezna (Biblioteka Kroniki Wielkopolski) (Polish Edition)

Miedzy legenda a historia: Szlakiem piastowskim z Poznania do Gniezna (Biblioteka Kroniki Wielkopolski) (Polish Edition) Miedzy legenda a historia: Szlakiem piastowskim z Poznania do Gniezna (Biblioteka Kroniki Wielkopolski) (Polish Edition) Piotr Maluskiewicz Click here if your download doesn"t start automatically Miedzy

Bardziej szczegółowo

Maximum A Posteriori Chris Piech CS109, Stanford University

Maximum A Posteriori Chris Piech CS109, Stanford University Maximum A Posteriori Chris Piech CS109, Stanford University Previously in CS109 Game of Estimators Estimators Maximum Likelihood Non spoiler alert: this didn t happen in game of thrones aaab7nicbva9swnbej2lxzf+rs1tfomqm3anghywarvlcoydkjpsbfasjxt7x+6cei78cbslrwz9pxb+gzfjfzr4yodx3gwz84jecoou++0u1ty3nrek26wd3b39g/lhucveqwa8ywiz605adzdc8sykllytae6jqpj2ml6d+e0nro2i1qnoeu5hdkhekbhfk7u7j1lvne/75ypbc+cgq8tlsqvynprlr94gzmneftjjjel6boj+rjukjvm01esntygb0yhvwqpoxi2fzc+dkjordegya1skyvz9pzhryjhjfnjoiolilhsz8t+vm2j47wdcjslyxralwlqsjmnsdziqmjoue0so08lestiiasrqjlsyixjll6+s1kxnc2ve/wwlfpphuyqtoiuqehafdbidbjsbwrie4rxenmr5cd6dj0vrwclnjuepnm8fuskpig==

Bardziej szczegółowo

Rachunek lambda, zima

Rachunek lambda, zima Rachunek lambda, zima 2015-16 Wykład 2 12 października 2015 Tydzień temu: Własność Churcha-Rossera (CR) Jeśli a b i a c, to istnieje takie d, że b d i c d. Tydzień temu: Własność Churcha-Rossera (CR) Jeśli

Bardziej szczegółowo

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition)

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition) Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition) Robert Respondowski Click here if your download doesn"t start automatically Wojewodztwo Koszalinskie:

Bardziej szczegółowo

Inverse problems - Introduction - Probabilistic approach

Inverse problems - Introduction - Probabilistic approach Inverse problems - Introduction - Probabilistic approach Wojciech Dȩbski Instytut Geofizyki PAN debski@igf.edu.pl Wydział Fizyki UW, 13.10.2004 Wydział Fizyki UW Warszawa, 13.10.2004 (1) Plan of the talk

Bardziej szczegółowo

Rozpoznawanie twarzy metodą PCA Michał Bereta 1. Testowanie statystycznej istotności różnic między jakością klasyfikatorów

Rozpoznawanie twarzy metodą PCA Michał Bereta   1. Testowanie statystycznej istotności różnic między jakością klasyfikatorów Rozpoznawanie twarzy metodą PCA Michał Bereta www.michalbereta.pl 1. Testowanie statystycznej istotności różnic między jakością klasyfikatorów Wiemy, że możemy porównywad klasyfikatory np. za pomocą kroswalidacji.

Bardziej szczegółowo

Stargard Szczecinski i okolice (Polish Edition)

Stargard Szczecinski i okolice (Polish Edition) Stargard Szczecinski i okolice (Polish Edition) Janusz Leszek Jurkiewicz Click here if your download doesn"t start automatically Stargard Szczecinski i okolice (Polish Edition) Janusz Leszek Jurkiewicz

Bardziej szczegółowo

aforementioned device she also has to estimate the time when the patients need the infusion to be replaced and/or disconnected. Meanwhile, however, she must cope with many other tasks. If the department

Bardziej szczegółowo

OpenPoland.net API Documentation

OpenPoland.net API Documentation OpenPoland.net API Documentation Release 1.0 Michał Gryczka July 11, 2014 Contents 1 REST API tokens: 3 1.1 How to get a token............................................ 3 2 REST API : search for assets

Bardziej szczegółowo

Formularz dla osób planujących ubiegać się o przyjęcie na studia undergraduate (I stopnia) w USA na rok akademicki

Formularz dla osób planujących ubiegać się o przyjęcie na studia undergraduate (I stopnia) w USA na rok akademicki Formularz dla osób planujących ubiegać się o przyjęcie na studia undergraduate (I stopnia) w USA na rok akademicki 2017-2018 Zanim zaczniesz wypełniać formularz, zapoznaj się z Instrukcjami! Imię i nazwisko:

Bardziej szczegółowo

Blow-Up: Photographs in the Time of Tumult; Black and White Photography Festival Zakopane Warszawa 2002 / Powiekszenie: Fotografie w czasach zgielku

Blow-Up: Photographs in the Time of Tumult; Black and White Photography Festival Zakopane Warszawa 2002 / Powiekszenie: Fotografie w czasach zgielku Blow-Up: Photographs in the Time of Tumult; Black and White Photography Festival Zakopane Warszawa 2002 / Powiekszenie: Fotografie w czasach zgielku Juliusz and Maciej Zalewski eds. and A. D. Coleman et

Bardziej szczegółowo

Emilka szuka swojej gwiazdy / Emily Climbs (Emily, #2)

Emilka szuka swojej gwiazdy / Emily Climbs (Emily, #2) Emilka szuka swojej gwiazdy / Emily Climbs (Emily, #2) Click here if your download doesn"t start automatically Emilka szuka swojej gwiazdy / Emily Climbs (Emily, #2) Emilka szuka swojej gwiazdy / Emily

Bardziej szczegółowo

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition)

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition) Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition) Robert Respondowski Click here if your download doesn"t start automatically Wojewodztwo Koszalinskie:

Bardziej szczegółowo

18. Przydatne zwroty podczas egzaminu ustnego. 19. Mo liwe pytania egzaminatora i przyk³adowe odpowiedzi egzaminowanego

18. Przydatne zwroty podczas egzaminu ustnego. 19. Mo liwe pytania egzaminatora i przyk³adowe odpowiedzi egzaminowanego 18. Przydatne zwroty podczas egzaminu ustnego I m sorry, could you repeat that, please? - Przepraszam, czy mo na prosiæ o powtórzenie? I m sorry, I don t understand. - Przepraszam, nie rozumiem. Did you

Bardziej szczegółowo

The Lorenz System and Chaos in Nonlinear DEs

The Lorenz System and Chaos in Nonlinear DEs The Lorenz System and Chaos in Nonlinear DEs April 30, 2019 Math 333 p. 71 in Chaos: Making a New Science by James Gleick Adding a dimension adds new possible layers of complexity in the phase space of

Bardziej szczegółowo

Weronika Mysliwiec, klasa 8W, rok szkolny 2018/2019

Weronika Mysliwiec, klasa 8W, rok szkolny 2018/2019 Poniższy zbiór zadań został wykonany w ramach projektu Mazowiecki program stypendialny dla uczniów szczególnie uzdolnionych - najlepsza inwestycja w człowieka w roku szkolnym 2018/2019. Tresci zadań rozwiązanych

Bardziej szczegółowo

y = The Chain Rule Show all work. No calculator unless otherwise stated. If asked to Explain your answer, write in complete sentences.

y = The Chain Rule Show all work. No calculator unless otherwise stated. If asked to Explain your answer, write in complete sentences. The Chain Rule Show all work. No calculator unless otherwise stated. If asked to Eplain your answer, write in complete sentences. 1. Find the derivative of the functions y 7 (b) (a) ( ) y t 1 + t 1 (c)

Bardziej szczegółowo

TTIC 31190: Natural Language Processing

TTIC 31190: Natural Language Processing TTIC 31190: Natural Language Processing Kevin Gimpel Spring 2018 Lecture 17: Machine TranslaDon; SemanDcs Roadmap words, morphology, lexical semandcs text classificadon simple neural methods for NLP language

Bardziej szczegółowo

JĘZYK ANGIELSKI ĆWICZENIA ORAZ REPETYTORIUM GRAMATYCZNE

JĘZYK ANGIELSKI ĆWICZENIA ORAZ REPETYTORIUM GRAMATYCZNE MACIEJ MATASEK JĘZYK ANGIELSKI ĆWICZENIA ORAZ REPETYTORIUM GRAMATYCZNE 1 Copyright by Wydawnictwo HANDYBOOKS Poznań 2014 Wszelkie prawa zastrzeżone. Każda reprodukcja lub adaptacja całości bądź części

Bardziej szczegółowo

Selection of controller parameters Strojenie regulatorów

Selection of controller parameters Strojenie regulatorów Division of Metrology and Power Processes Automation Selection of controller parameters Strojenie regulatorów A-9 Automatics laboratory Laboratorium automatyki Developed by//opracował: mgr inż. Wojciech

Bardziej szczegółowo

Program doradczy EducationUSA - formularz zgłoszeniowy

Program doradczy EducationUSA - formularz zgłoszeniowy 1. Podstawowe informacje 1. Informacje kontaktowe Imię i nazwisko Adres korespondencyjny Miejscowość Kod pocztowy Login Skype Adres e-mail Numer telefonu 2. Jak dowiedziałeś/aś się o Programie doradczym

Bardziej szczegółowo

Bayesian graph convolutional neural networks

Bayesian graph convolutional neural networks Bayesian graph convolutional neural networks Mark Coates Collaborators: Soumyasundar Pal, Yingxue Zhang, Deniz Üstebay McGill University, Huawei Noah s Ark Lab February 13, 2019 Montreal 2 / 36 Introduction

Bardziej szczegółowo

Realizacja systemów wbudowanych (embeded systems) w strukturach PSoC (Programmable System on Chip)

Realizacja systemów wbudowanych (embeded systems) w strukturach PSoC (Programmable System on Chip) Realizacja systemów wbudowanych (embeded systems) w strukturach PSoC (Programmable System on Chip) Embeded systems Architektura układów PSoC (Cypress) Możliwości bloków cyfrowych i analogowych Narzędzia

Bardziej szczegółowo

Klasa 6: Ocenę dopuszczającą otrzymuje uczeń, który:

Klasa 6: Ocenę dopuszczającą otrzymuje uczeń, który: Klasa 6: Ocenę dopuszczającą otrzymuje uczeń, który: Częściowo poprawnie stosuje czas Present Simple. Opowiada o sobie i swojej rodzinie. Stosuje podstawowe słownictwo. Częściowo poprawnie stosuje czas

Bardziej szczegółowo

Label-Noise Robust Generative Adversarial Networks

Label-Noise Robust Generative Adversarial Networks Label-Noise Robust Generative Adversarial Networks Training data rcgan Noisy labeled Conditioned on clean labels Takuhiro Kaneko1 Yoshitaka Ushiku1 Tatsuya Harada1, 2 1The University of Tokyo 2RIKEN Talk

Bardziej szczegółowo

DOI: / /32/37

DOI: / /32/37 . 2015. 4 (32) 1:18 DOI: 10.17223/1998863 /32/37 -,,. - -. :,,,,., -, -.,.-.,.,.,. -., -,.,,., -, 70 80. (.,.,. ),, -,.,, -,, (1886 1980).,.,, (.,.,..), -, -,,,, ; -, - 346, -,.. :, -, -,,,,,.,,, -,,,

Bardziej szczegółowo

Strangeness in nuclei and neutron stars: many-body forces and the hyperon puzzle

Strangeness in nuclei and neutron stars: many-body forces and the hyperon puzzle Strangeness in nuclei and neutron stars: many-body forces and the hyperon puzzle Diego Lonardoni FRIB Theory Fellow In collaboration with: S. Gandolfi, LAL J. A. Carlson, LAL A. Lovato, AL & IF F. Pederiva,

Bardziej szczegółowo

www.irs.gov/form990. If "Yes," complete Schedule A Schedule B, Schedule of Contributors If "Yes," complete Schedule C, Part I If "Yes," complete Schedule C, Part II If "Yes," complete Schedule C, Part

Bardziej szczegółowo

Pielgrzymka do Ojczyzny: Przemowienia i homilie Ojca Swietego Jana Pawla II (Jan Pawel II-- pierwszy Polak na Stolicy Piotrowej) (Polish Edition)

Pielgrzymka do Ojczyzny: Przemowienia i homilie Ojca Swietego Jana Pawla II (Jan Pawel II-- pierwszy Polak na Stolicy Piotrowej) (Polish Edition) Pielgrzymka do Ojczyzny: Przemowienia i homilie Ojca Swietego Jana Pawla II (Jan Pawel II-- pierwszy Polak na Stolicy Piotrowej) (Polish Edition) Click here if your download doesn"t start automatically

Bardziej szczegółowo

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition)

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition) Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition) Robert Respondowski Click here if your download doesn"t start automatically Wojewodztwo Koszalinskie:

Bardziej szczegółowo

New Roads to Cryptopia. Amit Sahai. An NSF Frontier Center

New Roads to Cryptopia. Amit Sahai. An NSF Frontier Center New Roads to Cryptopia Amit Sahai An NSF Frontier Center OPACity Panel, May 19, 2019 New Roads to Cryptopia What about all this space? Cryptography = Hardness* PKE RSA MPC DDH ZK Signatures Factoring IBE

Bardziej szczegółowo

The Overview of Civilian Applications of Airborne SAR Systems

The Overview of Civilian Applications of Airborne SAR Systems The Overview of Civilian Applications of Airborne SAR Systems Maciej Smolarczyk, Piotr Samczyński Andrzej Gadoś, Maj Mordzonek Research and Development Department of PIT S.A. PART I WHAT DOES SAR MEAN?

Bardziej szczegółowo

Test sprawdzający znajomość języka angielskiego

Test sprawdzający znajomość języka angielskiego Test sprawdzający znajomość języka angielskiego Imię i Nazwisko Kandydata/Kandydatki Proszę wstawić X w pole zgodnie z prawdą: Brak znajomości języka angielskiego Znam j. angielski (Proszę wypełnić poniższy

Bardziej szczegółowo

ERASMUS + : Trail of extinct and active volcanoes, earthquakes through Europe. SURVEY TO STUDENTS.

ERASMUS + : Trail of extinct and active volcanoes, earthquakes through Europe. SURVEY TO STUDENTS. ERASMUS + : Trail of extinct and active volcanoes, earthquakes through Europe. SURVEY TO STUDENTS. Strona 1 1. Please give one answer. I am: Students involved in project 69% 18 Student not involved in

Bardziej szczegółowo

Export Markets Enterprise Florida Inc.

Export Markets Enterprise Florida Inc. Export Markets Enterprise Florida Inc. IV webinarium 5.03.2015 Współfinansowany ze środków Europejskiego Funduszu Rozwoju Regionalnego w ramach Programu Operacyjnego Współpracy Transgranicznej Republika

Bardziej szczegółowo

Wprowadzenie do programu RapidMiner, część 5 Michał Bereta

Wprowadzenie do programu RapidMiner, część 5 Michał Bereta Wprowadzenie do programu RapidMiner, część 5 Michał Bereta www.michalbereta.pl 1. Przekształcenia atrybutów (ang. attribute reduction / transformation, feature extraction). Zamiast wybierad częśd atrybutów

Bardziej szczegółowo

Relaxation of the Cosmological Constant

Relaxation of the Cosmological Constant Relaxation of the Cosmological Constant with Peter Graham and David E. Kaplan The Born Again Universe + Work in preparation + Work in progress aaab7nicdvbns8nafhypx7v+vt16wsycp5kioseifw8ekthwaepzbf7apztn2n0ipfrhepggifd/jzf/jzs2brudwbhm5rhvtzakro3rfjqlpewv1bxyemvjc2t7p7q719zjphi2wcisdr9qjyjlbblubn6ncmkccoweo6vc7zyg0jyrd2acoh/tgeqrz9ryqdo7sdgq9qs1t37m5ibu3v2qqvekpqyfmv3qry9mwbajnexqrbuemxp/qpxhtoc00ss0ppsn6ac7lkoao/yns3wn5mgqiykszz80zkz+n5jqwotxhnhktm1q//zy8s+vm5nowp9wmwygjzt/fgwcmitkt5oqk2rgjc2hthg7k2fdqigztqgklwfxkfmfte/qnuw3p7xgzvfhgq7gei7bg3nowdu0oqumrvaiz/dipm6t8+q8zamlp5jzhx9w3r8agjmpzw==

Bardziej szczegółowo

www.irs.gov/form990. If "Yes," complete Schedule A Schedule B, Schedule of Contributors If "Yes," complete Schedule C, Part I If "Yes," complete Schedule C, Part II If "Yes," complete Schedule C, Part

Bardziej szczegółowo

Ankiety Nowe funkcje! Pomoc magda.szewczyk@slo-wroc.pl. magda.szewczyk@slo-wroc.pl. Twoje konto Wyloguj. BIODIVERSITY OF RIVERS: Survey to students

Ankiety Nowe funkcje! Pomoc magda.szewczyk@slo-wroc.pl. magda.szewczyk@slo-wroc.pl. Twoje konto Wyloguj. BIODIVERSITY OF RIVERS: Survey to students Ankiety Nowe funkcje! Pomoc magda.szewczyk@slo-wroc.pl Back Twoje konto Wyloguj magda.szewczyk@slo-wroc.pl BIODIVERSITY OF RIVERS: Survey to students Tworzenie ankiety Udostępnianie Analiza (55) Wyniki

Bardziej szczegółowo

Dolny Slask 1: , mapa turystycznosamochodowa: Plan Wroclawia (Polish Edition)

Dolny Slask 1: , mapa turystycznosamochodowa: Plan Wroclawia (Polish Edition) Dolny Slask 1:300 000, mapa turystycznosamochodowa: Plan Wroclawia (Polish Edition) Click here if your download doesn"t start automatically Dolny Slask 1:300 000, mapa turystyczno-samochodowa: Plan Wroclawia

Bardziej szczegółowo

Nonlinear data assimilation for ocean applications

Nonlinear data assimilation for ocean applications Nonlinear data assimilation for ocean applications Peter Jan van Leeuwen, Manuel Pulido, Jacob Skauvold, Javier Amezcua, Polly Smith, Met Ades, Mengbin Zhu Comparing observations and models Ensure they

Bardziej szczegółowo

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition)

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition) Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition) Robert Respondowski Click here if your download doesn"t start automatically Wojewodztwo Koszalinskie:

Bardziej szczegółowo

Narzędzia wspomagające projektowanie UR SISO Design. step, bode, margin, rlocus lqr, lqreg kalman,...

Narzędzia wspomagające projektowanie UR SISO Design. step, bode, margin, rlocus lqr, lqreg kalman,... Narzędzia wspomagające projektowanie UR SISO Design Obiekt LTI (Linear Time-Invariant System) Linear Analysis Tools LTI Viewer step, impluse bode, nyquist pool/zero map... Matlab+Control+... Schemat pod

Bardziej szczegółowo

Steps to build a business Examples: Qualix Comergent

Steps to build a business Examples: Qualix Comergent How To Start a BUSINESS Agenda Steps to build a business Examples: Qualix Comergent 1 Idea The Idea is a Piece of a Company 4 2 The Idea is a Piece of a Company Investing_in_New_Ideas.wmv Finding_the_Problem_is_the_Hard_Part_Kevin

Bardziej szczegółowo

Domy inaczej pomyślane A different type of housing CEZARY SANKOWSKI

Domy inaczej pomyślane A different type of housing CEZARY SANKOWSKI Domy inaczej pomyślane A different type of housing CEZARY SANKOWSKI O tym, dlaczego warto budować pasywnie, komu budownictwo pasywne się opłaca, a kto się go boi, z architektem, Cezarym Sankowskim, rozmawia

Bardziej szczegółowo

Egzamin maturalny z języka angielskiego na poziomie dwujęzycznym Rozmowa wstępna (wyłącznie dla egzaminującego)

Egzamin maturalny z języka angielskiego na poziomie dwujęzycznym Rozmowa wstępna (wyłącznie dla egzaminującego) 112 Informator o egzaminie maturalnym z języka angielskiego od roku szkolnego 2014/2015 2.6.4. Część ustna. Przykładowe zestawy zadań Przykładowe pytania do rozmowy wstępnej Rozmowa wstępna (wyłącznie

Bardziej szczegółowo

Wroclaw, plan nowy: Nowe ulice, 1:22500, sygnalizacja swietlna, wysokosc wiaduktow : Debica = City plan (Polish Edition)

Wroclaw, plan nowy: Nowe ulice, 1:22500, sygnalizacja swietlna, wysokosc wiaduktow : Debica = City plan (Polish Edition) Wroclaw, plan nowy: Nowe ulice, 1:22500, sygnalizacja swietlna, wysokosc wiaduktow : Debica = City plan (Polish Edition) Wydawnictwo "Demart" s.c Click here if your download doesn"t start automatically

Bardziej szczegółowo

EGZAMIN MATURALNY Z JĘZYKA ANGIELSKIEGO POZIOM ROZSZERZONY MAJ 2010 CZĘŚĆ I. Czas pracy: 120 minut. Liczba punktów do uzyskania: 23 WPISUJE ZDAJĄCY

EGZAMIN MATURALNY Z JĘZYKA ANGIELSKIEGO POZIOM ROZSZERZONY MAJ 2010 CZĘŚĆ I. Czas pracy: 120 minut. Liczba punktów do uzyskania: 23 WPISUJE ZDAJĄCY Centralna Komisja Egzaminacyjna Arkusz zawiera informacje prawnie chronione do momentu rozpoczęcia egzaminu. Układ graficzny CKE 2010 KOD WPISUJE ZDAJĄCY PESEL Miejsce na naklejkę z kodem dysleksja EGZAMIN

Bardziej szczegółowo

Narzędzia wspomagające projektowanie - Matlab. PID Tunner. step, bode, margin, rlocus lqr, lqreg kalman,...

Narzędzia wspomagające projektowanie - Matlab. PID Tunner. step, bode, margin, rlocus lqr, lqreg kalman,... Narzędzia wspomagające projektowanie - Matlab Obiekt LTI (Linear Time-Invariant System) Schemat pod Simulinkiem SCDesign linearyzacja SCOptimization linearyzacja Linear Analysis Tools LTI Viewer step,

Bardziej szczegółowo

Miedzy legenda a historia: Szlakiem piastowskim z Poznania do Gniezna (Biblioteka Kroniki Wielkopolski) (Polish Edition)

Miedzy legenda a historia: Szlakiem piastowskim z Poznania do Gniezna (Biblioteka Kroniki Wielkopolski) (Polish Edition) Miedzy legenda a historia: Szlakiem piastowskim z Poznania do Gniezna (Biblioteka Kroniki Wielkopolski) (Polish Edition) Piotr Maluskiewicz Click here if your download doesn"t start automatically Miedzy

Bardziej szczegółowo

Marzec: food, advertising, shopping and services, verb patterns, adjectives and prepositions, complaints - writing

Marzec: food, advertising, shopping and services, verb patterns, adjectives and prepositions, complaints - writing Wymagania na podstawie Podstawy programowej kształcenia ogólnego dla szkoły podstawowej język obcy oraz polecanego podręcznika New Matura Success Intermediate * Cele z podstawy programowej: rozumienie

Bardziej szczegółowo

On the Use of Stochastic Optimization in Chemical and Process Engineering

On the Use of Stochastic Optimization in Chemical and Process Engineering Rzeszów University of Technology Faculty of Chemistry Department of Chemical and Process Engineering, Rzeszów,, Poland POLAND Warszawa Kraków Rzeszów On the Use of Stochastic Optimization in Chemical and

Bardziej szczegółowo

Zarządzanie sieciami telekomunikacyjnymi

Zarządzanie sieciami telekomunikacyjnymi SNMP Protocol The Simple Network Management Protocol (SNMP) is an application layer protocol that facilitates the exchange of management information between network devices. It is part of the Transmission

Bardziej szczegółowo

PSB dla masazystow. Praca Zbiorowa. Click here if your download doesn"t start automatically

PSB dla masazystow. Praca Zbiorowa. Click here if your download doesnt start automatically PSB dla masazystow Praca Zbiorowa Click here if your download doesn"t start automatically PSB dla masazystow Praca Zbiorowa PSB dla masazystow Praca Zbiorowa Podrecznik wydany w formie kieszonkowego przewodnika,

Bardziej szczegółowo

GRY EDUKACYJNE I ICH MOŻLIWOŚCI DZIĘKI INTERNETOWI DZIŚ I JUTRO. Internet Rzeczy w wyobraźni gracza komputerowego

GRY EDUKACYJNE I ICH MOŻLIWOŚCI DZIĘKI INTERNETOWI DZIŚ I JUTRO. Internet Rzeczy w wyobraźni gracza komputerowego GRY EDUKACYJNE I ICH MOŻLIWOŚCI DZIĘKI INTERNETOWI DZIŚ I JUTRO Internet Rzeczy w wyobraźni gracza komputerowego NAUKA PRZEZ ZABAWĘ Strategia nauczania: Planowe, Zorganizowane Lub zainicjowane przez nauczyciela

Bardziej szczegółowo

Jak zasada Pareto może pomóc Ci w nauce języków obcych?

Jak zasada Pareto może pomóc Ci w nauce języków obcych? Jak zasada Pareto może pomóc Ci w nauce języków obcych? Artykuł pobrano ze strony eioba.pl Pokazuje, jak zastosowanie zasady Pareto może usprawnić Twoją naukę angielskiego. Słynna zasada Pareto mówi o

Bardziej szczegółowo

Poland) Wydawnictwo "Gea" (Warsaw. Click here if your download doesn"t start automatically

Poland) Wydawnictwo Gea (Warsaw. Click here if your download doesnt start automatically Suwalski Park Krajobrazowy i okolice 1:50 000, mapa turystyczno-krajoznawcza =: Suwalki Landscape Park, tourist map = Suwalki Naturpark,... narodowe i krajobrazowe) (Polish Edition) Click here if your

Bardziej szczegółowo

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition)

Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition) Wojewodztwo Koszalinskie: Obiekty i walory krajoznawcze (Inwentaryzacja krajoznawcza Polski) (Polish Edition) Robert Respondowski Click here if your download doesn"t start automatically Wojewodztwo Koszalinskie:

Bardziej szczegółowo

METHOD 2 -DIAGNOSTIC OUTSIDE

METHOD 2 -DIAGNOSTIC OUTSIDE VW MOTOMETER BOSCH METHOD 1 - OBD 2 METHOD 2 -DIAGNOSTIC OUTSIDE AFTER OPERATION YOU MUST DISCONECT ACU OR REMOVE FUSE FOR RESTART ODOMETER PO ZROBIENIU LICZNIKA ZDJĄĆ KLEMĘ LUB WYJĄĆ 2 BEZPIECZNIKI OD

Bardziej szczegółowo

Zdecyduj: Czy to jest rzeczywiście prześladowanie? Czasem coś WYDAJE SIĘ złośliwe, ale wcale takie nie jest.

Zdecyduj: Czy to jest rzeczywiście prześladowanie? Czasem coś WYDAJE SIĘ złośliwe, ale wcale takie nie jest. Zdecyduj: Czy to jest rzeczywiście prześladowanie? Czasem coś WYDAJE SIĘ złośliwe, ale wcale takie nie jest. Miłe przezwiska? Nie wszystkie przezwiska są obraźliwe. Wiele przezwisk świadczy o tym, że osoba,

Bardziej szczegółowo

TEORIA CZASU PRESENT SIMPLE I PRESENT CONTINUOUS

TEORIA CZASU PRESENT SIMPLE I PRESENT CONTINUOUS TEORIA CZASU PRESENT SIMPLE I PRESENT CONTINUOUS Present Simple-czas teraźniejszy prosty Present Continuous-czas teraźniejszy ciągły UŻYWAMY: gdy mówimy o rutynie gdy mówimy o harmonogramach gdy mówimy

Bardziej szczegółowo

TEORIA CZASU FUTURE SIMPLE, PRESENT SIMPLE I CONTINOUS ODNOSZĄCYCH SIĘ DO PRZYSZŁOŚCI ORAZ WYRAŻEŃ BE GOING TO ORAZ BE TO DO SOMETHING

TEORIA CZASU FUTURE SIMPLE, PRESENT SIMPLE I CONTINOUS ODNOSZĄCYCH SIĘ DO PRZYSZŁOŚCI ORAZ WYRAŻEŃ BE GOING TO ORAZ BE TO DO SOMETHING TEORIA CZASU FUTURE SIMPLE, PRESENT SIMPLE I CONTINOUS ODNOSZĄCYCH SIĘ DO PRZYSZŁOŚCI ORAZ WYRAŻEŃ BE GOING TO ORAZ BE TO DO SOMETHING Future Simple-czas przyszły prosty Be going to- zamierzenia, plany

Bardziej szczegółowo

Assignment 3.1 (SA and LA)

Assignment 3.1 (SA and LA) AW 11 Name u J2T0v1V9g XKru_ttaF NS^oQfOtXwUagr\eu GLHLOC^.F C XAqldlO FrEiIg\hXtHsO xrcewsqeprtvheed`. Assignment 3.1 (SA and LA) Find the area of each. 1) 11 m 2) 6.5 m 5 km 2.9 km 3) 4) 4.3 ft 6.2 km

Bardziej szczegółowo

Agnostic Learning and VC dimension

Agnostic Learning and VC dimension Agnostic Learning and VC dimension Machine Learning Spring 2019 The slides are based on Vivek Srikumar s 1 This Lecture Agnostic Learning What if I cannot guarantee zero training error? Can we still get

Bardziej szczegółowo

Dolny Slask 1: , mapa turystycznosamochodowa: Plan Wroclawia (Polish Edition)

Dolny Slask 1: , mapa turystycznosamochodowa: Plan Wroclawia (Polish Edition) Dolny Slask 1:300 000, mapa turystycznosamochodowa: Plan Wroclawia (Polish Edition) Click here if your download doesn"t start automatically Dolny Slask 1:300 000, mapa turystyczno-samochodowa: Plan Wroclawia

Bardziej szczegółowo

Sargent Opens Sonairte Farmers' Market

Sargent Opens Sonairte Farmers' Market Sargent Opens Sonairte Farmers' Market 31 March, 2008 1V8VIZSV7EVKIRX8(1MRMWXIVSJ7XEXIEXXLI(ITEVXQIRXSJ%KVMGYPXYVI *MWLIVMIWERH*SSHTIVJSVQIHXLISJJMGMEPSTIRMRKSJXLI7SREMVXI*EVQIVW 1EVOIXMR0E]XS[R'S1IEXL

Bardziej szczegółowo

European Crime Prevention Award (ECPA) Annex I - new version 2014

European Crime Prevention Award (ECPA) Annex I - new version 2014 European Crime Prevention Award (ECPA) Annex I - new version 2014 Załącznik nr 1 General information (Informacje ogólne) 1. Please specify your country. (Kraj pochodzenia:) 2. Is this your country s ECPA

Bardziej szczegółowo

Hakin9 Spam Kings FREEDOMTECHNOLOGYSERVICES.CO.UK

Hakin9 Spam Kings FREEDOMTECHNOLOGYSERVICES.CO.UK Hakin9 Spam Kings FREEDOMTECHNOLOGYSERVICES.CO.UK Hi, I m an associate editor at Hakin9 magazine. I came across your blog and think you would make a great author, do you have anything you would like to

Bardziej szczegółowo

Lecture 20. Fraunhofer Diffraction - Transforms

Lecture 20. Fraunhofer Diffraction - Transforms Lecture 20 Fraunhofer Diffraction - Transforms Fraunhofer Diffraction U P ' &iku 0 2 zz ) e ik PS m A exp ' &iku 0 2 zz ) e ik PS exp ik 2z a [x 2 m % y 2 m ]. m A exp ik 2z a & x m ) 2 % (y 0 & y m )

Bardziej szczegółowo

Wymagania na podstawie Podstawy programowej kształcenia ogólnego dla szkoły podstawowej język obcy oraz polecanego podręcznika New Exam Challanges 4 *, wyd. Pearson Cele z podstawy programowej: rozumienie

Bardziej szczegółowo