Linear Classification and Logistic Regression Pascal Fua IC-CVLab 1
<latexit sha1_base64="qq3/volo5ub4xouhnmbrwbu7wn8=">aaagcxicbdtdbtmwfafwdgxlhk8orha31ibqycvkdgpshdqxtwotng2pxtvqujmok1qlky5xllzrnobbediegwcap4votk2kqkf+/y/tnphdschtadu/giv3vtea99cfma8fpx7ytlxx7ckns4sylo3doom7jguhj1hxchmy/irhrlgh67lxb5x3blis8jjqynmedqujiu5zsqqagrx+yjcfpcrydusshmzeluzsg7tttiew5khhcuzm5rv0gn1unw6zl3gbzlpr3liwncyr6aaqinx4wnc/rpg6ix5szd86agoftuu0g/krjxdarph62enthdey3zn/+mi5zknou2ap+tclvhob9sxhwvhaqketnde7geqjp21zvjsfrcnkfhtejoz23vq97elxjlpbtmxpl6qxtl1sgfv1ptpy/yq9mgacrzkgje0hjj2rq7vtywnishnnkzsqekucnlblrarlh8x8szxolrrxkb8n6o4kmo/e7siisnozcfvsedlol60a/j8nmul/gby8mmssrfr2it8lkyxr9dirxxngzthtbaejv Linear Classification y(x; w) = 1 if w T x + w 0 1 otherwise. 0, w = [w 1,...,w n ] w = [w 0,w 1,...,w n ] 2
<latexit sha1_base64="kygw32pos1dfqilziwku7emswic=">aaag8xicfztdbtmwfidtwcoifxtciw4sjqyx0jtubm6qjsy0nrexphbtvhev7titttijymdtlevbuepc8hk8bm+dk3baqccsnt3x9/neoucxtukhtof9qi3cur1yv7n01713/8hdr8srj09unkwmt1gcxmmhesvdefgwfjrknstlrnkqt+n5tsnbfzxvio6aeplwnisdsascew2m+ss/meudeeukfinoo3drzb3t8uu0ht4hhonorgmuy/qkjfoewthfoegjy6dwkeplukl4skbr5yik/lkgumkxxsytac+wppe4r0ihpetiny9nisarjqf3nily+0hlauay3yxczcp/asf95vvv <latexit sha1_base64="rrwoitkmjgdqscqkwhjxgrxzsra=">aaags3icfdtdbtm <latexit sha1_base64="kg9pconuuudtvxu+tjykjge44vm=">aaaeihicfznpb9mwgie9l <latexit sha1_base64="qrjfeiqscdqxz4srnutfjjul0wk=">aaaeihicfzpdbtmwfic9lcaop+vgcjcwhrjcucw7gdxnk5o46rhsu3aqq8pxndra7es20x 3 Distance to the Decision Boundary w T x kwk y(x) kwk w 0 kwk y(x) =w T x + w 0 y(x) kwk = wt x kwk + w 0 kwk < Signed distance to the decision boundary.
What about Data that is NOT Linearly Separable? 4 Map to a higher dimensional space in which it is!
What about Data that is NOT Linearly Separable? This is possible surprisingly often. This is the basis of many ML algorithms. 5
Polynomial Features 6
Wikipedia Dimension d Dimension D >> d Machine learning and data mining. https://en.wikipedia.org/wiki/machine_learning 7
8 Mapping to a Higher Dimension How to define f? How to find the hyperplane?
9 Deep Learning High dimensional feature vector
<latexit sha1_base64="/e1iuaiazzgzwfiryssh1hy0zlo=">aaagy3icfdtdbtmwgibhb7ayon62wreiywjc2hbmzu7gznjeqcymdqypxts1xeu4tmotdilbwvtfuq6uhlo4bi6a+8dtiucrxyk1+utntdq4aois4druaj+xlu/cxancw71fxxvw8nhj9y3nm53miri2tznudqoiwcilaxtuetbnfcmisfgnukxpvxpfloapbjljxvqcxjjhnbjjlwbrnt+r4sz24qejcgfv47fy17kyfhlfky9o8lzdv2j5wxggd7azykrfh6xv1xzrswo7gzcfttd8ob1srdzzw5tmgklde6j1z6tlpl8qzthnwfn1c80yqi9jzhp2leqw3s9m91biv3ylxfgq7esapfv994ycck0nircligaof226ejp1cho97xdczrlhkl5/ujqn2kr4ule45iprk0zsqkji9rtioiskugo3s1r1pzj7m4o17yu/z0wrk6rxhu9ulmi4tdcx+2+m020hl39co/0v1dy22ez9lktmenfzuvinsvcnwxaerams7bexbertiygm7u/fkhu1/jyizgh0q+chjh8bp3l8gpix4y3glcfbwnuod4b3ho8c7zp+dvx8ukelbqefca4qaa8cp8cp42eigtajoggekrpewgphh8 <latexit sha1_base64="chc5mer7qbbr5ec0tb1jgbqaxou=">aaahhnicfzrbb9mwfidtwegotw3e4owifqmxquonofeyaajmy5n2qaxrp6aqhmdjrcvofls3ovwx/glvvmkvwelbta7hory68ff5+hjcu3fihbttx6w1bw/xh5u3hleeph32/mxm1streq0stjo4cqok7sjbqspju1izknacemtckltc20bow0oscbrxkzmjszehgfofyirvu2+r9n1xsub5sivh9afjkhund6fbucqjyd6bauq4poyzhq8kejfqqxkqo559yfdrc/87ktnxlpxqvzarvb2gzn/1uztbeccqqxp7smljlsqu3wkcbhevykc5r8baa1cpakxoc0b9vpm0ib88kaaccjqkbmtwajheqqgvem8xvah4jpuub7wi91kkc7wyzeqf2v1mo5u79qwoypgewq7ihof5yckvlqkseo4t75n22tvctmt28yaz1gfbtjv7lntb668dl8idpja9mh+nbseym6jeuhyqjm5akbjhwxsqjgo5ykr006l8gbxtlr74uak+xelrutwjruyicxovmrderlk88s7wguj/uzelpb5iwvf0ih8q5rxnzxj4dlbnhku4owqugpsoqvjvwkhn+uluyhjyphjfxhnzo+rd6qakygicqcufzm4e3sdsph Perceptron Minimize: NX E(w) = (w T x n )t n n=1 Center the x n s so that w 0 = 0. Set w 1 to 0. Iteratively, pick a random index n. If x n is correctly classified, do nothing. Otherwise, w t+1 = w t + t n x n. 10
<latexit sha1_base64="tdes/naocamykpu3cy8uq6pgwka=">aaagxnicbdtbbtmwgafwflyyammbxjc4weygguau7alshdqxtwotng3urh11qrzhsa3ftpc4a6iob3hlbjwad4htvwxfjawojn//z3aayp4k5ply3v+te/exltspvh7ajx6vpllb33h6niv5slmxjngs9n2sszhl1lvcxaw/srkrfsx6/uv+471rlmy8kr1vtthqkejykfoi9nbo/q/2wcrlxa4ksvnsvq3tcgv7bdpfwpe4ybx2p/ub9bp91beowahwhex51aykrmhci9tdtud4svhxseb4kifb3tm+d27vcu0ru0iuwrtog2gm7hfev/jejvk65rlz9lsyywc+jr5jetrwji6w7851s7mbt12mvg3hcfsvhm5ctfhto43wn13hntvkdrx5z9oat7prxti3hcq0f0wqgpmsg3jura31fhsnmattngdsquglidhadyurlbtws9dso1d6jebhkuplkjqbvvtrezflpfb1uha1zhatgfyfdxivfhhwx Linear Classification y(x; w) = 1 if wt x 0, 1 otherwise. x = [1,x 1,...,x n ] 11
<latexit sha1_base64="pbzfmmjup+5namg/ftocyz22dqw=">aaagbhicfdrva9nahmdx23r11n+b+ksgcdiuoczorvdwyxcouchmhhxt6hxjcrmkx3jjvluskyevxlfju33om/a1eo0q+utvltdy4z7fpmlbe+sjsq7r+dkze+pmxo3w/o36nbv37j9ywhx4zlpccnkswzkztscttfqqw065rhzyi7koetkozjzh3j6xxqospxtdxpy0j1mvkcgdxzpdemeizpakozyuurymkbnh5ijl+2qfn6yu2zeuxfilrebldgg5sd4yhxqpzcmwtcbhweni3bmwzqlqmnui4dz2m43c9upunbkjroqssdln4ozhsuvhlgtpe+x4ksv63k+e1n+u/6sojlf/papk2tqhdnypuevbarstxmxdwkvve6vk88ljvfz+ufqk1gv0tgu0veyklwz9wivr/l6p6hpdhfmbw2cfph8wi/f8dt/l0ncxmdwscrnrflh5z4vz2mi6lltpn9bp/wutin02/r4m0enex52ubksq2wghgqdcqqp6nav Pacman Apprenticeship Examples are state s. Correct actions are those taken by experts. Feature vectors defined over pairs f(a,s). Score of a pairs taken to be w. f(a,s). Adjust w so that 8a, w (a,s) w (a, s) when a* is the correct action for state s. http://ai.berkeley.edu/project_overview.html 12
Perceptron Ambiguities Two different solutions among infinitely many. The perceptron has no way to favor one over the other. 13
Logistic Regression 14
<latexit sha1_base64="vsuhinusqzc9fjndsky+d1zfd1w=">aaaf7hicbdrnb9mwgadwd6xjljd1corissiniu3plnabtuzt2krnq2rxjqzutuo01minspy1wdrvwq1x5ttx46pgdbxw9mfsokf+/f0ijzjpipvzz/tzw7u3xt+4v/mg8fdr4ydbze2nl1mcp0j2rrzfaz+ztebkyk5vnpl9jjvm80j2+pvh5b0bmwyqnh1bjhko2dioualmxdeoeetzovamvfzqdsvnjyzflbrv7pp89oq+oz6ffum494y+ptorr99tr0vzlqmxysijeemyojamkokjbblci/lcuhnbp0kac8zvpgyxr93c0gr/lho1d7w9b9eoltrlyocs28voe/2zh8qi19jyebktdnpeyocls60skdu7n2cyyekajexalyzpmq3lxshn6uvxe9awtt1jlf30/juizdrlcs1dujm7yvat6vyfdxibvh2wyis5lubclrtmizoptpwgkpxcrourmeiv2ysve5yyyd13afhgtkwsnxmn4/oj+cabln61ca/lo/l8xy+bhym/ax6c/bt4kfio8a7ylvau8h7whvi+8d7yk+bxzhsrcqysdm3agxpkarhahgqgekbagijaiajj4gpke+at5eqbgeibdvwjj4hhyc1wizwhnio/ax6dfap8inwgfia8af5upwei6po/acgi8nx1bn0ga2cocagdh/pqnmqv3j24unzfa7v60/7owyflvbrjnpmxzje0yrtyqd6sc9ilgvyqbds2as26qx+tf6t/v4uu1zzjnhhq6j9+a633gd8=</latexit> <latexit sha1_base64="jd7jbjiqs498zhh4pun0yez8/pc=">aaafynicbdtratswfazgtuu2luu29moxg2fwbrsq9m7wy0eoxqsthcrnutguwzytuckylpzugl/abvd0e4y9weq0sb2fcqyh8/2wbgnoukhhro//2tl9mhg+fbb3fprif/ty1eud/rujq5lxkgmpy1lcdzci56evvvjzuxkqesmnyf248+mkl0bofglrgseklnkrcuata13fhrz5x/5mebgitsur2a67w8h3knwsujy3tfjj5off2lihprvm8nyuvyyxln3tbz+7mqekm7jzpgfrfxcd1mt06a7cepvuv3c0vbltq8qlfbvl07eu+t+bvzy7irurf5xloxs8kkukz7xxvbsxipizk2txufyk96wew9ksmus+zsjk+zpppwienlfy2s79uim6q5ksow3bnp8bp0n+dvwc+qxwc+qt4bpkifaq+rt4fpkm+az5lfbb56negoiertskwbpkddhdnqygkkjalofahgil4avks+bl5ekageabbvwh18a1cgvciq+av8hxwffi18dxyb+apycvgdfdtwac6upvgfhzxpv3ujcwcikcyxgyt60brkf/9odi5tnx4opvptkjb8l78pee5dp5qr6saxisrllyg/wc/b6+gb57hfq7o9vpdujagnp/agsf7k0=</latexit> sha1_base64="litsnoieonisodnkty4wotywuvy=">aaagahicbdrbb9mwfadwb7yywocn6y5pg6awosnlahixrdwntdo0phxtqkvkczzwwpxksbm2hhwervdz+cdccdoo5hpllz78+/slssv7acsl8rxfzr37a+unbxsp3uebj5883drevjbjnlhwpumuzh2fsbbxmhuvvxhrpxkjwo9yz7/q1n67yznksxyuipqnbrnhposukd012nyq9tmyxyvxtpcvrhjxxugpwttjowaifswaqkmnqsahqmdzdmroj6amrig77mguy5jlphhvuuut+7mwvarm0jrlzrpkm1h0rm34bnmc/b5ea8yu/b06rbopsnqfomwildo6d9rr6rwt2yataigtqm6zd/g6j4ho+mmsbh4kbzoq5rgrgbuyxchdd1tm3h7wagvp2/fma+yivsz20hkcjbbxvuagoblgsairkxlq9li1lemmoi3qhcwlswm9imm20gvmbjpdcn5wfbzqmwgesaz/syl57n0vjrfsfslxsuhurk5apfk/g+qqfdcsezzmisv08aawj0dvzn3wepcmuruvuia04/pdgu6i3molz9pfmzvsraiidwb7b9xag5a4fogflgdvtekhhh9afmt4kexhhh9bfm74uevdw7uw9wzvwd43vg/5pegx2t2vbdesxorgg+5btg2nlgebeqisqbgagdakja0fwz4xfgi550aawwfhula8mtyxxbmulm8nzy2/mfzg8qnhu8tnhs8slwwv6j+bercn/wkuroxpagdxygzoredhdhsq+jzqr949dnhxzr+t688e2ka7abc1uru9rr/qj3sguog6y+e788p5uf67sdpyxdxb95zlbfymgapx/a8yikrg</latexit> sha1_base64="pvvvsgkcrw7llx51ucw1dwzrc2g=">aaagc3icbdtdbtmwfafwzgxlhk8nlndztafqeuwtnybxm62axiztgtk6dtrv5thoay2os9hzg0iegvt4nh6ee5y2g7nguquj//4+sezkfhixqzrnx+7kvdw12v31b97dr4+fpn3yfhyhrzys2ieiemnpx5jglkydxvree0lkmfcj2vwv2pv3b2gqmyjpvz7qacejmiwmykwnhpuug3w6ynhbfoxsky09vfxqptarwrraxk4oycj0negrqhiw7ubmzaxqjbv4ix70oszpivpxpzfxkt9twctaoelsmdvvuke8pwzbn5groi/wbhdy4o/qaz1nlnavoddfpeiqplctqdy2bps0yrk0yqrnlqdrdac644tpud/hozwqzhgtghmixshdd5tp3h7wcgonuducdbcl1qlycrbjbli5+gufgmscxopewmp+q5moqyftxuhu7wamayljfr7rvi5jzkkcflozkuglngkgfkn+xqpms3dxfjhlmxnfjzlwy7ls1et/rj+p8mogyhgskrqt+ypclak9m9xbq8bsslsu6wktlol3btlgeo+vpk8pxxrcbody7wzyd8p+c1cg6if+wbyu5zifgn5o+zhhr5yfg35s+bnh55z3do9y3jw8a3np8j7ll4zfavewethiykudb7hvotgcwb4ericwamfobeirmdj8zpny8lhljbkbzgw44dxyybiwxbmulm8mzyy/mfzg8onhe8unhk8tzw3pqz+beecn/wier8xpcgd+ygzoredbdltl6jzqld89dnhxbrel68/nnb39xb207mw5207datnvnt3nk3pmdbzijtzv7g/359rv2lztu/zihl1xf2ueo8aovf0dl7rfpw==</latexit> sha1_base64="csak9axmy8ak8uohna7rwg/1ezy=">aaagc3icbdtdbtmwfafwblyywtcgl7s52gc1ckz2nybxm62axiztgtk6dtrv5thoay2os9hzg0iegvt4nh6ee5y2g7nguquj//4+sezkfhixqzrnx+69+yurtqdrd71hj588fba+8fxciiwltenejnkejywnwew7iqmi9pkuyu5htotftsvv3tbumhgfqzyha45hmqszwuppdtdcb/l0xokckcrzv1p6qkqgs2eisiiaif1rwfdoazaivcus2ohmybjugcvwfj3odyztfodvsi+vi3/agneacjkkyqqrpi54e9icbzajqb/hlsdkwd+h0zqbkkerqggkszfufabau61t3dzplhnplfwbhudxgq50xhftor6pcyohzsjamfiqjyo77zafup3g4fp2c6c5g2axruwx7szg2xbj5qskbmk4jrwjsjt9vjnrgwknipgo2sfm0gstkzyifv3gmfm5kgznvcirprnakfl9ixxmzu+ukdcxmue+tnksxnlzqsn/wt9t4ydbweikuzqm8wefwqr6n6udh4cllkgo1wumkdpvcmsm9r4rfz4eiumecm6x3hnkh5t95qba1up8sdgoyyu/npzq8ipdjyw/nvzy8npdzy3vgn6xvgt41/ke4t3llw2/1o4tjbcrwfyh33dfcmi4stwijebgbclqcirwygt4ypkx4wplgtmczapww7nlwnbhutjcwz4znll+y/in5rpdj5zpdz9anhuev38ci8bp/wuijort5q78xaycwig2gwix1w3uwr577ojid6el68+723v7i3tpzdl0tpy603leo3voj+fm6tjehbnf3r/uz9xftc3avu3lphrpxax54rij9u4pmfrfqq==</latexit> <latexit sha1_base64="htyjykboisytpkesmnuqqwv16qq=">aaafnnicbdtdbtmwfadwbywwylchl9xyteibyfo6g7gbtvrjbflhjrvrr1mqx3faa3esju7tkmpl8dtcwlvwnrhdjtg9sxtp5pz+thurshshmtw2/wdj896wdf/b9spao8dpnj6r7zy/sqms4alhoybkbi5lrsbd0dnsb2iqj4ipnxb996a98p5mjkmmwq4uyjfsbbjkx3kmtwtcf7u/3/zyph1b80rqqzvfrupo9+gh6psi/96l5pw+ofny3muo6w37wf4oiovwqmiq1bgy72x9c7yiz0qemgcstycto9ajkiva8kbunsdlrcz4dzuiosldpkq6kpefvdhxpunrp0rme2q67p4/o2qqtqvlmqriepqu26j5lw0z7b8fltkmmy1cfrurnwvur3rxrtstiea6kezbudkwysmfsorxbu6y5oqi55fslprkxz2uhvaodbabuh55xfvrfgl8bpkp8fpkz8dpkhebd5h3gpeq94h3kq+ad5bfa782xltlmjbgaauxuiuca+fipq8epbtwfrdwuwacfij8cnykxeoqkciggcvkefaiuqaukwfam+qz4dpkofac+rz4hhkbvfj8bccgzv8foavk8/uvvacgoijqhof2vznbqlv+9+di6vcgzerlw8brp9w9te1ekldkl7tio3jevpal0ioc/ca/ys/y26lw 15 Reintroducing Probabilities y(x) =w T x + w 0 > 0 assigns a label to each x but not a probability. We would like a soft version such that y(x) p(c 1 x), / p(x C 1)p(C 1 ) p(x). (Bayes rule) > We write y(x) =f(w T x + w 0 ). How should we choose f?
16 Brightness as a Feature Some algorithm brightness +1 (salmon) brightness -1 (sea bass)
<latexit sha1_base64="cqdm0wwhiibbu9wj1cr6p77p6ye=">aaahjhicfztdbtmwfidtwcoip9vgbokbiw20oalkdgm3kybgndzpy0jr2lfxlem4qbxydo7th4vi3mkd8dtcis644sl4ajyue7ges6l21n93to6d1ega0ex53s/a3i2b8/vbc7fdo3fv3v9cwn5wmolcytleihgyhacmjjstpqiqie1uesschlsc852ktwzezltwezvoszehmnoiyqt0vg/pn+sc8pbwbfboghcg+gqoisinpayzus7sulaaa6me31a9dtz7hq9gqj4afvf1vakolxaiuqggvpvbcybbsujprbcgjka8qfkicvngparuvwcl+aa8m+brcgqoyyjv6fwilw3pnajsk80qxyweh9os+oekcv81xnh1sayoq4qa1xrnbfmxbqmpm/larzdactzxg72lfa/htqawa38ardjtcdxbnn8eq4fzpvcjj7qjju+lqqs7ubqnphrhnpeu4xmuk44oowik6xatx1ocp3ombjfuihj6nyez/2yuigxzmaxa1l33s1lwtv7formkxnylytncey4vbhtlcvacvm8ahfqsrjkxdhcwvpckcb9jhjv+i1z4mui1shko675niurkyocfrdjmaftqtcvwo4queym/fhx0pzgjsdymn9cobkkwq5uc7pyfrhygcirdsnrdymkqc8aqfhfgsft2vo5uicacyfcmvmfxfypvw/za4acwpzh4icwbbm9avgxwlsxbbm9b/mzgz9ugzrjimjbvitb4yhfscgzxmdse0bkiybais4gnhlu8b/c+xsk1bgojzodm4slgwulk4mriucfziw8mprd40obdi48mprl42obj61/cjv4kgcxf0wwfdmgkh5awywo7 Brightness as a Feature Some algorithm brightness 1+1 (salmon) 0 (sea bass) Given the training set brightness {(x n,t n ) 1applenappleN } with tn = 1 t n = 0 if salmon, if seabass, estimate p(t = 1 or 0 x = b). 17
<latexit sha1_base64="szjxtdb8exyz4qzrc+0vnfad6mk=">aaahthicnztdbtmwfidtwugip9vghokbiwnu7k/jlocbsrnjgpu0uar17vrxlem4qbxycymztspyw1vyclwfttqour4twlkto/n9pqdxhdtrqbnhwt8r1qcpfx7vfh+bt54+e760vplipantgjmwdomw7jgoiqhlpcwocegniglitkdazuv+wdtxje5oym/eoci9hnxopyqrkkn+suuhdihpeyyc6vp13gzwrz/v2tcjotsnyl7bbdclec6a9tj1xeigmhqnfayaoycw0p/rnwdga9wm1oxyn4unde08szf0flbjrp/rcnn5ugeohcgoqm+hosf3flmzmtnhkbtsmzjlkvy2bqne/6/ojvnjhpom1vhdhis7sz3vl69z21y5gb7y02dnmi5mf2vhfbohthnhagcosbq2fylehmjbcubye6yjirc+rd7pypajrpjevp6whlyvgrd4ysx/xiaye3tfhlisjjkjtybeijlnrfiu1k2f97gxur6lgna8aeslarahki4ecglmsajgmka4pvk/ajxacjufpkam/ezks8tkrnb9gpeyitbezyck5y6ncvlsptwsovteymeijp4myncgtxk+r2glwu5wmniqz7dyacfjdvlcncenqxwyhurrhc5b3rv6u8erbzufkvxq40ckp9l4sckpnx6m8dontxte0nhb4w2ndxte0fifwi+kdzozkgigrykjcefjwofy466rck4mej4iejrgk9zx+edha41tqghue5jcmczdhycafwoxgk8vnmr8sufxgh8qfkjxkcjhgh8rfkx9jez0j4brkj3ov2anqncicfuqsc8fezna81enhpzvbnsy/raztvdpeq0ugq+nn0bdsi0pxp7xxwgalqnxflwxqq+qq7x3nvjdntjrq5xpmpegmmr8n+ovty8=</latexit> <latexit sha1_base64="zpvnvxpdokii0toosaq4gwnixb0=">aaagkxicfdtvt9nagafwa2xi/axyyvjmipemfbiae/ufcrejggexywxkt+d1et0u9nqmvy4ttf89/wf/b9/qa69bdt57keu2phk+37u2z7k6cabs02z+mju/dxuhdmfxbv3e/qcphy0tpz5noywrsiwiieo6lk9loelzmsoeshmnkms3kg33yqf09lamqyrcezoozu/zfqh8jbixrfolbyxvfc0bfj2u0s3k/isl3clo7tcxlmlr3njg6wvj9 Sigmoid Function P (t n =1 x n = b) = P (x n = b t n = 1)P (t n = 1) P (x n = b) P (x n = b t n = 1)P (t n = 1) = P (x n = b t n = 1)P (t n = 1) + P (x n = b t n = 0)P (t n = 0) 1 = s(a) = 1+ P (x n=b t n =0)P (t n =0) P (x n =b t n =1)P (t n =1) 1 1 + exp( a) = (a) s(a) Likelihood Prior with a = ln( P (x n = b t n = 0)P (t n = 0) P (x n = b t n = 1)P (t n = 1) ) 1 (a) = 1 + exp( a) @ @a = (1 ) a 18
<latexit sha1_base64="omroygomnyzcy7stf/ads2qsua8=">aaaggxicfdtva9nagmdxm3n1xl+bvhlfbifqio5kbxrlmjxjbrbzyv07mliul0t6lhcjl8vwevnv+b8ivtw/w2tx0afpxkdtw32+szudjsxturjp+750a/n2supo6l3n3v0hdx+trt8+lbjsm95hwzrpxkglngrfo0aylpdyzakmu94nz3en3r3guhczojgtna8ktzsibapglg3x2gfknsoe6ck7j+g71n12xwepagaxpqxqn8ddtr1+nfbda7wbv581wpfn636rbjnbyli24w16s8pfgz8fnsj8aa/xv54guczkyzvhks2kvu/lzlbrbqrlee0ezcfzys5pwvt2vftyyldnbr12x9ivyi0zbv/kulpvf8+oqcykiqxtkakzfys2xbzo+qwj3w4qoflscmwuviguu9dk7nqf3uhozkw6sqnlwtjf6rirtztm7g47wudu70xzi3vdzznx1gt6zrvqnug6ru29jcgr6xrtknsf0e7/cwur2gz2fkmiz4m8pqmcvbokpjsqhtvexttoopgly6skkqqcck/ue4n5em8c6pva95efad9afgj8epkj8bpkhead5f3gxeq94d3kz8dpphu0ufbquhsfehiinafnykmibbek4hgemqos4anyefarcifaifagguvkgfamuqfukjfas+qxwc+qxwk/rd4gpky+at5b/xj5/ddgnk2of68gj2bwhijdgozawd5m/cvhjx5otzz9o3/z2tj5mh+srpjn5dlpep+8itvke2mtdmhkg/lbfpjfjevgq+e1tq7sw0vzc54qcdte/wzwzlsb</latexit> <latexit sha1_base64="eaqva9z9oy6pf9vbuq+ypqyauey=">aaagqhicfdrbt9swfafwfebhmthgo027wendgazquwwxjeshgeiwtqk0qk4qx3fsi9ijhkdnlewd7rdvmqcuba9mwgr09h7/oimt2ksinupg41dtafnzsv356gv35dr6q9cbm29u0jhtlhvohmwq55gurvyyjuy6yr1emsk8ihw9u1bl3tftky/ltz4mbcbikhnakdgmndyyyi+fxbyk4qhck12cqrgnszkooef5/gnc2j0ojdo+qu0dpzrhzz+5uea7fad7yqynpbrkh8d2vgum2uuuztj/enbwy6tx0jgnzbfnebhlzed7ulnydvsxzqstmkyktfvnrqihbvga04ivls5slhb6r0lwn6ukgqwdyrzcjfpooj4kym 1 Parameter Model In theory: a = ln( P (x n = b t n = 0)P (t n = 0) P (x n = b t n = 1)P (t n = 1) ) In practice: These probabilities are hard to estimate. Introduce a parameterized model and choose the parameters that maximize its likelihood. Model: a x B y n = P (t n =1 x n = x) = (a) = (x B) 19
<latexit sha1_base64="hnh5li4qkqqwl+glkghelmsedzs=">aaagyhicfdrva9nahmdx23r11j/b9jh65haou7s2iiiimdbh3gbzwrp2nkvclpf0wo4s7i7bssjl8nx4vf+et30lxruk/vqbc7t8um83axolcfnuwtds/pybv3z9oxzj8wb91u07d5ewv+4d2awwxlr5lmamgziruqlf20mxim5ubfnhkjrhyebyo6fcwjnpqzfkrv+xrmtycub80md5fq1iw3gzbvvltffrsp/sd5qgtladtddg/u0ldqp9zk+9p436yhm12whodo <latexit sha1_base64="z7c6amlqww4cwmlfdro+e1vt9na=">aaags3icfdrva9nahmdx2zrzxn+dplkfbicwh4x0t/sjmdrl3gbzqrt2nlvclpf0wc4jl8u6egk+gp/qg/ef+dp8jj7wmlx0199cooxhfb5jmopgz2kra9f9vrr846a1cmv1tn3n7r37dxprd0/ytfcmd1kap6rv05zhiufdlxtm+5nivpox7/lnuzpvnxoviztp6gngh5jgiqgfo9osjrppvihd+rj Maximizing the Model Likelihood p(t =[t 1,...,t n ] x =[b 1,...,b n ]) = Y n y t n n (1 y n ) 1 t n ; (Naive Bayes) E(B) = ln(p(t x)), = X {t n ln y n +(1 t n )ln(1 y n )}. n de db = X n X (y n t n ). +1 (salmon) 0 (sea bass) B* s(x-b*) B = arg min E(B) B 20
<latexit sha1_base64="u8b5qlgetfg5o02gxzrcuffetgy=">aaagixicfdrva9nahmdx2zq649ttofgkoaqrgalp9kfqy9ygc0k7djrlxc6x9fgucxex/ihkdfjun+krexwgvhuvwuv//c0fwn7c55s0cftcihxa+p6pre07d51793ceua933uepn+ztnuu8viwpwz7mahxszvor8aerjuxjqneqw5spwqtu46m5v1rk2ccscj6vnmlelbg1dunscu/ap/tbw8ndzz0ckpvxue88dakclzjnhqvu60nhl8y0osoilvladurnc8quamindsyo5hpatfdzey/ssutfubkfzhjt6r9nvfrqvzkhlsu1m71pzejnniln/g5aiawodc/y9q/fzeqz3gse2oue4sykkztqpos9v4/nqklm2ffjbh+4frbf+/a6nwquqmnvqyqgkpf0wdtns4lxzxrbkli/oz3+f2qr2kz9viwrbsjvtqqgv1db8wbcsorvtesggv+wxeqarvuq9uqjp10hcrtapwj+hpwy+dhye+anyafab8ihwifir8bhymfax8gvgf80l2ijokcg6aoh8ba5a86qrxeiihtemqhifctae+qz4dpkqobaoeacl8hz4dlya9wgl4gxyofa58gxwbfil8cxyffav+hfik//boym1enmfwqfbn0udghqtyhdszuboycezt8cduz82sc75bl5tl6sdnll3pop5iwmcsmr+uk+ot+c787p6z13e2u9+e4tcdi/fgo8fzoh</latexit> sha1_base64="v+62nktsd0qly1tskucve3kostk=">aaahlxicpzrbt9swfidtdlawsqf7m/zigyykgijzy/acnmeqf4mbrglrxvwo46qwsvmldrsk8jp3sn+ylzlpyzyaiu2z1ojoff+olzm1n4x4qhznz63+ym7+zwphlf168c3bpewvxas0zhlkwjso4qtjkzrfxlkw4ipinwhcipai1vzu9krevmvjymn5qczd1hmkldzglcid6q/ujpzyygvoih7kzcie9yva30e45aegzdhw7gbc2mzbqmjua1ig3n8tsgu4trdtpu5sob1ultqsii4el8grbjapaj0umxbxv6/mmoyr30b/ontdal3v9dpzu1tgbajcz8makvg/q1ivdm9jjv37k52a/vkas+1ua5mbow3wrok476/mv8d+tdpbpkirsdou6wxvlyej4jrihy2zla0jvseh6+pqeshsxl51roe+6oypgjjrp6lqlx38rk5emo6fp01b1ccdzwxykdbnvpc1l3m5zbstddjrkevixahsm+tzhfevjxvaaml1whedeh2csjejjb8zvzeenei6z0owebunmzkmiw6fuah3fujpzfscyow9qko/ibohtvy9n1fepyinlrzvfzkut8dz8v2isg0s2r2nhsd642jvv+g6vakqvalkb4afgpwi8codhwn+bpblwc8n3gk8zfa24g2ddwdvgpwa8ovyggymagxivpaa9wxoaacg930g+iyqbeaidceepdt4apcbwtkhajceabgweax4bhafudj4bnhm8fvabw1+b/idwueajww+bnxs/eve6r+bkig/na0gtqbwygh7 sha1_base64="wivsrub7cdrumssvlzvvhf7cmmw=">aaahohicpzrbt9swfidtdlaw3wb7mvzigyy6nfcyl+0fcceqf4mbrglzxvwo46qwsvmldrsk8jp3sn+ylzlpyzyaiu2z1orofj/psv3x3jdiqxkcn7x6k7n5p42fz/bzfy9fvv5cenorxllcwyvguzx0pjkyievwulxfrdnmgbfexnre1u7j29cssxksz9v4yhqchjihnbklu/2lmsqec7nmscrduvby475eq5sipzwupdla3/6imlzxkbca+xowkpe3i1ibttnf6/cth9emkkutvurh98vgsdblvihuik2iq6txpyx+5wz0r0pxrdzvtz/p766dfabyp3drqod/rgk12r2nmfrvt3yc+osrzoztdwqg7jrysabjtl80/w77mc0ek4pgje27rjnuvzwkitoiftboujyk9iqerktdsqrle3l1mwr0qwd8fmsj/kifquz9gtkratownjyfuyn0lpxjh1g3u8hxxs7lmfnm0kmjiiuqilf5zzdpe0zvnnyboqnxa0v0qprxkn0zbfyn6b0k7ejxprmyhkg4wcsxsft1grv6byh+veapivzeijr6m6gvpnaq70cuusniysxhu0woyxpwvhy3kgwbs3zdyygi/ngxt1t0nd5uccob8j3a9wx+apibwq8bpzt4oednbm8b3jj4g/c2wtuadwx+cfhleuazbgegmsp4ghsgp4btg/s+ehxdcaigbiyqah4afad4wocca4ebggbcgdwgpda4alwzpam8m/g14ncgvwh8xuajwecghwm+nv4l4vipqemuh89weedqodke sha1_base64="4ny7wirajkcjqde6a1tejg4nlws=">aaahohicpzrbt9swfidtdlaw3wb7mvzigyykwlhdy/achgcii8rnorssrirhcvkl2kksb1pf+zl72g/zy5y0me7nkkzzanv0vs/n2k5rdxjyrdwbpyvvz3pzz2sll+yxr16/ebu49o4yidkyshanwijuucrhizespbgkwwcymylcklxd692ct29ynpbixqjxkpuecst3osvkp/plfyldfnczkzahcj23x32jvrcqtnggsh3u2fldgnvyjwnnpa1zlhk7eaeap+6gxspegtpcran1kujoyrkcynqrytf9sqlxdpwyh9gvmi3+dei65brk9jp9nqzyasr2c7+gev7pklbl3duyse/uzcegv7js3giwa5mbmw1wrok46y/nf8berfpbpkihszku0xyqxkzixwnichuncrssek0c1twhjiilvay8gtn6pdme8qnyf6rczfbhjiyijbklv5ucqeeyy4rky6ybkv9rl+nymcom6asrn4zirai4zsjjmamqhoua0jjrtsi6ipo4lb6mnv7g9f5idqzrng5ztfqur2eyxpr6jhk9twb/lqknrc7vrb39tdqxumvl9xokkbxxujlhvtzdxqm4brax57anjbulkrbe/7jy3cu7zd5u8esb8n3a9w1+cpihwy8apzl4beaxbm8b3jj4g/c2wtuadwx+bfhvcuazbgegmsq4glsgp4btg3seedxd8h0g+iyqab4yfad4wocca4ebggbcgdwcpdk4alwzpau8nfgn4dcgvwx81uajwecghwm+nv4l4uspqemyncxwemdqodae <latexit sha1_base64="8ofevy7+ai92vicmflv9gq1x5hw=">aaahgxicfztdttswfibtnjrw/zxtatqnntrugfqnn5uekbcsyidb2erpuv1vjuokfrftoq6lirix2dpsbtrtrvyge4w5pww7ncnsk5pzft7xt2pvgpfenxo/s3n37s6x7y3crzx4+ojxk+ri09mkthvllrphsep4jgerl6yluy5yz6gyev7e2t75bshbf0wlpjynejxkpufcyqnoitapfvux9ljizuyihsrvvikatz0vtiy20drcssr6euem6eiwstq299eo5hpouitfbvjq5m1tjuoi4wr+xmobjkrfi4qdrwjmn/pm38nr8hay6tzm53v3bdqt6plqcg+invmq3dyuql2xk+r8gdcmqhcdydk/nlm/utsonyyn2ye7dzacatvul84/x35mu8gkphfjkq7bgoperptmngj5bacjgxj6tklwnaekgiw9bliloxplmj4kymv+uqnj9t8egrfjmhaemqxrg2swfcmbwdfvwdtexuuw1uzsqxcfayr0jir9rt5xjopobajcftdjrxrazczo8xvu8dtm5qlyoan7ycgu0bfaztbroscxuzlbinek6dary2vrrp8te7poeta53qkiisjuvjjsvhpcridnzc08r1swzcmac0hmtmkvmxcbvakqtati9wdfs/g+4pswpwd8woingj9yvav4y+jtwnsw7wdesfgz4gffas0ybbjequab7 Computing the Derivatives We have: y n = (x B) dy n db = (x B)(1 (x B)) = y n (1 y n ) d ln(y n ) db = 1 dy n y n db d ln(1 y n ) db =(1 y n ) = 1 dy n 1 y n db = y n Therefore: E(B) = X n {t n ln y n +(1 t n )ln(1 y n )}. ) de db = X n {t n (y n 1) + (1 t n )y n }, = X n (y n t n ). 21
<latexit sha1_base64="cjq3woqyyfvhps76f18xrztgqtm=">aaagg3icfzrpb9mwfmcz0cii/zy4crfyh9qxvkkvikghawmsl40hdwunulso67br4irznlwryffgwlfhwggeocfx4ntgz9mom22w0jz598vl86tjj/tcifvw34xfw6xy7ttld8179x88fls88vgocmkgyseovib1hbqrz/xjixe5rzohi4g6hmk7jzukt88ii9zab/ekjd2krr47ddhicqq/urlx4fucpctihulj9yfhiadsxuderhrnuyhvcenf5k2m61+ymrrwjnyr9llfmxyaofqar8ddveaqzxxh7kapaukfthbakfihajrttgv1cmeo2llqeufjnqi/1vhxgbc03iowgdqbfri4gnckfkzxcyfzjfmcp9p4wyfpnd4p8kngpwweadwpnidu/xcw8st+faa6pwt7bwfhf3au4jcr6wty6ipgulkemmfvin0bfaky5howwxowks8ahwxhiakzvuw5tdytpdfmgxdgfjrz0kylmt9rsahgaywgkjnbvfs3uuiiwgkpsa7h3jvtvngyhyttsfohke+hcsyyaoojkqobnbi3qhuwhkzghbgqaird6awj6lurapdlakivhvzwxlozgyyksqkzjypjkesycbvttf6vvkmanget5nickkk1a+bmryvyu3xsqqblfu36vq1i9pdxryavdvam7dxynfjx0f/+dqcbjinxufyyo6hrwyhvyeocu9gjspw4irkeezqixrn6ijkoj7kzlqvrcmyahggtl89bnjv7hea0ihlqsfptlmieqcmrwdfmw1c94fphzimpz180jd3aa6aosjbwgchcs2sasoyniweei/k/chmcqiby80subkfnhm017a/n1a3tvb1lxlpjmve1boolsww8mw6mqwoxppe+lr6xfps/ll+vf5z/naulc/kztwxtlp/8a4bidsq=</latexit> Linear Model p(c p(x C 1 )p(c 1 ) 1 x) 1 = p(x C 1 )p(c 1 )+p(x C 2 )p(c 2 ), 1 = 1 + exp( a), = (a), where a = log( p(x C 1)p(C 1 ) p(x C 2 )p(c 2 ) ). > We write y(x) = (w T x + w 0 ) (Bayes rule)! 1! 0 22
<latexit sha1_base64="uw+e66n5cjpg/0ecbfeyqkcyxt8=">aaah1xicfztdbts2fiavr607dwub7wbabg6wbjbbj7b6s2hbgkjz1rvasg6i6xsma1aujrmvkywi/anod8vu9xj9qb7nknlew7aradnh5/t4eetrioqmlwowel/t+ezgzvvd25/7d7748u69+7tfvszzshi6jhmwy4silzrjgg4vuxm9kctfpmrokhpz3pdrnmqs5ejcrqo64tgvlgeek5oa7u68+y2xogyulmrmmjfcsrxsiw3bcir6aiqgn9uhoixeglh/ndwa6n2axyxkcvung5i59kafgtchbyuznlmfrtrlqtnlgaxeqwe1d1aekflwf6bo0ymf4bdzaspkhpsqq6l4ru05u19dcadnwiz1wochbrbqerwbuqxwdihh+vihz5jnktzbk1r2zdq0djmfmgvognab7vgvdiayevxtondu7jtppthj4wmqk970okg9y6zter5c0hrnur0mf2xbnb/+a3lbcri4yvchftavn3wdzta2ugq2vj196cmq4ivbnr2/nzgctapcinwee95mvjju3vwgxtmpobwkzlgsx+ggubonpwiko7wpqpiwmlzbkr2bugboy4luz1mn35tmdik5gekufltzqzm05mw54pexovaz8jprkh9j40olp000e0wlqcdrhziqa5vdczghzpisla1mgilkplcgmywxueyi++hxau5f0lnt94+csqxy+uajlfool7w5txt1m+htihnb0ut/j5ysnvr7+qmftwr/uklrsa1rswnrpe/q2veroausc47nk0xrst0etdzc0go2f2rxpw5/zvfndn9u8ecop7f4ucohfh86fgtxkcmvlh7h8fcwf9vs0dudwwz2kkqwjxxole4chsewedtcklhc4gipxvohzyw+czhjlsacgvucozy3eo5wzxhl8mrilcpnfp87fghxhcoxfl86fgxxlfmv4wcfbiizfxa9aj+1hvnholaf47p5l4bx35xu8plrywjipx/tpx6yeave9r71vvmcl/r+9b57v3svvkfhonc6yefnzlf31k27b7t/r9xozmbo1541uv/8c/dptzk=</latexit> 23 Logistic Regression For the training set {(x n,t n ) 1applenappleN } where t n 2 {0, 1}, wehave p(t w) = Y n y t n n {1 y n } 1 t n ; (Naive Bayes) E(w) = ln(p(t w)), X = {t n ln y n +(1 t n )ln(1 y n )} ; n re(w) = X n (y n t n )x n. > The objective function is a differentiable function of the parameter vector, which enables us to use standard optimization techniques.
Height and Weight 24
Addressing Perceptron Ambiguities Perceptron Logistic But. 25
Outliers Can Cause Problems Logistic regression tries to minimize the error-rate at training time. Can result in poor classification rates at test time. > Support Vector Machines. 26
From Binary to Multi-Class k classes. Naively using k (k-1)/2 binary classifiers results in ambiguities. 27
<latexit sha1_base64="fmrdnwx/mzztdtjx29lo1xjqvwa=">aaae1xichvjnb9naehvmomv8txdkmqkhsqfedjmahipk4ycekhxutkmyabterjml3t2wxjcxlm+ikxeu8c/4mfwb1q6rnh+oi1k7njdv3uzo+koqrdp1/1bsa9xrn5awbzq3bt+5e29l9f5hjgnfajpiukqwjymamkgbmumqtkakyu6h9mgfvs3xo1oqiibfgu5gtmnxx7caeaxnqltqlygf9plimaacfawzg3ipah9qqeicrsbx0eegoacuaqm4u3os7rco/mkgbapyx93hyye5j/auct91ag2nlpooepzrgy9j0cmqgvq9ymczgvl5oytwc6ajmpg5n+a1uavkknorfbfgryficuondxdkg9avaiufkyezawteum8wyawboxvwab2ctubomxp3kizc7eecdxxan1h/oe0bcgzipbyoq0chgucpm/lcl5muwabzcettomcu+vny5bngqb0jrqakmheiyp7t5hvnnhxere/ykxx/53pvrqy5dbcwmhe80lmzstvvrlb+oj4kmadcf5vs9tyr7qryaubcfjhiii4wgei+brtxye6jtlqsbaaptasxr5p5hiyiepmryh5fcfdnjsd6em1iexar1o518lktmjgknrxktciiq9as8v2hhlou6daxdiakmv6bdldcrjs1n/sm6jhizrf5gus3srbbsvgu4gdpk8tm8omp/njgmwl6kkhpfrhp4em5fdkft+bwzapp5hrgexcjbifpxpyp2psd67xzunxw3ib3cwttz7cc+bl10hpk1s3pemhtwo+tfatpevvyp+1f9u/qutwrfqt+p0u1kyxngtvl1r//acjshzu=</latexit> Linear Discriminant K classifiers of the form y k (x) =wk T x + w0 k. Decision boundary (w k w j )x +(wk 0 wj 0 ) > 0. Decision regions are convex. (w k w j )x A +(w 0 k w 0 j ) > 0, (w k w j )x B +(w 0 k w 0 j ) > 0, )8, if x = x A +(1 )x B, then (w k w j )x +(w 0 k w 0 j ) > 0. 28
<latexit sha1_base64="a96wcfnu70qndyxd60tloaedaxg=">aaadg3icbvfdb9mwfhuwbqn8dfaid1c0seotqqrcgpehmv6qxqqhrvunpkso67rebcckdk2i8rpgv/bvcnkkkqa/+oiec889utepouuvbf82dsw7u3fv7d3vpxj46pgt/v7tyztkeklhjojrmvfxsjmtdkyy4nqsjxqln9mrp/xy81ffazkysf6oiqyzgresbyxgpute/5fr0wwtjvnusb+06vxcgoj1aghhoe21vrddfibaugiirpssxugfb66fv4yjcp2vf3690h8oh1bjwwvwjh90+0kcvbp+ahllgyrwpmcffraarl9g76hwbhpo1umacw7wdbisfopqgupyvm5vu3r9gt20mwdd4gzaag3eubdv/htneckelaqjmhxswm1knchguhz0s5tgmir4qacasixooiubbvfwslfmta4gkgqa6r8djrzpwghfkwvwy3sbq4v/46azct7nsibjtffj1oocjnelqq8fc5zqonihasyj01mblhgcidjh0uurdeuiibdejetpqqk9k916ih+uk6ra4q9b/hvvbruolkb1dfytftxh8xafd/iixredaolsvkawl8+qsh/a2t5rf1yoho49dl6mbscnm5pvoefojtpadnqljtendi7gibgvjbpj1phs7pqh5sh8s5bugjuez6j1zkm/z4w <latexit sha1_base64="2et/ai+plmjfzmmmgmhzou9y5bu=">aaag3xicfdrnb9mwgafwdiwywtsgehcufhmtmhbqjig4ie2mawztxpdatvmdisdxum9xejlo2yjykrviygfgcnwfvg1o22l1pbbu6al//zyx3tr+ftnctfp/gks3lm82b63ctu/cvxf/weraw9m8ltgmhzzgke/5kccxtuhhubgtxsyjyn5muv7ftu3diee5tzo2kdpimhqlnkqyctxlrtueq59enklqtknku9oxyomtgihhdi29cwafjqnsqx8kvfpp7avvywkg7l9izlvhtelrb+cye+eolbwwaj Multi-Class Linear Classification K classifiers of the form y k (x) =w T k x + w0 k Assign x to class k if y k (x) >y j (x) for all j 6= k. k = arg max j w j T x 2 3 2 3 6 4 y 1. y K 7 5 = 6 4 w T 1. w T K k = arg max j 7 5 x y j 29
<latexit sha1_base64="a96wcfnu70qndyxd60tloaedaxg=">aaadg3icbvfdb9mwfhuwbqn8dfaid1c0seotqqrcgpehmv6qxqqhrvunpkso67rebcckdk2i8rpgv/bvcnkkkqa/+oiec889utepouuvbf82dsw7u3fv7d3vpxj46pgt/v7tyztkeklhjojrmvfxsjmtdkyy4nqsjxqln9mrp/xy81ffazkysf6oiqyzgresbyxgpute/5fr0wwtjvnusb+06vxcgoj1aghhoe21vrddfibaugiirpssxugfb66fv4yjcp2vf3690h8oh1bjwwvwjh90+0kcvbp+ahllgyrwpmcffraarl9g76hwbhpo1umacw7wdbisfopqgupyvm5vu3r9gt20mwdd4gzaag3eubdv/htneckelaqjmhxswm1knchguhz0s5tgmir4qacasixooiubbvfwslfmta4gkgqa6r8djrzpwghfkwvwy3sbq4v/46azct7nsibjtffj1oocjnelqq8fc5zqonihasyj01mblhgcidjh0uurdeuiibdejetpqqk9k916ih+uk6ra4q9b/hvvbruolkb1dfytftxh8xafd/iixredaolsvkawl8+qsh/a2t5rf1yoho49dl6mbscnm5pvoefojtpadnqljtendi7gibgvjbpj1phs7pqh5sh8s5bugjuez6j1zkm/z4w <latexit sha1_base64="2et/ai+plmjfzmmmgmhzou9y5bu=">aaag3xicfdrnb9mwgafwdiwywtsgehcufhmtmhbqjig4ie2mawztxpdatvmdisdxum9xejlo2yjykrviygfgcnwfvg1o22l1pbbu6al//zyx3tr+ftnctfp/gks3lm82b63ctu/cvxf/weraw9m8ltgmhzzgke/5kccxtuhhubgtxsyjyn5muv7ftu3diee5tzo2kdpimhqlnkqyctxlrtueq59enklqtknku9oxyomtgihhdi29cwafjqnsqx8kvfpp7avvywkg7l9izlvhtelrb+cye+eolbwwaj Multi-Class Logistic Regression K classifiers of the form y k (x) =w T k x + w0 k =s( ) Assign x to class k if y k (x) >y j (x) for all j 6= k. k = arg max j w j T x 2 3 2 3 6 4 y 1. y K 7 5 = 6 4 w T 1. w T K k = arg max j 7 5 x y j 30
<latexit sha1_base64="e+skd7wqfsnvy7n4rsahcgbbc+o=">aaahu3icfztrbtmwfedtsa2jmnjgbykxaxoohv3vjgeqjqsjuo0nbqypxyvqejmuk7qnnza4a6uqd+breiup4yfv4qun7wcuxyylvb3n+n46tmz5lg1ftfort3rrewu1v3a7cofu+r37g5spzkmvcjbpyc/1go6fqujstlqccpd0/iagzrmkby3qkw9fkcckhm+kqu96ddmc2hqjivpmzm4lwsshpcafoqocnn1jcgd4rcjq5gh8adcalab4cl7jc9obwjgazoixktkqgqtemiyyobzmddmczjy3icgyioh8juugz8tcy3p0qzkwbm+ajahcb2uayfc5yz3zezskqvmrg0qlus78d6v5592sdqfz1wgim+yj2cjlxxiaxukpkwdhwghyzllovq9kxmmlyql+rmjwqphohca7wbfhot+apb32qauace9fushmxna1us0g0ipapng25upm3fx5bpsejhjharsodlu1qi96mqoexs5jcjakiy/wcdmkk0oogal7cbabcxgim31ge4g8uabz9uqmglewndjlmgyjqbji0ur1rbsj+2uvptypbof41sioxca8kd4aoe8dgou7lqhcazx/feabknsu5anugg+ixetatmtd9z4jkpccnriiwgfoksi1obccrjejlf+kmvqfgfjhatnndqrlfha9esngesp0dlhw3eisqgfymsyey0jullqasbfamwt2jqj8uoghgj9s+jhgjxv+rpgmwpsabym8pfg2wtsa7yi8o/gpcv+y3qafaykg0ipycrc0jhwond7vk0jfe2xbewxncbtuahyg8ihgkvueqglm4uzjnsi9jqufc41hco80fqhwc42pft7w+ethe41pft7v3hj2+k/ayi1pfyuwe1u40ys6ktst9cytlz6cenc+v6nj+mpe9shr+am6zjw2toyiutnegafgw+pmabk49zx3lfc992p15+rv/fj+eayu5ezzhhrkyk//as/klsc=</latexit> 31 Multi-Class Logistic Regression p(c k x) = exp(a k ) Pj exp(a j) with a k = w T k x + w k, X X E(w 1,...,w K ) = t nk ln(y nk ), re(w) wj = X n n k (y nj t nj )x n. > The extension is to the multi-class case is natural.
Multi-Class Results 32