3.7 Novelty & Diversity Redundancy in returned results (e.g., near duplicates) has a negative effect on retrieval effectiveness No benefit in showing relevant yet redundant results Bernstein and Zobel [3] identified near duplicates in TREC GOV2; mean MAP dropped by 20% when treating them as irrelevant and increased by 16% when omitting them from results Novelty: How can we avoid redundancy in results? 92
Novelty & Diversity Ambiguity of query needs to be reflected in results to cater for the uncertainty about the user s information need Query ambiguity comes in different forms topic (e.g., jaguar, eclipse, defender, cookies) intent (e.g., java 10 download (transactional) vs. features (information)) time (e.g., olympic games 2012, 2016, 2020) Diversity: How well do results reflect query ambiguity? 93
Implicit vs. Explicit Diversification Implicit diversification methods do not represent query aspects explicitly and instead operate directly on document contents and their (dis)similarity Maximum Marginal Relevance [3] Explicit diversification methods represent query aspects explicitly (e.g., as categories, subqueries or key phrases) and consider which query aspects documents relate to IA-Diversify [4] PM-1 and PM-2 [5,6] 94
<latexit sha1_base64="y/xsgiisde2it553vpll7c00w3e=">aaaclxicdvdbattaef2pt9s9urfoq6gi+seyjjyuasllwdbs+lka0jojziuzrcbkyu2lu6ssi/qt/bv+qz+gj107ccvnoyzm4zyzs7un0dw3lk1/bognm7du39m627t3/8hdr/3ht46sagzdkvo1micfwky5xknjrsytbrbeuenxsxiz0o+/obfcys9uqtexuek+5wycp2b977srpdfrtwvepxkuy+hz11iwfrvw3kw0xrmlffomjdbtqjqacmpienhalrr463y5inaionu5nbr5st++w+jkdgx4t4dy6c2o4dwzg836g3s8m64qug6y8bqng8kzsq7dwf8nlrvrberhard2neu1y1swjrmaux5tlgpgc6jwffin/va8rvajdga5kxu4eptn23w0g2ilwtqlkk6ratzzjlkotxbq2l/m3fwgb7nujupjvlagbtk1pqhkuadxhycflslme8l/ysy1odxpbdcj7njdrufjucog+j842h1nhn/ag0wolniiw+qfeuvikpf9mihvysgzekz+bs+dyrchz8px4dvw3cvogfzupcubfx78dqznzry=</latexit> <latexit sha1_base64="y/xsgiisde2it553vpll7c00w3e=">aaaclxicdvdbattaef2pt9s9urfoq6gi+seyjjyuasllwdbs+lka0jojziuzrcbkyu2lu6ssi/qt/bv+qz+gj107ccvnoyzm4zyzs7un0dw3lk1/bognm7du39m627t3/8hdr/3ht46sagzdkvo1micfwky5xknjrsytbrbeuenxsxiz0o+/obfcys9uqtexuek+5wycp2b977srpdfrtwvepxkuy+hz11iwfrvw3kw0xrmlffomjdbtqjqacmpienhalrr463y5inaionu5nbr5st++w+jkdgx4t4dy6c2o4dwzg836g3s8m64qug6y8bqng8kzsq7dwf8nlrvrberhard2neu1y1swjrmaux5tlgpgc6jwffin/va8rvajdga5kxu4eptn23w0g2ilwtqlkk6ratzzjlkotxbq2l/m3fwgb7nujupjvlagbtk1pqhkuadxhycflslme8l/ysy1odxpbdcj7njdrufjucog+j842h1nhn/ag0wolniiw+qfeuvikpf9mihvysgzekz+bs+dyrchz8px4dvw3cvogfzupcubfx78dqznzry=</latexit> <latexit sha1_base64="y/xsgiisde2it553vpll7c00w3e=">aaaclxicdvdbattaef2pt9s9urfoq6gi+seyjjyuasllwdbs+lka0jojziuzrcbkyu2lu6ssi/qt/bv+qz+gj107ccvnoyzm4zyzs7un0dw3lk1/bognm7du39m627t3/8hdr/3ht46sagzdkvo1micfwky5xknjrsytbrbeuenxsxiz0o+/obfcys9uqtexuek+5wycp2b977srpdfrtwvepxkuy+hz11iwfrvw3kw0xrmlffomjdbtqjqacmpienhalrr463y5inaionu5nbr5st++w+jkdgx4t4dy6c2o4dwzg836g3s8m64qug6y8bqng8kzsq7dwf8nlrvrberhard2neu1y1swjrmaux5tlgpgc6jwffin/va8rvajdga5kxu4eptn23w0g2ilwtqlkk6ratzzjlkotxbq2l/m3fwgb7nujupjvlagbtk1pqhkuadxhycflslme8l/ysy1odxpbdcj7njdrufjucog+j842h1nhn/ag0wolniiw+qfeuvikpf9mihvysgzekz+bs+dyrchz8px4dvw3cvogfzupcubfx78dqznzry=</latexit> <latexit sha1_base64="gki4usoazpcibs7gxwssaexw1q8=">aaaclxicdvbdi9nafj3er7v+vx3wqzbgh5rcbpmsgvsilcjii7ii3v3ykevmcpsdmpmjmzeyjerp+o/8d/4ah53wlljxlwp3cm69z2zoxlfsuzp+d8jr12/cvlvzu3fn7r37d/ophx0701ibe2eqy09zcfhjjrosvofpbrfuxufjvni90k++onxs6m+0rhgqonrylgwqp2b9b7zrhder2ili2hcxovrutrxsyrvcdbgvce6xb960al7lty0wyfgnclsnvrd/2s1g0v4uz3ubqzef++m7jc5nv4b/dcig3ojbwz7tanyfpop9dfxrvzcn1z0dse0dzfo/egfeo1ctqmc5syytadqcjskq7hq8cvidwecjz5dw4g+ftiuahwsx27khq0e5abuodktsqtm3vpkvuggdb5o5mquc3p1ltvodast13rbq4bu1bjoj80elixytpzj5v0xmtsf8n1xsaeffokby48yedj0fz2ug0f/b8f448/jji8hhwsaohfaupwcxy9hldsjessm2yyl9dj4fwyaon4svwjfh29+jybdzecy2kvzwc4uazmi=</latexit> Maximum Marginal Relevance Carbonell and Goldstein [3] return the next document d as the one having maximum marginal relevance (MMR) given the set S of already-returned documents 3 4 arg max sim(q, d) (1 ) max d œs d Õ œs sim(dõ,d) with λ as a tunable parameter controlling relevance vs. novelty and sim as a similarity measure (e.g., cosine similarity between query and document vectors) 95
<latexit sha1_base64="2mdpahaddhqdmgcygz/ydcigh5u=">aaacynicdvblaxsxgjq3fbjui07bw3ty6oslrnc3fjplwdbllgwx1a/wlotw/myllvaqpa0xyn9efktuusxh9f557udd0ehwjwbme2gyyag2yxjv8a4epx7ytpms9fzfy1eh7apxyy1krwbebbnqmmenjbywmtqwmeofmgcmjln+benpzkfpkoqfzi0h4xhz0aul2dgpbacxkcqevfzx5x/1y13y1jiq7tuyqewywcj0i/+th0sl5qmd+zet/lpql2otnrt0z+0qulyzj7urtjth/zjcwh9ion49w87glaoxtnu38vyqkknhcmnaz6jqmsrizshhulxiuopejmdlmofqygkqsusqhixa79uofpidtmzd055pmdd6zbmhisdmts9mquqgz/qfx83ijlg0kkwbgjivpjyyadd2kgeg3khb93wqsie4p+mayqmxacdecd13tgq5eu478p9pxsf9ypefnzudk21pqineoq+oiyl0bq3qkrqiesloet2go/s7ce21vcpvztbqnxy7u/s9vpd/afhfvau=</latexit> <latexit sha1_base64="2mdpahaddhqdmgcygz/ydcigh5u=">aaacynicdvblaxsxgjq3fbjui07bw3ty6oslrnc3fjplwdbllgwx1a/wlotw/myllvaqpa0xyn9efktuusxh9f557udd0ehwjwbme2gyyag2yxjv8a4epx7ytpms9fzfy1eh7apxyy1krwbebbnqmmenjbywmtqwmeofmgcmjln+benpzkfpkoqfzi0h4xhz0aul2dgpbacxkcqevfzx5x/1y13y1jiq7tuyqewywcj0i/+th0sl5qmd+zet/lpql2otnrt0z+0qulyzj7urtjth/zjcwh9ion49w87glaoxtnu38vyqkknhcmnaz6jqmsrizshhulxiuopejmdlmofqygkqsusqhixa79uofpidtmzd055pmdd6zbmhisdmts9mquqgz/qfx83ijlg0kkwbgjivpjyyadd2kgeg3khb93wqsie4p+mayqmxacdecd13tgq5eu478p9pxsf9ypefnzudk21pqineoq+oiyl0bq3qkrqiesloet2go/s7ce21vcpvztbqnxy7u/s9vpd/afhfvau=</latexit> <latexit sha1_base64="2mdpahaddhqdmgcygz/ydcigh5u=">aaacynicdvblaxsxgjq3fbjui07bw3ty6oslrnc3fjplwdbllgwx1a/wlotw/myllvaqpa0xyn9efktuusxh9f557udd0ehwjwbme2gyyag2yxjv8a4epx7ytpms9fzfy1eh7apxyy1krwbebbnqmmenjbywmtqwmeofmgcmjln+benpzkfpkoqfzi0h4xhz0aul2dgpbacxkcqevfzx5x/1y13y1jiq7tuyqewywcj0i/+th0sl5qmd+zet/lpql2otnrt0z+0qulyzj7urtjth/zjcwh9ion49w87glaoxtnu38vyqkknhcmnaz6jqmsrizshhulxiuopejmdlmofqygkqsusqhixa79uofpidtmzd055pmdd6zbmhisdmts9mquqgz/qfx83ijlg0kkwbgjivpjyyadd2kgeg3khb93wqsie4p+mayqmxacdecd13tgq5eu478p9pxsf9ypefnzudk21pqineoq+oiyl0bq3qkrqiesloet2go/s7ce21vcpvztbqnxy7u/s9vpd/afhfvau=</latexit> <latexit sha1_base64="p1sdvl+az3gddasnappa/dsxnl0=">aaacynicdvdpixmxge1hxwvv3xz71mnglxvqz6yi9rjq8lixovl7azrdkem/dkotsuwyy5ywf55/hhdv7lhvpmmfa/er+b7vve+dvewyqk0yfq15dx4+ontcf9j4+uz5+uwzdtntolaepkqworyz1sbodlnddyofvib5xmcebd/v/fktke1f/snsjcqcb3k6pgqbj6xnnczs2ulpp5f+lr/rgqewlhgvkkklxwzwphv5b/xykrfk7cqpae5pyr+mfxrl0j23q+jmxrw+jltzcfudca//let9aoyddma4bd7hk0ekdrkhdgu9jejpeouvoyrb2ygldrktld7aeocss1cj3ydgyntu2hy0xxx0yquejkyludy7np2ihjubyzetymtwpv85btbdxnjcfgzy4ryk2mcqxdvbngplhg0+7ijuosh9sqcmg7gloczk6l6jzcpv86cd//9knuhhjn982xknd0xv0qv0cnvrhn6hebpgyzrfbh1b39ep9lp2zwt4la/9o+rvdjttdatv5s/xllux</latexit> Intent-Aware Selection Agrawal et al. [4] model query aspects as categories (e.g., from a topic taxonomy such as DMOZ) query q belongs to category c with probability P[c q] document d relevant to query q and category c with probability P[d q, c] Given a query q, a baseline retrieval result R, they aim to find a subset of documents S of size k that maximizes P[S q ]= ÿ A P[c q ] 1 Ÿ B (1 P[d q, c ]) c dœs 96
<latexit sha1_base64="2mdpahaddhqdmgcygz/ydcigh5u=">aaacynicdvblaxsxgjq3fbjui07bw3ty6oslrnc3fjplwdbllgwx1a/wlotw/myllvaqpa0xyn9efktuusxh9f557udd0ehwjwbme2gyyag2yxjv8a4epx7ytpms9fzfy1eh7apxyy1krwbebbnqmmenjbywmtqwmeofmgcmjln+benpzkfpkoqfzi0h4xhz0aul2dgpbacxkcqevfzx5x/1y13y1jiq7tuyqewywcj0i/+th0sl5qmd+zet/lpql2otnrt0z+0qulyzj7urtjth/zjcwh9ion49w87glaoxtnu38vyqkknhcmnaz6jqmsrizshhulxiuopejmdlmofqygkqsusqhixa79uofpidtmzd055pmdd6zbmhisdmts9mquqgz/qfx83ijlg0kkwbgjivpjyyadd2kgeg3khb93wqsie4p+mayqmxacdecd13tgq5eu478p9pxsf9ypefnzudk21pqineoq+oiyl0bq3qkrqiesloet2go/s7ce21vcpvztbqnxy7u/s9vpd/afhfvau=</latexit> <latexit sha1_base64="2mdpahaddhqdmgcygz/ydcigh5u=">aaacynicdvblaxsxgjq3fbjui07bw3ty6oslrnc3fjplwdbllgwx1a/wlotw/myllvaqpa0xyn9efktuusxh9f557udd0ehwjwbme2gyyag2yxjv8a4epx7ytpms9fzfy1eh7apxyy1krwbebbnqmmenjbywmtqwmeofmgcmjln+benpzkfpkoqfzi0h4xhz0aul2dgpbacxkcqevfzx5x/1y13y1jiq7tuyqewywcj0i/+th0sl5qmd+zet/lpql2otnrt0z+0qulyzj7urtjth/zjcwh9ion49w87glaoxtnu38vyqkknhcmnaz6jqmsrizshhulxiuopejmdlmofqygkqsusqhixa79uofpidtmzd055pmdd6zbmhisdmts9mquqgz/qfx83ijlg0kkwbgjivpjyyadd2kgeg3khb93wqsie4p+mayqmxacdecd13tgq5eu478p9pxsf9ypefnzudk21pqineoq+oiyl0bq3qkrqiesloet2go/s7ce21vcpvztbqnxy7u/s9vpd/afhfvau=</latexit> <latexit sha1_base64="2mdpahaddhqdmgcygz/ydcigh5u=">aaacynicdvblaxsxgjq3fbjui07bw3ty6oslrnc3fjplwdbllgwx1a/wlotw/myllvaqpa0xyn9efktuusxh9f557udd0ehwjwbme2gyyag2yxjv8a4epx7ytpms9fzfy1eh7apxyy1krwbebbnqmmenjbywmtqwmeofmgcmjln+benpzkfpkoqfzi0h4xhz0aul2dgpbacxkcqevfzx5x/1y13y1jiq7tuyqewywcj0i/+th0sl5qmd+zet/lpql2otnrt0z+0qulyzj7urtjth/zjcwh9ion49w87glaoxtnu38vyqkknhcmnaz6jqmsrizshhulxiuopejmdlmofqygkqsusqhixa79uofpidtmzd055pmdd6zbmhisdmts9mquqgz/qfx83ijlg0kkwbgjivpjyyadd2kgeg3khb93wqsie4p+mayqmxacdecd13tgq5eu478p9pxsf9ypefnzudk21pqineoq+oiyl0bq3qkrqiesloet2go/s7ce21vcpvztbqnxy7u/s9vpd/afhfvau=</latexit> <latexit sha1_base64="p1sdvl+az3gddasnappa/dsxnl0=">aaacynicdvdpixmxge1hxwvv3xz71mnglxvqz6yi9rjq8lixovl7azrdkem/dkotsuwyy5ywf55/hhdv7lhvpmmfa/er+b7vve+dvewyqk0yfq15dx4+ontcf9j4+uz5+uwzdtntolaepkqworyz1sbodlnddyofvib5xmcebd/v/fktke1f/snsjcqcb3k6pgqbj6xnnczs2ulpp5f+lr/rgqewlhgvkkklxwzwphv5b/xykrfk7cqpae5pyr+mfxrl0j23q+jmxrw+jltzcfudca//let9aoyddma4bd7hk0ekdrkhdgu9jejpeouvoyrb2ygldrktld7aeocss1cj3ydgyntu2hy0xxx0yquejkyludy7np2ihjubyzetymtwpv85btbdxnjcfgzy4ryk2mcqxdvbngplhg0+7ijuosh9sqcmg7gloczk6l6jzcpv86cd//9knuhhjn982xknd0xv0qv0cnvrhn6hebpgyzrfbh1b39ep9lp2zwt4la/9o+rvdjttdatv5s/xllux</latexit> Intent-Aware Selection What is the intuition behind this probability? A B P[S q ]= ÿ P[c q ] 1 Ÿ (1 P[d q, c ]) c dœs c{ Probability that d is relevant to q, {not Probability that no document S is relevant to q, c { in Probability that at least one document in S is relevant to q, c It is thus the probability that an average user finds at least one relevant result among the documents in S 97
Intent-Aware Selection Probability P[c q] can be estimated using query classification methods (e.g., Naïve Bayes) Probability P[d q, c] can be decomposed into probability P[c d] that document belongs to category query likelihood P[q d] that documents generates query Agrawal et al. [5] show that finding an optimal subset S is an NP-hard problem, but propose a greedy algorithm that gives an approximation guarantee 98
<latexit sha1_base64="7valo+qgwtclpku2wklqiyewvm8=">aaacthicdzdnahrbfivvt2km40/g6c6bwtle0onuijinmodgliqenssqbprqmjudyurpqhpxkpqzfbn3ltyyf3dptgqrnbgyg4ec+jjn3oi6trhc+sz7lvtw1m9t3n6807977/6drchd7wonf5bhhgmh7wlnhqqucok5f3hqlfjzczyp568v85opab3x6r1fgiwlbrsfcuz9tkrbqcgmdyxchra2vgvjk1iyq6dvmjkckxkdqudm7+bkoelmp7rh2hbpomydw96c+6fvyjin9rjlkzuqj7o7g44fq6fdavcjmgq2kkg8e9s5szwzvgzues4etv1i4dbqnqcnnthmuio2da1qid4uv+oiikp0zegawqkdlc4tzx3dlnsfr5q11nnpa/fp4362xwauzmkjyjhrmkqtf3tn5zxaehj6dplwjhrxty4v1oonvfjmtrtfbpuxnr8dkp/d8d4oj3z0yjjev+ojnmehnsau5pasxvagdmecdd7dv/gof8mx5gfyk/l9ndplrncewyp6g38ah2g2vw==</latexit> <latexit sha1_base64="7valo+qgwtclpku2wklqiyewvm8=">aaacthicdzdnahrbfivvt2km40/g6c6bwtle0onuijinmodgliqenssqbprqmjudyurpqhpxkpqzfbn3ltyyf3dptgqrnbgyg4ec+jjn3oi6trhc+sz7lvtw1m9t3n6807977/6drchd7wonf5bhhgmh7wlnhqqucok5f3hqlfjzczyp568v85opab3x6r1fgiwlbrsfcuz9tkrbqcgmdyxchra2vgvjk1iyq6dvmjkckxkdqudm7+bkoelmp7rh2hbpomydw96c+6fvyjin9rjlkzuqj7o7g44fq6fdavcjmgq2kkg8e9s5szwzvgzues4etv1i4dbqnqcnnthmuio2da1qid4uv+oiikp0zegawqkdlc4tzx3dlnsfr5q11nnpa/fp4362xwauzmkjyjhrmkqtf3tn5zxaehj6dplwjhrxty4v1oonvfjmtrtfbpuxnr8dkp/d8d4oj3z0yjjev+ojnmehnsau5pasxvagdmecdd7dv/gof8mx5gfyk/l9ndplrncewyp6g38ah2g2vw==</latexit> <latexit sha1_base64="7valo+qgwtclpku2wklqiyewvm8=">aaacthicdzdnahrbfivvt2km40/g6c6bwtle0onuijinmodgliqenssqbprqmjudyurpqhpxkpqzfbn3ltyyf3dptgqrnbgyg4ec+jjn3oi6trhc+sz7lvtw1m9t3n6807977/6drchd7wonf5bhhgmh7wlnhqqucok5f3hqlfjzczyp568v85opab3x6r1fgiwlbrsfcuz9tkrbqcgmdyxchra2vgvjk1iyq6dvmjkckxkdqudm7+bkoelmp7rh2hbpomydw96c+6fvyjin9rjlkzuqj7o7g44fq6fdavcjmgq2kkg8e9s5szwzvgzues4etv1i4dbqnqcnnthmuio2da1qid4uv+oiikp0zegawqkdlc4tzx3dlnsfr5q11nnpa/fp4362xwauzmkjyjhrmkqtf3tn5zxaehj6dplwjhrxty4v1oonvfjmtrtfbpuxnr8dkp/d8d4oj3z0yjjev+ojnmehnsau5pasxvagdmecdd7dv/gof8mx5gfyk/l9ndplrncewyp6g38ah2g2vw==</latexit> <latexit sha1_base64="ckqnqxaret8tv8cf08+qplyjwbi=">aaacthicdzbpsxwxgmyz2/qn26rbeuwlubcldwdgbl0ubc/1ifjavcezhkz23tfs/plkxsxmz/kbeovbi36bhnstoxhchlbxizafz/o+gtyl5sy6jlmjwm/ezs0vll5rv/+wtlzs+fjpykqxodcniitzuhilnenoo+y4nggdrjqcjsvr3mn+fahgmiv/uymgxjbksigjxawr6oxnvbufsagwrf3pgn/dmtzqupgbzpjewck4dn16ijdwm3te+0gdfw2ynmxydea+fj1u0ttmhovfqtpr7qslpjoson+ygajjadjrtqw9trptck+my5rd3c7gfjshi1lbkuk00wbyx4es4mxkng4oiqcb+6armdatye1elc9mqdzzrfkqnxkktm8ed8od3dopxw4kdvmdpu7b0gs8ksgew+kdsvzoyiq/2zgtb5exinqo2wtyt0m9/zvar8przi8n/goru7szlworfuzrab2labvtou/oepurrvfon7pfd9f19de6jx6erlvrdgcvzag1/w+krryd</latexit> <latexit sha1_base64="crxpajnoccayqjn3omm/fooao08=">aaacm3icdzdnsgmxfixv+g/9qz87n8vuxehnkoiuc27cciq2fzxhymrrdz1mypirs5jx8e3cudv3ehfiut/bdnrffq+bfjytg7gnksntjgievbhxicmp6znzytz8wujsdxmlo0wuklapsiu6s4jglgxynsykecyvep6k2e36+8o8e4nkm5gdmoheijnexi4zjczzctuidc5js4taskwyyya9gi3ssrfulcz1ys9+ma45rtadxnywvo0vnbvlhdrba1dqkk6+hxec5hwzq1oi9xkzkcayrblguywqya5retonptwngsqsvwr7kdganrinhwaeo45sufhiaanxesctpyyn5mrutitog5lox5+by73iskzmbjpqshkt39aupr+fohkh+ycdp5bocdtppyugb31oqbk64bcouhp+oqj9d53trtpx8u69tffve8zaomzajjrhf1pwaefqbgp38acp8otdey/eq/f29xtm+55zhrf5h59ho67q</latexit> <latexit sha1_base64="crxpajnoccayqjn3omm/fooao08=">aaacm3icdzdnsgmxfixv+g/9qz87n8vuxehnkoiuc27cciq2fzxhymrrdz1mypirs5jx8e3cudv3ehfiut/bdnrffq+bfjytg7gnksntjgievbhxicmp6znzytz8wujsdxmlo0wuklapsiu6s4jglgxynsykecyvep6k2e36+8o8e4nkm5gdmoheijnexi4zjczzctuidc5js4taskwyyya9gi3ssrfulcz1ys9+ma45rtadxnywvo0vnbvlhdrba1dqkk6+hxec5hwzq1oi9xkzkcayrblguywqya5retonptwngsqsvwr7kdganrinhwaeo45sufhiaanxesctpyyn5mrutitog5lox5+by73iskzmbjpqshkt39aupr+fohkh+ycdp5bocdtppyugb31oqbk64bcouhp+oqj9d53trtpx8u69tffve8zaomzajjrhf1pwaefqbgp38acp8otdey/eq/f29xtm+55zhrf5h59ho67q</latexit> <latexit sha1_base64="crxpajnoccayqjn3omm/fooao08=">aaacm3icdzdnsgmxfixv+g/9qz87n8vuxehnkoiuc27cciq2fzxhymrrdz1mypirs5jx8e3cudv3ehfiut/bdnrffq+bfjytg7gnksntjgievbhxicmp6znzytz8wujsdxmlo0wuklapsiu6s4jglgxynsykecyvep6k2e36+8o8e4nkm5gdmoheijnexi4zjczzctuidc5js4taskwyyya9gi3ssrfulcz1ys9+ma45rtadxnywvo0vnbvlhdrba1dqkk6+hxec5hwzq1oi9xkzkcayrblguywqya5retonptwngsqsvwr7kdganrinhwaeo45sufhiaanxesctpyyn5mrutitog5lox5+by73iskzmbjpqshkt39aupr+fohkh+ycdp5bocdtppyugb31oqbk64bcouhp+oqj9d53trtpx8u69tffve8zaomzajjrhf1pwaefqbgp38acp8otdey/eq/f29xtm+55zhrf5h59ho67q</latexit> <latexit sha1_base64="dlbkm10jpvzgyutirhf2glezw+y=">aaacm3icdzdlsgmxgiuz9vbrrerstbebf9kzfseuc27ccbxtbtrdken/1tdjjcyzsyr5dd/envt9b3enlvudtgtd1oihki9z8gf+e4myku15r05uaxllds2/xtjy3nreke7utrvpjyew4tgx3qgrigkclu11df0habmohk40opvkntuqivlkwo8fbawpezqgbgtrhuxpvykldclkphhs+akmsyqzv5l/pdvum9p/ztlhsfj2kjvvotiivcvt2yujmzph8dpvc5iysdsjsvk9qid0yldulmsqffxugcbkhifqw57aamrghsazadmejy0mmiekzhtxudbgptsyrqsmw/pm3ow4h2kcqt+f60e9mdqrqyae2gykxm0pw587ikdaq92lsrska9idlbtjdfcuw0ryvbgyfww9vx2u/od2rvk1fhlsbtrnrexratper6iktlednammaigchtatekyvzqpz5rw7hz9pc85szh/nyfn6bsyirnw=</latexit> Intent-Aware Selection (IA-SELECT) Greedy algorithm iteratively builds up the set S by selecting in each round the document with highest marginal utility ÿ P[ c S ]P[q d ]P[c d ] c with P[ c S] as the probability that none of the documents already in S is relevant to query q and category c P[ c S ]= Ÿ dœs (1 P[q d ]P[c d ]) which is initialized as P[c q] 99
Intent-Aware Selection One problem with IA-SELECT is that its results degenerate: it does not select more results for a category c, once it has been fully satisfied by a single highly relevant result, which is not effective for information needs that require more than a single document 100
Diversity by Proportionality Dang and Croft [5,6] develop the proportionality-based explicit diversification methods PM-1 and PM-2 Given a query q and a baseline result R, their objective is to find a subset of documents S of size k, so that S proportionally represents the query aspects Example: Query jaguar refers to aspect car with 75% probability and to query aspect cat with 25% probability S 1 more proportional than S 2 more proportional than S 3 101
<latexit sha1_base64="u516yjnpoztsbcgayfhwe64qyw8=">aaacgxicdzdnsgmxfixv+g/9qz+4cvpsrla6m0xqzcgng6gcvceow514w0mnk5ckxtl2sdy51zdwj25d+q4+hgnvrrupgxyckxu4j1epnzyi3r2jyanpmdm5+clc4tlysnf17dzirmbuydkv+jjbqynpqgg5telsaukrphsrdi6g+uwptoeyo7n9rzhadszbnkf1vlzcbly0srwx80febe6zmjd2s+eglpadsjuyqvqxwsrodsq1driphhc/mtesdqvllqvozfuykbvlqc1nkq0kza4hhaydbbrcqkeihevtkoks7o/hdjmuzkj8tn5ymkmwpi+sp6zaeznujlj2lcbm1+e2drjlpfndsxlz2qhzv2fcsx4nie0o90/6fqyc4xyyfoqwbn2btettctgouhp+oij9d+fvsuj4dl9co/zqcezgc7zhb0i4gbocqx0awoaohuarnrx779l78v6/nk543zprmcbv7rni46it</latexit> <latexit sha1_base64="u516yjnpoztsbcgayfhwe64qyw8=">aaacgxicdzdnsgmxfixv+g/9qz+4cvpsrla6m0xqzcgng6gcvceow514w0mnk5ckxtl2sdy51zdwj25d+q4+hgnvrrupgxyckxu4j1epnzyi3r2jyanpmdm5+clc4tlysnf17dzirmbuydkv+jjbqynpqgg5telsaukrphsrdi6g+uwptoeyo7n9rzhadszbnkf1vlzcbly0srwx80febe6zmjd2s+eglpadsjuyqvqxwsrodsq1driphhc/mtesdqvllqvozfuykbvlqc1nkq0kza4hhaydbbrcqkeihevtkoks7o/hdjmuzkj8tn5ymkmwpi+sp6zaeznujlj2lcbm1+e2drjlpfndsxlz2qhzv2fcsx4nie0o90/6fqyc4xyyfoqwbn2btettctgouhp+oij9d+fvsuj4dl9co/zqcezgc7zhb0i4gbocqx0awoaohuarnrx779l78v6/nk543zprmcbv7rni46it</latexit> <latexit sha1_base64="u516yjnpoztsbcgayfhwe64qyw8=">aaacgxicdzdnsgmxfixv+g/9qz+4cvpsrla6m0xqzcgng6gcvceow514w0mnk5ckxtl2sdy51zdwj25d+q4+hgnvrrupgxyckxu4j1epnzyi3r2jyanpmdm5+clc4tlysnf17dzirmbuydkv+jjbqynpqgg5telsaukrphsrdi6g+uwptoeyo7n9rzhadszbnkf1vlzcbly0srwx80febe6zmjd2s+eglpadsjuyqvqxwsrodsq1driphhc/mtesdqvllqvozfuykbvlqc1nkq0kza4hhaydbbrcqkeihevtkoks7o/hdjmuzkj8tn5ymkmwpi+sp6zaeznujlj2lcbm1+e2drjlpfndsxlz2qhzv2fcsx4nie0o90/6fqyc4xyyfoqwbn2btettctgouhp+oij9d+fvsuj4dl9co/zqcezgc7zhb0i4gbocqx0awoaohuarnrx779l78v6/nk543zprmcbv7rni46it</latexit> <latexit sha1_base64="w1qn5onig9rsymkedws5n4dtw0e=">aaacgxicdzdnsgmxfiuz9a/wv6rgxk2xg0hpzbtblgtu3agv7a+0w3an3tbqysqkabgmfrj3bvul3ilbv76dd2fa66iwd4f8njmbucesmdpg8z6dznlyyupadj23sbm1vzpf3wtomvau61teqrui0bizbougmrhbuihwkmzm1l+y5m0hks1ecmngegmovyr1gqvjrtb/0okqookwzoo03dnvisucfpxxmc96pbi3uwer/nl09opkplqy/+rccjrgmbgag9zt35mmseezrmmc5zodjrjoh3rybk+crbwkprqcjrrnxxyt4kiddlrexjgc13reowwtg7mbnymh+gyi/edz060ekuvkwgbcbtbf1k1rw5lbj1dzw9yrkrtka9idtbudwxuxa1vclyyoc7ae3w4k/0ojxpitx58vq5vzuvlysi7imfhjoamss1ijdulja3kiz+tfexrentfn/edpxpnn7jm5or/fzjch2q==</latexit> Sainte-Laguë Method Ensuring proportionality is a classic problem that also arises when assigning parliament seats to parties Sainte-Laguë method as used in New Zealand Let v i denote the number of votes received by party p i Let s i denote the number of seats allocated to party p i While not all seats have been allocated assign next seat to party p i with highest quotient v i 2 s i +1 Increment number of seats s i allocated to party p i 102
PM-1 PM-1 is a naïve adaptation of the Sainte-Laguë method to the problem of selecting documents from D for S members of parliament (MoP) belong to a single party only, hence a document d represents only a single aspect c, namely the one for which it has highest probability P[d c] Allocate the k seats available to the query aspects (parties) according to their popularity P[c q] using Sainte-Laguë when allocated a seat, the query aspect (party) c assigns it to the document (MoP) not yet in S having highest P[d c] Problem: Documents relate to more than a single query aspect in practice, but PM-1 cannot handle this 103
<latexit sha1_base64="u516yjnpoztsbcgayfhwe64qyw8=">aaacgxicdzdnsgmxfixv+g/9qz+4cvpsrla6m0xqzcgng6gcvceow514w0mnk5ckxtl2sdy51zdwj25d+q4+hgnvrrupgxyckxu4j1epnzyi3r2jyanpmdm5+clc4tlysnf17dzirmbuydkv+jjbqynpqgg5telsaukrphsrdi6g+uwptoeyo7n9rzhadszbnkf1vlzcbly0srwx80febe6zmjd2s+eglpadsjuyqvqxwsrodsq1driphhc/mtesdqvllqvozfuykbvlqc1nkq0kza4hhaydbbrcqkeihevtkoks7o/hdjmuzkj8tn5ymkmwpi+sp6zaeznujlj2lcbm1+e2drjlpfndsxlz2qhzv2fcsx4nie0o90/6fqyc4xyyfoqwbn2btettctgouhp+oij9d+fvsuj4dl9co/zqcezgc7zhb0i4gbocqx0awoaohuarnrx779l78v6/nk543zprmcbv7rni46it</latexit> <latexit sha1_base64="u516yjnpoztsbcgayfhwe64qyw8=">aaacgxicdzdnsgmxfixv+g/9qz+4cvpsrla6m0xqzcgng6gcvceow514w0mnk5ckxtl2sdy51zdwj25d+q4+hgnvrrupgxyckxu4j1epnzyi3r2jyanpmdm5+clc4tlysnf17dzirmbuydkv+jjbqynpqgg5telsaukrphsrdi6g+uwptoeyo7n9rzhadszbnkf1vlzcbly0srwx80febe6zmjd2s+eglpadsjuyqvqxwsrodsq1driphhc/mtesdqvllqvozfuykbvlqc1nkq0kza4hhaydbbrcqkeihevtkoks7o/hdjmuzkj8tn5ymkmwpi+sp6zaeznujlj2lcbm1+e2drjlpfndsxlz2qhzv2fcsx4nie0o90/6fqyc4xyyfoqwbn2btettctgouhp+oij9d+fvsuj4dl9co/zqcezgc7zhb0i4gbocqx0awoaohuarnrx779l78v6/nk543zprmcbv7rni46it</latexit> <latexit sha1_base64="u516yjnpoztsbcgayfhwe64qyw8=">aaacgxicdzdnsgmxfixv+g/9qz+4cvpsrla6m0xqzcgng6gcvceow514w0mnk5ckxtl2sdy51zdwj25d+q4+hgnvrrupgxyckxu4j1epnzyi3r2jyanpmdm5+clc4tlysnf17dzirmbuydkv+jjbqynpqgg5telsaukrphsrdi6g+uwptoeyo7n9rzhadszbnkf1vlzcbly0srwx80febe6zmjd2s+eglpadsjuyqvqxwsrodsq1driphhc/mtesdqvllqvozfuykbvlqc1nkq0kza4hhaydbbrcqkeihevtkoks7o/hdjmuzkj8tn5ymkmwpi+sp6zaeznujlj2lcbm1+e2drjlpfndsxlz2qhzv2fcsx4nie0o90/6fqyc4xyyfoqwbn2btettctgouhp+oij9d+fvsuj4dl9co/zqcezgc7zhb0i4gbocqx0awoaohuarnrx779l78v6/nk543zprmcbv7rni46it</latexit> <latexit sha1_base64="w1qn5onig9rsymkedws5n4dtw0e=">aaacgxicdzdnsgmxfiuz9a/wv6rgxk2xg0hpzbtblgtu3agv7a+0w3an3tbqysqkabgmfrj3bvul3ilbv76dd2fa66iwd4f8njmbucesmdpg8z6dznlyyupadj23sbm1vzpf3wtomvau61teqrui0bizbougmrhbuihwkmzm1l+y5m0hks1ecmngegmovyr1gqvjrtb/0okqookwzoo03dnvisucfpxxmc96pbi3uwer/nl09opkplqy/+rccjrgmbgag9zt35mmseezrmmc5zodjrjoh3rybk+crbwkprqcjrrnxxyt4kiddlrexjgc13reowwtg7mbnymh+gyi/edz060ekuvkwgbcbtbf1k1rw5lbj1dzw9yrkrtka9idtbudwxuxa1vclyyoc7ae3w4k/0ojxpitx58vq5vzuvlysi7imfhjoamss1ijdulja3kiz+tfexrentfn/edpxpnn7jm5or/fzjch2q==</latexit> PM-2 PM-2 is a probabilistic adaptation of the Sainte-Laguë method that considers to what extent documents related to query aspects Let v i = P[c i q] and s i denote the proportion votes/seats received by/assigned to aspect (party) c i While not all seats have been allocated select query aspect c i with highest quotient v i 2 s i +1 104
<latexit sha1_base64="1wger7epyuseuk2erac3vfrpm4m=">aaacdhicdzblaxsxfiu10zqp9xgnza5zddwflcbzmifmaeimm0ikcrlidmmd+dqvly0usrnixpzo0v/qdxefko6zce0uah3op3tap1kcgzumv4lw2cbzza3tnc6ll69e73b33lwy2wikqyq51fcvgossxqflluov0gii4nhzzb7c88tb1ibj+tzofrycjjubmwrww2x3juf+8qjyo3ysgbrbkrwunx+zkkwfoqz1plxajvphpfhwyxa8xpnomwle6azrxunn5pfjxnszmv3lmlzlt5fg/fr+onwrxys77q32ywloyu7vfcrpi7c2limx11mqbofaw0y5tp28maiazmcc15aqukgln0ep0or5kvaybogmcivmvqadycxcvgumaptj1ayknfmozh/hdnxaofarxmjnpvtilwyn7z+zvaj9ycm3eviqb/g/mysdxbteanxsxf62hv/pywfr0+kih2defz/pdu4feilb5b15tw5jrj6taflkzsiqupkt/a02g63gt3gq9sipd0/dylnzlqxmgp8d9nxasg==</latexit> <latexit sha1_base64="1wger7epyuseuk2erac3vfrpm4m=">aaacdhicdzblaxsxfiu10zqp9xgnza5zddwflcbzmifmaeimm0ikcrlidmmd+dqvly0usrnixpzo0v/qdxefko6zce0uah3op3tap1kcgzumv4lw2cbzza3tnc6ll69e73b33lwy2wikqyq51fcvgossxqflluov0gii4nhzzb7c88tb1ibj+tzofrycjjubmwrww2x3juf+8qjyo3ysgbrbkrwunx+zkkwfoqz1plxajvphpfhwyxa8xpnomwle6azrxunn5pfjxnszmv3lmlzlt5fg/fr+onwrxys77q32ywloyu7vfcrpi7c2limx11mqbofaw0y5tp28maiazmcc15aqukgln0ep0or5kvaybogmcivmvqadycxcvgumaptj1ayknfmozh/hdnxaofarxmjnpvtilwyn7z+zvaj9ycm3eviqb/g/mysdxbteanxsxf62hv/pywfr0+kih2defz/pdu4feilb5b15tw5jrj6taflkzsiqupkt/a02g63gt3gq9sipd0/dylnzlqxmgp8d9nxasg==</latexit> <latexit sha1_base64="1wger7epyuseuk2erac3vfrpm4m=">aaacdhicdzblaxsxfiu10zqp9xgnza5zddwflcbzmifmaeimm0ikcrlidmmd+dqvly0usrnixpzo0v/qdxefko6zce0uah3op3tap1kcgzumv4lw2cbzza3tnc6ll69e73b33lwy2wikqyq51fcvgossxqflluov0gii4nhzzb7c88tb1ibj+tzofrycjjubmwrww2x3juf+8qjyo3ysgbrbkrwunx+zkkwfoqz1plxajvphpfhwyxa8xpnomwle6azrxunn5pfjxnszmv3lmlzlt5fg/fr+onwrxys77q32ywloyu7vfcrpi7c2limx11mqbofaw0y5tp28maiazmcc15aqukgln0ep0or5kvaybogmcivmvqadycxcvgumaptj1ayknfmozh/hdnxaofarxmjnpvtilwyn7z+zvaj9ycm3eviqb/g/mysdxbteanxsxf62hv/pywfr0+kih2defz/pdu4feilb5b15tw5jrj6taflkzsiqupkt/a02g63gt3gq9sipd0/dylnzlqxmgp8d9nxasg==</latexit> <latexit sha1_base64="pbw0sdhom6qhmzd/wwjiwku8j84=">aaacdhicdzblaxsxfiu10zzj3uecdtkshphcstn5mecydgtttsgfoalkhugofo3klkakpakxyn5n6x/iurtcfxeyce0uah3op3tap1kcgzumv4lw2fmxg5tbl3uvxr95u93fexdhzkmpjqjkul9vyjczgkewwy5xsioiiunlnt994je3qa2t9bldkcwetgs2yrsst8r+tc794zhkb/lea3w3jwvdmd8wjys+r1nrfaq0g7eoeukt/exlt/ljm9oi0s2ivmabyophjfmxmvvlmlvlf5dgw/rhonwrxcs7hzbuzsr+ft6wtbfyw8rbmossvbzwoc2jhnte3hhuqocwxwtifsjuhzuifgj1yhv7wynau7hlcyvqgtbmiao1u4d9swpwus4tvoa/cds5lhyrvwoxpp4tputgxvefzcvu/rdk2yipltf8n0zcwejdiobqawiv256v57gd6glxmywzr78fdk6ou6k2yaeyr/zjro7icflkzsiiupkt/ak2gs3gd7gbdskp/56gqbfznqxmgp8fejg/9g==</latexit> <latexit sha1_base64="sj3cxbuym+npyyptzx8rh1qspzw=">aaacohicdzbbsxwxgia/svrtqnvte/mydc+csdmril6ebs9ecgqucs4wfbo/xenojihjieuyx+i/8dzr/qw99sacp/6cztcw2yovixl43ytwvbkqulfx/doyetc7935+4unjcwn540pz9dojkzvm1goykposr0mfl6lnus3otglckrd0mg/3x/npnwndzxlsr4psgyos9zld662suw0yhu6f430jtpoamuuy0u6idizjde0su4nmxduv7lvdz81w3n6mxwpfq6c9oenw9wtmdjg1n5ilyspbpwufgnpeizvnhwrlwuf1i6kmkwrdhna5xgov6dqnsaqyejqdeyxrkendzpqp0kewzityv6zaezlt5liolebmv89tfzd1vfsvpzl5biiu6hnfyttmsfvfo2+jkfpe8dozqebln5fapqvpe6wbvp5/hyrvw8lmu+p5akvv3x3ucrzgdb7conrgb7pwaifqawa38b1+wh1wf/wkholh56szwd83n2fkwe8/49+wpq==</latexit> <latexit sha1_base64="sj3cxbuym+npyyptzx8rh1qspzw=">aaacohicdzbbsxwxgia/svrtqnvte/mydc+csdmril6ebs9ecgqucs4wfbo/xenojihjieuyx+i/8dzr/qw99sacp/6cztcw2yovixl43ytwvbkqulfx/doyetc7935+4unjcwn540pz9dojkzvm1goykposr0mfl6lnus3otglckrd0mg/3x/npnwndzxlsr4psgyos9zld662suw0yhu6f430jtpoamuuy0u6idizjde0su4nmxduv7lvdz81w3n6mxwpfq6c9oenw9wtmdjg1n5ilyspbpwufgnpeizvnhwrlwuf1i6kmkwrdhna5xgov6dqnsaqyejqdeyxrkendzpqp0kewzityv6zaezlt5liolebmv89tfzd1vfsvpzl5biiu6hnfyttmsfvfo2+jkfpe8dozqebln5fapqvpe6wbvp5/hyrvw8lmu+p5akvv3x3ucrzgdb7conrgb7pwaifqawa38b1+wh1wf/wkholh56szwd83n2fkwe8/49+wpq==</latexit> <latexit sha1_base64="sj3cxbuym+npyyptzx8rh1qspzw=">aaacohicdzbbsxwxgia/svrtqnvte/mydc+csdmril6ebs9ecgqucs4wfbo/xenojihjieuyx+i/8dzr/qw99sacp/6cztcw2yovixl43ytwvbkqulfx/doyetc7935+4unjcwn540pz9dojkzvm1goykposr0mfl6lnus3otglckrd0mg/3x/npnwndzxlsr4psgyos9zld662suw0yhu6f430jtpoamuuy0u6idizjde0su4nmxduv7lvdz81w3n6mxwpfq6c9oenw9wtmdjg1n5ilyspbpwufgnpeizvnhwrlwuf1i6kmkwrdhna5xgov6dqnsaqyejqdeyxrkendzpqp0kewzityv6zaezlt5liolebmv89tfzd1vfsvpzl5biiu6hnfyttmsfvfo2+jkfpe8dozqebln5fapqvpe6wbvp5/hyrvw8lmu+p5akvv3x3ucrzgdb7conrgb7pwaifqawa38b1+wh1wf/wkholh56szwd83n2fkwe8/49+wpq==</latexit> <latexit sha1_base64="lztk82y5tan4omagz2eg64avd6e=">aaacohicdzdnsgmxfiuz/lv/qi7ddhyjcj2pklorbddubawrgjmmd+jtjz1mqpirs5gn8u3cuduncodobfc+gwmtsbupifk4jwnck8qmarogz97i6nj4xotudgvmdm5+obq4dkpfosg2qcieok9by8zybbpmmjyxcogngz6lnf1efnadsjorn5iuxjhdo2ctrse4k6lu6yt5u35vx/ejlgjqiyqvvswttvhz2kgxplhx5y97xzzjtrbwn8ke/l/qqpfpseygokqq79gloaxh3namtl5ohnlefprhnmoyehuajdaotpecqgksvwzbkdga1r2ohebauce2p/1qaifr3exph5oduro2uye6bll963pt2okty2vhmkcu66mnmtp1ghrsvg6x4labjnizbiydzgdwnubaldb1h2xf1fpdgf8/ng7ug46pn2t7o4oipsgkwsvrpeg2yr45ieekssi5iw/kktx5996l9+q9fv0d8qzvlsmqvi9patuwuq==</latexit> PM-2 select document d having highest score v i 2 s i +1 P[d c i ]+(1 ) ÿ j =i v j 2 s j +1 P[d c j ] with parameter λ trading of relatedness to aspect c i vs. all other aspects update s i for all query aspects as s i = s i + P[d c i ] q j P[d c j ] 105
Summary Returning relevant yet redundant results to users is not beneficial; MMR returns documents that are relevant and novel, i.e. different from already-returned results Results should cover different interpretations of the query to reflect different information needs; diversification methods (e.g., IA-SELECT and PM) return results that are relevant and cover different interpretations 106
Literature [1] C. D. Manning, P. Raghavan, and H. Schütze: Introduction to Information Retrieval, Cambridge University Press, 2008 (Chapter 12) [2] W. B. Croft, D. Metzler, and T. Strohman: Search Engines Information Retrieval in Practice, Pearson Education, 2009 (Chapter 7) [3] J. Carbonell and J. Goldstein: The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries, SIGIR 1998 107
Literature [4] R. Agrawal et al.: Diversifying Search Results, WSDM 2009 [5] V. Dang and W. Bruce Croft: Diversity by Proportionality: An Election-based Approach to Search Result Diversification, SIGIR 2012 [6] V. Dang and W. Bruce Croft: Term Level Search Result Diversification, SIGIR 2013 108