Supervised Hierarchical Clustering with Exponential Linage Nishant Yadav Ari Kobren Nicholas Monath Andrew McCallum
At train time, learn A :2 X! Y Supervised Clustering
<latexit sha1_base64="owgj5x0xq+dfya1o3lczk+pidi=">aaab8nicbvdlssnafl2pr1pfvzdugvwvrirdfl147kcfuabymq6aydozslmjvbcp8onc0xc+jxu/bsnbrbaemdgcm69zlntaq36hnftmltfwnzq7xd2dnd2z+ohh61juo1zs2qhnldbgmugqt5chyn9gmxkfgnxbyl/udj6ynv/irpwlyjkspokuojv6/zjgmbkr3cwg1zpx9+zwv4lfbouaa6qx/2homnmjfjbjon5xojbrjryktis08nswidbhrwspjzeyqzspp3dordn1iafsunp190zgymomcwgn84hm2cvf/7xeitf1hgzpmgxxwupcjf5eb3u0ouguuxtyrqzw1wl46jjhrtsxvbgr988ippx9r9yx8ua43boo4ynmapnimpv9cae2hccygoeizxehpqexheny/fampdo7hd5zph3bqvc=</latexit> <latexit sha1_base64="owgj5x0xq+dfya1o3lczk+pidi=">aaab8nicbvdlssnafl2pr1pfvzdugvwvrirdfl147kcfuabymq6aydozslmjvbcp8onc0xc+jxu/bsnbrbaemdgcm69zlntaq36hnftmltfwnzq7xd2dnd2z+ohh61juo1zs2qhnldbgmugqt5chyn9gmxkfgnxbyl/udj6ynv/irpwlyjkspokuojv6/zjgmbkr3cwg1zpx9+zwv4lfbouaa6qx/2homnmjfjbjon5xojbrjryktis08nswidbhrwspjzeyqzspp3dordn1iafsunp190zgymomcwgn84hm2cvf/7xeitf1hgzpmgxxwupcjf5eb3u0ouguuxtyrqzw1wl46jjhrtsxvbgr988ippx9r9yx8ua43boo4ynmapnimpv9cae2hccygoeizxehpqexheny/fampdo7hd5zph3bqvc=</latexit> <latexit sha1_base64="owgj5x0xq+dfya1o3lczk+pidi=">aaab8nicbvdlssnafl2pr1pfvzdugvwvrirdfl147kcfuabymq6aydozslmjvbcp8onc0xc+jxu/bsnbrbaemdgcm69zlntaq36hnftmltfwnzq7xd2dnd2z+ohh61juo1zs2qhnldbgmugqt5chyn9gmxkfgnxbyl/udj6ynv/irpwlyjkspokuojv6/zjgmbkr3cwg1zpx9+zwv4lfbouaa6qx/2homnmjfjbjon5xojbrjryktis08nswidbhrwspjzeyqzspp3dordn1iafsunp190zgymomcwgn84hm2cvf/7xeitf1hgzpmgxxwupcjf5eb3u0ouguuxtyrqzw1wl46jjhrtsxvbgr988ippx9r9yx8ua43boo4ynmapnimpv9cae2hccygoeizxehpqexheny/fampdo7hd5zph3bqvc=</latexit> Supervised Clustering At train time, learn A :2 X! Y At test time, use on new set of points A<latexit sha1_base64="owgj5x0xq+dfya1o3lczk+pidi=">aaab8nicbvdlssnafl2pr1pfvzdugvwvrirdfl147kcfuabymq6aydozslmjvbcp8onc0xc+jxu/bsnbrbaemdgcm69zlntaq36hnftmltfwnzq7xd2dnd2z+ohh61juo1zs2qhnldbgmugqt5chyn9gmxkfgnxbyl/udj6ynv/irpwlyjkspokuojv6/zjgmbkr3cwg1zpx9+zwv4lfbouaa6qx/2homnmjfjbjon5xojbrjryktis08nswidbhrwspjzeyqzspp3dordn1iafsunp190zgymomcwgn84hm2cvf/7xeitf1hgzpmgxxwupcjf5eb3u0ouguuxtyrqzw1wl46jjhrtsxvbgr988ippx9r9yx8ua43boo4ynmapnimpv9cae2hccygoeizxehpqexheny/fampdo7hd5zph3bqvc=</latexit>
<latexit sha1_base64="owgj5x0xq+dfya1o3lczk+pidi=">aaab8nicbvdlssnafl2pr1pfvzdugvwvrirdfl147kcfuabymq6aydozslmjvbcp8onc0xc+jxu/bsnbrbaemdgcm69zlntaq36hnftmltfwnzq7xd2dnd2z+ohh61juo1zs2qhnldbgmugqt5chyn9gmxkfgnxbyl/udj6ynv/irpwlyjkspokuojv6/zjgmbkr3cwg1zpx9+zwv4lfbouaa6qx/2homnmjfjbjon5xojbrjryktis08nswidbhrwspjzeyqzspp3dordn1iafsunp190zgymomcwgn84hm2cvf/7xeitf1hgzpmgxxwupcjf5eb3u0ouguuxtyrqzw1wl46jjhrtsxvbgr988ippx9r9yx8ua43boo4ynmapnimpv9cae2hccygoeizxehpqexheny/fampdo7hd5zph3bqvc=</latexit> <latexit sha1_base64="owgj5x0xq+dfya1o3lczk+pidi=">aaab8nicbvdlssnafl2pr1pfvzdugvwvrirdfl147kcfuabymq6aydozslmjvbcp8onc0xc+jxu/bsnbrbaemdgcm69zlntaq36hnftmltfwnzq7xd2dnd2z+ohh61juo1zs2qhnldbgmugqt5chyn9gmxkfgnxbyl/udj6ynv/irpwlyjkspokuojv6/zjgmbkr3cwg1zpx9+zwv4lfbouaa6qx/2homnmjfjbjon5xojbrjryktis08nswidbhrwspjzeyqzspp3dordn1iafsunp190zgymomcwgn84hm2cvf/7xeitf1hgzpmgxxwupcjf5eb3u0ouguuxtyrqzw1wl46jjhrtsxvbgr988ippx9r9yx8ua43boo4ynmapnimpv9cae2hccygoeizxehpqexheny/fampdo7hd5zph3bqvc=</latexit> <latexit sha1_base64="owgj5x0xq+dfya1o3lczk+pidi=">aaab8nicbvdlssnafl2pr1pfvzdugvwvrirdfl147kcfuabymq6aydozslmjvbcp8onc0xc+jxu/bsnbrbaemdgcm69zlntaq36hnftmltfwnzq7xd2dnd2z+ohh61juo1zs2qhnldbgmugqt5chyn9gmxkfgnxbyl/udj6ynv/irpwlyjkspokuojv6/zjgmbkr3cwg1zpx9+zwv4lfbouaa6qx/2homnmjfjbjon5xojbrjryktis08nswidbhrwspjzeyqzspp3dordn1iafsunp190zgymomcwgn84hm2cvf/7xeitf1hgzpmgxxwupcjf5eb3u0ouguuxtyrqzw1wl46jjhrtsxvbgr988ippx9r9yx8ua43boo4ynmapnimpv9cae2hccygoeizxehpqexheny/fampdo7hd5zph3bqvc=</latexit> Supervised Clustering At train time, learn A :2 X! Y At test time, use on new set of points A<latexit sha1_base64="owgj5x0xq+dfya1o3lczk+pidi=">aaab8nicbvdlssnafl2pr1pfvzdugvwvrirdfl147kcfuabymq6aydozslmjvbcp8onc0xc+jxu/bsnbrbaemdgcm69zlntaq36hnftmltfwnzq7xd2dnd2z+ohh61juo1zs2qhnldbgmugqt5chyn9gmxkfgnxbyl/udj6ynv/irpwlyjkspokuojv6/zjgmbkr3cwg1zpx9+zwv4lfbouaa6qx/2homnmjfjbjon5xojbrjryktis08nswidbhrwspjzeyqzspp3dordn1iafsunp190zgymomcwgn84hm2cvf/7xeitf1hgzpmgxxwupcjf5eb3u0ouguuxtyrqzw1wl46jjhrtsxvbgr988ippx9r9yx8ua43boo4ynmapnimpv9cae2hccygoeizxehpqexheny/fampdo7hd5zph3bqvc=</latexit>
<latexit sha1_base64="owgj5x0xq+dfya1o3lczk+pidi=">aaab8nicbvdlssnafl2pr1pfvzdugvwvrirdfl147kcfuabymq6aydozslmjvbcp8onc0xc+jxu/bsnbrbaemdgcm69zlntaq36hnftmltfwnzq7xd2dnd2z+ohh61juo1zs2qhnldbgmugqt5chyn9gmxkfgnxbyl/udj6ynv/irpwlyjkspokuojv6/zjgmbkr3cwg1zpx9+zwv4lfbouaa6qx/2homnmjfjbjon5xojbrjryktis08nswidbhrwspjzeyqzspp3dordn1iafsunp190zgymomcwgn84hm2cvf/7xeitf1hgzpmgxxwupcjf5eb3u0ouguuxtyrqzw1wl46jjhrtsxvbgr988ippx9r9yx8ua43boo4ynmapnimpv9cae2hccygoeizxehpqexheny/fampdo7hd5zph3bqvc=</latexit> <latexit sha1_base64="owgj5x0xq+dfya1o3lczk+pidi=">aaab8nicbvdlssnafl2pr1pfvzdugvwvrirdfl147kcfuabymq6aydozslmjvbcp8onc0xc+jxu/bsnbrbaemdgcm69zlntaq36hnftmltfwnzq7xd2dnd2z+ohh61juo1zs2qhnldbgmugqt5chyn9gmxkfgnxbyl/udj6ynv/irpwlyjkspokuojv6/zjgmbkr3cwg1zpx9+zwv4lfbouaa6qx/2homnmjfjbjon5xojbrjryktis08nswidbhrwspjzeyqzspp3dordn1iafsunp190zgymomcwgn84hm2cvf/7xeitf1hgzpmgxxwupcjf5eb3u0ouguuxtyrqzw1wl46jjhrtsxvbgr988ippx9r9yx8ua43boo4ynmapnimpv9cae2hccygoeizxehpqexheny/fampdo7hd5zph3bqvc=</latexit> <latexit sha1_base64="owgj5x0xq+dfya1o3lczk+pidi=">aaab8nicbvdlssnafl2pr1pfvzdugvwvrirdfl147kcfuabymq6aydozslmjvbcp8onc0xc+jxu/bsnbrbaemdgcm69zlntaq36hnftmltfwnzq7xd2dnd2z+ohh61juo1zs2qhnldbgmugqt5chyn9gmxkfgnxbyl/udj6ynv/irpwlyjkspokuojv6/zjgmbkr3cwg1zpx9+zwv4lfbouaa6qx/2homnmjfjbjon5xojbrjryktis08nswidbhrwspjzeyqzspp3dordn1iafsunp190zgymomcwgn84hm2cvf/7xeitf1hgzpmgxxwupcjf5eb3u0ouguuxtyrqzw1wl46jjhrtsxvbgr988ippx9r9yx8ua43boo4ynmapnimpv9cae2hccygoeizxehpqexheny/fampdo7hd5zph3bqvc=</latexit> Supervised Clustering At train time, learn A :2 X! Y At test time, use on new set of points A<latexit sha1_base64="owgj5x0xq+dfya1o3lczk+pidi=">aaab8nicbvdlssnafl2pr1pfvzdugvwvrirdfl147kcfuabymq6aydozslmjvbcp8onc0xc+jxu/bsnbrbaemdgcm69zlntaq36hnftmltfwnzq7xd2dnd2z+ohh61juo1zs2qhnldbgmugqt5chyn9gmxkfgnxbyl/udj6ynv/irpwlyjkspokuojv6/zjgmbkr3cwg1zpx9+zwv4lfbouaa6qx/2homnmjfjbjon5xojbrjryktis08nswidbhrwspjzeyqzspp3dordn1iafsunp190zgymomcwgn84hm2cvf/7xeitf1hgzpmgxxwupcjf5eb3u0ouguuxtyrqzw1wl46jjhrtsxvbgr988ippx9r9yx8ua43boo4ynmapnimpv9cae2hccygoeizxehpqexheny/fampdo7hd5zph3bqvc=</latexit> Select a clustering algorithm Learn a dissimilarity function
<latexit sha1_base64="owgj5x0xq+dfya1o3lczk+pidi=">aaab8nicbvdlssnafl2pr1pfvzdugvwvrirdfl147kcfuabymq6aydozslmjvbcp8onc0xc+jxu/bsnbrbaemdgcm69zlntaq36hnftmltfwnzq7xd2dnd2z+ohh61juo1zs2qhnldbgmugqt5chyn9gmxkfgnxbyl/udj6ynv/irpwlyjkspokuojv6/zjgmbkr3cwg1zpx9+zwv4lfbouaa6qx/2homnmjfjbjon5xojbrjryktis08nswidbhrwspjzeyqzspp3dordn1iafsunp190zgymomcwgn84hm2cvf/7xeitf1hgzpmgxxwupcjf5eb3u0ouguuxtyrqzw1wl46jjhrtsxvbgr988ippx9r9yx8ua43boo4ynmapnimpv9cae2hccygoeizxehpqexheny/fampdo7hd5zph3bqvc=</latexit> <latexit sha1_base64="owgj5x0xq+dfya1o3lczk+pidi=">aaab8nicbvdlssnafl2pr1pfvzdugvwvrirdfl147kcfuabymq6aydozslmjvbcp8onc0xc+jxu/bsnbrbaemdgcm69zlntaq36hnftmltfwnzq7xd2dnd2z+ohh61juo1zs2qhnldbgmugqt5chyn9gmxkfgnxbyl/udj6ynv/irpwlyjkspokuojv6/zjgmbkr3cwg1zpx9+zwv4lfbouaa6qx/2homnmjfjbjon5xojbrjryktis08nswidbhrwspjzeyqzspp3dordn1iafsunp190zgymomcwgn84hm2cvf/7xeitf1hgzpmgxxwupcjf5eb3u0ouguuxtyrqzw1wl46jjhrtsxvbgr988ippx9r9yx8ua43boo4ynmapnimpv9cae2hccygoeizxehpqexheny/fampdo7hd5zph3bqvc=</latexit> <latexit sha1_base64="owgj5x0xq+dfya1o3lczk+pidi=">aaab8nicbvdlssnafl2pr1pfvzdugvwvrirdfl147kcfuabymq6aydozslmjvbcp8onc0xc+jxu/bsnbrbaemdgcm69zlntaq36hnftmltfwnzq7xd2dnd2z+ohh61juo1zs2qhnldbgmugqt5chyn9gmxkfgnxbyl/udj6ynv/irpwlyjkspokuojv6/zjgmbkr3cwg1zpx9+zwv4lfbouaa6qx/2homnmjfjbjon5xojbrjryktis08nswidbhrwspjzeyqzspp3dordn1iafsunp190zgymomcwgn84hm2cvf/7xeitf1hgzpmgxxwupcjf5eb3u0ouguuxtyrqzw1wl46jjhrtsxvbgr988ippx9r9yx8ua43boo4ynmapnimpv9cae2hccygoeizxehpqexheny/fampdo7hd5zph3bqvc=</latexit> Supervised Clustering At train time, learn A :2 X! Y At test time, use on new set of points A<latexit sha1_base64="owgj5x0xq+dfya1o3lczk+pidi=">aaab8nicbvdlssnafl2pr1pfvzdugvwvrirdfl147kcfuabymq6aydozslmjvbcp8onc0xc+jxu/bsnbrbaemdgcm69zlntaq36hnftmltfwnzq7xd2dnd2z+ohh61juo1zs2qhnldbgmugqt5chyn9gmxkfgnxbyl/udj6ynv/irpwlyjkspokuojv6/zjgmbkr3cwg1zpx9+zwv4lfbouaa6qx/2homnmjfjbjon5xojbrjryktis08nswidbhrwspjzeyqzspp3dordn1iafsunp190zgymomcwgn84hm2cvf/7xeitf1hgzpmgxxwupcjf5eb3u0ouguuxtyrqzw1wl46jjhrtsxvbgr988ippx9r9yx8ua43boo4ynmapnimpv9cae2hccygoeizxehpqexheny/fampdo7hd5zph3bqvc=</latexit> Select a clustering algorithm Learn a dissimilarity function Clustering Algorithm Mismatch leads to poor Training Procedure generalization
Hierarchical Agglomerative Clustering!3
Hierarchical Agglomerative Clustering!3
Hierarchical Agglomerative Clustering Iteratively merge two closest clusters!3
Hierarchical Agglomerative Clustering Iteratively merge two closest clusters!3
Hierarchical Agglomerative Clustering Iteratively merge two closest clusters!3
Hierarchical Agglomerative Clustering Iteratively merge two closest clusters!3
Hierarchical Agglomerative Clustering Iteratively merge two closest clusters!3
Hierarchical Agglomerative Clustering Iteratively merge two closest clusters!3
Hierarchical Agglomerative Clustering Iteratively merge two closest clusters!3
Hierarchical Agglomerative Clustering Iteratively merge two closest clusters!3
Linage Function Inter-Cluster distance given by Linage Function!4
Linage Function Single Linage (Minimum Pairwise Dissimilarity) Inter-Cluster distance given by Linage Function!4
Linage Function Average Linage (Average Pairwise Dissimilarity) Single Linage!5
Linage Function Complete Linage (Maximum Pairwise Dissimilarity) Single Linage Average Linage!6
Linage Function Best linage function for a dataset is, a priori, unnown Single Linage Average Linage!7 Complete Linage
In this wor: 1. Exponential Linage: Learnable family of linage functions 2. Training objective to jointly optimize linage & dissimilarity function!8
Exponential Linage Single Linage Average Linage Complete Linage!9
Exponential Linage Single Linage Average Linage Complete Linage 0 Weight 1 Weighted Average with Learnable Parameter!9
Exponential Linage Single Linage J(, ) = n 0 X i=1 X C u,v 2P (i) max n o 0, (C u0,v 0) (C u,v) Average Linage Complete Linage Algorithm 1 train ExpLin(X, C?,T, 1, 2) Init:, for t =1,..., T do J 0 T (0) j {x j } 8 x j 2 X for round i =1,...,n 0 do {T (i) }l i HAC-Round({T {C (i) } l i C (i) {C (i) } l i {lvst (i) }l i (i 1) } l i 1 ) P (i) {C u,v 2 C (i) C (i) : C u 6= C v } P (i) + {C u,v 2 P (i) : 9C j? s.t. C u,c v C j?} P (i) P (i) \P (i) + C u 0,v 0 arg min C u,v 2P (i) for C u,v 2 P (i) J 1 @J @ (C u,v ) 0 + Weight 1 J + max n do 0, (C u0,v 0) (C u,v ) Weighted Average with Learnable Parameter 2 @J @!9 o
Experimental Setup Entity Resolution REXA AMINER UMIST Faces Noun Phrase Coreference!10
Experimental Setup Entity Resolution REXA AMINER UMIST Faces Noun Phrase Coreference Four Linage Functions Single Linage Average Linage Complete Linage Exponential Linage!10
Experimental Setup Entity Resolution REXA AMINER UMIST Faces Noun Phrase Coreference Four Linage Functions Single Linage Average Linage Complete Linage Exponential Linage Evaluated using Dendrogram Purity Averaged across 50 different train/test/dev splits!10
Results Dataset: Rexa 90.0 89.1 Dendrogram Purity 86.3 82.5 78.8 81.7 84.6 80.1 75.0 Single Average Complete Exponential Linage Function
Results Dataset: Rexa 90.0 89.1 Dendrogram Purity 86.3 82.5 78.8 81.7 84.6 80.1 75.0 Single Average Complete Exponential Linage Function
Summary!12
Summary Exponential Linage: Learnable family of linage functions!12
Summary Exponential Linage: Learnable family of linage functions X J(, ) = n 0 i=1 X C u,v2p (i) max n o 0, (C u0,v 0) (C u,v) Algorithm 1 train ExpLin(X, C?,T, 1, 2) Init:, for t =1,..., T do J 0 T (0) j {x j } 8 x j 2 X for round i =1,...,n 0 do {T (i) }li {C (i) } li C (i) (i 1) HAC-Round({T } li 1 (i) {lvst }li {C (i) } li ) P (i) {C u,v 2 C (i) C (i) : C u 6= C v } P (i) + {C u,v 2 P (i) : 9C j? s.t. C u,c v C j?} P (i) P (i) \P (i) + C u 0,v 0 arg min C u,v2p (C (i) u,v ) + for C u,v 2 P (i) n do o J J + max 0, (C u0,v 0) (C u,v ) 1 @J @ 2 @J @ Training Objective & Algorithm: Jointly Optimizing Dissimilarity & Linage Function!12
Summary Exponential Linage: Learnable family of linage functions X J(, ) = n 0 i=1 X C u,v2p (i) max n o 0, (C u0,v 0) (C u,v) Algorithm 1 train ExpLin(X, C?,T, 1, 2) Init:, for t =1,..., T do J 0 T (0) j {x j } 8 x j 2 X for round i =1,...,n 0 do {T (i) }li {C (i) } li C (i) (i 1) HAC-Round({T } li 1 (i) {lvst }li {C (i) } li ) P (i) {C u,v 2 C (i) C (i) : C u 6= C v } P (i) + {C u,v 2 P (i) : 9C j? s.t. C u,c v C j?} P (i) P (i) \P (i) + C u 0,v 0 arg min C u,v2p (C (i) u,v ) + for C u,v 2 P (i) n do o J J + max 0, (C u0,v 0) (C u,v ) 1 @J @ 2 @J @ Training Objective & Algorithm: Jointly Optimizing Dissimilarity & Linage Function Effective Empirical Results!12
Single Linage Average Linage Thans for listening! Chec out our poster #196 today at 6:30pm in Pacific Ballroom! Complete Linage Exponential Linage Paper: Code: Train Test!13