A Fast Approximate AIB Algorithm for Distributional Word Clustering
Distributional word clustering merges the words having similar probability distributions to attain reliable parameter estimation, compact classification models and even better classification performance. Agglomerative Information Bottleneck (AIB) is one of the typical word clustering algorithms and has been applied to both traditional text classification and recent image recognition. Although enjoying theoretical elegance, AIB has one main issue on its computational efficiency, especially when clustering a large number of words. Different from existing solutions to this issue, we analyze the characteristics of its objective function — the loss of mutual information, and show that by merely using the ratio of word-class joint probabilities of each word, good candidate word pairs for merging can be easily identified. Based on this finding, we propose a fast approximate AIB algorithm and show that it can significantly improve the computational efficiency of AIB while well maintaining or even slightly increasing its classification performance. Experimental study on both text and image classification benchmark data sets shows that our algorithm can achieve more than 100 times speedup on large real data sets over the state-of-the-art method.
Lei Wang received the B.Eng degree and the M.Eng degree from Southeast University, China in 1996 and 1999, respectively, and the PhD degree from School of EEE of Nanyang Technological University, Singapore in 2004. He is now Senior Lecturer of Faculty of Engineering and Information Sciences of University of Wollongong. Lei Wang was awarded the Australian Post-doctoral Fellowship by Australian Research Council in 2007 and the Early Career Researcher Award by Australian Academy of Science in 2009. His research interests include machine learning, pattern recognition, and computer vision. Lei Wang has published more than 80 peer-reviewed papers, including those on highly regarded journals and conferences such as IEEE TPAMI, IEEE TNN, CVPR, ICCV and ECCV, etc. He is the Area Chair of Pacific-Rim Symposium on Image and Video Technology in 2010, 2011 and 2013, and has been the Technical Program Committee member of 20+ international conferences and workshops. Lei Wang is also the regular reviewer of 20+ international journals. He is a Senior member of IEEE.
时间地点
12月15日下午2:30
中心楼会议室