|
| Qi He (何奇) | Home | Research | Publication | Software | Technology | Coding | Life | Miscellaneous |
Seek Creative Essence in Truth
2007 Student Travel Grant, to present paper at SIGIR 2007 in Amsterdam, Netherlands 2007 Student Travel Grant, to present paper at SIAM on Data Mining 2007 in Minneapolis, Minnesota, U.S.A. 2006 Microsoft Research Fellowship, presented by Dr. Hsiao-Wuen Hon (managing director of MSRA) in person, was one of 2 awardees from Singapore (36 from Asia-Pacific) Graduate Scholarship, to pursue Ph.D. program at Nanyang Technological University, January 2005- January 2008 2002 Lenovo Fellowship, for outstanding academic performance on M. Eng program 2001 IBM Best Student Award, honored among the best students in China 2001 Suntek Fellowship, for outstanding academic performance on M. Eng program 2000 Jinghua Fellowship, for outstanding academic performance on B. Eng program Microsoft Research Asia, Beijing Internship research, September 2007 – February 2008 Analyze Query Sessions for Web Search Interactions Finished work: · Proposed a novel method using a Variable Memory Markov model for Web search query prediction. Query prediction accuracy significantly increases with the accumulation of query history (within the same query session). Due to the fact that the full length Markov models carry several theoretical and practical drawbacks for simulating user behaviors such as query sessions, a Variable Memory Markov (VMM) model is proposed. The VMM can efficiently capture most of the query temporal characteristics while retaining a much lower compression ratio (relative to the full length Markov models). Our method is especially well-suited to real-life scenarios in which the user has already issued 2 or more queries. The encouraging results pave the way for a wealth of end-to-end applications in Web search interaction like query suggestion, query expansion, and query substitution, etc. This work was submitted to SIGIR 2008. Future work: · Analyze user behaviors from query sessions in more details. During the above experiments, we found that the query sessions in fact contain too many unexpected noises. We have to aggregate billions of query sessions to achieve any statistically valid results. User behaviors may exhibit irregularities, e.g., a historical query could be frequently repeated. Poor data quality is a major obstacle to further studies. Therefore, the effective preprocessing and aggregation of search engine query logs remains a major challenge in the future. · Study how to re-rank the search results via query session analysis. Results from query session analysis can be applied to facilitate Web search interactions, and user feedback could be used to help re-rank the search results. A more reliable ranking benchmark is based on user behaviors. For example, a query session could potentially help narrow the search scope with respect to a particular user’s past search patterns. Nanyang Technological University, Singapore PhD research, January 2005 – May 2008 Bursty Event Detection from Temporal Text Streams · Detecting periodic and aperiodic events from news streams. Automatically extracting historical events from a news stream is an open research problem because the performance of traditional methods via document clustering largely relies on the granularity of similarity among news articles. Instead, we proposed to generate events based on bursty words. We found that bursty words have varying bursty periods, and it is more accurate and efficient to identify events based on bursty words occurring during the same bursty period. As a result, the bursty period of word led to the automatic discovery of bursty periodic and aperiodic events. This work was published in SIGIR 2007, etc. · Analyzed the effect of bursty words on bursty event clustering. Document clustering has been a classical method for identifying topics/events from a news stream. However, previous research have neither considered detecting only important (bursty) events, nor incorporated bursty properties of words into the clustering. In this research, we considered embedding news documents into a Euclidean space consisting of only bursty words. The encouraging results showed that clustering in this new space enables important events to be quickly uncovered from a large-scaled news corpus. The work was published in SIAM Conference on Data Mining 2007, etc. · Proposed a new research problem called Anticipatory Event Detection (AED). The vast majority of existing research on event detection only target historical events, which restricts its application to real-world problems. In practice, people expect to be notified of personally interesting events (impending) as early as possible. Especially in the financial sector, e.g., stock brokers are extremely sensitive to the timely notification of political events happening anywhere in the world. In AED, a user subscribes to interested topics by specifying a few keywords to describe his/her anticipated event. Whenever the event is triggered by our detector, a notification will be sent to the user. We trained the anticipatory event detector as a classifier using Support Vector Machines. Experiments verified the high precision of such a classifier for various types of user pre-defined events such as company Merger and Acquisition events. The work was published in ER 2006, etc. The following keywords are automatically detected by my own bursty word detection algorithm.
Top Conferences/Journals in Data Mining Related Fields Conferences
Journals
Professional Activities
Paper Reviewer PAKDD (Pacific-Asia Conference on Knowledge
Discovery and Data Mining), 2008 Infoscale (ICST Conference on Scalable
Information Systems), 2007 CIKM (ACM Conference on Information and
Knowledge Management), 2006 Mentor of NTU Research Peer Mentorship Program
This section introduces all my interested research topics over the past
years. Each topic carries a simply yet clear brief (with examples, I try),
following by its challenging/open problems recently (until my updates).
Please never hesitate to contact me if you have unique comments or found any
problems. Collaboration in research is always welcome.
Temporal Vector Space/Probability Model for Topical/Categorical Data |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||