Key Contributions

Mining Evolution of Tree & Graph-Structured Data (2002 – present)

Many real-life data can be represented as tree and graph structured data. Different data mining techniques have been recently proposed to mine such complex structured data. Existing techniques focus on snapshot collection of tree/graph structured data while the data collections keep evolving in real applications. Our key contribution in this research is a system called TREASURE-MINER for discovering novel knowledge from the historical change patterns of tree and graph structured data. To the best of our knowledge, none of the previous works have considered the issue of mining changes to historical tree/graph structured data (our research results first appeared in ACM CIKM 2004). Such mining results can be useful in many applications such as e-commerce, web site maintenance, web site personalization, effective web advertisement, web event detection, XML index maintenance, intelligent XML query processing, XML caching, etc. In this context, we have designed and implemented various novel mining algorithms. Our research results have appeared in premium conferences/journals such as ACM SIGKDD, ACM WWW, VLDB, ACM CIKM, & DKE.

 

 

Efficient XPath Processing in Tree-Unaware RDBMS (2005 – present)

With the rapid emergence of XML as the de facto standard for exchanging data on the Web, the interest on efficiently querying growing XML data sources using existing relational framework has increased. Current approaches for evaluating XPath expressions in RDBMS can be arguably categorized into tree-aware and tree-unaware types. In the former approach, the database kernel is invaded to make it understand tree-shaped data. In the latter, the kernel is not modified primarily due to benefits with respect to portability and ease of implementation on top of off-the-shelf RDBMS. The research community has shown that tree-aware approaches are more scalable and perform orders of magnitude faster than some tree-unaware approaches. Our key contribution in this domain is our novel schema-oblivious tree-unaware system called XCALIBUR that we have been developing since 2005. The XCALIBUR project is an exploration of how far we can push the idea of using mature tree-unaware RDBMS technology to design and build a full-fledged XPath processor. To the best of our knowledge, we are the first to show that, contrary to popular belief, it is indeed possible for schema-oblivious tree-unaware strategy to outperform state-of-the-art schema-conscious tree-unaware approaches (such as Shared Inlining), commercial XPath support in RDBMS (such as MS SQL 2005), and several native tree-aware approaches (such as TJFast). For the first time, we also show that the performance gap between tree-unaware and tree-aware approach like MonetDB/XQuery can be significantly reduced and even outperform in certain cases. Our research results are published in DASFAA and ACM CIKM.

 

 

Scalable XML Change Detection (2002 – 2006)

Since online information changes frequently, being able to quickly detect the changes in XML documents is important to many applications. Our study showed that existing main-memory algorithms for XML change detection suffer from the scalability problem as they fail to detect changes to large XML documents due to lack of memory. Our novel contribution is that we develop a XML change detection system called XANADUE, to address the scalability issue by detecting changes using the relational database system. The XANADUE project is an investigation of how far we can push the idea of using mature RDBMS technology to design and build a full-fledged XML change management system, without invading the database kernel to make it tree aware. To the best of our knowledge, XANADUE is the first system to detect changes to XML data using relational backends (Our research results first appeared in DEXA 2004). Our results show that XANADUE is orders of magnitude faster and more scalable than the main memory-based approaches for larger XML documents. More importantly, our approach has much superior result quality than the main memory-based approaches. Our research results appeared in premium conferences such as ACM SIGMOD, ACM CIKM, DKE, & ER.

 

 

Warehousing the Web (1998 - 2001)

How does anyone find and manage relevant information among the millions of pages linked together in unpredictable tangles on the Internet? What makes the Web so exciting is its potential to transcend geography to bring information on myriad topics directly to the desktop. Yet without any consistent How does one locate and retrieve desired information easily and quickly in this huge information repository? This research address the problem of efficient management of historical Web information from the database perspective. We investigate issues in the construction of a Web data warehouse called WHOWEDA that materializes and manages useful information from the Web in order to support strategic decision making. Managing data in a Web warehouse requires us to develop a suitable data model and a set of operators to manipulate data components in the model, and to look into efficient storage structures and indexing mechanisms for Web data. To the best of our knowledge, this is the first systematic effort for building a web warehouse (our first research results appeared in IEEE ADL 1998). The results related to this research are published in several international conferences and journals including DEXA, ICDCS, DASFAA, ER, DKE & TKDE.

 

Back to top