Project Title
Proper Name Resolution in Bibliographic Databases
Name of Academic Supervisor
Helen Paik
Email of Academic Supervisor
hpaik@cse.unsw.edu.au
Name of Joint Supervisor(s)
John Shepherd
Email of Joint Supervisor(s)
jas@cse.unsw.edu.au
CSE or NICTA Project
CSE
Research Area
Databases
Abstract of Research Project
Building accurate and consistent bibliographic databases is a difficult task. One particular problem is that proper names, while very useful in identifying entities, are not effective primary keys. For example, several authors may have the same name, or one person may use several variations on their name. While building a database from a number of external sources, where such ambiguities may be rife, it is useful to provide automated assistance in resolving names to ensure database accuracy. The aim of this project is to investigate methods for resolving proper names in the context of bibliographic database entries.
Novelty and Contribution
Investigation of the issues involved in resolving proper names; analysis of existing approaches; development of improved methods for proper name resolution.
Expected Outcomes
Methods for resolving proper names in environments where ambiguities and inconsistencies are present. A system that can read bibliographic entries in a variety of formats and integrate them semi-automatically into a bibliographic database.
Reference Material/Links
Contact the supervisors
Project Title
Near-duplicate Web Page Detection for Search Engines
Name of Academic Supervisor
Wei Wang
Email of Academic Supervisor
weiw@cse.unsw.edu.au
Name of Joint Supervisor(s)
Sunanda Patro
Email of Joint Supervisor(s)
sunandap@cse.unsw.edu.au
CSE or NICTA Project
CSE
Research Area
Database
Abstract of Research Project
The aim of the project is to implement and develop algorithms and systems to allow search engines to detect near-duplicate web pages effectively and efficiently. This is essential to focused web crawling and search result ranking. It is a challenging task, e.g., to develop systems to work with 8 billion Web pages!
Further details on this topic can be found in the references or by contacting the academic supervisor.
Through this project, students will learn fundamentals and current state-of-the-art in near-duplicate object detection within the context of Web Search. Students will work closely with a team of researchers and PhD students.
Novelty and Contribution
Students can gain a deep understanding of the challenges of some practical problems in Web Search, and gain hands-on experience in developing state-of-the-art solutions in that context.
Project Title
Google-style Keyword Search in Databases
Name of Academic Supervisor
Wei Wang
Email of Academic Supervisor
weiw@cse.unsw.edu.au
Name of Joint Supervisor(s)
Yi Luo
Email of Joint Supervisor(s)
luoyi@cse.unsw.edu.au
CSE or NICTA Project
CSE
Research Area
Database
Abstract of Research Project
This project focuses on providing google-style keyword search for databases. Huge amount of data are stored in relational databases nowadays, yet the dominant way to accessing those data is through sophisticated SQL and requires knowledge of schema of the databases. We will investigate effective and efficient ways to retrieve information based on free-style keyword search.
This project is built upon our previous study of the problem in the SPARK project (See Ref 1). The outcome will be an improved version of the SPARK system to support keyword search in major relational database systems (e.g., Oracle and mysql).
You are expected to work closely with an internationally leading researcher and PhD students, and gain rich experience in database and information retrieval areas.
Novelty and Contribution
Students can work on a novel topic and develop a practical solution with application to Enterprise Search and Web Search.
Expected Outcomes
The outcome of the project will be an improved version of the SPARK system for keyword search over databases. Successful completion of this project could lead to research publications and become part of an honours thesis or postgraduate research thesis.
Reference Material/Links
Project Title
Mobile Data Management for the Future
Name Of Academic Supervisor
Raymond Wong
Email Of Academic Supervisor
wong@cse.unsw.edu.au
Name Of Joint Spervisors(s)
William Shui
Email Of Joint Supervisors(s)
bill.shui@nicta.com.au
CSE or NICTA Project
CSE
Research Area
Databases
Abstract of Research Proposal
The aim of this project is to allow computer data (of various formats) to be stored and processed in a succinct representation: a space-efficient representation, which also maintains low access and update, costs for all of the processing operations. This is essential for devices with limited resources such as smart phones, PDA, and sensor networks. Basic research has been completed and preliminary prototype demonstrated has been implemented. Based on the current result, the project involves further development of the project in Linux or Windows (preferred) for handheld devices and mobile phones. The candidate will work with a team of researchers, PhD students and collaborators from industry.
Expected Outcomes
The outcome of the project will be an improved version of the current prototype such that it will allow more efficient querying and processing of data in mobile devices. The basis of the underlying data structure and techniques will be simplified for training and accelerate future development building upon our system. Finally, the outcome shall easily allow different applications to be integrated with our system. This project can further be extended and become part of an honours thesis or postgraduate research thesis.
Reference Material/Links
Feel free to email the academic supervisor for further details of the project (e.g., technical papers).
Commercial collaborators of the project includes http://www.greenpea.net
Project Title
Cooperative Query Processing for XML
Name Of Academic Supervisor
Raymond Wong
Email Of Academic Supervisor
wong@cse.unsw.edu.au
Name Of Joint Spervisors(s)
Franky Lam
Email Of Joint Supervisors(s)
Franky.lam@nicta.com.au
CSE or NICTA Project
NICTA
Research Area
Databases
Abstract of Research Proposal
Semistructured data, such as XML, allows users to structure a document in a way, which precisely captures the semantics of the data. This, however, poses a substantial barrier to casual and non-expert users who wish to query such data, as it is the data's structure which forms the basis of all XML query languages. Without an accurate understanding of this structure, users are unable to issue meaningful queries. This problem is compounded when one realizes that data adhering to different schema are likely to be contained within the same data warehouse or across multiple enterprise databases. This project is to develop a mechanism for meaningfully querying such data with no prior knowledge of its structure. The candidate will work with a team of researchers, PhD students and collaborators from industry.
Expected Outcomes
The project will be based on the research result that has been taken place in our research group in the past few years. It can be adjusted and extended according to the student interests. The system shall return approximate answers to such a query over semistructured data such as XML data, and can return useful results even if a specific query value cannot be matched. This project can further be extended and become part of an honours thesis or postgraduate research thesis.
Reference Material/Links
Feel free to email the academic supervisor for further details of the project (e.g., technical papers).
[Top of Page]