Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
|
 |
Data Quality in Web Information Systems
| |
|
Data Quality in Web Information Systems
Xiaofang Zhou1 , Shazia Sadiq1 and Ke Deng1 
| (1) |
School of Information Technology and Electrical Engineering, The University of Queensland, Australia |
Abstract
The World Wide Web has brought a wave of revolutionary changes for people and organizations to generate, disseminate and use
data. With unprecedented access to massive amount of data and powerful information gathering capabilities enabled by Web-based
technologies, the traditional closed world assumption for database systems has been challenged. More and more data from the
Web are used today as essential information sources, directly or indirectly, for all types of decision making purposes in
not only just personal, but also many business and scientific applications. A user of such Web data, however, has to constantly
rely on their own judgement on data quality, such as correctness, currency, consistency and completeness. This is an unreliable
and often very difficult process, as the quality of this judgement itself often relies on the quality of other information
obtained from the Web, and the relationship among the data used can be very complex and sometime hidden from the user.
While the issue of data quality is as old as data itself, it is now exposed at a much higher, broader and more critical level
due to the scale, diversity and ubiquitousness of Web Information Systems. The intrinsic mismatch between the intended use
and actual use of the data on the Web is a fundamental cause of poor data quality for Web-based applications. In this talk,
we will introduce the notion of data quality, from its root in management information systems research to new issues and challenges
in the context of large-scale Web Information Systems. After a brief introduction to organizational and architectural solutions
to the data quality problem, this talk will focus on the current research activities and results on computational solutions
form the database community in data profiling, record linking, conditional functional constraints, data provenance and data
uncertainty. These technical solutions will be examined for their promises and limitations to the problem of data quality
in Web Information Systems. Finally, we will discuss a list of open research problems.
Fulltext Preview (Small, Large)
|
|
|
|
|
|