02-05-2011, 04:40 PM
[attachment=13262]
Abstract— In this paper, our research objective is to develop a
database virtualization technique so that data analysts or other
users who apply data mining methods to their jobs can use all
ubiquitous databases in the Internet as if they were recognized as a
single database, thereby helping to reduce their workloads such as
data collection from the Internet databases and data cleansing
works. In this study, firstly we examine XML scheme advantages
and propose a database virtualization method by which such
ubiquitous databases as relational databases, object-oriented
databases, and XML databases are useful, as if they all behaved as
a single database. Next, we show the method of virtualization of
ubiquitous databases can describe ubiquitous database schema in a
unified fashion using the XML schema. Moreover, it consists of a
high-level concept of distributed database management of the same
type and of different types, and also of a location transparency
feature. Finally, we develop a common schema generation method
and propose the virtual database query language for use in a
virtualized ubiquitous database use environment.
Keywords-component; database virtualization; data mining;
XML schema; ubiquitous databases; database
integration;database query
I. INTRODUCTION
Nowadays, massive amounts of data are collected daily
in ubiquitous sensor network environments. With such data
available and elaborately structured, it is more important
than ever to locate and access knowledge and trends from it
using data mining techniques. Those data are valuable to
support analyses and decision-making in businesses, for
example. Such data normally exist in databases of various
types––called ubiquitous databases hereinafter––that might
usually be distributed and placed anywhere. A salient
problem, however, is that a person who engages in data
mining using ubiquitous databases would have to spend
much time for database selection and data collection, for
example, which would be merely a preparatory step to the
actual data mining tasks. What a person really should want
must be instead to concentrate on the work of analysis and
rule extraction.
In our study, the primary objective is therefore to
develop a virtualization technique so that the data analyst or
other user can use all ubiquitous databases as if they were
recognized as a single database, thereby helping to reduce
the user’s workload.
II. ASSOCIATED STUDIES
Some earlier reports in [1], [2] have described the study
of database virtualization technology.
One report [1] proposed development of a system to pass
information actively to all users in a mobile computing
environment without fail, as sourced from various types of
database groups connected by a wide-area network. By
image-copying of the data of the local database group to a
meta-database through the basic search and build
operations, for example, it is intended to combine data and
include different types of the local database group.
The data integration technique, teiid, which is described
in [2], enables virtualization of various types of databases;
through such virtual databases, one can access such data
sources as relational databases, web databases, and
application software such as ERP and CRM, for example, in
real time. They can all be integrated for use. In fact, teiid
has a unique query engine. Furthermore, the real-time data
integration is accomplished by connecting business
application software through the JDBC/SOAP access layer
with data sources which are accessed through the connector
framework.
In our study, we considered the metadata, UML, E–R
model, and the XML schema as candidates for use to
accomplish database virtualization. Thereby, ubiquitous
databases can be used as if they were a single database. We
then compared the advantages and disadvantages of each to
analyze them as follows