In order to establish a useful data warehouse, it must be correct and consistent. Hence, when selecting the data sources for
building the data warehouse, it is essential know exactly about the concept and structure of all possible data sources and
the dependencies between them. In a perfect world, this knowledge stems from an integrated, enterprize-wide data model. However,
the reality is different and often an explicit model is not available.
This paper proposes an approach for identifying data sources for a data warehouse, even without having detailed knowledge
about interdependencies of data sources. Furthermore, we are able to confine the number of potential data sources. Hence,
our approach reduces the time needed to build and maintain a data warehouse and it increases the data quality of the data
warehouse.
Keywords Data Warehouses - Data Source Identification - Multiple Sequence Analysis