Modern software development methods such as Extreme Programming (XP) favor the use of frequently repeated tests, so-called
regression tests, to catch new errors when software is updated or tuned, by checking that the software still produces the
right results for a reference input. Regression testing is also very valuable for Extract–Transform–Load (ETL) software, as
ETL software tends to be very complex and error-prone. However, regression testing of ETL software is currently cumbersome
and requires large manual efforts. In this paper, we describe a novel, easy–to–use, and efficient semi–automatic test framework
for regression test of ETL software. By automatically analyzing the schema, the tool detects how tables are related, and uses
this knowledge, along with optional user specifications, to determine exactly what data warehouse (DW) data should be identical
across test ETL runs, leaving out change-prone values such as surrogate keys. The framework also provides tools for quickly
detecting and displaying differences between the current ETL results and the reference results. In summary, manual work for
test setup is reduced to a minimum, while still ensuring an efficient testing procedure.