The data Transfer application
This is an introductory article about my (relatively) new open source command line application named Transfer.
Transfer is about transforming and transferring data between files and databases using recipes.
I always wanted to be able to make some transformation to data as easy as possible. Of course, there are many software solutions suitable for this task, generically named ETL (Extract, Transform and Load) and one of the applications that I liked most is Pentaho Data Integration (Kettle).
The concept is simple, read a data table from a source, row by row,
optionally make some transformations on it, and transfer it to the
destination table. The required definitions of the transformations
are hold in configurations files named
- easily extensible multi database support (currently: Firebird and PostgreSQL);
- file (.csv and .xls) and database input readers;
- database output writer;
- plug-in system for simple data transformations;
The recipe is a file in Apache format (parsed with the Config::General Perl module) and describes the source and the destination and how to transform the data. Transformations are made using plugins.
The simplest recipe defines only the source, the destination and the mapping for the fields from the source with those of the destination.
In this example the structure of the source and destination tables is
identical, but the role of the
headermap configuration is to allow
mapping fields with different names - a rename field transformation.
# Transformation recipe <recipe> version = 1 syntaxversion = 1 name = Copy table description = Users data from db1_prod to db1_dev </recipe> <config> <source> reader = db target = db1_prod table = users </source> <destination> writer = db target = db1_dev table = users </destination> </config> <tables> <table users> description = Users table logfield = <headermap> user_id = user_id name = name dep_id = dep_id </headermap> </table> </tables> <transform column/> <transform row/>
plugin is a Perl module specialized to make simple transformations.
It receives a hash reference containing info about the field (and some
extra info needed for logging) and the current value. Transforms the
value using Perl functions and returns the new value. Plugin
functions can be chained together to make complex transformations.
transform column and
transform row sections of the
uses plugins to make column and row transformations.
This is the introductory article about my Transfer application - transforming and transferring data between files and databases using recipes. More elaborate examples will be presented in my next article on the subject.
The source code is on GitHub: https://github.com/stefansbv/transfer
Comments on the subject are welcome, but will be moderated, of course ;)
Some concepts, blocks of code and even entire modules, comments and documentation are borrowed and/or inspired from the excellent Sqitch project by @theory. Thank you!Comments powered by Disqus