The data Transfer application

This is an introductory article about my (relatively) new open source command line application named Transfer.

Transfer is about transforming and transferring data between files and databases using recipes.

I always wanted to be able to make some transformation to data as easy as possible. Of course, there are many software solutions suitable for this task, generically named ETL (Extract, Transform and Load) and one of the applications that I liked most is Pentaho Data Integration (Kettle).

The concept is simple, read a data table from a source, row by row, optionally make some transformations on it, and transfer it to the destination table. The required definitions of the transformations are hold in configurations files named recipes.

Features:

  • easily extensible multi database support (currently: Firebird and PostgreSQL);
  • file (.csv and .xls) and database input readers;
  • database output writer;
  • plug-in system for simple data transformations;

Concepts

The Recipe

The recipe is a file in Apache format (parsed with the Config::General Perl module) and describes the source and the destination and how to transform the data. Transformations are made using plugins.

The simplest recipe defines only the source, the destination and the mapping for the fields from the source with those of the destination.

In this example the structure of the source and destination tables is identical, but the role of the headermap configuration is to allow mapping fields with different names - a rename field transformation.

# Transformation recipe
<recipe>
  version               = 1
  syntaxversion         = 1
  name                  = Copy table
  description           = Users data from db1_prod to db1_dev
</recipe>

<config>
  <source>
    reader              = db
    target              = db1_prod
    table               = users
  </source>

  <destination>
    writer              = db
    target              = db1_dev
    table               = users
  </destination>
</config>

<tables>
  <table users>
    description         = Users table
    logfield            =
    <headermap>
      user_id           = user_id
      name              = name
      dep_id            = dep_id
    </headermap>
  </table>
</tables>

<transform column/>

<transform row/>

Plugins

A plugin is a Perl module specialized to make simple transformations. It receives a hash reference containing info about the field (and some extra info needed for logging) and the current value. Transforms the value using Perl functions and returns the new value. Plugin functions can be chained together to make complex transformations.

The transform column and transform row sections of the recipe uses plugins to make column and row transformations.

Conclusion

This is the introductory article about my Transfer application - transforming and transferring data between files and databases using recipes. More elaborate examples will be presented in my next article on the subject.

The source code is on GitHub: https://github.com/stefansbv/transfer

Comments on the subject are welcome, but will be moderated, of course ;)

Acknowledgements

Some concepts, blocks of code and even entire modules, comments and documentation are borrowed and/or inspired from the excellent Sqitch project by @theory. Thank you!

Comments powered by Disqus