Toggle Menu

Insights > Modern Analytics > Open Source ETL Tools

Open Source ETL Tools

Imagine that you have been charged with getting data from multiple sources – a flat file, a query from your data warehouse – and you need to bring it together so that it can be used to feed a report or a dashboard. What are your options? You’re not a developer who writes scripts and […]

By

June 10, 2016

Imagine that you have been charged with getting data from multiple sources – a flat file, a query from your data warehouse – and you need to bring it together so that it can be used to feed a report or a dashboard. What are your options? You’re not a developer who writes scripts and you’re tired of copying-pasting into Excel spreadsheets. You also don’t have any budget for a tool. As the Data Integration (DI) space moves to a model that encourages self-service integration and analysis services, this scenario is becoming more common, so we decided to take a look at some of the free open source tools available for this task.

As anyone who has combined data between sources to enable analysis will tell you, DI (or ETL) is more than just combining data. DI involves designing the data structures in a manner that supports the analysis being performed, moving the data between the systems (the actual Extract, Transform, Load processes), monitoring and improving the data quality, and ensuring the security of the data.  Here are just a few of the items that you’ll need to consider:

Data structure design
Data movement
Data quality
Data security & privacy

We took two of the most popular Open Source DI tools, Pentaho Kettle and Talend, to determine how they could handle some of these tasks.  Both tools are free with an active user community to assist with any questions, an adequate pool of existing users for long term support, out of the box connections to most file formats and databases, and intuitive User Interfaces in guiding you through your transformations.  These transformations not only format your data as needed but also allow you to validate the data quality and handle errors gracefully.

You Might Also Like

Modern Analytics

30-Day Data Analysis

What is Data Analysis?   Data analysis is the process of turning raw data into actionable...

Artificial Intelligence (AI)

Haunted by Data Quality Issues? Call DataOps!

October is here and with it comes jack-o-lanterns, skeletons, witches, and yes…bad data. There are many...

Artificial Intelligence (AI)

AI Explained in 140 Characters

There is a lot of information out there about AI: what it is, what it’s not,...