The process of moving and blending data from multiple sources and turning it into a useful consolidated form (often to facilitate downstream analytics) can be very simple or very complex. Developer-focused tools for data integration (DI) like Informatica PowerCenter or Microsoft SSIS have been in use for many years and the market continues to grow. […]
The process of moving and blending data from multiple sources and turning it into a useful consolidated form (often to facilitate downstream analytics) can be very simple or very complex. Developer-focused tools for data integration (DI) like Informatica PowerCenter or Microsoft SSIS have been in use for many years and the market continues to grow. Per Gartner’s Magic Quadrant from July 2015:
“…the data integration tool market was worth approximately $2.4 billion in constant currency at the end of 2014, an increase of 6.9% from 2013. The growth rate is above the average for the enterprise software market as a whole…”
Yes, you can still choose to write code (hello Python!) for all your DI needs. With the many tool options available, including free community versions, here are the 3 key reasons why we advocate using a tool rather than using all custom code:
In many cases, data integration or ETL scripts are created based on an individual developer’s own coding preferences – their personal best practices and even the choice of programming language. Across multiple team members and over time, this disparity of code can be difficult to maintain causing production support and enhancement work to take longer.
A DI Tool has a standard interface and while a developer may change the order in which functions are performed, the function screens remain constant across users and solutions. Yes, you need to learn the basic tool functions and once you do, navigation, comprehension and additions to solutions built in the tool are usually much faster and easier than traversing through lines of code.
Tools provide modules to support common functions. Drag and drop objects to use them, enter the required information, draw a line between objects to create dependencies. In addition, there’s usually some immediate quality checks and alerts to let you know when something is missing before you continue building. The experience is visual and usually pretty intuitive.
By comparison, cutting and pasting code or starting with existing scripts and modifying them is less instinctive and likely to take longer to update and verify – especially when it’s someone else’s code.
Remember when all websites were created by software developers? Nowadays anyone comfortable with using a web browser can use a drag and drop interface to create an impressive looking website. Tools exist and persist where there is demand. Drag and drop pre-defined functions allow someone with less experience and/or less technical background to learn faster and become competent sooner. Even if you are an accomplished software developer, the pre-configured functions, automation and real-time alerts a tool provides can save time –and most tools give the option to extend capabilities by executing external scripts or applications when needed.
With the volume and variety of Big Data came data repositories beyond the enterprise data warehouse and a move away from a single source of data truth out of necessity. Many BI vendors are building out their data integration functions to create a comprehensive self-service solution for business users – DIY for all your data needs. Data integration is a vital part of how businesses execute. If getting more consistent, faster results using a more flexible staffing model helps address some of your DI pain points, a thoughtful review of your tool options is a good next step. If cost is a primary driver, Excella performed an assessment of the two of the leading open source, community DI tools – Pentaho and Talend.
You can read our review here: https://www.excella.com/services/data-analytics/open-source-data-integration-tool-comparison.