ETL Developer, Data Integration Developer, Data Engineer – whatever you call the role, in our data-driven society there is increasing demand for people who can design, build and manage data workflows. While there are many new courses popping up at universities and other education institutions targeted at data science skillsets, these are often heavy on statistical analysis […]
ETL Developer, Data Integration Developer, Data Engineer – whatever you call the role, in our data-driven society there is increasing demand for people who can design, build and manage data workflows. While there are many new courses popping up at universities and other education institutions targeted at data science skillsets, these are often heavy on statistical analysis and machine learning techniques while light on building automated data workflows. A reliable method for data delivery is the lifeblood of any analytics or data migration effort.
At Excella, we’ve been looking for programs that provide baseline data integration skills to be able to design and build these data pipelines. We started to explore how we might provide core skill sets for new junior employees whose career goals were focused here. This culminated in the recent introduction of the Excella Data Integration Bootcamp. The goal of the one-week program is to provide new junior employees with insight into data integration design concepts and tools. A mix of lectures and hands on labs and group exercises, the intent is to set the context for what’s possible.
If this is a career path area that interests you, here’s insight into our program curriculum with the topics we chose for the first iteration of this Bootcamp. This is what we see as the core skillsets for anyone wanting to pursue a role in this in-demand field:
Warehouses, lakes, marts, repositories – there are many different terms used to describe the components of a typical analytics environment. There are also many different tools and technologies – business intelligence tools, data visualization tools, NOSQL platforms, data science tools, data integration tools! This class introduces a high level analytics environment framework as a starting point for any solution design and discusses the different types of data needs and target audiences. From raw data stored in a lake and accessed by advanced users, to cleansed, aggregated data presented in executive dashboards, at Excella we recommend using a blend of proven and new design techniques and tools to deliver a tailored analytics platform for an organization.
Yes, this is still relevant! This class gives an introductory view into Ralph Kimball’s dimensional modeling concepts (aka Star Schemas). Many business intelligence tools use star schema terms and promote similar data structure design concepts in their data mapping layer. Understanding the reasons why data is often split and merged into new forms to support analytics, along with the ability to navigate between a DIMENSION table and a FACT table is an essential building block for anyone designing and building analytics solutions. Our bootcampers have baseline SQL and relational database skills, which are a pre-requisite for this class.
It’s a 21st century buzzword, but what does Data Science encompass exactly? This class covers the outline of typical data scientist activities at Excella: from data collection and exploration, applying advanced statistics and machine learning techniques, then presenting complex findings in a compelling visual format. It’s a team effort at Excella – we don’t subscribe to the unicorn data scientist mindset – data integration developers and data visualization experts work with data scientists throughout the delivery cycle.
Where to start? Excella’s experts review the basic building blocks for building a robust and flexible data pipeline. From the on-boarding of a new data source to the presentation of data to analysts, we cover the key steps (and some of the pitfalls to avoid) when designing and building data workflows.
This is a whole day working with one of the popular data integration tools available today – Talend Open Studio for Data Integration. This is a free, open source tool that allows developers to select common data integration tasks and combine these into visual workflows to collect, cleanse, transform and enhance data. (There is also a paid version that includes vendor support and additional functionality). Leveraging Excella’s internal Data Innovations Lab hosted in Amazon Web Service, bootcampers work with public data sets and build out simple data workflows in Talend taking data from an AWS S3 repository and loading it into a Redshift database table. It’s a chance to put theory into practice for core data integration concepts.
Experienced developers know that source code management is a critical for version control and parallel work threads when developing solutions. As DevOps becomes more prevalent, source code management is required when enabling a continuous integration pipeline. This class introduces the reasons why source code management is so important and provides hands-on practice in a popular version control tool – Github.
DevOps is intended to increase the communication and collaboration between development and operations teams to produce better outcomes. These could be in the form of increased automation, great efficiency and improved solution quality. Excella’s DevOps Lead delivers this class and outlines the core concepts and compelling reasons for introducing DevOps into solutions delivery.
While developer tools are a great baseline for solution delivery, no tool can do everything. There will be times when custom code is needed to extend beyond the tool. Python is a popular programming language used in the analytics space and has many functions to work with data at scale. At Excella we are using Python to deliver initial solution prototypes and get early feedback from end users before self-service tools have been selected and deployed. We also use Python to enable advanced data transformation capabilities by extended data integration tool functionality. This class introduces the Python language and basic applications for delivering analytics.
Excella is recognized as a thought leader in Agile delivery locally and nationally. All bootcampers attend our two day public class to learn the concepts, tools and ceremonies associated with Scrum as one of the Agile frameworks we employ for solution delivery. At the end of the class, attendees take an online test to obtain their Certified ScrumMaster certification.
Automating data pipelines with appropriate error handling and data quality frameworks, plus implementing DevOps concepts like Continuous Integration and Continuous Deployment adds greater efficiency and reliability to the data fuel line for your business. Knowing what’s possible can help junior developers navigate a new career in the evolving and exciting data and analytics landscape. Interested in learning more about at Excella, contact us.