Toggle Menu

Insights > Advanced Data & Analytics > 9 Topics to Include in a Data Integration Bootcamp

9 Topics to Include in a Data Integration Bootcamp

ETL Developer, Data Integration Developer, Data Engineer – whatever you call the role, in our data-driven society there is increasing demand for people who can design, build and manage data workflows. While there are many new courses popping up at universities and other education institutions targeted at data science skillsets, these are often heavy on statistical analysis […]

By

May 24, 2017

ETL Developer, Data Integration Developer, Data Engineer – whatever you call the role, in our data-driven society there is increasing demand for people who can design, build and manage data workflows. While there are many new courses popping up at universities and other education institutions targeted at data science skillsets, these are often heavy on statistical analysis and machine learning techniques while light on building automated data workflows. A reliable method for data delivery is the lifeblood of any analytics or data migration effort.

At Excella, we’ve been looking for programs that provide baseline data integration skills to be able to design and build these data pipelines. We started to explore how we might provide core skill sets for new junior employees whose career goals were focused here. This culminated in the recent introduction of the Excella Data Integration Bootcamp.  The goal of the one-week program is to provide new junior employees with insight into data integration design concepts and tools. A mix of lectures and hands on labs and group exercises, the intent is to set the context for what’s possible.

If this is a career path area that interests you, here’s insight into our program curriculum with the topics we chose for the first iteration of this Bootcamp. This is what we see as the core skillsets for anyone wanting to pursue a role in this in-demand field:

1. An Overview of Modern Analytics Environments

Warehouses, lakes, marts, repositories – there are many different terms used to describe the components of a typical analytics environment. There are also many different tools and technologies – business intelligence tools, data visualization tools, NOSQL platforms, data science tools, data integration tools! This class introduces a high level analytics environment framework as a starting point for any solution design and discusses the different types of data needs and target audiences. From raw data stored in a lake and accessed by advanced users, to cleansed, aggregated data presented in executive dashboards, at Excella we recommend using a blend of proven and new design techniques and tools to deliver a tailored analytics platform for an organization.

2. Dimensional Modeling 101

Yes, this is still relevant!  This class gives an introductory view into Ralph Kimball’s dimensional modeling concepts (aka Star Schemas). Many business intelligence tools use star schema terms and promote similar data structure design concepts in their data mapping layer. Understanding the reasons why data is often split and merged into new forms to support analytics, along with the ability to navigate between a DIMENSION table and a FACT table is an essential building block for anyone designing and building analytics solutions. Our bootcampers have baseline SQL and relational database skills, which are a pre-requisite for this class.

3. Data Science Introduction

It’s a 21st century buzzword, but what does Data Science encompass exactly? This class covers the outline of typical data scientist activities at Excella: from data collection and exploration, applying advanced statistics and machine learning techniques, then presenting complex findings in a compelling visual format. It’s a team effort at Excella – we don’t subscribe to the unicorn data scientist mindset – data integration developers and data visualization experts work with data scientists throughout the delivery cycle.

4. Data Integration Practices 101

Where to start? Excella’s experts review the basic building blocks for building a robust and flexible data pipeline. From the on-boarding of a new data source to the presentation of data to analysts, we cover the key steps (and some of the pitfalls to avoid) when designing and building data workflows.

5. Lab: Hands on Learning in Talend Open Studio for Data Integration

This is a whole day working with one of the popular data integration tools available today – Talend Open Studio for Data Integration. This is a free, open source tool that allows developers to select common data integration tasks and combine these into visual workflows to collect, cleanse, transform and enhance data.  (There is also a paid version that includes vendor support and additional functionality). Leveraging Excella’s internal Data Innovations Lab hosted in Amazon Web Service, bootcampers work with public data sets and build out simple data workflows in Talend taking data from an AWS S3 repository and loading it into a Redshift database table. It’s a chance to put theory into practice for core data integration concepts.

6. Lab: Using GitHub for Source Code Management

Experienced developers know that source code management is a critical for version control and parallel work threads when developing solutions. As DevOps becomes more prevalent, source code management is required when enabling a continuous integration pipeline. This class introduces the reasons why source code management is so important and provides hands-on practice in a popular version control tool – Github.

7. Introducing DevOps

DevOps is intended to increase the communication and collaboration between development and operations teams to produce better outcomes. These could be in the form of increased automation, great efficiency and improved solution quality. Excella’s DevOps Lead delivers this class and outlines the core concepts and compelling reasons for introducing DevOps into solutions delivery.

8. Lab: Python for Analytics

While developer tools are a great baseline for solution delivery, no tool can do everything. There will be times when custom code is needed to extend beyond the tool. Python is a popular programming language used in the analytics space and has many functions to work with data at scale. At Excella we are using Python to deliver initial solution prototypes and get early feedback from end users before self-service tools have been selected and deployed. We also use Python to enable advanced data transformation capabilities by extended data integration tool functionality. This class introduces the Python language and basic applications for delivering analytics.

9. Certified ScrumMaster Training

Excella is recognized as a thought leader in Agile delivery locally and nationally. All bootcampers attend our two day public class to learn the concepts, tools and ceremonies associated with Scrum as one of the Agile frameworks we employ for solution delivery. At the end of the class, attendees take an online test to obtain their Certified ScrumMaster certification.

Automating data pipelines with appropriate error handling and data quality frameworks, plus implementing DevOps concepts like Continuous Integration and Continuous Deployment adds greater efficiency and reliability to the data fuel line for your business. Knowing what’s possible can help junior developers navigate a new career in the evolving and exciting data and analytics landscape. Interested in learning more about at Excella, contact us.

You Might Also Like

Advanced Data & Analytics

Truck Safety Coalition Uses Data to Push for Safer Roads

Did you know that there were 4,290 large-truck related crashes in 2017? The Truck Safety...

Modernization

Building the Next Generation of the Homestretch Client Database with AWS

At Excella, we understand the power of technology and the huge impact it can make...

Digital Service Delivery

Continuous CX Improvement: Between Margins and Moonshots

Something has gone wrong in customer experience (CX). Advanced analytics was supposed to tell us...