Your company has invested in data science; you’ve created data teams, invested in expensive data scientists and tools, and set your goals. So why isn’t it working? Chances are it may be how you are organizing your data team.
Inevitably, if you try to build a data science team and/or project within a silo, isolated from the rest of the company, you are bound to see limited results. More often than not, this is exactly what happens – analytics is seen as an “add-on” to current work, instead of a key part of the integrated engineering ecosystem. How can we expect to have successful data science projects with such separation? Each team having it’s own incentives and deadlines, working in parallel but never really reaching a pivotal integration point.
You see – data science is a greedy business by nature; ask us what we need, and we’ll inevitably ask for more data, more computing speed, or more time to research. It’s this fundamental nature of data science that leads so many to misunderstand the investment in data science, and why many data science related technologies are trudging through Gartner’s “Trough of Disillusionment.”
Many of the pitfalls of data science can be remedied by borrowing from a fellow practice – DevOps. I’ve seen many definitions of DevOps from around the internet, but perhaps my favorite is:
DevOps is the practice of operations and development engineers participating together in the entire service lifecycle, from design through the development process to production support.
I believe there are two takeaways here; cross-functionality and integrated delivery. DevOps, unlike data science, is inherently agile in nature – springing from an Agile-ization of traditional software engineering and infrastructure work. Fast iterations and deployments are built into the DevOps mindset. What do these seemingly disparate fields have in common? Both are becoming increasingly essential to running the foundation of modern solutions, and in reality, are well complemented towards each other. To build a modern analytics ecosystem, we need to unify data science and DevOps processes.
In traditional IT environments teams can have different priorities. For data scientists, this can be a disaster. Has anyone else seen a stack of non-deployable Jupyter Notebooks piling up? To guard against these kind of problems, the core of the DevOps mentality can be applied:
- Align the objectives of all teams working on an individual project or product; we need common, integrated goals instead of disparate ones
- Be fundamentally agile together; allocate resources and integrate features based on measurable valued added for the customer
- Build cross functional teams
At Excella, we’ve had great success with this and in embedding data scientists and DevOps engineers, along with data engineers in cross-functional teams. With it, I’ve noticed several improvements in the data science workflow:
- Increasing Cross-Functional Skills: Often, data scientists are not known for writing code that is practical and reasonable. Integrating your data science and DevOps engineering practices help data scientists think more about critical things such as run time, storage, and overall deployability.
- Developing Self Service Solutions: Data scientists and DevOps engineers can work together to create self-service solutions to deployment problems. When we create reusable scripts for cloud-oriented tasks, for instance, we are increasing everyone’s productivity.
- Making an Environment for Data Science: Data scientists are naturally experimental; the agile engineering environments that DevOps creates develop an atmosphere that is both cheap and safe for data scientists to experiment and iterate on new ideas. In the end, you’ll end up seeing faster delivery from the increased autonomy you’ll create.
For 2018, let’s all be stronger together. Be sure to unify your technical teams; your products and your employees will be stronger for it!