Your company has invested in data science; you’ve created data teams, invested in expensive data scientists and tools, and set your goals. So why isn’t it working? Chances are it may be how you are organizing your data team. Inevitably, if you try to build a data science team and/or project within a silo, isolated […]
Your company has invested in data science; you’ve created data teams, invested in expensive data scientists and tools, and set your goals. So why isn’t it working? Chances are it may be how you are organizing your data team.
Inevitably, if you try to build a data science team and/or project within a silo, isolated from the rest of the company, you are bound to see limited results. More often than not, this is exactly what happens – analytics is seen as an “add-on” to current work, instead of a key part of the integrated engineering ecosystem. How can we expect to have successful data science projects with such separation? Each team having it’s own incentives and deadlines, working in parallel but never really reaching a pivotal integration point.
You see – data science is a greedy business by nature; ask us what we need, and we’ll inevitably ask for more data, more computing speed, or more time to research. It’s this fundamental nature of data science that leads so many to misunderstand the investment in data science, and why many data science related technologies are trudging through Gartner’s “Trough of Disillusionment.”
Many of the pitfalls of data science can be remedied by borrowing from a fellow practice – DevOps. I’ve seen many definitions of DevOps from around the internet, but perhaps my favorite is:
DevOps is the practice of operations and development engineers participating together in the entire service lifecycle, from design through the development process to production support.
I believe there are two takeaways here; cross-functionality and integrated delivery. DevOps, unlike data science, is inherently agile in nature – springing from an Agile-ization of traditional software engineering and infrastructure work. Fast iterations and deployments are built into the DevOps mindset. What do these seemingly disparate fields have in common? Both are becoming increasingly essential to running the foundation of modern solutions, and in reality, are well complemented towards each other. To build a modern analytics ecosystem, we need to unify data science and DevOps processes.
In traditional IT environments teams can have different priorities. For data scientists, this can be a disaster. Has anyone else seen a stack of non-deployable Jupyter Notebooks piling up? To guard against these kind of problems, the core of the DevOps mentality can be applied:
At Excella, we’ve had great success with this and in embedding data scientists and DevOps engineers, along with data engineers in cross-functional teams. With it, I’ve noticed several improvements in the data science workflow:
For 2018, let’s all be stronger together. Be sure to unify your technical teams; your products and your employees will be stronger for it!