Sniffing Out Technical Debt in Machine Learning Solutions With the democratization and open-sourcing of machine learning tools, there has been an explosion of interest in incorporating machine learning into existing systems, or building stand alone machine learning solutions. However, as Google Researchers Sculley et al. have astutely observed in their recent paper, Hidden Technical Debt […]
With the democratization and open-sourcing of machine learning tools, there has been an explosion of interest in incorporating machine learning into existing systems, or building stand alone machine learning solutions.
However, as Google Researchers Sculley et al. have astutely observed in their recent paper, Hidden Technical Debt in Machine Learning Systems, the widespread adoption and implementation of these tools and techniques is both a blessing and a curse. They allow for increasingly more impressive solutions to long-standing problems in the industry, see Image recognition or Natural Language Processing, but they also bring with them new and unexpected opportunities for “technical debt”. For the uninitiated, “technical debt” is a programming term commonly used as shorthand to refer to problems that arise when programs are built quickly with an emphasis on solving a problem with the most convenient solution rather than the best solution.
In the data science arena, new techniques are developing faster than the existing systems can be updated to incorporate the old ones, and you likely do not have time to dive into every white paper put out by the Googles and Facebooks and internalize each one of their main lessons. This post is an attempt to summarize Sculley et al.’s original paper for easy consumption.
The tech industry often refers to “smells” as easily identifiable coding patterns that usually indicate that technical debt is accruing nearby. In order to help better understand the problems, I’ll summarize each of the papers’ main takeaways as a “smell” along with recommendations for mitigation.
As Sculley et al. point out “[machine learning systems’s technical debt] may be difficult to detect because it exists at the system level rather than the code level.” In other words, issues caused by the hasty deployment of these systems often arise because of factors external (or in addition) to how the code was written.
Here are five common symptoms that suggest your model might be starting to smell funny:
Sculley et al. go into much greater detail on potential causes of technical debt, and are much more thorough in their recommendations for how to proactively address them. If you still have questions, read the full article yourself, discuss it with your team, and above all be vigilant when developing and depending upon a machine learning solution for your business decisions!
What is data literacy? Data literacy isn’t all that different from literacy in any other...
What is Data Analysis? Data analysis is the process of turning raw data into actionable...