I’ve had a career in the Data & Analytics space for over 20 years and previously the words “datamarts”, “data warehouse”, or “business intelligence” would draw blank stares from most folks. Today, if I use the term “Big Data” to describe my vocation, there is often a glimmer of recognition courtesy of recent news headlines and an increasing realization of how much data exists about all of us. But what exactly does the term mean and why is it so popular now?
Structured vs Unstructured Data
Let’s be clear – large data volumes are not new. Despite the name, there is more to Big Data than volume. For decades companies have been processing millions, billions, and trillions of data points on a regular basis to make sure you get your phone bill or log your banking transactions, then later analyzing it en masse. Data from these systems is typically tabular and easily stored in rows and columns. For a visual, think of an Excel worksheet with each individual data point stored in a cell assigned to a column and row – this is also known as structured data; relational databases have been built over the last 3 decades to store and manage this type of data.
Now think of data displayed on a website – it can include paragraphs of text, images, video, and hyperlinks. If you were to organize these data examples on the same Excel worksheet how would you do it – and how easy would it be to make sense of it at volume? Translating this ‘unstructured data’ and populating it into the inflexible data models favored by the relational databases is difficult and expensive. This is where Big Data technology comes to the forefront. Big Data platforms offer alternate ways to store your data and provide data models that are better suited to unstructured data types.
Why Not House Everything on Hadoop or NOSQL?
If you’re starting from scratch without significant infrastructure in a data warehouse or datamarts, putting everything on a Big Data environment can make economic sense. Hadoop and NOSQL platforms are designed to store a combination of structured and unstructured data sources. Many Big Data tools are open source and low cost/free and can be deployed on inexpensive hardware.
Sounds good, right? Well, don’t write off your data warehouse yet. First, consider the following:
Depending on your tolerance for risk and the skills available within your team, moving all of your analytics workload into Hadoop or NOSQL may not make economic sense. Industry experts recommend a mix of platforms to support different data source types that can be unified via data integration processes and then presented cohesively within a business intelligence tool.
According to industry estimates, 80% of the data that exists today is unstructured and considered ‘machine-unfriendly’ because it doesn’t fit into the structured model used by relational databases. Cue big data solutions like Hadoop and NOSQL platforms that complement (not supplant) your existing data infrastructure.
Interested in learning more?
Comprehension of your data landscape and which data platforms, tools, and practices are the best fit for your organization are the first steps to implementing an effective data solution (Big Data or not). Stay tuned for future posts on how to manage and get the most out of your Big Data.