When I’m talking with clients about how they are using their Big Data environments, the common response is “we’ve given our Data Science Team access”. These statistical boffins are well prepared for the challenges of analyzing unstructured, semi-structured and structured data at volume.
In contrast, I see a plethora of roles from finance, marketing, sales and other business departments using off-the-shelf BI tools to access data in existing data warehouses and datamarts – and making data driven decisions. No statistics or computer science degree required. Three decades of honing data integration techniques and the evolution of tools and platforms have made structured data available to business masses. A key part of this enablement was presenting one cohesive view of data across a department or enterprise – the data warehouse.
The EDW concept is simple and common sense – merge disparate data sources together into one central repository – a single version of the truth for the enterprise. In practice, building an EDW was often a mammoth effort; viewed as a necessary step, but not undertaken lightly. The Data Warehousing Institute (TDWI) declared in it’s BI Maturity Model that an organization did not become a BI ‘Adult’ until the perilous ‘Chasm’ of challenges had been crossed and a functional EDW established. To those of you who successfully crossed the Chasm – congratulations, it’s a feat worth celebrating!
Today, organizations are often supplementing their EDW with a Hadoop environment and adding a few truck loads of unstructured data from varying sources into it. Many of you want access to this new, promising data ALONGSIDE your current information AND without having to be a data scientist to accomplish this.
How to Survive (and Thrive) in a New World of Data
This is the big question we’re focused on at Excella – how do you best access data from unstructured/semi-structured sources (aka Big Data) and structured sources (the data you have in your Warehouse or Marts) with the objective of providing a coherent view across ALL your data sources?
Here’s the options we’re exploring at Excella:
- Creating structured (summary) data from unstructured/semi-structured data and adding this into an existing Data Warehouse – when does this approach work and when does it not work?
- Moving everything into Hadoop – is it worth it?
- Using an open source NOSQL platform as the intermediary for structured and unstructured workloads (instead of Hadoop).
- Using data virtualization or data federation tools to bridge the gap, sourcing data on demand and leaving it stored disparately.
Stay tuned as we publish our observations in the coming weeks and become a subscriber to get access to all Excella blog posts. Have an alternate option you’d like us to prove out? Contact us!