It is an exciting time to be in Data Integration and Solutions Architecture. We’re reaching the point where Big Data is becoming more common and database management software (DBMS) is becoming ever more specialized toward particular use cases. This has provided a multitude of options to solve a specific business problem, but it can sometimes […]
It is an exciting time to be in Data Integration and Solutions Architecture. We’re reaching the point where Big Data is becoming more common and database management software (DBMS) is becoming ever more specialized toward particular use cases. This has provided a multitude of options to solve a specific business problem, but it can sometimes lead to overload when you’re swamped with the massive array of options in the market today.
Here are my top considerations that you should include as you are evaluating a data storage solution:
And try to define the basics of the data model, then allow this to drive you towards a database platform category. As an example, when Craigslist was putting together an archive database to allow the company to track changes in posts using their standard relational database, they were feeling restricted by the highly structured data model. Because of this, they turned to a data model based on a JSON document so that they could add unlimited tags and multiple historical lines. The switch to a document repository allowed them to reduce their engineering and maintenance costs while enabling real-time analytics.
Including planned expansion with web-based applications, to understand if real-time access to high volume data tables is required. This type of capability can lead you to select a Key-value or Column Store database instead of a traditional RDMS. The Key-value is ideal for a lightning fast response but isn’t good for a complex set of tables. In contrast, a Column Store is incredibly fast with finding a particular row while having a large number of relational style fields to match.
While many Relational DBMS are optimized for storage, the separation of data can lead to times where it can be slow to pull data from the system because of the complicated joins between tables. A desire to handle these complicated joins very quickly could lead to selecting a tool in the Graph category of databases.
Finally, it’s important to select products that have a certain critical mass in the marketplace if you are using the database platform for a mission critical application. A large user base will provide a community that can answer questions, in addition to providing a talent pool for development resources. In order to stay current on the top database platforms and their relative positions in the market, I use the data collection at http://db-engines.com/en/. It tracks the overall popularity for each DBMS, in addition to the relative position within each sub-classification, and trends towards open source software. When the time comes to make the final decision, it may be helpful to get professional vendor assistance to guide you through the details and help with the initial data modeling and migration.
When I’m talking with clients about how they are using their Big Data environments, the...