Tableau recently promoted its Maestro beta program to a licensed data cleansing and modification tool called Prep. Tableau Prep comes bundled with Tableau Desktop in their new Creator license. The purpose of bundling the two products together is to speed up the time it takes to import clean data (and thereby speeding up the time to data analysis) by simplifying the data cleansing process.
A quick tip: With Tableau, sometimes the shininess of a new feature (or in this case, a whole new tool) can obscure the actual need for the functionality. If your dataset is already clean, or you have simple data cleanup needs (like renaming fields or changing data types) the legacy Tableau Data Source screen is still the fastest way for you to proceed. With that said, Tableau Prep does simplify much of the manual cleansing process that the legacy Data Source screen has available.
Tableau Prep’s Top Features:
This is why Tableau Prep exists. If you’ve used Tableau Desktop in the past and you wanted to clean your data, you either cleaned it in your source system (in the original database or Excel spreadsheet for example) or you used a combination of calculated fields and groups directly in Tableau. Prep allows you to handle your data cleanup directly and makes it very simple to see which fields are affected.
Providing an Overview of Your Data Prior to Analyzing It
Before you even begin analyzing your data, Tableau Prep provides you with an overview of all of your fields as well as data histograms so you can view the frequency of the values in each field. By simply sorting any of these fields you can see which values are most frequent or infrequent within your dataset. A count of the total distinct records in each field is provided as well.
Documenting Your Cleanup Process
If you have previously used Tableau’s legacy data preparation screen, it was up to you to document your process in a tool outside of Tableau. Prep removes this grunt work and provides a clean and intuitive flow diagram of each step in your process. Not only does this allow you to keep track of what you have done, it allows others to pick up your Prep flow and quickly understand it too.
By saving a flow in Prep, you will be able to use the same process for data cleanup in the future. While this still requires manual execution, running the flow only requires the click of a button.
Tableau Prep’s Limitations:
Tableau Prep is not an ETL tool that you can automate to run without your interaction. It will not replace your current ETL process or stored procedures. For now, the only way to execute a Tableau Prep flow is directly in the tool itself.
You cannot set a schedule for executing your Prep flows. Unfortunately, this manual execution process means that it is not viable for reports that require regular data updates.
Prep allows exporting the cleansed dataset as an extract or publishing directly to Tableau Server. However, if you want to use the data outside of Tableau, the only current option for export is as a csv.
While there are no published data volume limitations, large datasets are not handled well by Prep. When connecting to a dataset of 10M records, the data field histograms did not populate. Simply converting two fields from integers to dates and attempting to run the flow caused Prep to fail and return a message that stated, “An error has occurred while running the flow”. Until Tableau provides additional guidance, Prep should be primarily used for smaller datasets.
As it exists now, there are limited use cases for using Tableau Prep without using Tableau Desktop. There are more mature and feature-rich data preparation tools out there that allow you to automate the data clean-up process, export the data however you would like, and schedule that export as well.
However, Tableau Prep is a great improvement over the data preparation capabilities that previously existed in Tableau Desktop. By reducing the amount of manual work required in cleaning up data prior to importing it into Tableau, as well as having Prep bundled directly with Desktop, users will be able to focus more time on data analysis and less time on data clean-up.