Incrementally Add to a Tableau Data Extract With Dynamic Parameters

The problem with Tableau’s incremental refresh feature is that it still requests all the data from the database, before discarding everything old and keeping everything new. If you get billed for the amount of data you request from the database, or have a complex SQL statement with a lot of calculations that takes a long time to run, or a relatively small amount of memory or disk space on the database server, this can be a big problem.

For instance, I frequently need to use data from a Google Analytics to BiqQuery export. Each day’s data is stored in a different table, and any given query can only stitch together a maximum of 1,000 tables, so if I need to import three or more years of data into Tableau, I’m in trouble. Plus, I’m charged for every query (after hitting a certain monthly quota), so I don’t want to request years of data and discard most of it every time I update the Tableau extract.

But thanks to Tableau 2020.1’s new dynamic parameters, we can write custom Tableau SQL queries that only request the most recent data from the database. We can even write queries that only request a small slice of data at a time, build up to the full dataset, and then request and append just the latest data going forward.

Requirements

Since dynamic parameters were introduced in Tableau 2020.1, you’ll need to be using that version or later. Likewise, if you’re using Tableau Server, you’ll need version 2020.1 or later.

You’ll also need a dataset that has a column that only ever increases (or only ever decreases, I suppose) as data gets added. A date/time field is an obvious example, like the date and time that a user lands on your website, but the internal ID of a new sale or user could work too, as long as each new ID would go after each previous one when placed in numeric or alphabetical order.

Finally, it’s important that once data enters your database, it never changes. The nature of this incremental refresh means that old data never gets re-examined; if a user gets deleted or a sale gets refunded and the row gets deleted from the database, but it’s already loaded into your Tableau extract, it will stay there.

For this example, I’ve made a simple demo table for an imaginary sales database. It has two fields:

  • date - datetime - the date when the sale was made
  • sale_id - integer - a random integer

Loading Everything Since the Last Refresh

First, connect to your database in Tableau. Create a New Custom SQL query:

SELECT
	date,
	sale_id
FROM 
	sales
WHERE 
	date >= '2020-01-01'

(My made-up data only goes back to the first of January, so this will get everything.)

Hit Preview Results to make sure that this query will work.

Tableau’s Preview Results window

Now, select '2020-01-01' (including the quotes) and click Insert Parameter > Create a New Parameter. Call it Latest Data Date, set the Data type to Date, and set the Current value to the earliest date you want your data to range from - in my case, 01/01/2020. Press OK, and then OK again.

The Latest Data Date static parameter

Load a worksheet. Create a new Calculated Field, call it [Calc] Latest Data Date, and give it this calculation:

{FIXED: MAX([date])}

This {FIXED} syntax creates a Level of Detail Expression; it means that we’re taking the maximum value of date regardless of what level of detail is in the currently open view.

This in turn means that we can use it to create a dynamic parameter. Edit your Latest Data Date parameter (not the calculated field we just created), and change Value when workbook opens to [Calc] Latest Data Date. Note that the Current value changes.

The Latest Data Date dynamic parameter

Edit your data source, change the Connection to Extract, click Edit, and tick Incremental refresh. Set Identify new rows using column to date, and press OK.

Save the extract and the workbook. From now on, when you open the workbook, all you need to do is press Data > Refresh all data extracts and Tableau will request everything since the last time the data was updated and append it to the extract.

Loading Data Into an Extract in Slices

But what if my database system’s resources are so low that it can’t handle a request of more than one week’s worth of data at a time? Let’s look at how to request the data in week-long slices.

The steps are similar. Create the custom SQL query, but add another date condition:

SELECT
	date,
	sale_id
FROM 
	sales
WHERE 
	date >= '2020-01-01'
AND	date < '2020-01-01' + INTERVAL '7 DAYS'

(This syntax works in PostgreSQL, but in other SQL variants you may have to add a week in a different way, like by using DATEADD().)

Again, create a Latest Data Date parameter to replace '2020-01-01' (both of them, this time). When you get these initial results, they’ll cover just a single week’s data.

One week’s data in Tableau

Set up the calculated field as above, and set the parameter to use this calculated field when the workbook opens. Make sure your connection uses an extract and that the extract uses an incremental refresh (using column date), then refresh the extract, save it, save your workbook, and close it.

Re-open the workbook. Click Data > Refresh All Extracts and click Refresh.

Two weeks’ data in Tableau

Success!

A Word of Warning

Be very careful not to request a full refresh of your data source if using this technique. If you do, then Tableau will empty the entire extract, and then request a new set of data with the SQL - but the parameter will still be set to the value it had when you opened the workbook, rather than to the earliest date in the dataset.

If you need to do a full refresh, make sure you manually change your parameter’s value to the earliest date you want to go back to first - and remove the parameter’s reference to the calculated field.

Michael James Williams avatar
About Michael James Williams
Michael James Williams is a data analyst, writer, and editor.