Tableau be careful using an Incremental Refresh

//

With a very large data extract in Tableau it can be better to use an incremental extract refresh.

On the Tableau Server there can be multiple different Tableau Sites. The developers of each site share the same server resource to update the data extracts/hyper files.

Although site admins only have visibility into their Tableau site extract refreshes, other sites will be updating their data on the same server, potentially at the same time. Consequently, many sites can be attempting to update their data sources at the same time, adversely impacting each other, but with none knowing about it.

Should your data extract be sat in the refresh queue, awaiting update for a period without any apparent reason, this is likely to be caused by others consuming the extract resource.

Server admins have visibility into everything happening on the extract server; some server admins are happy to share the extract server activity with site admins, others are less willing.

Therefore, with long running extract refreshes, keep in mind your actions can adversely impact others, and an aim should be for extracts to update as quickly as possible.

Therefore, sometimes in Tableau an incremental update is a better choice than a full update.

What is an incremental update in Tableau?

When a data source is slow to refresh, negatively impacting other users of the server, an incremental refresh could be a better option.

An incremental update, using a user-specified key, only extracts new records from that specified point.

set up a Tableau extract refresh as an incremental refresh using the Order Date as the key field
Using the Order Date as the key field in an incremental refresh

For example, with web click data, the numbers of records can easily be in the billions, which takes time to pull into Tableau. To return this volume of records can block the Tableau data server for a significant period of time.

Situations such as web traffic are ideal for an incremental data refresh.

Tell Tableau which is the key field, the field it will use to identify which are new records. Ideally use an ID or a Date field.

As part of the incremental update Tableau scans the relevant field and checks the current extract for the largest value

Only those data source records having a key field greater than the maximum in the current Tableau extract are imported.

For example, if using an ID, if the largest ID in the current Tableau extract is 489, Tableau will incrementally update with all records having an ID of 490 and above.

An incremental refresh is a great option to accelerate the time and frequency of data extract updates. Whether this is practical depends on the data source structure.

When to use an incremental extract refresh

Before switching to an incremental refresh first understand how the data source populates.

An article on the Tableau site about Optimising Incremental Refreshes says “Updates to existing data and deletions are only included in full refreshes

Data warehouses are a good example where it is important to understand the data structure.

Some data warehouses update / alter existing records rather than inserting a new record when their source data changes. This would rule out using an incremental refresh.

Back to the previous example, using the ID, if record ID 450 changes retrospectively this change would not reflect after an incremental refresh.

Similarly, if record ID 450 is deleted, this would not delete from the Tableau data source.

Incremental refreshes only add data and only look forward.

If reporting data where the source data can alter historically AND your Tableau data needs to reflect the changes then a Full Refresh is the option.

The incremental refresh would miss the alterations / deletions and only insert new records.

When deciding the Tableau data extract refresh type check the data source does not change historically.

Alternatively, if it can change historically, check it’s not important to immediately reflect these changes in the data.

Tableau allows an extract to refresh both Incrementally and Fully.

Therefore, dependent on the requirements, on working days, when it’s important not to clog up the Tableau server, run daily incremental updates.

On the weekend, when server performance may matter less, carry out full extract refreshes to capture historic changes.

Leave a Comment