Do you sometimes find Tableau data extracts are slow to refresh on the Tableau Server? Is your extract taking too long and not refreshing?
Usually this is caused by the database / the query pulling from the database.
Simply put, if it’s slow to run on the SQL Server it will be slow to refresh an extract on the Tableau Server.
However, it also appears that Tableau can be the cause of slow updating in some circumstances.
Why is my Tableau extract refresh slow?
We have identified that a slow refresh on the database server would also mean a slow refresh on the Tableau server.
But what if the refresh is quick on the database server?
There are also things on the Tableau Server that can cause extracts updates to be slow.
I have just come across this information on extract performance by one of the Tableau staff on the Tableau forum. I think it is very useful so I’ve copied it in full to the bottom of this article.
What causes Tableau to be slow to refresh an extract?
Pulling out some of the key points of the text, slow extract refreshes on the Tableau side are caused by:
- Wide data sources – a lot of columns slow down the data pull into Tableau
- Network latency – data has to travel across a network; the distance to travel and bandwidth will dictate the speed
- Data source content – Tableau sorts and optimises all of the fields in the data sources; leading to faster usage on the front end
How can I speed up the Tableau extract refresh?
Now we have ascertained the cause of slow refresh, we can understand how to make it faster.
If the query is slow on the database server, speak to a DBA about altering the indexes. This can make a huge difference to query performance.
Remove unnecessary columns. For example you may pull in multiple key fields from different tables that you won’t use in any dashboards. Any fields you are sure won’t be used aren’t needed, so lose them.
Reduce the number of rows. Perhaps there is no need to pull in data from 10 years ago. Only pull in data that is needed on a regular basis. If there’s a question about 10 year old data every few months, then answer that question on the ad-hoc basis. Don’t slow down the daily data refresh to speed up answering occasional ad-hoc requests.
Check the data type. Fields of free text can be slow; they are often large in terms of bytes (so slow to pull across a network), can contain “dodgy” characters that confuse Tableau, and are very difficult for Tableau to optimise. Therefore, lose them if possible. Boolean and Numeric data types are the fastest to process.
See if the Tableau Server can reside close to the main data servers, reducing the distance data has to travel.
Pre-aggregate the data. The smaller the data set, the quicker it will be to refresh the extract.
You see there are a number of things to look at to speed up extract refreshes. If your dashboards are slow there are also a number of things you can check to speed up dashboard performance.
The copied text about slow extract refreshes:
“When thinking about extracts, the “width” of the final extract (all columns in the extract) is often as, and sometimes more important than the height/number of rows. So keep that in mind for the rest of the “commentary”.
I just pulled ~5M rows from SQL (via a 10-table join) and it took 5 min, 10 sec. If the output isn’t too wide and your database is responsive, ~1M rows / minute is fairly standard.
If you have a particularly “fast” data source, you can do better. For example, when extracting rows from a huge local extract with not too many columns, I was doing about 80M an hour, or ~1.33M / minute. Yay!
Flip this equation – I had a client pulling ~190M rows from a SQL Server via a fairly ugly query – it also returned 100+ columns, some of which contained extremely unique values (which makes for slower sorting). It took these folks about 8 hours to return results from SQL, and then an additional 10+ hours to process/sort because of the super-high cardinality in some of their fields – about 174K rows / min. Boo!
So…I’d say that 11M rows in 2-3 hours is too slow. The things you should look at:
- How long does it take after you start the extract process for SQL Server to start returning records (based on the Tableau dialog that says “X rows retrieved so far”)? This is pure “SQL think time”
- How long does it take to return those 11M rows? This is a combination of SQL Processing time and latency in terms of returning 11M rows across your network
- How long does Tableau take in the “Optimizing” and “Sorting” phases – this is pure Tableau time
If you see the lion’s share of your time being sucked up in bullets 1 & 2, you should see if you can optimize SQL – look at the tables/query you’re using and add appropriate indexes, etc. etc.
If Tableau is taking a long time to optimize/sort, you may have lots of columns and/or columns that contain very unique values which take longer to sort. Consider removing those fields from your data source before you create the extract.”
Use the power of Tableau Server to speed up a local data extract refresh
On a related theme, there is a way to use Tableau Server to accelerate creating a Tableau data extract.
Note this only works for database connections, where it’s possible to create a view (a query in Access) or a stored procedure.
In the database, create a view or stored procedure, selecting the fields to load into the Tableau extract.
But, instead of selecting all, in the view / procedure, only SELECT TOP 1.
In Tableau Desktop, connect to the database and write the query to retrieve the data from the view / stored procedure. For example:
SELECT * FROM vw_MyTableauData
Once connected to the view / stored procedure in Tableau Desktop, extract the data, creating a hyper file.
That creates the data extract structure with the contents of the view / procedure – i.e. the 1 record.
Publish the data extract to the Tableau Server.
Next, in the database, edit the view / stored procedure, removing the TOP 1 from the SELECT statement, so it now returns everything.
Finally, within Tableau Server, refresh the extract. It will pull from the updated view / stored procedure, therefore returning all records.
This is far quicker than pulling all of the data in Tableau Desktop and publishing the full data source.
Utilise the power of Tableau Server when possible, it’s almost always more powerful than your desktop machine.