Data blending is a way to combine data in Tableau. Blending provides a quick and simple way to bring information from multiple data sources into a view. It can be misunderstood, but, when using a data blend correctly, it is an efficient way to merge data sources in Tableau.
Many Tableau developers find data blending frustrating. It has a few limitations which aren’t always clear, however, once data blending is understood, it is a valuable part of the Tableau toolkit. Some of the data blending limitations are covered below.
This guide to Tableau data blending covers:
- Tableau data blend vs a join
- Data blending best practice
- How to blend data in Tableau
- Limitations of data blending in Tableau
- The Tableau data blending asterisk
- Filtering when blending data
- Tableau dashboard performance when blending data
Data blending can be very useful, but can also be problematic. When used well it provides a simple way to add additional data to a dashboard. When used incorrectly it can bring down a Tableau Server.
With data extracts, there’s no alternative but to blend data sources. It isn’t possible to join a Tableau data extract to another data source, therefore, when using extracts, data blending is the only option.
I use data blending in some articles on other topics. For example, I used it in the post showing how to create panel charts in Tableau, where the abbreviated State Name was blended in from a different data source.
Also, in the post demonstrating how to do date scaffolding in Tableau, one of the techniques showed to how to scaffold data using a data blend.
It’s also useful when creating filled maps using a shape (kml) file. In this post showing coronavirus in England by local authority, the primary data source is the kml file, and the coronavirus numbers are blended in from a separate data source.
Is a data blend the same as a join?
A very common question about data blending, is whether a blend is the same as a join? Data blending is hard to define. It’s a type of left outer join…but it’s not a proper join. Blending is like a flexible left join, best used to join in measures from another data source – dimensions often give an asterisk instead of an expected value.
A key difference between a data blend and a join is the order it occurs. With a join, the data is joined and then aggregated.
With data blending, the data is aggregated and then joined.
As with a database join, relationships should be exist between some fields, connecting the data sources. Similar to joining TableA.Field ON TableB.Field, for example.
However, the connections between the fields aren’t enforced in the workbook. There’s a chain symbol showing whether the connection is active in the worksheet. If it’s orange then it’s connected; if grey it’s not active.
Also, standard joins happen within a data source. This is a big difference to blending.
Data blending happens within the worksheet, not the data source.
The connecting fields are activated within each worksheet – so the blended data sources can join on different fields in different worksheets within the same Tableau workbook.
Standard joins occur at the record level, while data blending happens after the numbers have aggregated. This is a key difference between a join and a blend. A join joins the records then aggregates; a blend aggregates then joins.
Do the new Tableau Relationships make a data blend obsolete?
The simple answer is “no”. Tableau Relationships don’t make data blending obsolete.
When using published data extracts, which is normal in enterprise environments (it helps create a secure single source of the truth), blending is the only option to bring in additional data. (Apart from altering the published data source, of course.)
Therefore, data blending still has a prominent place in the Tableau ecosystem, and is important for developers to understand.
Recommendations for Tableau data blending
From a high level, as data blending can provide an output that may not be fully expected, try and follow these recommendations:
- Be aware which data source is the Primary source
- Ensure the appropriate data connections are activate in the worksheet
- Set up the joins at a high level, the least granular level possible
- If filtering the view, set the data source containing the filter fields as Primary
- Try and keep the number of blended data sources to a minimum. This is easier to understand and troubleshoot.
- Use data source filters to remove redundant data from the secondary source
- Try and keep secondary data sources small – blending big data extracts will have a big negative performance hit and could bring down your Tableau server
- Joins are case sensitive in a data blend; this can impact joins on text data types
- Use a data blend to bring in single dimension values (per join field) OR measures
How to set up a data blend
Firstly, the Tableau workbook has to connect to at least two data sources.
To set up the relationships, on the menu bar, Data – Edit Blend Relationships.
Tableau will guess if there are related fields, this will be those in the Automatic relationship setting.
Set the blend relationships
When blending data sources in Tableau, it may be necessary to alter the “Automatic” relationships. To set the blend relationships manually, click Custom and amend them.
When data blending with dates, often it will be necessary to change the default relationships.. With dates, the relationship should be set up at the correct “level” in the date hierarchy. The correct date part should join with the corresponding date part.
Defining the relationships doesn’t enforce the relationship. The “joins” are activated within the worksheet by clicking the chain link symbols in the secondary source.
Within the same workbook it’s possible to have the same data sources connected by different join fields in different worksheets.
An orange chain link means that specific join is active within that worksheet. The broken chain shows that join isn’t active within that worksheet.
To switch the connection on or off, click the chain symbol.
Key points about data blending
An important point about data blending, the relationships should always blend at the least-granular level. For example, if you have a file containing annual targets, blend on the year. Even if the annual targets are broken down daily, when reporting at an annual level, blend on the year, not the day.
Avoid blending on ID = ID, especially with large data sources. This can bring down your Tableau Server.
This is one of the key differences between a standard join. With a join, the aim is to join at the most granular level. With a blend it’s the opposite; blend at the least granular level.
If possible, avoid blending a large data set with a large data set, it’s likely the dashboard performance will be bad; potentially unusable.
In addition, the joins are case sensitive. Therefore, if blending text fields, it can be a good idea to convert them to upper case first using the UPPER function.
Also, a data blend is always at the worksheet level. The blend occurs only within that specific worksheet, it isn’t shared across data sources and dashboards. This means a data blend can’t be published as a data source for others to use. The blend applies to that worksheet only.
Primary and Secondary sources in data blending
When blending in Tableau, there is always a primary source and a secondary data source. It is important to understand which is the primary source, it can impact your view.
To reiterate, which data source is primary and which is secondary does matter. As it’s a type of left join, all fields will be included from the primary and related from the secondary.
In a worksheet, the primary source is defined by the field that is first placed on to a worksheet. The data source of the first field becomes the primary data source, and other data sources will be secondary. Even if the worksheet is cleared, if a field was on that worksheet, the primary data source remains set.
To change the primary data source on a worksheet, create a new worksheet and start again.
Data blending example
To better explain Tableau data blending, we can use a simple example. Download this example from Tableau Public and experiment yourself.
Using the sample superstore data source and a dummy file containing a budget for every month/year for every country, we can report sales vs budget. Note the budget numbers are generated using a random number, so probably won’t make sense!
Comparing actuals vs a budget is a common ask, and works very well in a bullet chart. Create a bullet chart, then we can experiment with different joins, filters and formulas to see what works and what doesn’t.
Firstly, pull in the superstore file and the budget file as separate data sources. Next, set up the relationships.
Tableau will assume the relationships, but these aren’t always correct. Especially when blending dates, set up the relationships at the right level.
In this example, the orders are daily and the budget is monthly. Therefore the relationship should join the date parts month to month and year to year. For completeness, also join the MY date parts.
Remember, setting up these relationships doesn’t enforce them throughout the dashboard. Blend relationships are only activated within each individual worksheet.
So, we have the bullet chart, with the sales data being the primary source and blending in the budget numbers data from the secondary source.
To show the monthly sales numbers vs budget, the blend needs activating only between the date fields.
Activating the blend between country and region doesn’t break anything, but also doesn’t add anything – therefore, for this view, the geographic blends are not required.
Switch the dimensions to show Region sales vs budget instead of Year and Month. Now, for this to display the correct budget number, the join between Region OR the join between Country (which rolls up to Region) have to be activated.
With this join, the budget numbers are aggregated up to the Region level, ignoring anything to do with dates. The date component doesn’t need joining for this view. However, if the user is able to filter by date, that relationship would also need activating.
If the Country is being compared against budget, the join needs to be activated against the Country. Activating against the Region only would return the budget at the region level, against each country – which isn’t what we want.
The image below shows the consequence of only joining the Region. The budget shown against each Country is actually the budget of the Region because of the join.
The level of the join fields is very important for 2 reasons:
- To return the correct numbers
- For performance – better to join at a high level and only join when necessary
Care is needed when defining the correct relationship for the view. Take an example where there is a budget for a Country (secondary source), but no sales have happened, meaning the country isn’t in the primary data set. If reporting at the Region level, while activating the relationship at the Country level, this budget would not appear in the budget numbers because that country isn’t in the primary data source.
However, activating the relationship at the Region level would show the correct budget for the region (assuming the country is in a region that’s in the primary data set – e.g. the country has made some sales).
This same problem would also exist for a standard join. The workaround in Tableau is to create a scaffold data source, so all possible combinations of required data are forced to exist.
Limitations of Tableau data blending
The following situations are commonly seen when data blending. There are often difficulties caused by the following:
- Calculations not working with a data blend
- The asterisk when data blending
- Filters not working as expected with a data blend
- View Data only showing “all data” from the primary data source and none from the secondary
- Dashboard performance when data blending
Not all calculations work with a data blend
Some of the standard Tableau formula calculations don’t work with data from a secondary source.
They sometimes give no error message, until dragged into the view, when the pill turns red. Other times the error message can read “Cannot blend the secondary data source because one or more fields use an unsupported aggregation”.
Aggregations not supported by a data blend cause this error. The unsupported aggregations include:
Also, Level Of Detail (LOD) calculations can cause errors with data blending.
For example, if trying to FIX a value from the secondary source within a calculated field in the primary source, it will error. The error message will be:
“All fields must be aggregate or constant when using table calculation functions or fields from multiple data sources”
For example, creating this type of calculated field in the primary data source will error:
As an alternative, build the FIXED calculation into the secondary source as a calculated field. Then reference that calculated field from the secondary source in the primary source:
However, to work, it requires the connected blending fields to appear in the view. This could be the Detail shelf if you don’t want it visible, but without it, the calculation will error.
You can also consider using table calculations, instead of LOD.
The Tableau data blending asterisk
Very common when blending data in Tableau, the asterisk! This happens when bringing in a non-aggregate value – such as a dimension – from the secondary source and there are multiple values per joining field. Tableau doesn’t know which value to show, therefore, the asterisk displays.
For example, if blending data sources on a Region, which is a higher level than Country, and using the Country from the secondary source. If there is only one country in a region, it will return that country name. If there is more than one country in a region in the secondary source data, Tableau can’t aggregate multiple members of a dimension, so will return the asterisk.
In the above example there are multiple countries per region, therefore, Tableau displays the asterisk.
The options to get rid of the asterisk of a Tableau data blend are:
- Make sure there is only 1 member of each dimension per joining field that is placed into the view. Clearly, this isn’t always possible.
- Aggregate the dimension to show only the MIN or MAX value.
- Create a calculated field, with some business logic, to choose which to show in the case of multiple.
- (This is more of a workaround) Use a viz in tooltip to show the members of the dimension instead, coming from a different worksheet. For example, blend in the count of countries from the secondary source, and use a viz in tooltip to list the country names.
Note this isn’t a limitation of a data blend, but the asterisk can confuse those not understanding it. Arguably the asterisk is even a good thing. A similar situation with a standard database join would create duplication in the records, meaning the totals inflate, which is not good.
Filtering with a data blend
How and what a filter does with a data blend depends on 2 things:
- The active joins
- Whether the filter is from the primary or secondary data source
Filtering from the primary data source
It’s preferable to filter from the primary data source. These filters function like “normal” filters. For example, these filters can show “Relevant Values” and can apply to all of the worksheets on a dashboard.
To filter both data sources with a data blend, join the fields. Earlier in this article, I wrote always join at a high level as it gives better performance. Joining on the Region is the example. This is great, until you want to filter on Country or Date. If joining on Region, filtering on Country or Date won’t filter the secondary source as they are not connected on these fields.
Filtering from the secondary data source
It’s possible to filter from the secondary data source, however, it’s not recommended if avoidable. It has limitations. This filters only the data from the secondary source AND the primary source if there are connected fields.
Watch out for unexpected filtering of the primary source.
For example, if the primary and secondary are connected on Region and there is a filter on Country from the secondary source, this will filter the secondary source on the country (as expected), but it will filter the primary source on the connected field, the Region. Clearly, this behaviour is not wanted; it looks to the user they are filtering Country, but they are inadvertently filtering the Region of the Country in the Primary source.
There are other limitations. For example:
- Only Relevant Values doesn’t work with a filter from the secondary source in a data blend
- Null appears as a filter value and there’s no way to get rid of it – often this value doesn’t exist in the secondary source. Even functions to handle null values do nothing in this situation.
If there are items in the secondary source but not in the primary – for example, a country could have a budget but not yet made any sales – if the join includes the Country field, this budget won’t be included. Not joining the Country field would mean the budget is included in the aggregate budget.
To summarise, try and avoid filtering from the secondary source, unless it’s a data source filter.
Use a data source filter on the secondary source
I do recommend using a data source filter on the secondary source, should the data source contain data not needed. For example, perhaps there are certain years or certain business lines that are irrelevant for the dashboard. In which case, remove them at the data source level by using a data source filter.
Removing redundant data reduces the size of the data source and therefore reduce the number of queries…speeding up Tableau dashboard performance.
“View Data” with a data blend
When clicking the View Data symbol from the tooltip, the Summary data contains all fields in the view, including those fields from the secondary source.
However, switch to Full Data and it no longer includes the data from the secondary source. Instead, it only shows data from the primary data source.
Dashboard performance when data blending
When blending larger data sources with, this can have a negative impact on performance. That statement will sometimes be an understatement – if blending large data sources at a granular level, the performance impact can be huge.
Blending data at a granular level using large data sources can even bring down a Tableau Server. Or if not bringing down the server, it can degrade performance for ALL other users of dashboards on the server.
Therefore, be careful with joins and data sources. The impact of getting these wrong can be large without you, as the developer, having any idea of the poor performance all dashboard users are experiencing.
As already mentioned in this article, join the data sources at the highest possible level.
Use Data Blending to create a fixed value without a LOD
A final thing to mention. Sometimes there’s a need to have a constant value, but that constant may sometimes need to change. For example, a constant could be sales from last year, but on 1st Jan each year, “last year” changes. Plus sales figures can alter retroactively for various business reasons.
If this slowly changing constant comes from a data source, it’s possible to blend it into a worksheet. To make sure the value doesn’t change, apart from when the underlying data changes, pull the value from a secondary data source, but make sure there are no active connections between the primary and secondary data sources. No active connections will mean the number becomes a constant.
This can be an alternative to a FIXED calculation, but it will perform better then FIXED with a large data set, meaning the dashboard is quicker.
Used right, data blending is powerful
Data blending is useful to have in your Tableau armoury. I have to confess, it’s not my favourite thing; it’s fraught with complications and I often find the behaviour frustrating. However, sometimes it’s the best way to achieve a desired outcome.