(If using Tableau 9+ check this updated article on using LOD calculations to create bins from a measure.)
This post is a continuation of a previous article showing how to create bins from a measure introducing a situation where data blending is a solution. Credit to Richard Leeke for supplying the solution.
In the previous article we created bins from a calculated field and discovered that when we use these bins they can re-calculate and give us results that we were not expecting. To recap I had some data showing an enquiry count per listing per month. I want to sum these enquiries for each listing to put each listing into a bin dependent on the total number of enquiries it had received over the time period.
Once the bins are created I would like 2 pieces of information:
- The number of listings within each enquiry bin
- The monthly distribution of enquiries received per bin (i.e. listings in bin 1 received 20% of enquiries in Month 1, 50% in Month 2 and the remaining 30% in Month 3; Listings in bin 2 received 50% of enquiries in Month 1, 25% in Month 2 and 25% in Month 3; etc.)
Number 1 is quite straightforward and was demonstrated in the previous post while I showed how part 2 is far from simple due to the calculated fields recalculating when I attempt to partition the enquiry counts per bin by month. This is because our bins are a measure and not a dimension. To do part 2 we need to use data blending, which is like doing a SQL self join.
To blend (join) data sets first we need at least 2 data connections in the workbook. For this particular task I want to self join – in other words join the same data set to itself. The reason for this is because we need to assign a fixed enquiry count bin for each ListingId which will not re-calculate. These bins are created in 1 dataset. The calculations/data manipulations are then carried out in the other dataset.
The first thing to do is right click in the area where the Data connections are stored (top left of your workbook) and select Duplicate. Now you’ll have 2 copies of the same dataset in your workbook.
Next step is to alter the relationship between the 2 datasets – the 2 datasets should be connected by ListingID only and not both ListingID and Month, which is the default join. The Relationships are set in the Data menu.
The Custom option needs to be selected and the Month, which is entered as a default join, has to be removed for this to work.
Using the 2 Datasets
As mentioned previously the reason for joining the 2 datasets is to make the bins static in 1 of the datasets and allowing data manipulation in the other dataset without affecting the bins. The first thing we need to do is take the EnqCountBins pill from dataset 1 and place it on the Columns shelf. Then we will use dataset 2 for everything else.
Note the orange tick mark against the EnqCountBins pill – this shows it’s from a different dataset to the other pills displayed on the sheet
This chart looks exactly the same as when only using 1 dataset – but the difference is we can now drag the Month pill into the Colour shelf to split the enquiry count by month of enquiry.
The next challenge is to show these values as a %. Unfortunately none of the quick table calculations work so we need to create the calculation.
For each bin we want to know 2 things:
- How many enquiries happened in total within that bin
- How many enquiries happened each month in each bin
This means using the Window_Sum table calculation, which just sums everything in a partition. This whitepaper on the Tableau site is useful should you want more information on table calculations and partitions.
Number 1, the number of enquiries in total within each bin, is relatively simple. The formula is WINDOW_SUM(SUM(EnquiryCount)) and I have called the field EnqCountPerBin. By default this partitions across the table – in other words by each bin, which is exactly what is wanted.
Number 2, the number of enquiries in each month in each bin, is more complicated. We need to partition by month and also by bin. For this we also use the FIRST function which if I’m being honest I don’t fully understand, Rickard Leeke supplied the formula and I have copied it. I have called the field EnqCountFormMonth and the formula is:
IF (FIRST()==0) THEN
There’s an explanation of the above formula at this post, with the explanation being “That just returns the result for the first row in the partition and NULL for all other rows”
Unfortunately there’s more to do than just dragging and dropping the calculated fields. To begin drag the EnqCountForMonth pill to the Rows shelf – depending on the settings of AGG(EnqCountBins) this could give you a weird looking graph. If you are seeing weird results first thing to do is right click on the AGG(EnqCountsBins) pillnd make sure the option ‘Ignore in Table Calculations’ is NOT selected.
Next thing is to check the EnqCountForMonth is calculating along the correct partitions. Right click into the pill and set it to compute along the ListingId so the description is as follows:
If the description isn’t exactly like this check the ‘Ignore in Table Calculations’ setting again.
Next step is to test the EnqCountPerBin is working properly. Drag this pill into the Label section and it should show the total enquiry count for the bin duplicated for each month – but it won’t until the way it calculates along the partitions is altered. Again right click into the pill and select Compute Using Advanced. Make sure the setings of this pill are as follows:
If all has gone well the chart will now look like this:
The next step is to create a calculation using these values to get the % of enquiries per month per bin, so each bin will equal 100%. I’m calling the calculated field PercentPerMonthPerBin, in effect it’s a Percent of Total built in table calculation but this visualisation is too complex for the in-built table calculations to work.
The formula is: [EnqCountForMonth]/[EnqCountPerBin]. Drag the PercentPerMonthPerBin pill to the rows shelf in place of EnqCountForMonth and the visualisation is almost complete. Just format the PercentPerMonthPerBin pill as a %, remove the label and the chart is now complete. It should look like this: