If using Tableau 9+ check this updated article on using LOD calculations to create bins from a measure.

This post is a continuation of a previous article showing how to create bins from a measure introducing a situation where data blending is a solution. Credit to Richard Leeke for supplying the solution.

In the previous article we created bins from a calculated field. We discovered these bins re-calculate when introducing additional fields, therefore giving results that we were not expecting. To recap I had some data showing an enquiry count per listing per month. I want to sum these enquiries for each listing to put each listing into a bin dependent on the total number of enquiries.

There are two pieces of information I would like:

  1. The number of listings within each enquiry bin
  2. The monthly distribution of enquiries received per bin. For example listings in bin 1 received 20% of enquiries in Month 1, 50% in Month 2 and the remaining 30% in Month 3; Listings in bin 2 received 50% of enquiries in Month 1, 25% in Month 2 and 25% in Month 3; etc.

Number 1 is quite simple. The previous post describes it and also shows how #2 is far from simple. This is due to the calculated fields recalculating when I attempt to partition the enquiry counts per bin by month. Our bins are a measure and not a dimension, meaning they are fluid, not static. To do #2 we need to use data blending, which can be similar to a SQL self join.

Data Blending

To blend (join) data sets first we need at least 2 data connections in the workbook. For this particular task I want to Self Join. In other words blend the same data set to itself. This is because we need to assign a fixed enquiry count bin for each ListingId, we don’t want it to re-calculate. Therefore create the bins in one dataset while the calculations are occur out in the other.

The first thing to do is right click in the area where the Data connections are stored (top left of your workbook) and select Duplicate. Now you’ll have two copies of the same data source in your workbook.

Next step is to alter the relationship between the two data sources. The the two data sources should ONLY be connected by ListingID and NOT both ListingID and Month, which is the default join. The Relationships are set in the Data menu.

Select the Custom option and remove the Month, which is entered as a default join.

Using the two Data sources

As mentioned previously the reason for joining the two data sources is to make the bins static in one of the datasets. Thereby allowing data manipulation in the other data source without affecting the bins. The first thing we need to do is take the EnqCountBins pill from dataset 1 and place it on the Columns shelf. We will use dataset 2 for everything else.

       

Note the orange tick mark against the EnqCountBins pill. This indicates it’s from a secondary data source of that sheet. That’s important for this technique to know the data is coming from the intended data source. Sometimes, especially when dealing with identical data sources, it is simple to accidentally use data from the ‘wrong’ data source.

This chart looks exactly the same as when only using 1 data source. The difference is we can now drag the Month pill into the Colour shelf to split the enquiry count by month of enquiry.

The next challenge is to show these values as a %. Unfortunately none of the quick table calculations work so we need to create the calculation.

Create the calculations

For each bin we want to know 2 things:

  1. How many enquiries happened in total within that bin
  2. How many enquiries happened each month in each bin

This means using the Window_Sum table calculation, which sums everything in a partition. This whitepaper on the Tableau site is useful should you want more information on table calculations and partitions.

Number 1, the number of enquiries in total within each bin, is relatively simple:

EnqCountPerBin: WINDOW_SUM(SUM(EnquiryCount))

By default this partitions across the table – in other words by each bin, which is exactly what is wanted.

Number 2, the number of enquiries in each month in each bin, is more complicated. We need to partition by month and also by bin. For this we also use the FIRST function which to ensure only one value is returned per partition. Without FIRST the same value will return multiple times making the chart look strange with multiple values piled upon top of each other.

EnqCountFormMonth:

IF (FIRST()==0) THEN
WINDOW_SUM(SUM([EnquiryCount]))
END

There’s an explanation of the above formula at this post, with the explanation being “That just returns the result for the first row in the partition and NULL for all other rows”

Increase the complexity – set the calculation partitions

Unfortunately there’s more to do than just dragging and dropping the calculated fields. Firstly drag the EnqCountForMonth pill to the Rows shelf – depending on the settings of AGG(EnqCountBins) this could give you a weird looking graph. If you are seeing weird results first thing to do is right click on the AGG(EnqCountsBins) pill and make sure the option ‘Ignore in Table Calculations’ is NOT selected.

Next thing is to check the EnqCountForMonth is calculating along the correct partitions. Right click into the pill and set it to compute along the ListingId so the description is as follows:

If the description isn’t exactly like this check the ‘Ignore in Table Calculations’ setting again.

Next step is to test the EnqCountPerBin is working properly. Drag this pill into the Label section. It should show the total enquiry count for the bin duplicated for each month but to make this work we alter the way it calculates along the partitions. Again right click into the pill and select Compute Using Advanced. Make sure the settings of this pill are as follows:

If all has gone well the chart will now look like this:

The next step is to create a calculation using these values to get the % of enquiries per month per bin, so each bin will equal 100%.

PercentPerMonthPerBin: [EnqCountForMonth]/[EnqCountPerBin]

Finish the chart

Put this new field on the Rows shelf, replacing EnqCountForMonth, and the visualisation is almost complete. Finally format the PercentPerMonthPerBin pill as a %, remove the label and the chart is now complete. It should look like this: