How to turn data into information
Many organisations have a lot of data without much idea what to do with it. Converting data into useful information is a journey. The Information Ladder shows that journey, starting from the simple, all the way to complex machine learning.
The difference between data and information
In the workplace I often hear the words “data” and “information” used mistakenly. These are two distinct things, often confused for the same thing.
In a world where many organisations are striving to become more data-driven, it is important to understand the distinction.
Data is the raw material used to produce information.
Information is a product of data; without data, there won’t be information.
Data comes in many shapes and sizes. Some data changes frequently, some is static.
Frequently changing data includes website data; some websites have many clicks per second. Currency exchange rates are another good example, these fluctuate throughout the day.
Static data would include the capital city of a country or the tenor of a bond.
Examples of data include:
- Click data on a website
- The written content of your emails
- The static attributes of a book
- A list of countries with their country code
- Names of event attendees
- Whether an email recipient engaged with the email
- The holdings of an investment account
- Static attributes of a fixed income product
The explosion of the internet has led to a significant increase in data. I won’t even attempt to quantify how much data, (others have), but everything that happens online is another data point. Every email, web click, like, video view, tweet, etc is stored somewhere.
Gaining value from the data is a significant challenge. It takes time for organisations to become data-driven. It takes time to understand the data and it takes time to work out what to do with the data.
Therefore, when starting on the journey to make an organisation more data-driven, it can be difficult to manage expectations.
The Information Ladder represents the conversion of data into useful information.
Climbing the Information Ladder is a journey.
The lessons learned on each rung enable progression to the next.
Each rung leads to a deeper understanding of the data. At the bottom rungs of the ladder, the data work is often simpler. Moving up the ladder often leads to more advanced data work and a larger amount of data sources. At the upper rungs, the data work becomes more complex.
What are the rungs of the Data to Information Ladder?
These are often the starting point of any business information. Usually table style reports, often someone getting data, putting it in Excel/Powerpoint and sending it around. Usually, this will be one or two data sources combined, keeping within the limits of basic Excel. Sometimes data is distributed in big Excel files and finding the information may take some work by the recipient, either scanning through a table or building pivot tables, etc.
Like the above, only this time the user can drag and drop limited data sources to find information themselves. Or perhaps they ask their “data team” to provide information on an ad-hoc basis. This is generally in the form of Excel. Often there are limitations to the data available. It’s also prone to error as not all users may know the correct filters to apply / quirks of the data / etc to use effectively. The positive for the user is that getting answers, as long as the data is available, should be quite quick.
A well designed dashboard should answer many business questions. Dashboards should provide the means for the user to answer their initial standard questions (i.e. how are the sales vs budget?) and also answer the follow on questions (i.e. which products are the top sellers? Which salespeople are performing well/poorly?). Usually, these are automated and web-based, putting the information at the fingertips of those who need it.
Dashboards should answer the majority of standard business questions in a mature business intelligence environment.
This is where users are proactively notified if something requires their attention. Business rules drive alerts. Some alerts require multiple sources of data, some only one source. It notifies someone they need to aware of some information and perhaps take some action. Examples include:
- · some dodgy data is in a CRM system that requires cleaning
- · a trader has breached a trading limit; the trader and their management needs to know
- · an item in a fund portfolio has moved more than usual and the fund/portfolio manager should be aware
This is where the data complexity starts to increase. For statistical analysis to be meaningful the underlying data has to be good. The previous rungs of the ladder should ensure the data is of good quality. If people are using the information from previous rungs any existing data issues should have been corrected.
This is where data is used to try and better understand something. For example, in a subscription business what factors lead to a subscription renewal? With PPI claims what characteristics of the loan lead to compensation? Within asset management what causes fund outflows?
Based on what we have learnt during the statistical analysis, what is the likely aggregate outcome? For example, what do we think the subscription renewal rate will be? How many PPI claims will require manual investigation? How much AUM is likely to be lost over the coming 12 months?
This is where we attempt to predict what will happen in the future at an individual level.
Predictive analytics has been in widespread use for a long time in financial services. For example:
- Your Credit Score is used to calculate the risk of defaulting on a loan – which is then used to decide a) whether to offer a loan and b) what price (interest rate) to offer the loan
- Car Insurance – your past claim history, age, type of car, etc. contribute to your insurance cost; they analyse the probability of an incident based on the incident history of those in a similar cohort and then price your insurance based on the claim likelihood
- Life Insurance – data about your age, health and lifestyle are gathered before pricing your life insurance; using past data the insurer is working out the probability you will die while being insured. So an overweight smoker in their 50s will pay far more than an athletic clean living 20 something
This could easily be a separate topic. Machine Learning is a subset of Artificial Intelligence, which this article explains well. It should help better understand clients and subsequently make their experience more tailored to their needs. It can be further broken down:
- Scenario modelling – forecasts and predictions under what-if scenarios
- Decision support – extends scenario modelling, the goal here being to optimise decision making
- Bots and recommender systems – use machine learning to influence customer behaviour. Some leading-edge organisations excel in this area, such as Amazon with their recommendation engine. For example, what does this customer/client want? Based on the behaviour of similar customers can we affect the decision-making process? Can we win more business/maintain their business?
The reality of the Information Ladder
In the real world, there can be significant cross-over between the rungs. Also, it’s not necessary to step on every rung of the ladder. For example, many data projects start on ladder rung 3, “interactive”. Rung 4, “alerts”, is often a by-product of “interactive”.
Similarly, “predictive analytics” and “machine learning” can also be quite similar.
However, what is true is the data understanding and amount of data sources does increase as one progresses up the ladder.
Starting at the top of the ladder simply would not be possible without climbing the rungs below.
Climbing the Information Ladder
Often data to information projects start small with simpler data sets. All organisations are different, some are data-driven, other less so.
However, once these projects start to gain traction and start to prove their value they can lead to significant cultural change.
The interest in data tends to rapidly spread throughout organisations of all sizes.
There is a natural progression to expand to more data and use existing data in different ways.
Simply by answering one question is likely to lead to more questions.
An example of climbing the information ladder in an asset management context
Using asset management, here are some examples of items that could fit on each rung of the ladder, depending on the data maturity of the asset manager:
- How is a product performing vs the benchmark? (Static report)
- Can I see the holdings of a product? (is there a place I can see/request this data?)
- The product is performing like this today, how was it performing 2 years ago? What was the profile of the holding then vs now? (Interactive dashboard)
- This product is under-performing the benchmark by more than it should (send an Alert)
- What is causing the underperformance? (Statistical analysis)
- If performance remains as it is are we likely to see outflows? (Forecasting)
- Are there characteristics of the product that could mean a significant deviation from the benchmark? (Predict which products are could lead to at-risk clients)
- What products would be suitable to recommend to our clients? (Can we fulfill their needs with one of our products?)