The I in BI…

18 min readMay 12, 2023

Five simple rules to extract and present meaning from data. The Applied Data Analysis (ADA) last lecture.

The I in BI stands for intelligence. Author image.

Tl;dr;

Purpose defines form and function. Start with purpose. Who are you building it for? What will they use it for? How will it work? What will it show?
Intelligence is actionable. Insight is gleaned. Metrics track performance, exceptions and models. Present intelligence first, then metrics.
Focus on the relevant. If you are asked to show the ducks, show the ducks. Don’t show blank rows or columns, unless you are making a point.
What gets measured improves. But that depends on what you are measuring against. To track growth, first build consensus on the right baseline and benchmarks. Then use the ones you pick, consistently.
Understand the model and the data that describes it. Predict, forecast and track accuracy of prediction. Without accountability, there is no self correction or improvement.
The I in BI stands for intelligence. If you are not presenting or showcasing intelligence, it is not BI.

My orientation notes for fresh data analysts. As a technologist I was so focused on cool tools that I often missed the context. A few lessons from a career in parsing, analyzing and presenting data. From an internal exercise with a focus on content over tools.

One. The difference between intelligence, insights and metrics.

A metric is a measure that we use to track and improve performance. Or meet operational benchmarks. For instance:

a) Number of visitors it takes to convert an order.

b) Average order value.

c) Number of impressions for a click.

d) Change in number of orders for a promoted product.

Each of the above trackers is a metric that can be used to improve performance in a specific area.

Improve your conversion rates (a), increase average order value (b), increase click through rates ( c), track effectiveness of a promotional campaign (d).

Metrics are important. They are useful when we have clarity on why or what. If you are placing a metric on your dashboard, ask why and what first.

Insight is something we glean from data which is not visible with a direct or cursory review. Insights come from mixing context with directed analysis, inspired by a specific question.

Here is an illustrative example that maps our path to an insight.

We run a ecommerce site that sells excel templates and PDF study guides to consultants, traders and model builders in the computational finance space. For our online store, the top seven order origin countries contribute 58% of all orders and 63% of all revenues.

Is this an insight? Can I put it to work? Is it useful? Should it be on our sales dashboard?

The 58% / 63% ratio is a calculated value, not raw data. Some level of processing goes into generating it. Is it useful? Yes. It is a statistic so better than data but not an insight. Statistics are compiled by processing data but depending on context, may or may not be useful.

How do we turn this into an insight? An insight should lead to actionable intelligence.

For instance, a question we pose every year as part of our internal store review. Which of these seven countries should we focus sales effort and marketing dollars on?

Given population, size, economic foot print, are there countries that had a disproportionate share of sales? So we could focus our efforts on them (actionable intelligence).

Version 1. Customer orders and order value. Author data and analysis.

If you can’t see it at data level, transform data to a form where information we need is visible. This requires some calculations.

Two countries meet this criteria. US and Singapore have a higher disproportionate share of revenues compared to share of orders. The higher share is explained when we calculate and compare average order value (AOV) by country.

Which countries have a higher disproportionate share of revenues relative to orders and average order value? The answer: Singapore first, then US.

Why is that? Possibly because the two territories represent global and regional financial centers for their markets. But so does UK. While UK is our second important market, it doesn’t stand out on the basis of these two criteria. And if that criteria worked, we would also expect to see Paris, Berlin, Hong Kong and Tokyo on this list.

Why does this matter? Insight often leads to actionable intelligence. What is the actionable intelligence here?

Which countries should our efforts focus on to grow traffic, conversions and sales. US and Singapore. Why?

Three reasons. a) Higher average order value across top 7 markets b) Higher order numbers and order value on a relative market size basis. c) consistent traction over a decade of traffic.

Why not UK, India, France or UAE? They are also in the top 7 list.

That is a good question. To answer it, we would need to change the granularity of our analysis. We address it later.

Actionable intelligence is the cutting edge of our sword. A focused laser sharp push that helps us get to our targeted goal or objective. It is at a more directed level of focus (granularity) than data or insight and is often a calculated result. Actionable intelligence also has a goal. In our case that goal is higher sales.

Sometimes intelligence and insight also incorporate external context.

In our specific instance, Singapore, a much smaller market has the same standing as two larger, competitive markets. Because Singapore is a much smaller market and we already have traction and validation, we may own and dominate Singapore due to simple network effects. Smaller market and community already familiar and comfortable with our product with less competition. Every new sale or customer brings us closer to dominating a smaller market. Depending on the product we sell, dominating smaller markets leads to higher margins than simply being present in a much larger market.

This is context. It is not reflected within data. It is not meta data. But it is relevant to understanding our analysis and the final recommendation.

An oil tanker navigates the Bosporus strait during a mist covered morning. Intelligence or metrics? Author image.

Understand the difference between intelligence, insights and metrics. Put that understanding to work in building better reporting for customers.

Two. Purpose. Intelligence over metrics.

Data, intelligence, insights, dashboards all have a common theme. Purpose. Find it.

How do you find purpose? Answer a question. To find what you are seeking you must know what you are looking for.

As ecommerce store owners we have different objectives and wear different hats. In the example above, our objective is to identify which markets we should spend our growth dollars on. Using traffic and conversion analytics the answer is US and Singapore. It wouldn’t be evident on a cursory review. The US maybe, but certainly not Singapore. To get to that insight we had to specify a criteria.

What was the criteria? We want to focus on markets that have a disproportionate share of revenues, compared to orders. Essentially higher average order value (AOV).

We can also invert the purpose and focus. For instance, which markets we should walk away from? Or the right strategy to grow revenues from markets we want to focus on? The two new questions would require new criteria and different line of inquiry.

Purpose defines form and function. Author image.

Most beginner dashboards are metric focused. That works well for operations, compliance and governance teams. Metric focused teams want to see values, relative percentage change, periodic change, seasonal change and deviations from expected, anticipated behavior. Show that.

For strategy and executive teams, we need actionable intelligence. Actionable intelligence has two attributes: a) It explains or helps us better understand data. b) It sets direction for what should be our next step.

Simply put, what does this (result) mean and what should we do (action)?

When we add data to our dashboard, ask if this is a statistic or an insight? A statistic belongs within the metric section. Insights belong within the executive section.

Intelligence and insights, given the definition above, are often calculated. Are we presenting raw data, processed data, filtered, calculated or derived data.

Order value for a single order (a single row) is raw data. All orders (rows), where order value is greater than $19 is filtered data. The sum of all orders shipped in a given month to customers in the US is collated or processed data. Average order value for all orders shipped to US customers in a given period is a calculated value. A trend line of monthly average order value by month for US customers from 2012 to 2023 is derived data.

If it is filtered, calculated or derived, it will go to the executive dashboard. If it is raw or processed it belongs in the operational dashboard.

Which one are you building? Author image.

Purpose determines what goes up on our dashboards. Operational dashboards focus on tracking exceptions and metrics. Performance dashboards focus on the business model that generates results we are trying to improve. Strategic dashboards focus on tracking core thesis (relationships), deviation from thesis that require action and intelligence.

When it comes to ordering dashboards, present intelligence and insights first, then related metrics, then data.

This is what has changed that we need to look at (intelligence).

Followed by, here is why we think this has changed (insights) and;

What we need to track to fix it (metrics).

From data to intelligence. Author image.

What is a conversion opportunity? We track traffic, average order value and trend by average order value by country. We also use traffic analysis to gauge where we are getting traffic from.

The identification of opportunity to write Founder puzzles, our book on modeling for founders, is owed to traffic analysis. For our finance site, we receive fair share of traffic related to financial modeling across multiple social sites. In ’21 funding for startups and founders was a key theme but we didn’t have a product to sell. So we wrote one using notes from our acceleration training practice creating relevant collateral. The insight from traffic was that we had an audience but didn’t have a product. The insight from context was the broad topic. The actionable intelligence was the specific topic for a specific audience we had access to. The action, creating and launching the product. The aim, converting customers we already had into repeat buyers for a new product with broader appeal than our conventional products.

First ask, who will use this report for what? How does the content in this report helps them do what they need to do?

Three. Focus on the relevant.

Omit needless rows and columns. Author image.

What do you think the following dataset shows? Is there any intelligence or insight here?

This is a list of markets. Customers from these markets visited our site but had no sales across 12 years. The blanks rows represents annual sales by year.

Does this help in answering the top markets question? No.

Would you reach a different conclusion, if I added more fidelity to the dataset? Break it down by months of the year? No.

Then don’t show it. Filter it out.

If we are trying to zero down our top 5 markets, is there any value in showing either of the two extracts above? To answer the top 5 market question, what would we prefer? The two blank extracts above or, the table below?

Stay focused on the point. Do you really need those pesky decimal places?

If data doesn’t add value, if it doesn’t contribute to the discussion, don’t include it. Not everything you have in your tables needs to be seen. Prune and trim unnecessary rows and tables. You really don’t need those decimal places.

Trimming is not just applicable to white noise. If you are answering the top 7 market question, only show the top 7 markets. There is no need to show the 8th (my apologies, Paris).

Here is the revised cleaner presentation. Without decimals and Paris.

While Denver’s AOV is 3 times higher than Singapore, Denver’s number of order are too low to be credible. Singapore, London and New York with their larger number of orders represent a deeper flow of orders compared to Mumbai, Denver or Riyadh.

Once again context comes into play. The Denver orders were the result of a one time consulting opportunity. Once the engagement was complete, the order flow dried up.

Bandwidth, processing capacity, focus and attention is at a premium with consumers of intelligence. Don’t waste it on white noise.

Sometimes we need cuts with different dimensions. One such cut is customer churn. The cut above with a few changes is what we track internally to keep an eye on our core business within our core markets.

The cut shows why Singapore is a special market for us. A higher share of returning customers and higher AOV on orders from returning customers. And so are India and the US. UK meets one criteria (higher share) but fails the other (higher AOV).

Omit needless rows, columns and numbers.

Four. Tracking Growth. Find the right benchmark.

Dissecting and representing growth is hard. More so for new products and startups. Growth turns out to be a minefield when it comes to tracking dashboards. How do you measure and represent growth correctly?

Here is our first attempt. Our version one. For tracking growth. Growth as collated data by month and year. It is the output of a simple pivot table run on gross sales. Data presented in raw form collated by month and year.

It is as useful as a dictionary.

Here we go again. Version 1. Collated author data.

The data set shows revenue figures by month and year for 12 years. If you look at the right most column (Grand Total) you can see year on year growth has been uneven. The right most column is the most intelligent column in the table above. It shows us a trend. Not a visual trend but still a trend. We can make an infinitely more intelligent call by replacing the above table with the simple plot below.

While the plot does a better job of showing revenue trend line on an annualized basis, it doesn’t answer the growth question. It builds everything around the intelligence column we highlighted above but drops granularity.

If you are trying to dissect month on month or seasonal growth, you need a variation of the original cut. For this analysis we were tracking monthly sales performance mid year on a month on month basis. If we don’t have data at monthly granularity, we can’t measure or account for trends or differences our interventions are making in sales.

For instance, in May ’23, we are quite keen on determining what our year end figure would like? More so given sales booked till April ’23 have been disastrous. While we are interested in the year end figure, we are also interested in tracking where we are with respect to the full year target, adjusted for seasonality. Also any possibility that we could turn around the figures for the full year.

Here is our version two, our second pass. We compare current month, current year, with the same month, previous year. January 2021 compared to January 2020.

Sales down 16% when we compare the two months (1st column, 8th row). Similarly November 2021 when compared to November 2020 up by 201% (11th column, 8th row).

The month on previous month comparison addresses the seasonality question. For instance, December is a slow month and often a month of decline in our business. The last 15 days of December and first 15 days of January are the slowest days for our business. The same holds true for last two weeks of June and the first week of July.

Here is a slightly readable version three. The only difference between version two and three is how data is presented. Higher growth numbers are highlighted and presented in bold. They stand out.

Why not do a month on previous month comparison? Compare Feb with Jan, Jan with Dec? What would the plot look like if we did that?

This is our version four. When you remove the impact of seasonality, and just compare current month with previous month, everything looks absolutely fantastic.

Let’s do a quick customer / user feedback check. As a store owner we are not happy with either of the above cuts.

Sales for ’23 stand at US$ 3,222. Projected full year sales under US$ 10,000. That is 40% down from sales for full year ’22. This year is a disaster, to put it mildly from a sales perspective. Yet version four shows 3 of the 4 months of ’23 with positive growth on a month on month basis.

The seasonality adjusted version three does better for ’23 but has issues with astronomical growth in some months in earlier years. If we had a bad year the year before, our recent year will look great on a year on year basis. Also true in reverse. Clearly both these models (version three and version four) of growth are not working well for us as store owners. We need something different.

What if we established a seasonal baseline? It could be the average revenue for that month across all years. Or revenue that corresponds to what we term as a normal month for a normal year, free of outlier, or extra-ordinary events. Pick a year, the average of the most recent three years, or the average of all the years combined. Your call.

For a minute lets assume that this the average revenue for every month across the ten full years. We use that as a baseline. We compare and plot the change in revenue every month, not with previous years, or prior months but baseline revenues for that month.

Version 6. “You are the one that I want.”

This is version five of our plot. Majority of January and December growth figures are negative, which ties in with our seasonal model.

All problematic 2000%, 1300% and 1140% growth months are gone. They are now represented by reasonable numbers.

For 2023, we had a great January but terrible Feb, Mar and Apr, which ties in with our original read for the year and shows the assessment we want to see.

Without the right benchmark the tools, plots, charts we use are meaningless and can’t be put to work. With the right benchmark we get the information we need as a store owner in one glance.

Identify and use the right benchmarks and baselines. It is not the tools or plots. It is the benchmark. Don’t build the benchmark in isolation. Build it in consultation with your customers.

From an effort, pricing and timeline perspective, budget for three iterations. Irrespective of whether you are planning for an internal project or an external client engagement. The first cut we build will be binned. The second one will be presented. The third one will be tweaked and finalized. You may get lucky and do it in two or win the bad luck lottery and not converge at all.

Setting expectations for three iterations helps avoid disappointment and frustration. If we don’t budget for three iterations, someone on our team, the client team or the account management team will blow a gasket.

There is one final challenge with baselines that you must avoid at all costs.

Once we have the right metric tied down, we can even plot this table as a surface. Surfaces are not popular with audiences. They are only appreciated by teams working on optimization challenges. They help visualize a baseline if you need help or assistance with setting the right benchmark and highlight trends visually. For instance, despite our original impression of December, January being slow months, the reality appears to be quite different.

We love surfaces so these next few lines come with a heavy heart. Never start or end a presentation with a surface. Surface are a mistaken dominance tool. We think they are a secret weapon that we can use to establish mastery of our subject area and data set. In practice though we use them to sabotage a presentation or nuke any hope of building engagement with our audiences. We also use them to redirect a discussion when we have lost the argument or want to confuse our viewers and escape in the chaos we have just unleashed with our plot. They are a glorified version of a smoke grenade in Excel.

Baselines and benchmarks impact analysis and recommendations. Pick them carefully. Apply them consistently.

Five. Understand the model.

Our thesis so far is good tools and reports help us discern signals from noise. Great tools convert signals into actionable intelligence.

Only possible if we understand the business we are trying to model. There is making the effort to understand a business and there is understanding a business.

The two are totally different things.

a) Do you understand the business well enough to add value for an owner who has spent a decade working in that space?

b) We do that by drilling down to the core of how the underlying business creates and generate value.

For instance, in our ecommerce store, the business model is driven by content and impressions. Clear calls to action convert those impressions into clicks and orders. If you want to grow revenues you have two paths.

a) Get more traffic (via new content and clicks) and,

b) Convert more visitors (conversion optimization and higher order values).

So clicks, conversion and average order values are your growth ladders. Your metrics focus on them because what gets measured, improves. Your actionable intelligence should be aligned with these two goals for it to have real impact.

If you are not aware of the path or don’t understand it, you can’t help navigate it.

All models are not the same. The model above is based on a complex relationship between content and sales. It assumes you have digital assets (artefacts) that can attract traffic. It assumes you have a funnel that can effectively convert that traffic into sales.

Others models may have no discernable patterns. For instance, here is the daily price series for Bitcoin from 2014 to 2022.

To understand a phenomenon for which we don’t have a clean model, we opt for studying change. Nicholas Nassim Taleb calls this the generator function. How do, or what process controls and changes daily prices?

Dissecting change gives us three benefits.

One, we get one step closer to discerning the underlying signal.

Two, we have a model for validating change in our predictive structure we build ourselves.

Three, if the change model allows us to forecast future behavior, we can use these forecasts to assess impact of our interventions.

A histogram for daily price changes. The generator function.

Become one with the distribution. Understand how it is generated, how it shifts, evolves and changes with time. It happens overtime, with specialization and dedicated effort but it will make your work stand out.

Some models like our content and sales model above are relationship based. Some use direct data. Others use estimates.

Alternate estimation models. There is always more than one path to a number.

There is often more than one way to an answer or a forecast. There is no harm in exploring as many of them, test them with users as well as with real data and keep the ones that work best.

The deeper our ability to understand how a process works, the context of environment we operate in, the more value we can add with our work.

Good dashboards also have an accountability function. Is what we are doing, working? Difficult to account efficacy of interventions without some predictive and tracking capacity. Predictive capabilities used to be hard. But this is an area where technology has simplified the process and made tools more accessible.

Predict and track. Without accountability there is no self correction or improvement.

“To thine own self be true,
And it must follow, as the night the day,
Thou canst not then be false to any man.”
Act 1, Scene 3. Hamlet, William Shakespeare.

Context

The context for this case study is based on four questions.

Who? Me. I run a digital ecommerce store that sells Excel templates and PDF study guides in an exotic financial space called computational finance.

What? Building a dashboard for an ecommerce store owner. Using anonymized historical sales data for ten full years and two partial years. Not concerned with visuals at this stage, more concerned with content.

Why? Objective? To increase sales, revenues and average order value. To track performance and identify deviations before they become bigger headaches. I do this exercise every year, this year, I build a live case around it.

How? Ideally identify actionable intelligence that can be put to work immediately.

The Dataset? Traffic analytics data and analysis for our site from Google Analytics collated with revenue received through our payment consolidator. The author has full ownership of the dataset and is sharing analysis and results, but not the full dataset for this article.