December 5, 2024

How to leverage Metabase for efficient self-service analytics?

Lucas Leroux
Manager Data Science and Analytics
Pierre-Henri Cumenge
Cofounder and CTO @ Theodo Data & AI
Coralie Betbeder-Déchelette
Data Lead @ Joko
Boris Dorado
Head of Data & AI @ Klee
Emmanuel Martin-Chave
VP Data @ BlaBlaCar
Christelle Marfaing
Chief Data Officer @ May
Nolwenn Belliard
Head of Analytics @ Sumup
Yoann Benoit
Head of Data & Product @ Hymaïa
Lucas de Vries
Head of Data Freelance
Claire Gouze
Head of Data Freelance
Christophe Blefari (aka Blef)
Staff Data Engineer - Freelance
Yanni Iyeze
Data Analyst Senior @ Exotec
Edouard Flouriot
Director of Data Analytics @ Sorare
Governance and monitoring tips from the Modern Data Network with Pledg, Modjo and Joko.
partager l'article
No items found.

Metabase is a popular visualization tool to implement self-service analytics. It’s cost-effective and simple to implement. It’s also intuitive and easy-to-use for technical and non-technical users.

Yet, it can quickly turn into a nightmare with an exploding number of dashboards and broken questions, users losing time not finding what they need and uncontrolled warehouse computing cost.

Self-service initiatives fail because of poor end-user experience and knowledge, missing governance and unmonitored usage. The Modern Data Network shares tactical insights from Pledg, Joko and Modjo on their successful self-service analytics journey.

Implementing an efficient self-service strategy with Metabase relies on:

  • Clear prioritization of teams and use cases to start step-by-step
  • Identification and training of data champions in business teams
  • Strong governance, especially around organizing official vs. user-generated questions and collections
  • Detailed and actionable monitoring of self-service adoption and usage

Read on for more details on how to implement those in your team.

How Pledg leverages BigQuery logs to monitor Metabase self-service usage

At Pledg, we believe that business teams should be empowered to create their own analytics. However, to ensure they are utilizing data correctly and to avoid any unpleasant surprises, it is crucial to monitor data usage closely. This includes tracking which tables are used, identifying failed queries, and being aware of the ones that are excessively costly or time-consuming. In today’s modern data stack paradigm, the data warehouse has emerged as the central hub for diverse and numerous data consumers.

Consequently, we believe the most effective place to construct a robust data-usage lighthouse is within the warehouse itself. Let’s explore how we achieved this using BigQuery logs!

1. Get insights about your queries in the INFORMATION_SCHEMA dataset

Each time a query is executed in BigQuery, it’s recorded in the INFORMATION_SCHEMA.JOBS view, providing a wealth of information, including:

  • Execution Duration
  • Bytes Processed/Billed
  • Error Logs
  • DML Statistics (number or rows inserted, deleted, updated)
  • Referenced Tables
  • Cache Hits
  • The Executed SQL Query

You can learn more about INFORMATION_SCHEMA in the Google doc here.

This data is invaluable for understanding query performance and behavior. However, the challenge is in effectively navigating this extensive dataset. That’s where job metadata becomes crucial.

2. Enriching BigQuery Jobs with Metadata for Better Analysis

What you need to effectively navigate your BigQuery logs is being able to trace back every single job. Of course, the first step is to have separated service accounts for every data consumer tool, but you should also go at a more granular level by enriching every job with metadata that describes exactly where it comes from, how it was triggered etc. For custom-built data tools using BigQuery client libraries, you may add labels for metadata purposes. But most of the time, you don’t have this kind of liberty. To cope with this, we decided to add the metadata in the SQL query itself, as a comment. Let’s see how we did it with Metabase, our preferred BI tool at Pledg.

It’s actually Metabase that gave us the idea of injecting metadata in the query itself. When you execute a “Question” (the Metabase lingo for a query and its visualisation) the actual query being run on your warehouse contains a comment mentioning which User ID triggered the question :

To track which Question a query is coming from, it would be helpful to also have the Question ID logged (a feature not yet implemented in Metabase, despite popular demand). To address this, we developed a custom procedure to append this information to all Metabase questions in a nightly batch (see our script here). As a result, every BigQuery job initiated by Metabase now begins with a comment specifying both the User ID and the Question ID, greatly facilitating our navigation through BigQuery logs.

Applying a similar approach to all your BigQuery consumers will significantly enhance your ability to track data usage.

3. Add metadata about data consumers

The previous step successfully links BigQuery jobs with their respective consumers. The next phase involves adding detailed data about these consumers into BigQuery. This data can then be used to create various indicators and segmentation in our data-usage statistics.

Continuing with the Metabase example, relevant metadata includes Question names, creators, collections, and dashboards. At Pledg, we use Airbyte Cloud to sync all our data into BigQuery and we leverage its Metabase connector to import all Metabase metadata into BigQuery.

After this setup, we have a new raw_metabase dataset in BigQuery, equipping us with all the necessary information to start constructing our data-usage monitoring system.

4. Synthesizing Data with DBT and Analyzing the Results

With job insights on one side (courtesy of the INFORMATION_SCHEMA.JOBS view) and consumer metadata on the other (our raw_metabase dataset), we combine these elements using DBT to create a new “Data usage” mart that gives us usage statistics for every Metabase Question. The link between the INFORMATION_SCHEMA.JOBS view and the raw_metabase dataset is done thanks to the comment added in every query in step 2 (Question ID and User ID are extracted from the SQL query with a simple regular expression).

Let’s see a few views that we built on this mart and that we use every day to monitor data usage across the organization!

Most costly Metabase questions
Failed questions (with the actual error message !)
Search snippets within queries to assess if your data assets are actually used

With the recent release of Metabase, limiting the access to logs, this setup makes even more sense !

Joko improves users’ experience on Metabase thanks to better governance

2023 was an amazing year for the Data team at Joko. Metabase served as the cornerstone of our progress in better understanding our business and becoming more data-driven. However, fostering data access and usage also brings challenges. As our usage of Metabase has evolved over time, our governance of the tool also evolved. To ensure a great experience for our 50 users from all teams (B2B, Sales, Marketing, Product, Top management…), we have recently implemented a new organisation governance strategy for Metabase. Let us guide you through it 👇🏻

Former Governance & Related Challenges 😱

A few months ago, the volume of questions and dashboards was much lower than it is now (cf. chart below as an example for data lovers).

At that time, we had implemented a simple yet functional organizational structure. We had one collection per team (a collection being equivalent to a folder in Metabase), all collections contained only dashboards, except for one collection that housed all the questions (a question corresponds to a cart displayed as a table or chart in Metabase).

It was a good way to start. However, as the volume of content increased, some user pain points also increased. Users expressed the following feedback:

  • Difficulties in searching for content
  • Doubts on where and how they should create questions and dashboards
  • Doubts on permissions (e.g., whether they are allowed to modify a dashboard or a collection)
  • Lack of awareness regarding what is reliable or official, resulting in the need to recreate content

Our solution to improve user experience 🥰

To tackle these challenges, we have reorganized Metabase by reshaping the structure of each team’s collection. Now, each collection is made of two distinct types of collections:

  • Official collections: They exclusively contain official content, dashboards, and questions that have been curated or reviewed by the data team. It includes reliable KPIs. You can identify them by the official badge (this badge is available with the Pro plan, if you don’t have it you can use a ⭐in the title).
  • User Content collection: It is dedicated to user-generated content and is designed for self-service analytics.

Here is an example of the Marketing collection:

The UA (User Acquisition) collection consists of official dashboards, and all the official questions used in these dashboards are stored in the “Official Questions — Marketing” collection.

The User Content section contains various sub-collections that are used for self-service analytics within the Marketing team.

This organization relies on permissions that need to be adjusted to maintain official content:

  • Official collections: all users have view rights, while only the Data team and the Data champion of the corresponding team have edit rights. The Data champion, a recently introduced role representing the ‘data expert’ in each team, is responsible for maintaining their team’s Metabase collection, fostering a data-driven culture, and acting as the main point of contact with the data team regarding current and future data projects.
  • User Content collection: all users have view and edit rights.

We already received positive feedback from users regarding these changes. Users have noticed improvements in search, tool clarity, and awareness regarding reliable content. This project is aligned with our goal of driving our Data strategy and projects based on (i) user feedback and (ii) metadata related to the usage of Metabase. To further enhance our efforts in this direction, we are working on regularly measuring user satisfaction through a survey inspired by NPS (Net Promoter Score). Additionally, we started using the metadata available in Metabase, such as views and creations per team, cohorts of usage. If you would like to learn more about these initiatives, please don’t hesitate to reach out!

Modjo started self-service analytics with Metabase

The Data Analytics team was created one year ago at Modjo with the arrival of a data engineer and a data analyst. The first mission of the team was to build the data stack (Airbyte, Snowflake, dbt, Metabase). As we were a team of two people, we quickly realized that we would need to empower business stakeholders on analytics, so that we do not become a bottleneck. Metabase seemed to be the right visualization tool to pursue this goal, as it stood out for its ease of use by non-tech users (with no-code “questions”, the Metabase lingo for a query and its visualization). While empowering teams, we also knew that we would have to establish company-wide standards on Metabase usage to guarantee the reliability of analytics.

Self-service is only beginning at Modjo, but our return on experience might be valuable for those who want to get started but don’t know how to approach the problem. Here is the 3-step process that we followed.

1. Prioritize teams according to four criteria

The first step was to identify the teams that would be empowered first. Our prioritization framework depended on four factors ranked here by importance:

  • The need for data access: do users already have access to data through other tools? At Modjo, the Revenue team could directly access sales data within Salesforce, while product managers had no access to features data.
  • The data skills for building reports (are they already manipulating data with Mixpanel, Salesforce or Chargebee?) or for writing SQL queries
  • The impact of self-service on the company: some teams needed to make basic visualizations for order of magnitude or debugging (Product & Tech) while others needed to build official dashboards (Revenue)
  • The effort for the Data Analytics (DA) team to build appropriate data models and train users

The matrix below made us start with the Product, then Tech and finally Revenue teams according to the first factor:

2. Identify and train data champions

As everyone couldn’t be trained on Metabase usage, we had to select a few “data champions” to empower among those teams that would be responsible to broadcast the knowledge later on.

  • For Revenue teams: at Modjo, Revenue teams are composed of Sales, Operations and Finance. Sales representatives are not entitled to create analytics assets, so we rather chose to train one data champion in the Operations and one in the Finance teams, who would be responsible to create official dashboards to monitor the achievement of sales targets. The choice of the data champions has been made on their appetence and their experience in handling data. Each team has a different perimeter, so we organized individual training sessions focusing on the specific data models for each perimeter.
  • Product & Tech teams were already familiar with the data models (as they are in charge of building them) and their need was limited to basic questions, very close to raw models. This made us choose all product managers and software engineering leads as data champions and allowed us to organize only one joint training session, mixing theory and practice. We also created a “Training” folder in Metabase, with basic questions to be duplicated and used as a basis for customisation.

3. Measure self-service adoption

To monitor self-service adoption, we created a dashboard tracking the number of Metabase questions created per user over time. It reveals that four teams are now creating questions on a weekly basis (Product, Tech, Ops and Finance):

What’s next?

Self-service is now a reality at Modjo. As the company grows and becomes more and more data-driven, our next challenge will be to scale the initiative while preserving a high-level of quality on the produced analytics assets. Here are some projects that are in our backlog for 2024:

  • Create a community of data champions (create a Slack channel where the data team can share news on marts models and answer questions, organize a monthly meeting to make a collegial review of one specific dashboard, etc.)
  • Provide data champions with an up-to-date documentation on exposed marts models and predefined calculation rules (using the Metabase “segments” and “metrics” features)
  • Create a “Metabase charter of use” to educate on good practices (store created assets in appropriate folders, systematically check results accuracy, regularly clean broken or deprecated dashboards, etc.)
  • Define clear guidelines on Metabase permissions (folder and data access)

Conclusion

Metabase is a great tool to implement self-service analytics. It’s intuitive and easy-to-use for the data team and business users. Yet, it requires strong governance and monitoring to get the best results.

We hope the tactical insights shared in this article will help your data team make their self-service analytics projects work. Afterall, who does not like happy business users and a well-organised Metabase?

We’d love to hear from you! Feel free to share your own tactics, governance and feedback.

Partageons ensemble notre savoir

Rejoignez notre communauté d’experts en data et boostez votre réseau, vos compétences et votre impact dans le domaine
nous rejoindre

Nos prochains événements

TAG
This is some text inside of a div block.
This is some text inside of a div block.
TAG
This is some text inside of a div block.
This is some text inside of a div block.
TAG
This is some text inside of a div block.
This is some text inside of a div block.