We’re excited to share a deep dive into dbt Labs, the makers of dbt, which is currently the industry standard for data transformation. Databricks reported dbt as the fastest-growing software in the data and AI category amongst Databricks customers in their 2023 State of Data and AI Report1, and Snowflake recognized dbt Labs as their Data Integration Partner of the Year. In just a few short years, dbt has become the must-have software tool for data analysts and engineers to perform data transformation.

dbt Labs is one of the key companies at the center of the modern data infrastructure movement, and as a consequence, can additionally be thought of as the classic strategy of selling “picks and shovels” during a gold rush. Because virtually any company attempting to leverage data to implement artificial intelligence (AI) is likely to need dbt Labs to be successful.

"Every AI app starts with data, and having a comprehensive data and analytics platform is more important than ever."

- Satya Nadella, Microsoft CEO

According to Databricks’ recently published industry report on the 2023 State of Data + AI, dbt is the number one fastest growing data and AI tool in the world in terms of year-over-year customer growth, more than tripling the number of customers using dbt and Databricks during the previous 12 months.

Source: 2023 State of Data + AI

As more companies seek to do more advanced things, more often, with their increasingly large data sets, they will inevitably run up against the limitations of traditional approaches to transforming data, and seek out a better way with dbt. dbt allows teams to take raw data and refine it into an end state that enables companies to do something useful with it — such as personalize their customer experience, generate sales and inventory reports, or train their own AI and ML models, to name just a few examples.

Having already won over data practitioners everywhere with its ease of use and intuitive yet powerful approach, we expect dbt’s growth will continue for years into the future as it cements its central place in the technology stack of businesses ranging from startups to Fortune 500.

Powering the industrial revolution of data

From the workbench to the assembly line

At Fundrise, we use dbt Cloud for many of our own key data processes across the business, ranging from investor analytics to real-time cashflow forecasting for real estate analysis.

And we’re far from alone — dbt is used across nearly every industry, by businesses as diverse as McDonald’s Nordics, Nasdaq, and JetBlue. With over 3,600 customers using dbt Cloud (more paying customers than Snowflake had when it went public in 2020)2 and tens of thousands of potential future paying customers using open source dbt Core, it has become ubiquitous and much-loved by data teams doing advanced analytics.

dbt's homepage leads with the headline “Ship trusted data products faster.

The concept of a data product is a great framework for conveying the power of dbt to the vast majority of us who aren’t data engineers or data scientists by trade. Like physical goods, data, in order to be useful to a business for anything greater than the narrow purpose of recordkeeping, must be assembled by combining multiple raw sources / materials, and then molded or shaped via a specific set of steps to produce the desired output.

For example, imagine that an e-commerce company wants to build a predictive data model that estimates future customer lifetime value (CLV) for each of their customers. At a minimum, this would likely want to take into account their past orders, activity on the website, engagement with ads or email marketing, as well as demographic attributes like their age, gender, place of residence, and anything else that might be an indicator of their willingness and ability to spend money.

Applying our manufacturing analogy, each of these data sources can be thought of as the raw materials needed to build a data product. Physical raw materials typically need quite a bit of cleanup and preparation before they can be used in manufacturing, and this is true in the world of data too. Without getting too far into the specifics, the many different software systems that a typical company uses to run their business each stores data in a manner which is good for running that particular software (i.e. maintaining and retrieving individual records), but is messy and inefficient for doing anything else, like understanding what the data is telling you in the aggregate.

Generating a simple list of products a customer has bought might involve querying a dozen database tables, filtering out incomplete orders, and sifting through tens if not hundreds of unneeded columns to find those handful of data points you actually care about from a business perspective. And that cleaned-up list of purchases would be just one of a dozen or more inputs to a predictive customer behavior model.

dbt enables the data team to break apart processes like the one described above into discrete steps with highly structured requirements about the inputs and expected outputs of each step, and even specifications for what happens when something goes wrong. On top of that, the software makes it easy for anyone to observe the results of each step in the “assembly line” before the next one begins. Like an assembly line, this enables any downstream consumer of an output to confidently rely on the data being in an exact state as the starting point for the work that needs to be done next.

The same principles apply when it comes time to summarize enterprise data into metrics. For example, rather than having multiple slightly different variations of monthly sales volume floating around the company (should it be based on date of order placement, or date of order fulfillment?), dbt creates a central facility for defining and documenting these metrics, so that everyone from the analyst to the CEO can trust the numbers they are using.

At their annual conference, Coalesce, dbt Labs announced a number of major new dbt Cloud features that solve some of the biggest data challenges that companies face. These new features will help customers, like Fundrise, solve problems of complexity. As organizations — and organizational data — grow in complexity, it becomes really difficult for data teams to stay agile, while maintaining quality. The new initiatives aimed to address those concerns include dbt Explorer, Cloud CLI, new partner adapters, and the next generation of the dbt Semantic Layer. dbt Labs also announced the new dbt Mesh paradigm, which equips teams to collaborate across projects to support a data mesh architecture, enabled by the new capabilities. You can check out the latest news here.

Recap

To summarize, dbt enables a step change improvement in the way organizations work with their data which, in our opinion, is as significant of a leap forward as the transition from artisans individually handcrafting items on a workbench to mass-production on an assembly line. If, like us, you believe in the power of data to transform the way businesses operate, there’s no company that we can think of that’s closer to the center of this revolution than dbt.

Want to help dbt grow even faster?

We're believers in the dbt product and mission, and now we (and our venture investors) are literally invested in their growth.

If you lead a technology organization and want to embrace a truly modern data strategy, dbt Labs is essential.

Modernize your approach to data