Today, everyone likes to talk about data. Big data, data storage, data warehouses, data science, data analytics, etc. The word data is now used so frequently that it’s often unclear exactly what people mean by it or what problem they are actually trying to solve.
But beneath this hackneyed buzzword a quiet revolution is underway. One that is spawning a generation of promising new startups and transforming how nearly every business operates.
Data has always been important, although it’s only been with the recent advent of cloud computing that businesses have begun to recognize the full scale of the opportunity that exists in capturing and leveraging the vast troves of data that get created through their daily operations.
The reality is that most of what traditional professionals think of as their “work” is, at its essence, the exercise of gathering data, organizing it into a structured format, and then inferring conclusions, which are then distributed to various stakeholders both internally and externally. This effort describes the primary activity of functions such as accounting, finance, strategy, operations, analytics, and so on. Even many marketing, product, sales, and engineering roles spend a large percentage of their time in this endeavor of accumulating and repackaging data.
To understand the role of data within organizations today, one can imagine a process similar to that of manufacturing.
Raw data is generated organically through a wide variety of activities, such as spending behaviors, flow of funds, usage of social media platforms, GPS tracking, smart home devices, and pretty much every application one has on their phone. Virtually all of our consumer and business activities are now captured digitally. This raw data, like raw materials, is extracted from various sources and fed into a central data storage facility such as a data warehouse.
Once in the warehouse, the data must go through a cleaning and sorting process to ensure it is suitable for production, after which it gets transformed from its raw state into more structured formats. In the world of data, this may constitute adding various layers of meaning to the previously raw and unstructured data sets, making them useful or understandable within a certain business context.
This cleaned and structured data can then be combined with other data through various analytics processes until it’s ready to be delivered to the end customer for consumption (e.g. business intelligence tools or other data-enabled applications).
Today, these activities have more technical descriptors such as data extraction, transformation, orchestration, observation, etc. However, in simple terms, just as with a manufacturing process, the basic idea is that as the data moves through the system it becomes increasingly more refined and of increasingly greater use and value to the end customer.

Take, as one example, corporate expenditures. For any company, every expense begins as unstructured raw data. Each data point gets created by spending that occurs on corporate credit cards, through ACHs, checks, and wires. All these expenditures eventually make their way onto a single ledger, which gets sorted according to general accounting guidelines. This is the work of an accounting team that reports how much money was spent based on these broad guidelines. But who spent it? Why did they spend it? What was the outcome or benefit? And, most importantly, was it a good idea? Answering these questions becomes the work of a different department — the finance team, typically — which must overlay more information atop the expenditure data set in order to derive greater meaning. They in turn likely pass the outputs of their analysis on to sales, marketing, and product teams, who then overlay their own additional data sets to derive even deeper decisions, such as what new products to build, where and how to market them, and how much to invest in a sales force to distribute them.
Ultimately, a prime activity of any organization is simply the gathering of data, sorting it appropriately, and labeling it with descriptions, all with the higher aim of finding patterns that indicate meaning or signal opportunity.
The cloud & mobile transformed everything
Data always existed, even before the advent of computers. But the introduction of software in the early 90s sparked a cambrian explosion in data that changed not only the volume created but also the ease with which it could be captured, stored, and analyzed.
Over the last decade two parallel innovations have unlocked a second explosion in the data space: mobile devices and the cloud.
Today, thanks primarily to our phones, everyone everywhere is constantly creating more and more data around nearly everything that they do. In fact (as is often now quoted), 90% of all the data in the world was created in just the last two years. And we’re not slowing down. It is estimated that by 2025 the world will create roughly 465 exabytes of data every day. For context, every word ever spoken by every human in the entirety of our existence would add up to only 5 exabytes. In short, the amount of data that we will create over the next several decades is so vast it is almost impossible to comprehend.
But all this data being created wouldn’t be all that interesting if we did not have a reliable way to store it and use it. Enter, the cloud.
Cloud computing and, more specifically, cloud storage had two huge impacts. First, by scaling compute and storage independently, cloud storage providers like AWS have dramatically reduced costs, allowing companies to pay only for what they use without being saddled with the expense of wasted capacity. Even before the invention of the power plant many companies used electricity, but it wasn’t until it became a central utility that it unleashed the industrial revolution.
Now, companies are not only able to store and analyze exponentially more data, but also smaller, earlier stage companies can operate data-intensive businesses without needing huge sums of money for infrastructure, which in turn leads to more innovation and more data-dependent products and services.
The second impact was the vast improvement to the speed and scale at which data analysis could take place. Though we humans possess the capacity for long-term memory, we struggle to quickly recall precise information without some assistance. Improvements in massive parallel processing and computing speeds have reduced querying times by orders of magnitude, allowing for an overhaul in the approach to analytics which in turn has widened the aperture of the kind of analysis that can be done. A clear example has been the shift from an “Extract, transform, load” (ETL) process to an “Extract, load, transform” (ELT) process. Prior to this evolution, speed was often a gating item, and companies were forced to slim down the scope of their datasets and do computationally intensive transformations before loading data into their own warehouses. With modern cloud databases, companies can load massive amounts of raw data into a warehouse and do a nearly unlimited number of powerful transformations extremely quickly.
In short, the limiting factor on companies will increasingly become their own imagination when it comes to analyzing and utilizing their vast troves of data.
The opportunity
We believe there is an opportunity to bring a broad, macro-based thematic investing approach to the venture and growth equity space. In other words, rather than attempt to select the one company out of thousands that will be most successful, we aim first to select the industry or sector that has the clearest potential to harness macro growth trends and use them as tailwinds; map the dozen or two dozen companies that are clearly leaders in the space; and seek to invest in them as long-term partners and shareholders.
And today, we believe the data sector (a.k.a. data infrastructure or modern data stack) is one of, if not the biggest, of such opportunities.
How big of an opportunity exactly? Well… how valuable do you believe meaning is?
Much like the ways the decade-long trends of mobile and cloud reshaped how nearly every company operates, we expect this next wave of data infrastructure adoption to have similarly widespread impacts (and, as a result, the companies that lead it to see similarly outsized growth in value).
As a relatively young technology company, we’ve experienced first hand both the challenges and benefits of building a business on top of a scalable layer of software systems to allow for both a greater volume of data capture, as well as speed of analysis and application.
But in the grand scheme of things, we’d expect that Fundrise is on the earliest end of the adoption curve for such technologies. Likely, somewhere between 95-99% of all businesses across the country are yet to integrate their operations in any meaningful way to the type of data supply chain we outlined above. Think of the local convenience store owner, independent online retailer, or small lawn maintenance company… or any one of the more than 32 million small businesses that exist across the US. As many of these companies have now moved to some form of online presence, adopted social media platforms, switched to cloud based sales and payments processing solutions such as Square and Stripe, the logical next step will be to begin to gather and utilize the wealth of data that exists to make their businesses more successful; this trend will create a tidal wave of adoption for new data infrastructure startups.
Unique advantages
Importantly, the current up-and-coming generation of great data software companies is both benefiting from and driving several trends that should make this cresting wave that much larger.
First, thanks to the cloud (and as outlined above), these companies benefit from the relatively low startup and scaling costs associated with not having to maintain their own hardware. So while the sector shares a similarly sized market opportunity as the large cloud providers themselves, they do not have the commensurate and often large CapEx costs necessary to build and run their own data centers. As a result they should not only be able to maintain the higher margins of software business but also be more nimble, acting quickly to iterate on and develop an ever improving product experience.
Second, because of the lower startup costs these companies are able to focus their energy on making their tools not only user friendly but, importantly, also able to be run and operated with little to no technical and engineering talent. It’s worth noting that much of what this next generation of companies offers has already been available to those large organizations with both the engineering resources and deep analytical skill sets necessary to build such capabilities in-house. What these new tools are doing is expanding the benefits of better, more accessible, more available data to those small businesses who do not have such in-house resources (and likely will never employ the engineers and data scientists needed to develop them).
Third — and, again, similar to the description above — many of the future customers of these companies have already adopted other software tools or digital-based products for their payments, accounting, marketing, sales, etc. Each of these previously adopted tools can now be connected to and integrated with new data infrastructure systems. In short, all of these tools are already generating data today — the companies that use them just need to connect their systems into a modern data supply stack that can start to provide these businesses with even the simplest forms of analysis to start to incorporate into their operations.
It’s because of the convergence of these macro and micro trends that we believe the opportunity within the sector is so unique.
Challenges and risks
To be clear, the strategy of focusing on a specific sector and building broad-based exposure to it does not mean that every company within that space will be successful, or that some will not fail outright. In fact, both will likely be true.
It also (as has been shown recently) does not mean that one can simply ignore the fundamentals around good investment practices, such as doing basic diligence or understanding relative price / value. We do however believe that over a longer investment period, investing broadly across such a sector has a higher probability of producing positive results than attempting to pick a needle out of a haystack.
The data sector and the new generation of data startups within it are also not immune from competition or challenges.
For one, as has occurred in other industries before, the benefit of having lower startup and operating costs also makes the space prone to competition from the large incumbent data storage providers such as AWS, Microsoft, Snowflake, and Databricks, who over time may decide to vertically integrate some of the independent tools created by these new companies as features within their overall suite of offerings. (However, there is also a strong potential that in such an instance many of these same companies would be as likely to be acquired / integrated as to be competed out of business.)
And lastly, more data in and of itself is not necessarily a universally positive thing.
Data without meaning is noise. In many ways, too much data can be worse than the absence of it because of the former’s tendency to become paralyzing. All of us have experienced the consternation of choosing among thousands of tv shows to watch or hundreds-of-thousands of articles to read. If we must select one out of millions, we feel almost certain that we will make the wrong choice. It becomes information overload and we end up spending all our time trying to make a decision rather than experience the benefit of it.
Worse still, a proliferation of data can make it impossible to reach agreement on true causes and effects, for with enough data, anyone can find the patterns necessary to either support or attack most every argument.
This phenomenon, known as “p-hacking,” can be a dangerous side effect of the seemingly innocuous decision to be “data driven,” and we suspect it is one of the primary culprits behind the struggle of many large groups and organizations to act because of an inability to reach broad consensus on any shared understanding.
Of course, a silver lining in such a challenge is the opportunity to automate the data analysis and decision making process — which is itself the foundation of the opportunity for arguably an even more cutting edge sector of artificial intelligence and machine learning.
--
Our goal here is to share with our investors our thinking around the data infrastructure sector as an area of focus within our broader investment strategy for the Fundrise Innovation Fund, as well as to give context for specific investments that we may make going forward. We expect that over the next several months we’ll provide further detail and insight into both this sector and the specific companies within it that we find particularly interesting.
Additionally, we will share with our investors similar thoughts and analysis around other sectors that we intend for the Fund to focus on in the near term.
— The Fundrise Innovation Fund Team