The text below is a transcript of the audio from Episode 48 of Onward, "Will AI Surpass Us? with Robert Nishihara, co-founder of AnyScale".
---
Ben: Hello and welcome to Onward. My guest today is Robert Nishihara. Robert is the co-founder of AnyScale. He and his team created Ray, a software that is now essential to building and scaling AI. It is used at virtually every company in the world that deploys AI — including Apple, OpenAI, Uber, Spotify, Amazon, and Nvidia.
Robert is truly knowledgeable about what it takes to build machine learning and artificial intelligence and so I was excited to ask him the many questions I have about AI.
For full disclosure, AnyScale is a Fundrise portfolio company. Before we get started, I want to remind you that this podcast is not investment advice, it is intended for informational and entertainment purposes only.
Ben: Robert, welcome to Onward.
Rob: Thank you so much.
Ben: So I'm excited to talk to you about AI 'cause you know more about AI than I do, but that's for sure. But before we do, I wanna establish your background so people know where you're coming from and why I'm excited to talk to you and ask you questions about ai. So can you give a thumbnail sketch of how you got here?
Rob: Yeah, absolutely. So going back to the start of my career, I was a PhD student at uc, Berkeley, working on machine learning research. I. And this was five or six years ago. A lot of the work we were trying to do at the time was designing algorithms for reinforcement learning, deep learning training, learning from data optimization algorithms, these kinds of problems. And I got interested in machine learning research because machine learning seemed like such a potentially impactful. Field. If you can make breakthroughs or advances in our just ability to learn from data generally, then that could lead to breakthroughs in a lot of other areas. So that's what I got started on in around 2013, partway through the PhD program I began working on, I. Systems and tools for AI instead of AI research directly. And the reason for this was that AI is very computationally intensive. If you are building models, if you're deploying models, you're often running these things on GPUs and other hardware accelerators, and often not just one, but a number of these GPUs and servers in the cloud. And so even though. My coworkers and I were trying to spend our time designing algorithms. What we actually found ourselves spending much of our time doing was the software engineering to scale these algorithms. So managing clusters of servers or getting these algorithms to run on cheaper, unreliable hardware or moving data around quickly and shuffling the data and these kinds of software engineering challenges and that was a huge bottleneck for our research, and this was a bottleneck for many other researchers working in AI at the time where if you wanna do research in ai, you often spend a lot of time doing software engineering, building the tools to actually run and scale these algorithms. And so I. We thought it would be important or valuable to build better tools for scaling these types of workloads across a bunch of hardware, a bunch of computational resources.
And so that led us to start an open source project called Ray. So Python is the language that everyone uses for ai. And Ray, you can think of as scalable Python. It's a way to. Take Python applications and programs and modify them to run across a bunch of GPUs or other compute clusters. And it's used for things like training models, deploying those models, doing model serving, processing a bunch of multimodal data, different types of data. And so we were working on Ray for awhile in grad school. Of course, in grad school, the main output of your work is publications, like you write papers. And here we were doing something a little different. We were focused on building open source software, but we were just excited to build the open source software, get people to use it, recruit other grad students to work with us and help develop the software. And so by the time we graduated, around the time we were graduating from Berkeley and thinking about what we wanted to do next. Philip, one of my co-founders, Jan, one of my co-founders, we wanted to keep working on Ray. We thought this is an important problem. AI seems to be taking off, and if AI is gonna take off, the need for scale is only going to grow. It's still a hard problem, like we've been working on Ray and trying to make it easier to scale these AI algorithms, but we hadn't solved the problem. It was still hard to do, and so we wanted to keep working on it and. We started any scale to commercialize Ray. That was in 2019. But looking back when we started any scale, the whole point of Ray is scale distributed computing for ai. Back at the time in 2019, the market for distributed computing for AI didn't really exist. It was early adopters. The really, the earliest adopters. I think we had only. One or two serious production Ray users at the time that we started the company. One of them was a Chinese company called Ant Group, which makes Alipay and they continue to be a power user of Ray today actually.
They run about a third of their compute on Ray, and this is for, you know, large GPU clusters for model serving, online learning, batch inference, many different types of AI use cases. So they were the first serious production ray user. Then in the next few years. Uber started using Ray. Pinterest started using Ray to build all of their models. And the real inflection point for Ray was chat, GPT and generative ai. When chat GPT happened overnight, every business started caring about AI and the need for scale just became far more obvious. So you have larger models, you have more data. People are talking about scaling laws. So what we were pitching with Distri, with Ray, with distributed computing for ai. It started to land and that was a real inflection point for us.
Ben: Okay. This is gonna be so fun. 'cause I didn't even think about it until you said it, that we can talk about China too, which is another one of my favorite topics. So. You're so modest, but now you have companies using it. Every major AI company uses Ray Open, ai, anthropic, and then Uber, Canva, Spotify, Netflix, DoorDash, Instacart, isn't it widespread among every major company, Amazon.
Rob: Yeah, it's very widespread. To give you a few examples, you mentioned Amazon. Amazon actually wrote a blog post recently about migrating. An exabyte scale data processing workload to Ray and saving over a hundred million dollars a year by making that switch. And that is pretty stunning from my perspective.
Deep Seek. They're open sourced a bunch of their internal software. They're using Ray for some of their data pipelines. We work with foundation model companies, tech companies, AI startups, biotech finance. It's a variety of impressive use cases that are really stunning to see, and it's very different from when we started the company.
Ben: I wanna come back to any scale, but I want to go to AI for a bit because I feel like I'm like, everybody, there's this revolution happening and then it's both exciting and a little scary. And here you were, an AI researcher. I dunno if you'd still count yourself as an AI researcher.
Rob: Well, I like the idea of it.
Ben: You're an AI researcher to me, and you've developed algorithms, and so what is the day-to-day work of developing an AI algorithm? Like what does that actually mean?
Rob: That's a great question, and so this has changed quite a bit over time. So when I started the PhD program, I. A lot of the progress in AI was driven by computer vision and by the ImageNet. Benchmark. ImageNet is an object recognition benchmark. You have a bunch of images and then labels like this is a car, this is a dog, et cetera, and you try to build models that learn from the data, and then you run them on your test set and you see how accurate you are with labeling the objects. Now, a lot of the innovation at the time was going into modeling. How do we. Design a better neural network architecture or perhaps a different optimization algorithm to more efficiently learn from the data. This is an interesting blend of both thinking about the right way to design a neural network, you know, has the capacity to learn the relevant concepts, and it's also a blend of. Thinking about what is computationally efficient to run the model has to be powerful enough to represent what you're trying to learn, and it has to be something that you can actually run efficiently to take advantage of. Hardware I. And parallelization and things like that. And so every year somebody would come out with a new neural network architecture that did better than the previous one. And now the nature of the research, of course, you still have advances in modeling and ways to optimize these neural networks, but a lot of that research has shifted to being about the data. With ImageNet, the dataset was static. You have your ImageNet dataset and it's split into your training dataset and your test dataset, and it's just a given, and all the innovation goes into other places.
Now, the paradigm is totally flipped on its head where the data is really the variable that you optimize over. Now, benchmarks typically don't have training sets and test sets. They just have the test set and you can train on whatever you want. And the choice of what data to train on is one of the main things you are optimizing over and trying to solve for. And. People put a lot of energy into and spend a lot of money on buying data, selecting the right data, generating synthetic data, and there's a huge amount of effort that goes into that.
Ben: Let me try to make sure that's clear. 'cause it's so interesting. I dunno, how many images was AlexNet originally?
Rob: Oh man, it was millions. I don't remember the exact number. I.
Ben: Okay, well, let's just say 3 million Point is that there were 3 million images, and then the whole effort was to design an algorithm or algorithms that would be able to recognize images in that dataset, like cats or dogs. And the more accurate it was, the better the algorithm and people. Or develop new algorithms.
You're saying neural networks. I feel like most people dunno what that is, including me. So we may have to have you explain, but before we do that, what happened was people started saying, okay, actually the transformer, I dunno how much the transformer has changed. The GPT actually is not the thing that's changing as much as the data is changing.
So it kind of flipped on his head the thing you were changing to get better outcomes.
Rob: Yeah, it's an important conceptual shift. It sounds obvious when you say that data is central to ai, but there's a lot of additional maturity that we have around our understanding of that. Now, the quality of the data matters a ton. I.
Ben: And the scale of the data.
Rob: On the scale of the data, I think there's some open questions here about quality versus scale.
I mean, certainly we know that scale. A lot of the results we see with pre training language models, uh, are direct consequences of scale. But for post training and things like learning a new skill, learning to be a good software engineer, learning to perform some agentic task, do you need millions of. People generating tons and tons of data to learn that task. Or do you want the 10 best people in the world at doing it to generate the highest quality data and learn from that instead? And I think there's some interesting open questions around that.
Ben: Now there's so many things I wanna follow up, but let me just go back. Why don't you explain to me what a neural network is, because I see the image of the layers and the para parameters, so I'd be interested to hear your description of it.
Rob: There are different ways to explain this, and I think you can just substitute the word algorithm or function in your head. I think a lot of people are maybe familiar with the idea of linear regression, or if you have a plot of data points like X, Y pairs, you're trying to determine how a different value of X affects the value of Y. Maybe X is how many cigarettes a day you smoke, and Y is how long you live, or something like that. And you plot this on a piece of paper and you fit a line to it, and that gives you a way to take X and predict the output. Y.
Ben: Yeah, you're the line of best fit.
Rob: Yeah, and that's of course a very simple setup, but you can imagine, okay, what if X is not just a single number, maybe it's two numbers or a hundred numbers or a million numbers. Y is also not a number. I'm not just predicting one number. I'm predicting a bunch of numbers. And you can imagine X is an image or a sentence. You can turn a sentence into a bunch of numbers. You can turn a video into a bunch of numbers, and why is. Some property of that video or that sentence that you're trying to predict instead of going from X to Y, where it's just one dimension to one dimension.
It's like many dimensions to many dimensions. And instead of just fitting a line, maybe you're fitting some more complicated curve like a polynomial or something even more complicated, and that neural network. Or deep learning model or function is just some more complicated curve that you are fitting to your data. And interesting thing about these neural networks is that the way you construct a more complex curve or function is by composing a bunch of simple functions together. Like when you're algebra operations and you start by. Just initializing it randomly. You generate a random line or a random curve, and you iteratively look at some of the data and tweak the curve to better fit those data points, and then sample a few other data points and tweak the curve to fit those better.
And you just keep doing this until eventually it's pretty good. I don't know if that explanation helps things or confuses things more.
Ben: Definitely both, but lemme go back to the interest. Point you made about quality versus quantity. So, okay, let's talk about quality for a minute. So if you're trying to develop, say art for example, and you want it to be good, you don't wanna train on bad artists, you wanna train on good artists, but the less data you have, then don't you need better and better algorithms, more efficient algorithms, because you don't have as much data.
So is that a dimension that's still seeing a lot of progress, or actually that's really not the problem.
Rob: So you're just saying the quantity of data. I don't want to diminish the importance of scale of having lots of data, but I think it'll be different for different stages of training. So the pre-training stage, especially where you are learning. To model the world. What word comes next or what frame in the video comes next?
Just predicting the next thing. I think that especially benefits from tons and tons of data. I think some of the questions are around after that for the next stage of training where you are learning some perhaps more specialized skill. What type of data do you want? And there I think. With humans as an example, people can learn skills with one or two examples, often, not even with an example, but case studies in business school, not that I've been to business school, but in business school you do a lot of case studies.
You learn from cases. There aren't that many case studies. You do a handful of these things and you talk about them in depth and you learn a lot from each one. And that gives you an idea of how to. Generalized to future ones. I actually think that there are intermediate, you can sort of interpolate between learning from tons and tons of data and learning from small numbers of high quality examples.
For example, with synthetic data. So I may be able to see a small number of data points or see a description of a procedure for solving a problem and then generate. A bunch of additional synthetic examples that I can then learn from. If you show me how to do a math problem, I might be able to make up a bunch of new math problems of the same variety and learn from and solve those in my head.
Ben: Great. 'cause I've been wondering about synthetic data. 'cause I also read that it, you know, it's lower quality data. So if you overtrain to it, you'd end up with a circular reference. You overfit to a synthetic outcome.
Rob: Maybe if you do it poorly. I don't think it has to be that way. Certainly there are ways things can go wrong. I think there are ways it can not work, but I think the question is more like, are there ways that it can be effective, can work very well. And I actually think that I. Synthetic data will be used and AI broadly will be used to improve data quality. So if you think about, maybe just really oversimplify things, the way that we've made a previous generation of models smarter is by gathering more data from the internet and training from basically more data on the internet Now that has started to reach its limits. And maybe you can still push that further, but I think going forward the path to building smarter models is not to get twice as much data from the internet. It's rather it's going to be to turn the data into higher quality data. And the way to get higher quality data is going to be to use AI to generate the data and curate the data. How do you get higher quality data? Well, one way is to filter out low quality data. Or correct mistakes in the data or generate additional examples based on your existing examples.
Those are all things that you can do with ai.
Ben: Filtering makes sense to me, but I can understand synthetic data. When you have an objective answer like math or software or even software is not objective, but at least it can compile. But things that are less objective. Like synthetic data sometimes feel to me like trying to pull myself up from my bootstraps.
I don't really understand how AI can generate data that makes the AI better without some sort of diminishing returns rapidly.
Rob: It's a great question, and I agree. It does sound circular and a little bit like it shouldn't work. When I first heard of this concept, it's like, wait, you're training on data generated from the model, but if the model generated the data, doesn't it already know that thing, but just because the model. Knows something. It doesn't mean that it has necessarily figured out all of the implications of the facts that it knows, and you can imagine getting smarter by putting in more data, or you can imagine getting smarter by putting in more compute and putting in more compute. Imagine it's like thinking harder about something the model has learned from stuff on the internet, but there's a lot of contradictory stuff on the internet.
There's a lot of stuff that's wrong. On the internet. And so you can imagine by having the model think harder about things and put in more computational effort, it could be able to become more self-consistent. It could identify what things are incorrect or resolve contradictions, or it could derive consequences of facts that it knows and red facts A, and it learned fact B and fact C, but it hasn't put them together to figure out some consequence of that.
So I don't know if that provides some intuition about why it should be possible to get smarter by. Thinking harder about something. I mean also with people, some stuff you learn by getting data, by reading and so forth, but other ways you get smarter by just thinking harder about something I.
Ben: The parallel of people is really apt and it has been a great way for me to. Cut understanding. I wanna go back to when you were talking about how you need to learn a model of the whole world, but when you need to learn something specific. If I'm abroad and I see something and it stings me once, I'm probably gonna avoid that thing.
How many times do I have to get stung before I say avoid that thing With the yellow stripes and black lines, you can have sparse amounts of data and still learn the thing and that's how people are. And so it makes sense that that would extrapolate out. But there's parts of the predictions about the future 'cause there's a lot of people predicting the future where I feel like the extrapolations start to break down for me. And so one of those is the idea of AI liftoff, or AI is developing its own algorithms and getting recursive and doing it really fast. And that's why I'm interested in what is it and what does it actually take to develop algorithms because you've done it.
And I'm trying to imagine how I can see AI actually helping people. I can see AI developing better algorithms. I'm not debating that. I'm just having a hard time seeing it. Having this precursive self enclosed space where it just has this explosion that goes a hundred times faster than people can think, and then lifting off.
Rob: Things will definitely get faster and AI is already useful for software development. Making developers more productive and helping with running research experiments, becoming more and more capable over time, and getting to a point where AI can independently think of new experiments to run, run those experiments, evaluate the results, and then come up with the next experiments. I think one area you have to be careful is it's natural to say, okay, ai. Can recursively improve itself? It's easy to go from that observation to thinking that progress will become exponential, that it will, there'll become like exponential trajectory and how smart these models get. And I think there are some assumptions built into that, around how difficult it is to get to the next unit of intelligence. If each unit of intelligence, I'm speaking in very vague terms, but is equally difficult, then yes, maybe this recursive improvement will lead to some exponential increase, but it's also possible that the next unit of intelligence is harder than the previous unit of intelligence can achieve. And so it's not obvious to me what the perceived rate of intelligence increase will be, what that curve will look like. I think it could play out in a number of ways.
Ben: Yeah, that's a good parallel. 'cause if you think about the amount of scientific progress a hundred years ago compared to now, there are order of magnitude, more scientists, more economic input going into it. But the amount of new ideas and breakthroughs that have come out of it have declined as a percentage of the amount of input it takes.
So it takes, I've seen this before, maybe you've seen it, maybe it takes 20 scientists a year to produce something where 50 years ago it would take three scientists a year. So it requires more input to get equal output of progress.
Rob: It seems likely that in any given field there's low hanging fruits and then progressive discoveries become more difficult. I think physics may be a good example of theoretical physics where you build these particle accelerators and to get to the next discovery. I. Maybe you need a particle accelerator that is orders of magnitude bigger, or you know, uses more energy.
And so even if in some ways our abilities are growing at a very fast pace, it can still take a long time to get to the next breakthrough.
Ben: Or like in the field of physics, the standard model was developed and there really hasn't been all the low hanging fruit, all the Nobel s sort of came from an era and they haven't made much progress in, they made a little bit of progress, but 50 years, it's like golden era for that industry or that profession was in the past
Rob: On the flip side, I do think. There's still a ton of low hanging fruit in a lot of
Ben: in AI
Rob: and it, well, I mean broadly across the world, the types of products people could build, the types of problems that could be solved. I think there are generally more problems that are worth tackling than there are people really trying to solve them.
Ben: Yeah, but the point I'm trying to come back to, and I'll come back to it over and over again, is that I have a number of friends who are anxious about ai. I have a close friend who said he's anxious about what it's gonna do to his career and what it's gonna do to his investment portfolio. I. And I think the closer you get to the work, I think the less anxious you are about it.
That's my opinion, because you just know about all the practical aspects of what it actually takes to make it happen.
Rob: Yeah, I have no idea what. That's gonna do to his investment portfolio, but there certainly is a case to be made that things can take a lot longer than you would like. Robotics is a notorious example of things taking longer than you would like. I don't know if you've ridden in a Waymo, but they're fantastic. Self driving cars are really starting to work well now. It's a great product, fantastic experience, but the first self-driving car demos were. I think over 30 years ago. And so, and it's not crazy for something in robotics to lag from demo to production to be several decades. And hopefully that'll be faster with humanoids and the progress builds on itself.
And so I think a lot of these problems will get solved faster. But things historically have taken longer than you would've liked.
Ben: And there's two leaps here, though I feel like that are merged in most people's heads. One leap is that AI becomes hugely useful. And it's a coworker with people and it's complimentary. And sometimes it's better than humans. Sometimes it's worse and the hole is greater than some of the parts. And then there's this other point which that it gets beyond humans and doesn't need humans and puts you outta work.
And everybody sits on welfare because AI does everything better than people. And I think that Second Leap is a giant leap, much bigger than the first. So it's the second one that I think most people are upset about or worried about.
Rob: The first one is a little easier to visualize, a little easier to see the path to get there. You can imagine a lot of, certainly imagine virtual coworkers as. A fairly natural extension of how we use AI tools today. I think a lot of us, myself included, are talking to chat GPT continuously throughout the day, whether it's for software engineering related tasks or for writing related tasks.
I actually was recently translating a blog post that Tencent wrote about how they use Ray for WeChat for serving all their models for WeChat. Fantastic blog post, but it's in Mandarin and I don't speak Mandarin, so I've been using AI tools to essentially translate it and the quality of the tools. It's not good enough where you can just plug it into Google Translate or plug it into chat GPT, and then ship the result.
I think the role that I've found. Is that of course AI tools are great now for generating text generating content, but where, and the bottleneck really becomes the review of that content. Where you need a person today is to manage these tools and to hold the quality bar to say, okay, this is good or that's bad.
You need to redo that part. Change this when one bottleneck gets eliminated. You may have another bottleneck that appears, and of course the tools become more capable than what you can accomplish with a single person becomes far greater. So I think the virtual coworker is a very natural extension of what we have today, but taking it a lot farther.
Ben: If we just move to the present, and I'm gonna get back to a GI in a minute, but if you move to the present, GPT five and other similar models probably come out in the next six to 12 months. And then GPT six, I don't know if that's like two to three years from now, but there will be a five that will be a six.
Feels like a really good bet. And so the gap you're describing right now between having to review the translation, I can see that closing. Really soon. But the idea of not needing people, or even beyond that, which is sometimes people call super intelligence, it just seems like people who are out there worried about that or even preaching it are underestimating the amount of bottlenecks in the world and the practical aspects as someone who's doing it and done it.
Like, I'd love to hear your thoughts about the kind of bottlenecks that are out there.
Rob: I think a lot of these people who are coming up with different timelines do think very deeply about it and present a range of possible outcomes. I think they recognize the. A level of uncertainty about how the future will unfold. One big bottleneck is reliability, and I'll go back to robotics for a second.
If you think about what is the gap, why does it take so long to go from a demo to production? We have very impressive robotics demos today, and I think over the coming year or two, you'll continue to see really mind blowing demos of robots doing all sorts of different household tasks or complex tasks. So what's stopping me from buying a humanoid in my home to cook dinner and all these things? The thing about the environment you're operating in, I don't have kids, but if I might have a little baby that's crawling around and this robot is holding a boiling pot of soup and moving it around and it's heavy and gears degrade and hardware degrades over time and wifi connectivity is unreliable, and then a lot of the hardware and sensors are not nearly as good as what humans have.
So the level of. Reliability that you need to achieve, not just for the span of one performing, one task, but really over the lifetime of the robot. It's an incredibly high bar, and collecting the data is very broad, right? To collect this type of data is very labor intensive, and we haven't yet been able to get the reliability for these kinds of complex tasks to where we want, and so we are starting by. Automating safer situations or lower stakes situations?
Ben: Right, industrial robots and stuff. So are robotics companies using Ray also as part of their infrastructure,
Rob: Yeah, yeah. We work with a number of autonomous vehicle companies. For example, companies like, you know Neuro are using Ray Zoox, Ford, BMW,
Ben: but like a figure or like those humanoid robots as well?
Rob: some of the foundation model companies, physical intelligence, some of these types of companies as well.
Ben: I think one of the reasons why people find AI so nerve wracking is that there are lots of animals that have the physical intelligence you're describing, but humans have this type of intelligence that we thought was special, made us unique and extraordinary. And so it strikes, I think, more at our ego.
And our identity that any other technology has. Probably a few hundred years ago that wouldn't have been so central to our idea of what's valuable when you had to defend yourself from Viking attacks.
Rob: What these models have been able to accomplish is incredible. It's very striking. A few weekends ago. I was solving with a friend and we were friends from math camp, like back in high school, telling each other different brain teasers or math problems. And one particular math problem he gave me, I was working on it a bunch, and then I came up with some solution and he asked, can you come up with a better solution?
And then he said, oh, three can get the solution that you just came up with, but it can't get the better solution. And then I was like, okay, now I have to figure this out just to prove that I can. Do math better than the model, at least for now.
Ben: Well, I definitely can't. Lemme go back to bottlenecks for a minute. 'cause I think about any scale and I think about you and think of you, the guy behind the guy. Because for any of these companies to do ai, they have to have infrastructure. And there's a heck of a lot of infrastructure. Your distributed compute is only a small part of, I mean it's part of, it's important, but there's so much that has to be working behind the scenes systems.
And before we got on the call, we were talking about different people who use Ray. And all of that stuff has to be essentially autonomous. For there to be AI liftoff, for ai, to get to a place where it's recursive, explosively, recursive of exponential growth and intelligence. It has to be able to do so many things that currently humans are running around fixing bottlenecks, faults happening.
There's all these problems. Every time there's a training run.
Rob: I don't know if it has to be fully automated, but I do think things are heading more in that direction. There's a lot of manual babysitting of these training runs as you're talking about, and the trend there is. Toward increasing automation around failure handling and, and this is today. The way this works is you have people who are on call engineers who wake up and look at all the log messages, what went wrong, run some diagnostics, try to figure it out, and then restart things. But not just for machine learning, but for running software services in general. You know, AI tools are starting to be really useful for not necessarily automating this. But assisting, helping diagnose what went wrong, helping suggest ways of mitigating the issue. The trend is for an increasing fraction of scenarios to be handled automatically.
And of course, usually it doesn't go down to zero, but it gets smaller.
Ben: Yeah, I mean, there's no question to me that it's making progress, but for the exponential forecast to come true. It has to be able to go at superhuman speeds, has to be recursive at this very, very high speed. If it has any human bottleneck anywhere in the system, it can only go so fast.
Rob: Yeah, I don't fully buy the exponential claim. A lot of times people use the word exponential just to mean something is large versus actually exponential. You know, something could be quadratic and people will refer to it as exponential. So what is actually exponential here? Like GDP is exponential. Like what's actually.
Ben: I feel like one of the best descriptions was on the Drk Podcast with Scott Alexander from Star Lake Codex, and the guy used to be open ai, and I'm forgetting his name, and their argument goes that. You hit a GI and then what's really different for computers is the speed in which they think. And so 200 years in human time is whatever, six months or one month in computer time, in real time.
And so they can just make a lot more progress, a lot faster. They seem like it just happened overnight, but I look at that and I think about all the, just go down the list of bottlenecks and. You know, you've lived it. For it to have like, for example, some sort of explosive growth, it probably has to redesign the hardware. It has to have different chips.
Rob: I think AI will. Will lead to better chips. There are many factors at play here. Individual chips will continue to get faster and so the, the serial thinking speed of these models can get faster and you get some speed up there. Now, of course, like with Moore's law, there will. Likely be diminishing returns there at some point. Then you have the ability to parallelize these things, right? To run many copies of a model and for them to share information, then go off and do their own thinking, come back, share additional information that can give you some speed up. Compared to the what you could do serially. And of course there is this, like you said, we're gonna design better hardware, we'll come up with more efficient algorithms, so there'll be progress in a number of dimensions.
And I do think there'll be vast speedups compared to what we can do today. There products like deep research are. Massive speedups compared to what performing that kind of task with a human before that was released. So I definitely buy the argument that many things will become way faster. If certain types of thinking and reasoning or the bottleneck, then we will be able to explore many different hypotheses, do it much faster, search lots of data much faster, draw conclusions much faster. And then yes, you're interacting with the real world. Other bottlenecks will appear. Maybe that is running some biology experiment and then just waiting for the cells to grow and then getting the results, and so other bottlenecks will appear. I completely buy the idea that, and I believe that many situations that are bottlenecks by just reasoning and thinking time will go away.
Or those, the bottlenecks, that bottleneck will go away. But, and maybe I'm just being nitpicky, but the word exponential doesn't feel very meaningful to me to describe this. I would just say it's a big speed up.
Ben: Was any scale's business, and the whole purpose of Ray is to manage one of these layers of complexity.
Rob: Yeah, look, with any new paradigm shift or new technology you have, like the emergence of the internet or the cloud or mobile, whatever it may be, you need a big infrastructure build out to really enable that new technological wave. And when I think about the infrastructure build out, there is a hardware component to that and a software component to that. And so. With ai, the hardware component to that is these hardware accelerators, the NVIDIA GPUs and the TPU and other similar things. But there's a huge software infrastructure build out that needs to happen to really connect the AI applications to that hardware. And in the same way that you need an operating system on your laptop, which runs the applications and maps them onto the hardware and your CPU and different devices and so forth. That problem is much harder. The hardware is much more complex for AI because they have all these different accelerators. The hardware can fail. It's at a massive scale, and the applications are also far more complex because of the scale as well. So the need for this software infrastructure layer in between that kind of connects the applications to the hardware is just far greater. And that is a big part of what we're trying to build. There's an open source tech stack emerging in which Ray is an important part of that. And if you look at, we mentioned Uber and Pinterest and Spotify and all these companies like Ray is central to how they run machine learning. I. And not to mention all these Chinese companies that you've talked about. There are many different pieces to this software stack for running computationally intensive AI workloads. You're probably familiar with PyTorch. PyTorch is a deep learning framework. It is responsible for running models efficiently on the hardware accelerators, like on really the low level GPU code and of course around pie torch. There is an ecosystem of tools, things like VLLM, which is an inference engine for Transformers, really running these language models in a really performant way. Or there are tools like deep speed for, you know, sharding these models across a bunch of different GPUs. So anyway, there's this high torch layer for. Running models efficiently on GPUs. Then there's the ray layer. What we're building, which is about solving the distributed systems challenges, the scaling challenges. How do I divide up the work between different machines? How do I recover from machine failures? How do I move data efficiently? How do I just manage different processes on different machines that are actually running the work? And then of course there's the Kubernetes layer, which is about spinning up containers and managing containers and running multiple workloads or users within the same set of compute resources. And anyway, there are a few of these tools like Kubernetes, Ray PyTorch, VLLM, which really are starting to come together to provide this software stack, this open source software stack for powering ai. Of course, this early days, and so this will evolve.
Ben: Right. I mean, we're. Are so early in ai, we're talking about AI getting to geometric versus exponential, so that's still a thousand times more than let's say today. Easily. You can imagine AI being everywhere all the time, and so there's a massive build out of infrastructure. I'm old enough, remember internet in the nineties, and this is just like pretty sad half.
You know, you had your 14 four modem and all that noise, and so there's a lot of scale left that's be necessary. I wanna go back to China for a second. 'cause it's interesting to me that the early adopters were Chinese companies and are still sophisticated drivers of progress, which now has become, I remember, I mean I, in the mid 2000 tens, there wasn't this sense of fear about China.
It's really changed. The environment's changed. There's a sense of competition. And then some. So how do you think about, 'cause deeps seek really woke a lot of people up to this competitive arms race with ai. How do you think about that dynamic and the realities of it?
Rob: So Ray has done very well in China. Just in terms of, you know, I mentioned Ant's Group, which makes Alipay was the first serious Ray user. I have no idea why they decided to adopt Ray early on when no one else was using. It seems like kind of a risky bet to make, but they did. And since then, of course, Alibaba, Tencent White Dance, a number of these Chinese tech giants have adopted Ray very heavily, and Tencent just wrote this blog post on how they use Ray to power all of their models for WeChat. And this is everything, of course, from traditional machine learning applications like recommendations and ranking to generative models, dealing with content generation or processing lots of short form video. Things like that. So we have seen a huge amount of growth there. Now you mentioned deeps seek.
One of the most interesting things about deep seek is that I think it shows that open source models are here to stay and that they're gonna be very, very competitive and very compelling. We've seen a tremendous amount of demand for running deep seek and running other, of course, the llama models as well, but for running different open source models because people have value. The ability to really control the model at a low level, to customize it for their use cases, to be able to manage the infrastructure. And of course there have always been debates about whether open models will be widespread or not, or if we're all, all be using just a single model from a single company. And I think it's pretty clear now that these open models are here to stay and are going to be very competitive.
Ben: That's good for you, right?
Rob: That is good for us. Yes, because as models are getting bigger and more capable and there's more demand for running these models, it also becomes harder to do it because you need to run them across a bunch of chips or you need to run them on a ton of data, and the scale is greater.
The compute required is greater, and the complexity is greater. There's so many ways that AI applications are getting more complex. The scale of the compute, the scale of the data, the scale of the model, the different types of hardware that people are using, different types of GPUs and accelerators, the amount of inference, time compute that people are running.
I. How long these models are thinking for the number of models that are being composed together. If you look at sort of modern age agentic applications, people are building, it's not one model. It's many models composed together in very dynamic patterns. So there's tremendous complexity. The more complexity there is with running these setups and the infrastructure, the harder it is to do and the more value we can provide.
Ben: Selling picks and shovels in the gold rush. In your view, is there any real lead over China? I mean in in practice China is very sophisticated on some dimensions and others is even the arms race a bad formula to be accepting? I.
Rob: Well, I think that there is a lead that the top models from open AI and others are better than deep seek, but it's very dynamic, right? These new models are getting released all the time and things can change quickly. One of the reasons I think open source, these open models are here to stay. Is that I think there are actually a lot of incentives to open source models that you've built. For example, if you have the world's best model, and I can see the case for keeping it proprietary and not sharing it, but if you have the second best model or the third best model, or the fourth best model, you're not the market leader. Then to build a business, you need to be the best at something typically, and if you have the fourth best model. But you could have the number one best open model. Having the best open model might be a better place to be than having the fourth best non-open model. And if you're worried that you might be giving up some strategic advantage, well, the model you're talking about is gonna be out of date in six months anyway.
There's gonna be much better models, and so any advantage you're giving up is temporary. So I think there are a lot of reasons to open source these models, and so I think that you'll continue to see more models being open sourced and they'll get better and better and and really be competitive.
Ben: Yeah, we're building an AI applications. We're in a way, we're up near the consumer and. One of the big challenges with some of the most popular or most leading edge models is just they're not steerable. They're really, really hard to get that model to give you the outcome that you know is right. Like actually I know better, at least in my area, than the model does.
And so Steerability, I think was undervalued. We invested in a lot of these companies, and I just have come away being like, if you can't steer it.
Rob: Yeah. One company we work with is Runway ml. They do video generation. They have some of the most impressive video generation results out there, and it's been fascinating. To watch what they've been building. The earlier generations of models pretty much just took a text prompt and generated a video, and you can generate such cool videos. But you couldn't really generate the thing you had in your head. You wanted something specific and you got something else that's really cool, but not the thing you had in mind. And it's been very interesting to see how they've tried to, with some of their more recent releases, make these things far more controllable.
Steerable, give more choices to designers and creative folks who are trying to go from some concept in their mind and give them a lot of different knobs and a lot of different choices to really. Creatively control the thing, and so just one example. But certainly to build useful products, we're going to need to make these things steerable.
Ben: Really good example. I was just looking at that today. If you're a director of a movie and the AI just keeps producing the wrong thing, I mean, it's just, you can't direct a movie with that technology. I.
Rob: Yeah,
Ben: It's not
Rob: we'll get there.
Ben: Yeah, I, the runway IML people, like I met the Greek guy. He was awesome.
Rob: Yes. Oh yes.
Ben: Just interested in your view of a GI.
What do you think it looks like? When do you think it happens? You think it's actually a nonsense term.
Rob: I don't think you have necessarily a consensus definition of the term, and you can come up with different definitions, each of which could be meaningful. I expect a continuum progress to feel continuous as opposed to discontinuous where I. There's a binary. We didn't have it today and tomorrow we do have it.
I think these tools are just gonna get better and better, and the pace will be quick in a lot of different domains. I think I expect coding tools to become far better in a relatively short period of time, and they're already extremely impressive and I think in tandem our capabilities to use these tools. We'll grow as we learn to use these tools and incorporate them into our workflows and figure out how to make the most of them we like as individual workers are gonna become far more productive and far more capable and able to do things we just never would've done otherwise. There are different ways to look at this. If you think about as models get smarter, does that mean people will do less thinking and less work and just more of the thought process that we have will get outsourced to the AI or. Will, the amount of thinking and reasoning that we do actually rise in tandem with the models. And I think one optimistic perspective that I have here is that the level of rigor in our thinking can increase. Dramatically because we'll be able to ask so many more questions and get answers to them. In the past, I think there are many questions we just wouldn't ask because you might read a news article with some statistics about some policy, and that was it. You might have follow up questions, but you couldn't ask the follow up questions about how the data was collected, whether they controlled for this thing or what happens if you slice the data in a different way. You wouldn't ask that question because you wouldn't be able to get an answer. But once we have the tools that are able to do research on our behalf that are able to investigate things, fact check things, then I imagine we're gonna ask these types of questions all the time. In the same way that because you can run Google searches, you fire off dozens of Google searches every day. We're gonna ask dozens of research questions every day and get rigorous answers. I think our standards for rigor in our thinking will really rise. 'cause that's something I'm excited about.
Ben: I hadn't thought about it this way. One of the things that was, uh, meme at one point was Idiocracy, that the whole country, the whole world would get dumber. But actually that seems almost impossible. With AI intelligence. Everyone will get smarter. I. And they'll be smarter and they'll act smarter. And especially with AI becomes agentic.
And so it's not even, you'll ask the research, you'll just make better decisions without even realizing it. And so that's optimistic. My optimistic side of it. And then the thing I worry about is at this DC Policy dinner, a bunch of people from a Republican, democratic mix of people who were in the White Houses and they were asking like what the policies should be around ai, and they just had no idea.
Rob: Yeah, I mean, policy is quite complex.
Ben: Well, the problem is with policy, I think this is true. This is not what people wanna believe. You can't really come up with a policy until you know what the consequences are.
Rob: Totally. One point that other people have made is if you have a lot of uncertainty about how the world will unfold, you don't wanna come up with a policy that is very strongly dependent on one particular future path. If you have a policy that assumes it's gonna be a great policy, if we have super intelligence in two years, but is a terrible policy otherwise, then that's a little risky.
You want policies that are robust to different ways the future could unfold.
Ben: Right, but you're not gonna get that. So, but, but anyways, here's my last question. My son asked me this, my son's 13, and he asked me what job or profession he should be training for, studying for that will be highly resistant to ai, the job he'd want, not some menial job because AI's really bad at having opposable thumbs or something.
So, and that's a long way off, but he's worried. What do you think I should say to him?
Rob: Of course, it's very difficult to know what will be the most interesting job in that timeframe. I have no idea. That said, I think there are general heuristics that you can follow that are probably going to lead you in a good direction and a lot of the skills I. Uh, that you learn being a good rigorous thinker, whether that is through math or programming or running science experiments, or being a good communicator, learning to really refine your thoughts and convince others like these are gonna be valuable skill sets for the vast majority of professions.
So I think you can focus on learning core skills that are. Likely very generalizable. I think as a young person in the first 20 plus years of life, you will often wanna learn very general fundamental skills and leave a lot of optionality open and not necessarily go super deep on one specific domain.
And then of course, at some point you have to start specializing in closing off options, but we'll have more information at that point.
Ben: When I was with the DC policy people, I asked them this question and it was funny to hear different answers. I said, can you teach agency? 'cause it seems like in a world where intelligence at your fingertips, then you need to be able to have high agency, you know, self-starter, autonomous. You want to go out and you can learn anything if you want to.
And I say, can you teach curiosity? Can you teach agency? Everybody said yes, which actually was really skeptical of
Rob: mean, I hope the answer is yes. I have no idea how to teach it, but I've always believed I could take the vast majority of people and get them really excited about math if I had the opportunity to, to try to do that.
Ben: that. Next time we'll do the next podcast on that one.
Rob: I've never had the opportunity to really test this.
Ben: You gotta go on three brown, one blue, or.
Rob: Hey, I do think there's a lot of low hanging fruit here in education.
The potential to teach people skills. far far faster than we currently do through coursework and usual lectures. Something I'm very excited for is people, just, myself included, being able to learn orders of magnitude faster.
Ben: Yeah, being a young person, I mean, that's what's so exciting. The amount of learning you're gonna be able to do is gonna outpace, think about a hundred years ago versus now. I mean, what people can learn, how fast they learn, what's available to you if you want it. It's exponential. It is exp. I mean, I don't know, maybe it's not exponential.
Maybe it's still geometric,
Rob: Yeah.
Ben: either way it was Robert, thanks for coming on
Rob: Hey, thank you so much. This was a lot of fun.
Ben: Onward. You have been listening to Onward, featuring Robert Nishihara, co founder of AnyScale. My name is Ben Miller, CEO of Fundrise. We invite you again to please send your comments and questions to onward@fundrise.com.
And if you like what you heard, rate and review us on Apple Podcast and be sure to follow us wherever you listen to podcasts.
Finally, for more information on Fundrise sponsored investment products, including relevant legal disclaimers, check out our show notes. Thanks so much for listening, and we'll see you next episode.