How Lightwheel is Building the Simulation Infrastructure of Physical AI with Steve Xie


Steve Xie spent years leading simulation at Cruise and NVIDIA before founding Lightwheel — and in that time he watched simulation go from a tool that was "great for showcasing to investors" to what he believes will become the core infrastructure layer for all of physical AI.
In this episode, we sit down with Steve to break down Lightwheel's three-pillar framework for simulation infrastructure: World, Behavior, and Evaluation — and why getting all three right is what separates serious simulation from everything else. We also get into the physical measurement factory, the data scale that Lightwheel is hitting in 2025, and why RoboFinals may become the industry-standard benchmark for frontier robotics models.
In this episode we discuss:
- Why simulation started as a "toy" at Cruise and how Steve changed that.
- The difference between a visually realistic asset and a physically accurate one.
- Why Lightwheel operates one of the world's largest robotics arm factories.
- How egocentric data and simulation data work together in the behavior layer.
- The data pyramid: why real teleoperation is just the tip of the iceberg.
- Why academic benchmarks are maxing out and what RoboFinals does differently.
- How World, Behavior, and Eval form a flywheel — not just a stack.
- The agentic core Steve sees sitting at the center of it all.
- Why robotics data collection may eventually require a billion people.
About Steve Xie:
Steve Xie is the Co-Founder and CEO of Lightwheel. He brings over a decade of experience building simulation infrastructure across some of the most demanding environments in physical AI. Steve led the simulation department at Cruise during the early days of autonomous vehicles, then joined NVIDIA where he worked closely with the Omniverse team and developed his vision for simulation as next-generation physical AI infrastructure. He founded Lightwheel to build that infrastructure from the ground up.
Follow Steve on LinkedIn: https://www.linkedin.com/in/stevexiecbs/
About Lightwheel:
Lightwheel is building the simulation infrastructure that physical AI needs to succeed — spanning world generation, behavior data, and evaluation. Their products include SimReady Assets, EgoSuite for egocentric data collection, and RoboFinals, an industrial-grade robotics evaluation platform co-developed with NVIDIA.
SimReady Assets: https://simready.com/
Learn more at: https://lightwheel.ai/
Resources mentioned in this episode:
LW-BenchHub: https://github.com/LightwheelAI/LW-BenchHub
LeIsaac: https://github.com/LightwheelAI/leisaac
IsaacLab-Arena: http://github.com/isaac-sim/IsaacLab-Arena
Thanks to Lightwheel for making this episode possible. Learn about how Lightwheel is making physical AI successful at https://lightwheel.ai
Jonathan Stephens: It was quite early and simulation was mainly like a toy. Great at showcasing to investors, but it wasn't that useful internally. I was pretty lucky that I got to talk to Rive, who was the VP of Omniris at the time. And he showcased the vision of Omniris, which is the simulation platform of everything. So I started to see that the simulation is not just a tool or a time machine for self-driving, but it could be a platform, a next generation platform. a next generation infrastructure â for this whole physical AI industry. So that taught me, and there's more simulation and more passion about building my own company. That was Steve Shi, a founder and CEO of Lightwheel. And in today's episode, we're going to dive into what it takes to build a successful simulation infrastructure. And currently, Lightwheel is scaling up to be the number one company in simulation infrastructure for physical AI. In full transparency, I am the chief evangelist at Lightwheel. And I want to thank Lightwheel for making this episode possible. This is the Thinking Machine podcast. Now let's get to it. Steve, welcome to the Thinking Machine podcast. I'm really excited to have you on this episode. Yeah. Thank you so much Jonathan. You have been hosting this really amazing podcast and it's my great pleasure to be here. Yeah. So the goal for us on this podcast in general for the Thinking Machine is about getting just industry voices, people who have gone through the different experiences of holding startups in physical AI, people who are doing research. All sorts of different aspects. I'm trying to get different voices and I think what's great about having you on this is we'll intro you here in a minute, but is that you're building a company in physical AI and you have experience of working at not in just startup. You've done it with OEMs and large companies. So you have this really broad experience. What I'm excited to have you on, you're not just a researcher. You're not just a... company builder. I just want to start with that then, was your past. In the past you worked at, was it Cruise, NVIDIA, what were you doing at those companies that kind of led you towards starting Light Wheel? Yeah, so I was at Cruise back in 2018. So at that time I was leading the simulation department at Cruise. So basically Cruise is an alpha robotessy company and the simulation is extremely important for the company. to help them to do the testing in simulation and also to generate synthetic data to help to train the, so at that time the conservation, the prediction algorithms. But when I was there, know, it was quite early and simulation was mainly like a toy, you know. It is great at showcasing to investors, but it wasn't that useful internally. So, you know, we face a lot of challenges and also in internal doubt. And at the time, basically, I was mainly focused on two things there. The first one is being able to measure where we are with simulation, like how realistic we are with simulation, because I feel that's the most important thing. If we don't even know where we are, then how can we actually tell other people that they can trust us? So that's the first thing I focus on. The second thing is when we knew where we were, then how can we improve it? We mainly â improve it through AI, through generative AI at the time. And after we were able to do those two things there, â we were able to successfully showcase to internal teams that the synthetic data is truly useful to help to improve the algorithm performance, to help to benchmark the algorithm performance. â key goals for simulation at the time. So that's basically my experience at Cruise. Because I was doing a pretty good job at Cruise, it's pretty known in this industry, in Silicon Valley, then Nvidia wanted to talk to me because at that time they were looking for a simulation leader for their autonomous driving department too. So that's how I got into Nvidia. But I would say for Nvidia, it is very different than Cruise, right? Because Cruise, is one alpha company is vertically integrating everything itself. For Nvidia, is probably one of the best providers in the world, right? Building a platform for everyone. â So basically, I get to understand about the angle as a provider for OEMs, for other large OEMs of how we should be building simulation. And also at the time, you know, I was pretty lucky that I got to talk to Rive, who was the VP of Omniris at the time. And he showcased me of the vision â of Omniris, which is the simulation platform of everything physical, â from robotics to self-driving â to industrial simulation. And that's where I started to see that the simulation is not just a tool, our time machine for self-driving, but it could be a platform, a next generation platform, a next generation infrastructure â for this whole physical AI industry. So that got me, and there's more simulation and more passionately about building my own company of simulation. So, what did you see then at Nvidia? You you're working there, you're... You're building or you're working on the simulation. You're building that platform for them. the, would you want to leave? You know, that's a question I always get. Why would someone who is in leadership position at a large, you know, one these big companies that are making things happen. Why, what, did you see in that market that said, you know what I need it? I need to do my own thing. But totally. Uh, so I mean, Nvidia still is in my opinion, the best place to work. Uh, it's really, really amazing place. The culture is amazing. The people there is amazing. They are really working on the hardware technology stuff. You know, it's a really, really great place to work. But for me, I also have my own goal. You know, I always dream to be an entrepreneur. When I was a young kid, probably five years, I started dreaming, at myself to be an entrepreneur someday. Right? It is just my family culture that I wanted to be an entrepreneur. The more excited I was at Nvidia, the more â great work that I did at Nvidia, the more that I felt I was becoming more and more ready to start my own things. The other thing there is I really see that in order for this bigger vision that Nvidia is pulling off, the three computers, especially the simulation computer for physical AI, Nvidia also needs partners. And I could be a external partner, probably making much more contribution than as an internal employee. Right. So from those considerations, you know, I decided to chase my dream to find my own company. know, there's a DNA of an entrepreneur. can't pull them out of it. I'm not an entrepreneur myself. I mean, everyone is a little bit in their chair if they want to be, but I get that. I used to work at a in company as well, and I didn't like how fast they moved. Nvidia does move pretty fast on things, but just I wanted to work in cutting edge and what's more cutting edge than startups? Cause they're on the bleeding edge and it's got to work or it doesn't, right? So, know, that DNA, you can't get away from it. That makes sense. Like why would you want to leave them? You know, that's part of you. I admire that. So then you, you founded Lightwheel and what, one thing is Lightwheel builds simulation. infrastructure, right? That's what we do. That's what you do. I guess I get the question then lot of then isn't NVIDIA's Omniverse simulation infrastructure? What's the difference between what, you know, how's that different? Yeah, totally. So I would say, yeah, both of us are building the infrastructure. So I really think that, you know, simulation is such a a big problem to solve, but also a big opportunity. And everyone should be building. the simulation infrastructure. And by simulation infrastructure, I mean the physics solver. So I mean the SIM-ready assets. I also mean the framework â and also the applications to generate successful simulation. And by successful simulation, I don't mean â the typical graphics, pretty much like the gaming simulation rather than the physics. So first we need to understand about the mission. of â simulation for physical AI. The mission is not to let people to enjoy another game, To â enjoy the game in virtual world, but to help accelerate, deploy robotics in the physical world. So physical realism, physical accuracy, physical quality is the key of simulation for physical AI, right? And that's Actually, we are putting the most of the work. so I would say that for â LiveView, we are focusing on â to provide the second simulation, â which provides the most high quality physics. And in order to provide â the simulation for physics, then we need a few things there. First, we will need to have a really, really awesome physics software. A really, really awesome physics software. The physics solver needs to be â not only realistic, but also highly efficient to run on GPUs because we need to run them on parallel at a very large scale. And also the solver needs to be connected with a solver ecosystem. It couldn't be â like one independent solver, for example, maybe a rigid body solver, and that's it, no. It needs to be connected to every other solver. together as an ecosystem. So that's how we collaborate with Newton of NVIDIA. Because Newton is really a great, the next generation, GPU accelerated, physics solver ecosystem. And by doing that, not only â build, see the assets on top of it, we not only calibrate the solver, but also we are building the solver ourselves on top of Newton. Together building this open ecosystem. So I think the openness of the ecosystem is really, really important. So back to your point, know, Anita is building simulation infrastructure as we are too. And all of us need to work together in order to make simulation successful. So then I will say, this is the part of the physics software, which is extremely important. The second thing is actually the same, right, the assets. And I think to a lot of people's misconception, especially if the people is coming from gaming industry, coming from self-driving industry, they may not put that much attention of SimRadi assets because they can feel it's just a typical technical artist workflow. And I even get people asking me, hey, I can buy this asset, $9.9 on websites. And I was like, no, it's totally different. The $9.9 is just a... virtually realistic asset, but what we are looking for is a physically accurate asset. For example, a fridge, right? You need to have, you know, the magnetic force of this fridge. You need to have all the hinges, the joints, right? â In order to get it â accurate, to connect with the physical solver in order to generate the most accurate forces, right? So the SIM ready asset is extremely important because you need to calibrate all those physical parameters, the forces. But also you need to optimize it to make it highly efficient to run on GPU physics hours. So that's the SIM ready asset, which â LightWheel specializes in it. Then there is another thing. Is that where we started, by the way? Is that where LightWheel started with the assets? That's exactly where we started. Because, you know, we saw a problem that no one else wanted to solve. â They feel that â they are looking for other people to solve, but they are not solving it themselves. â But the more I look into it at the time, the more I found out it is truly intriguing. It is truly difficult. It is truly difficult. The more I work on it, the more I look into it, the more I found out it is extremely difficult to build. â the high quality SYNC Ready SS at scale. basically, from my part, I love the problems which is, I would say, â which is both much needed, but also extremely challenging. â So I love to solve those high impact problems. So that's how we jump into, why, and also how we jump into SYNC Ready SS. Yeah. But I would say that that's just like a very superficial level of this physical AI simulation problem. The â more we work on SimRadi ISI, the more we found out actually there's also another core problem need to be solved in order to build the high quality physical AI simulation, which is the physical measurement. â So why is that? So a lot of people may think, hey, you are building simulations. So everything you are doing is in computer, right? So I'm just looking at your computer screen so I can follow all of your product. But that's actually not it. We are operating probably one of the world's largest â robotics arm factory, right? With hundreds of arms of doing automated data collection, manipulation. with all the real objects, for example, the fridges, the microwave, the oven, the dishwasher, the table, all the different type of tools, things like that, in a factory. It's a real factory. Why do we need that? major reason- We're it physical AI factory, we're calling that, right? Oh yeah, we are calling it the physical environment factory. We are calling it the physical environment factory. I gotta get that right out. Well, it's starting to cut the other off. No worries, no worries. So actually we are showcasing a minimum factory in GTC next week, which I'm super excited of. But basically the core concept is the more we work on SimRadi assets, the more we found out, we need to scale the data collection of the real world, especially the real assets, in order for us to build scalable high quality SIM assets. So how about we start with this factory? Right. So we build a smaller factory and then we started rebuilding bigger and bigger factories. Right. So I would say this is extremely important because we, without all the physical parameters we collect from this factory, there is no way we can scale the productionization of the SimRady assets. Right. So right now the SimRady assets is mainly, I would say 99 % automation work, right. In that, you know, we have the 3D reconstruction, we have the automated physical factory data collection of measuring all the core physics of those real objects. Also, we are using the USD, which is the open standards of describing the SIM ready assets introduced by NVIDIA and also a few others. So, let me just kind of reiterate what you've told me because there's a lot of information you just gave me, which is really good. We're building simulation infrastructure at Lightwheel and we started with, you have to high quality physics solver, right? So that's basically an engine that does the math to make sure that physics interacting within a simulation environment correctly. So we have a great plastic bag with oranges example, where it's like that plastic bag to get the physics of that, you can make it look pretty in a game, but it's actually got to work exactly as it would in real life to our normal physics on planet Earth, or else when you transfer that simulation to a robot in real life, it will fail. that's the second one. can't even get the physics right in an engine. This falls apart. But then the next step is we have to take these assets. have, â F refrigerator we use as a great example, right? Everyone has one, everyone has opened up a door. It takes a lot of force to open it, but then also takes no force once that seal is broken. So we got to make sure that we are. building that into these assets. I'm going to pause you for a second, ask you a quick question on that. Now we've got the physics on that refrigerator. How important is it to look exactly like a â visual replica? Because I also wonder if that is not as important because we all use great vision models that just know what a refrigerator is for the robot. â Is that visual fidelity as important as the physics? â So I would say visual fidelity is also quite important. â The major reason there is, know, are so basically, though the VR is, you know, they are getting much more â visually data trained. They still need to look at exactly the kind of â fridge to find out the handle and other places in order to manipulate. But here's the thing, right? The visual quality is not that difficult to build, right? Because there's already the gaming industry, there's already the self-driving industry. So it is, â in my opinion, it is a solved problem. â The difficulty there is more in, I would say, â the optimization of efficiency, right? So you need to understand about the best trade-off. between the visual quality versus â the efficiency of â the eyesight, the simulation, the data. So that is the most important thing. If we put â extremely high quality visual there, for example, a movie-level visual, it would look extremely realistic. But the problem there is the it would take too much compute and it wouldn't be super efficient to run at large parallelizable jobs there. So from our perspective, we are actually closing a loop in that for every new asset we introduce, we actually close the whole loop of seem too real but also real too thin, meaning that we would actually leverage we would actually use the data generated from the simulation to train the algorithm and deploy the algorithm in SIEM and also in real of the real robotic arm and manipulate about the same object to test the efficiency of the data and also the quality of the data. By this, we actually understand where is the â best combination â that we should have good visual but no need to have the best visual there. So I see your background at Cruisons informing this where you â built simulation, but you got to figure out how useful it is. Same here, we're creating this SimReady assets in Sim, and then you have to then test it in Sim, and then back in World, you got to go in and out of simulation to make sure every time you do that transfer, you're still getting the same result. Right? So you're thinking that you had a good asset, you actually have to prove you have a good asset before you can release it. to the world to use. â Okay, so then getting back then, okay, so we've got the physics engine, we have the Simrity assets, which is very important. Like if you don't have those things, you really can't run simulation, right? â As you've been discussing the infrastructure layer that we're building here at Lightwheel, that's all encompassed in just one part. That â is the world, right? So we have... it also an environment. So we have to put them in an environment, a room â somewhere, right? And so then that's just one part. And then the next part is behavior, as you've described it as part of the infrastructure. Can you break down what that behavior component is? Yeah, totally. Yeah. Thank you so much Jonathan for bringing up this world behavior and also evolve framework that we use, right? So I would say the idea is actually the world is basically how we leverage simulation to build the physically accurate and also diverse worlds and also the similar data sites. The behavior is where we use the world to automatically generate data. And the data should be the robotics projectories. â So for example, a robotics, manipulating the fridge, the robotics opening up the oven, the robotics inserting cables. things like that, which is also extremely important. So behavior is actually where we generate the data, right? And also we provide this data to our customers in order for them to train their foundation models. So you need physics accurate data at scale. so we're using simulation so you can, as you said, are automating a lot of this collection, right? To go... put a robot out in a data center and have it pull a bunch of cables and reinsert them, you either have to go to a different data center to get a variation or you have to physically move these several ton server racks, put different ones in. â That's not efficient. that's where this lays, is this for efficiency that we're doing this all in simulation and not just putting out robots in real life? Yeah, totally. So we actually, have two ways of generating behavior data, which both are very important. So basically there is this data pyramid concept introduced by Yuka Zhu â from UT Austin. And also he is, you know, the co-leader of GEAR from NVIDIA. So Yuka's point of this data pyramid is, you know, the real teleoperation data, the real robotics data is very limited. Right? So it's just the tip of this iceberg, which is a very small part, maybe 1 % or maybe even less than 1 % of the data needed to train robotics. And the major data needed to train the robotics is one, simulation data, second, egocentric data. So human videos. And the live view, we really see the behavior as both two things there. One, simulation data. Two, egocentric, real human egocentric videos. And we are betting on those two things together. So for the simulation data, we leverage the simulation words. We leverage our auto data generation pipeline to generate synthetic data from simulation. And this data is extremely scalable. It's extremely fast. And also, it can be in different robotic forms to provide it to our customers. which is an extremely important data source. The second data source is basically we have collectors worldwide. By varying the egocentric devices, for example, the glasses, cameras, and also the wrist cameras, of doing different types of jobs. â A lot of the time they are doing jobs. â They are doing the house cleaning work, the housework, related work, chores â in their homes. Right. Others are doing â different type of works in factories, in grocery stores, in coffee shops, things like that. Right. And those data are extremely scalable, diverse too. And also the other good part is, is also from the real world. Right. And we provide those two scalable sources of data to our customers. We think are extremely important. I think so, especially now, not just to have SimTel Ops, because I think if I feel like at least one paper or more a week I'm seeing of researchers finding that if they mix in a large amount of egocentric data and a little bit of real robot data, they're getting fairly good zero shot results. They're getting robots to do things unseen and they're not quite sure the mix, they know that egocentric data and what you're building is high quality, right? So, you know, some of these teams are just scraping into. Then they get VFX, which is not physics correct examples. They're getting cartoons, all these things that you've to figure out, either filter out or just taste the data. What we provide, Lightwell provides is this high quality annotated physics, plausible because it's captured in the right way. We even have the arm state data if you want. â So you get this high quality data, and then you can use the robot tele-ops data. â get much further down the line. I think I'm seeing like even though he built a robot in seven months and deployed it. So right now it's all about how fast you can find the right mix. But us serving that all that different ways for them to test. feel like it's critical now. You can't just rely on one type of data. I totally agree with you. I totally agree with you. Yeah. The first half of last year, we delivered tens of thousands of data. But the second half of last year, we delivered probably close to 1 million. hours of data. And for this year, for just like this few months, we delivered more than probably two million office data already. So you can see the scale increases exponentially. So I'm really seeing this year probably we will deliver more than 10 million office data in total. I am seeing more and more projects are ingesting that say, oh, we have 44,000 hours of egocentric capture, because that's all they could find, right? Exactly. It's not that they don't want more. So if you go read their papers of this research, it'll say in the limitations, a limitation is we only had so much data that we could collect or we could find or that was high quality. So the thirst for it, there's infinite edge cases. We have an infinite complex world. You need as much data as possible to have a well-informed underlaying model. to do these tasks that we're asking them to do out in the real world. â And so that's what we're doing. First, actually, scale thing, But not just scale, you're trying to hit quality. Because more... Yeah, I would say the quality and scale are the two most critical, and probably the only two most critical requirements that we need to hit. Yeah, no, that makes sense. OK, so then we went with the environment, the world, that's just your assets. And then we do behaviors. And that could be egocentric, could be SimTel Ops. And then we need to test with these models that we're training work. So â tell me about eval. that's kind of like the, at the start of this video, I did an intro of talking about the concept of those three. Can you break down then eval and why that is? critical and at this point, don't see a lot of people, I see more people talking about it, but not enough people talking about it. Totally. So I would say, now we have the world, now we have the behavior, which is basically the data we generate. We are providing scalable, exponential data growth to our customers. And I would say now the bigger problem the industry faces is where we are, right? Now we have the enormous data, we can train our models, but can we really understand how well our model is? That's the major problem. So that's â the question that Evol is trying to answer, right? To help the industry, to help the customers, to help our passengers to understand where their robotics are, right? What are their capabilities? What are their boundaries? Things like that. Right. So, you know, â one of the typical problem of evolve for the industry, you know, is there are so many great open source evolves provided by academia, you know, there are behavioral challenge, is remote cast, there is Diro. They are amazing, you know, all the papers refer them and they are actually the, like, like I would say the backbone of this, this academia. â for robotics because you have to take a test from the barrier. You have to take a test from Robocast. You have to take a test from behavior challenge in order to showcase that your research is something providing insight to this academia. But the problem there is they are good for academia. They are not good for industry. The industry, we are running on much larger cluster of compute. Right. Many of our customers have tens of thousands of GPUs to train their models. Right. The industry is getting much more data than academia. For example, academia probably have thousands, if at most 10,000 hours data. That's it. But for industry, know, they are getting hundreds of thousands, if not million hours of data already. Right. And many of them are from us. Then the question is, okay. Where can I find the evolve? Where can I find the benchmark that can help us to understand the robotics capability at this scale? Right. For the academia benchmarks, know, the frontier robotics foundation models, they can take them on pretty well. You know, they are already reaching say 99%, 100 % of post-test. So it's high nineties. They're all exactly the same leaderboard. I don't know which one's truly better. Exactly. Exactly. They're all very similar. Now the problem is, okay, can you provide me a really difficult test that can run on large scale? Right? So both the difficulty and also the scalability, they are the two parts which is super important. And I would say that â the real Neuval cannot provide this scale. Right? Like, like, like, like â It is not possible to 10,000 homes, right? To deploy the robots in 10,000 homes, running evolve maybe say 10 times every day. It is not possible to do that, right? So we have to do it in simulation. So that's actually the place where we started RoboFinals because many of our customers, know, â they see we are extremely good with SimReady assets since tasks. and also open source benchmarks. They are like, hey, can you provide us a closed source benchmark to help us understand where we are? But it needs to be very accurate. It needs to be very high quality. It needs to be diverse. It needs to be scalable. So then basically we are working with the industry. We are working with the manufacturing partners. We are working with agriculture partners. are working with the hospital partners. We are working with many, and also we are working in homes. We are collecting all those real things. We are collecting the real tasks because you need to actually help the robotics understand where they are â and also benchmark them against the real world. So we get the real thing, we get the task, the real SOPs, right? And also â we â measure the real physics through our scalable physical measurement â factory, right? And then we put all of them. into high quality SIMRIDI SS SINs, tasks, benchmarks. Together we provide Robofinal, which is basically the industrial grade robotics benchmarks for the frontier models. So it's like everyone has been taking like, with the academic benchmarks, it's like taking a year-end final test in school. But then we're saying, okay, now we're going to make you do the full SATs. or taking a mortalized movie, you're trying to get into your master's program, you gotta take a much more difficult, let's see how well you truly can score on that. I guess I didn't even realize the scale of how many scenes are in Robofinals as well. So could academia use Robofinals? Is there use for it there, or is that more really just for like, Definitely. So I totally see that academia can use it. And actually we are provided as twofold. One, we are providing it as an enterprise version to the top robotics companies, players, and also frontier model partners there. And many of them are already adopting RoboFinals. The other thing there is we want to provide the infrastructure, I would say, for free to them. I mean, we are actually selling the benchmark, but we are providing the infrastructure for free because we want everyone to adopt this standard. If we can have the same standard of the â robotics evaluation infrastructure, then at least we are speaking the same language. And now let's talk about the more challenging tests. So we should be focusing on monetizing the test, but we should be I would say helping the whole industry of Demia to adopt â this infrastructure, which we open source. We open source together with Avidia, which is called Asset Lab Arena, which is basically the next generation large scale robotics evaluation benchmark framework. And also on top of it, we open source LW Bunch HAT, which is, you can see that it's the execution layer of the â the benchmark â on top of AssaLibrary. We want to help everyone to adopt it. And also in Alibaba Bunch Hub, we also provided quite a few open source benchmarks to academia, to industry. â can run on. There are many things there and many tasks there. It's extremely high quality, same ready as as. â So it's just a start. â This year, we will provide more. to this whole industry and also academia that everyone can run on. So I'll make sure I will link the Isaac Lab Arena and â the â LW Bench Hub as well in here, because I don't think people are necessarily aware that both of those even exist at this point. Maybe it's Isaac Lab Arena, but I think people get confused because they hear Isaac Lab. an arena thinking, I already know Isaac Lab. Well, that's different. Or there's Isaac Arena in the past. It's kind of a new thing. â Yeah, I see that. I like the fact that we're all working on the same like benchmark vocabulary, if that's the right term, but saying, okay, if you understand the benchmark and I've been using the benchmark, there's no, well, how do I know yours is better than mine if you use the benchmark system? So it kind of evens the field of... where you land. within, let's say I am a large company, maybe I don't know, maybe I'm Google's labs who've been building â the frontier model VLAs and they are using, let's say they want to use RoboFiles. With their benchmarks, do they just keep them internally or is that someone who's just saying I want to benchmark against what we deployed three months ago or are they trying to create like a leaderboard so other companies can compete against them? How does that work? Benchmarks are still kind of new to me. Yeah, totally. â So basically, we are also providing a leaderboard to the whole industry. â So basically, we are working on it right now. â So now we have the benchmark. We should be able to help to everyone to benchmark where they are, right? And also to showcase a leaderboard. So it's definitely also coming. It's definitely also coming, yeah. OK, so just not yet. And that makes sense. But as people adopt it and get more use, then we'll have a good leaderboard, because I think that's something people like to know. I have another, I have a new frontier model. Where does it land amongst others? It's amazing. You know, that one benchmark to use one or the other, but it evens the playing field of where everyone lines up. Totally. Yeah. It is extremely important to let everyone understand where they are. Not only to help them benchmark against themselves, say maybe a day ago, a month ago. but also help them to â find a benchmark against the industry, especially the top players in the world. So that's, in my opinion, the most important thing. it's a lot of work. are working on, I would say, scaling the benchmark because basically we are working with a lot of industrial partners. And also we â are getting a lot of data in order to build the most realistic, diverse, physically accurate scalable benchmark in the world, but also at the same time leveraging that benchmark to provide a little board to help everyone understand where they are. Yeah, makes sense. Okay, so we have the three different portions of infrastructure. We have the world, we have the behavior of the evaluation. â You kind of need all three. can't run simulation without doing all three, right? You need to build a world, do some actions. train a model and evaluate, yeah well it does. So you know where to improve it, where to create, get more assets, create more assets, try new behaviors. It's just almost like a flywheel or an ecosystem there. â One thing that I've been diving into is world models and how, where does that fit in that? Is that something that we're gonna be using for multiple parts of that ecosystem or is that more for just world generation? Yeah, it's basically everywhere. We are actually leveraging our model everywhere. And I really see that, know, work model and simulation, go hand in hand, right? â So it's fully integrated into, you know, our pipeline. For example, in the world generation, right, in the world generation, you know, you getting a lot of data from EgoSuite. So EgoSuite is LiveView's egocentric data product, right? And we have already delivered more than 1 million hours of EGLE Suite to the industry already. And our goal this year is to hit more than 10 million. â So for EGLE Suite, â it is bringing a lot of different type of real data. The real scene, the real trajectory â from human data collector, and also the real task. And we are leveraging world model, for example, Marble. to actually â digitize and transfer those real things into USDs and put into RoboFinals. Right. So that's basically how we scale RoboFinals, leveraging our ego split, our real data collection from the real world and also through our real-to-sign pipeline. That's been a great unlock, right? Yes. Our team can be spending more time creating some ready assets and improving those aspects and less time creating the part of the scene we're not necessarily interacting with us. But Marvel cannot put the backdrop that you still need to have. Exactly. don't want time creating. Exactly. Yeah. So I would say that that's extremely potent. And also in, for example, the behavioral layer, meaning the data layer, right? We have the auto data gen, which is like leveraging our SIMs, leveraging our â like automated data collection, algorithms, also leveraging world models, for example, Cosmos to augment them, right? To help us to provide more data for our customers. So that's also where we would use the world models. Yeah, I feel like there's a little bit of a misunderstanding of what all world models can and will do. I think some people are really focused on the real world. real time frame generation and think like a Genie 3 and you're walking around. But then there's Cosmos who's saying, hey, have a robot doing interactions in this kitchen. I want the kitchen to be completely different, but I want the exact same interaction. So we can easily redo that without having to actually go there or even setting up a new sim environment. Or we have WorldLives is Marble where you can just say, I want a scene. We don't need exact replicas most of the time. We just say, I need a kitchen of this theme or I need a warehouse that looks like this. specific kind of scene, you don't need a one for one replica of a warehouse, because we're trying to build models, we're building models that are generalizing across styles and scenes, one, I need it working more than just one kitchen. So that's the great thing, right? You can't just put out 100 kitchens in a day, where a 3D artist would take a month or more to create 100 kitchens, and Isaac's him. Totally. All right. Totally. So then moving forward, just kind of wrap this up. What part of that stack do you think will look most different in, let's say, two years from now of those three, the world? Where the world, we have the behavior, we have eval. Which one do think is going to really change the most? What we're going to see a big advance in? Or is it going to just have to be all three? Totally. So I would say first. We introduced the three layers, the work behavior, the evolve. But actually, there will be much more connections among the three and also between Rio and also SIM. We are actually truly building the flywheel. For example, we started as purely simulation. Then we found out that we need to scale simulation, we need to get real data. Then we get into the egocentric data collection. And now we are building a really scalable pipeline from Rio to SIM. Right. From ego, â from ego's feet, â from the behavior layer to actually the world layer to actually the evolve layer, which is amazing. Right. â but then the other thing that we are building right now is how can we best leverage our evolve layer? Right. Our evolve layer right now is we are providing the evaluation framework, evaluation process to the whole industry, to the academia, to help them understand where they are. But actually can we light our evolve layer? â to help us understand where we are with our work, with our behavior. So that's the thing that we are building right now, which is amazing. â We are actually leveraging our evolve layer to helping us to benchmark our ego suite right now. So meaning leveraging a simulation evaluation to help us. Then basically we would get this flywheel, Meaning basically from the real behavior from the real ego suite, right? We are getting into simulation. But also from simulation, we are actually improving our real data collection from Ecospeed. Then actually we are building the real flywheel. So this is basically the major focus I'm with right now to building the real flywheel, scalable flywheel, would say. From real to SYN, but also from SYN to real. So you would say that- Yeah, go on. Yeah, go on. I was going to say, in two years, you can't say one thing has stood out. because they kind of all need each other. They all grow together. â Yeah. And the second thing I see is for these three layers that is going into a loop, there needs to be a center. There needs to be a core. So what is the core? So what is the core? I really see that the core is agent. The core is AI native. So basically, we are... We are actually the infrastructure. We are not data operations. We are not simulation operations. We are actually the engineering infrastructure, â the data infrastructure, the simulation infrastructure of physical AI. And in order to scale this infrastructure, in order to scale data, we need to have a core agent that's sitting in the middle. That's helping us to understand about the data quality, the simulation quality. and helping us to improve it. So that, in my opinion, is another most important thing. â So think about this. egocentric data collection, in my opinion, in five years, there could be 100 million to 1 billion people collecting data for robotics. Why? Because there are millions of cars collecting data for Tesla. It's just self-driving. Self-driving is mainly vision-based. It is not that much physics-based. You don't need to touch the world with self-driving. You just need to make sure that you don't hit the other cars. You don't hit people, but you don't need to make sure that you can touch them. But for robotics, you need to touch everything. So the diversity, the physical parameters, â everything is much more difficult. the robotics is probably 1000 times more difficult problem than self driving and 1000 multiplied by 1 million you get a billion right so I really imagine that in five years you really to have a billion people collecting data somehow like either positive like either active or negative â so either active or passive â for robotics And in order to drive this large data infrastructure, it's better that you have, but you can automate everything. You can automate everything. Right? So I would say the agentic part is extremely potent. And also it's our core for right now, you know, we rather than having technical artists, like real technical artists, we have technical artists agents, right? Rather than having The data quality â manager, we have the data quality agents. Those are extremely important in our infrastructure, in our system. So I really see that in the future two years, the one thing is to have this fly view, to connect view and scene, and also connect back scene to view. The second thing is to really strongly the core, to let agent drive everything. That actually connects well with the first podcast I put out with Kamati Richards. He's ex-Amazon and he built, he's building an operating system for robots. And again, it's like that agentic layer. He's like, the only way that we're going to do this fast and even at scale is that we're not having a human sitting there doing the menial tasks. We're having an agent handle that and we just guiding it because there's enough information out there now. It shouldn't need that one person sitting there. touching code necessarily. So that makes sense with, you know, I'm hearing that from multiple people now in robotics saying we'll be interacting with an agentic layer. Now ecosystems being built. â There'll still be a lot of people having to be building in the background, but as an end user, it will no longer be because Jonathan's saying, Hey, want to, I to, I don't know. I need this kind of environment to test and train on this. Well, I'll just tell light wheel. agent to glare that and it will pull the pieces it needs, generates the backgrounds it needs, â does the tests and then does the benchmark, right? There's not a human sitting there just moving files around all day long doing that. Totally. Yeah. I will say that next time you can interview my agent maybe. Yeah, yeah, yeah. You'll be sipping some cocktails somewhere on the beach while your agent's speaker goes. Yeah, exactly. I'm looking for the day that I will report to my agent. Yes. Yes, for agent. Well, that's a good part to just kind of end this podcast, Don. Steve, it's been great having you here. I hope people better now understand. All that light will... is bringing as far as simulation infrastructure and differentiate from perhaps when we talk about omniverses and infrastructure layer. Because they both are, they, it's like you can't have one without the other. You're really, it's a big ecosystem. â We're not even the whole pie for chain and building robots. There's, I know people working on hardware that are trying to solve tactile sensors and all those sort of things, vision systems that also we all need to work with. So I think you've said this. speeches as well at the end of your speeches saying it's not us, we're not solving everything. We're just, yes, trying to solve some of the hardest things we know how to do well. And it's going to take a whole entire robotics industry to solve this really hard challenge. Exactly. Yeah. We all need to work together to make simulations successful, to make robotics successful. Yep. And as you said, there's like billions of hours of data we could click. We still don't. have everything we need collected. So it's not something that one company is gonna do alone, but â we'll go far altogether as an industry. So I'm excited for that. Can't wait to talk to your agents and see what they have to say about that. Thanks for having me on the podcast and I'll see you. We'll do a check-in maybe in six months and see where we're at with this agentic layer. Sounds good. Yeah, thank you, Jonathan. I'm really happy to be here. Yeah, thank you. Well, that's it for this episode. Thank you for listening. And please subscribe on whatever platform you're listening on, because that makes a big difference in growing this show. And as always, I'm going to link all of the project repositories and other topics that we brought up in this episode into the show notes so you can find them more easily. This is the Thinking Machine podcast. Thanks for listening. See you in the next one.


