
Daily robotics journey updates
Published on May 5, 2025
What is this? Every day in May, I'll write about what I worked on, key insights, and new questions that came up. Sharing this here helps me stay accountable and forces me to reflect on whether what I am doing day-to-day is efficient in my pursuit of May: Do I want to build a robotics company?
A bit sick and learning about point clouds
Tuesday, 27th of May
- I tried continuing to read the book AI for Robotics but couldn't concentrate because of my headache and light fever. The next chapter would have been about LiDAR and methods for analysing point clouds, so I watched this talk instead. It was mostly interesting to learn about the typical problems related to point clouds, namely, how to identify objects via segmentation, object classification, surface detection, and generating point clouds (e.g. 3D avatars) from a prompt.
- I talked to an investor whom I was introduced to and who helps connect cofounders. I also met Julian (founder of codarobotics.ai), whom I met via X. He is trying to build a world model and sell it to robotics companies for evaluations.
Flying to San Francisco & Started Reading "AI for Robotics"
Monday, 26th of May
- I'm moving to San Francisco for two months where I'll continue working and learning about robotics together with five other friends in a hacker house.
- I read the first 120 pages of AI for Robotics** on the plane. The book is less of a full guide for how to build robotic foundation models, but provides a medium-depth overview of the subfields feeding into robotics and the most important research in each of them.
Better chess piece detection & code cleaning
Thursday, 22nd of May
- I had 50 more images labelled, and the new model works much better. Not perfect yet, but I can now use this data to annotate new data even faster.
- I cleaned up the code for downloading and cleaning the data to make it reproducible and also to make it easier to wrangle the code if I lose files.
Board detection
Wednesday, 21st of May,
- My labeller finished annotating the first 60 images of data. It’s not 100% perfect yet, but it looks promising and is good enough to wrap a first version of the model into a workflow. Ishrat, my labeller, will continue helping me annotate more data. I paid €50 to get 60 images annotated, and it’s definitely worth the time it’s saving me. Someone on X suggested using SAM to do the labelling automatically. I’m curious whether that would work, but for now, manual labelling is working well, and I don’t want to lose too much time experimenting. I also found more datasets of chess pieces on Roboflow. If I want the model to generalise across different board types, merging these datasets to train a unified model could be interesting.
- Detecting the chessboard: I started working on detecting the individual tiles so I can build a FEN/PGN representation of the game. This blog post was very interesting. After rewriting parts of the code, I’m starting to get some good results!
Chess piece detection
Tuesday, 20th of May
- I'm flying to SF on Monday. I'm not sure whether collecting data here makes sense if I cannot use it in SF initially. So the obvious part to work on this week is 1) reading out the chess board and 2) feeding it into a chess engine to get the next move. I'm going to train a simple YOLO model to detect the pieces. I looked into existing datasets and found this one on Kaggle. I trained a YOLO model on it, but it doesn't transfer well to my data. I need to train on more of my own images, maybe playing 1-2 games and taking positions of every other position from different angles will do. I'll use a freelancer I worked with before from Upwork to do the labelling. I looked at labelling platforms and decided to go with Roboflow (which is more of a model training company) to annotate the data, as their interface is fast and downloading the images with labels is easy too. Below is a picture of the predictions of a model trained on a different dataset applied to one of my pictures.
- I met Simon in person at the office today - we originally connected a few weeks ago through my posts on X. He’s visiting Berlin and wanted to see the SO-100 robot I built. His family runs a precision steel casting business, which is exactly the kind of company - a European manufacturing SME - that could be my future customer.Talking to him was incredibly valuable: we discussed how companies like his approach their processes, how much robotics they already use, where they struggle, and what they’re looking for. Simon is a founder himself and is considering building his next company in robotics. He’s one of those rare people who deeply understands both manufacturing and startups, which made the conversation especially insightful.He was also very kind and invited me to visit their factory once I’m back from SF - and even offered to introduce me to other companies in the region. I have no connections to SMEs and so I’m especially happy to have met Simon!
Back in Berlin & work on Substack
Monday, 19th of May
- I spent most of the day working on my Substack post about what I’m aiming to achieve during my one-month robotics sprint. Writing it all out in one continuous piece is an interesting challenge. It’s forcing me to connect ideas I’ve considered individually but never really thought about they connect.
- In the afternoon, I talked to a friend who’s trying to build rockets - actual rockets, trying to compete with SpaceX and Isar Aerospace. Then in the evening, I met two other friends, both with robotics backgrounds, but now working on AI agents. The first conversation left me way more optimistic about my own ideas than the second.The rocket friend made the impossible feel exciting. The robotics guys were encouraging but repeated how hard robotics is. They're not wrong. But something about the way they said it made the whole thing feel… heavy.I’ve noticed this before. Some conversations give you energy. Others consume it - not maliciously, but by being realistic in a way that starts to feel like inertia.Are those energy-giving conversations just sugar highs? Maybe. Doubt might be more accurate, but it’s rarely more useful. Especially when you’re still in the phase where the only way to find out is to keep going.
London
Thursday, 15th of May
- I'm trying to understand whether I want to move to Paris or London after the next two months in SF. A friend is celebrating her birthday and that was a good occasion to fly here.
- I met two VCs whom I had met before. One is an older friend working a Series A & later VC who I wanted to see again, the other is an early-stage VC that I had met only once before and who I want to get to know a bit before I'm in a position where I'm fundraising.
- Talk: Connecting Robotics and Foundation Models by Google robotics. Discusses several of their recent papers.
- They're taking a very different approach from Physical Intelligence and are doing more work where the robot plans its tasks explicitly.
- Diversity of data once more trumps quantity: A model trained on 75% of the available tasks performs approximately as well as a model trained on 50% of the data.
- Larger models are less likely to exhibit catastrophic forgetting when trained on new data.
- Talk: Bernt Børnich from 1x Robotics. He mentions the importance of data and connecting this with the last talk and the importance of the diversity of data I wonder whether household robotics with humanoids is so interesting to many companies is because of the data diversity you could collect there.
Symposium Münster
Tuesday & Wednesday, 13th & 14th of May
- I was invited to speak for 45 minutes about recent developments in AI and join a panel. I enjoyed giving a speech with a call to action to start building. You can find the slides here if you're interested. I had agreed to speak a long time ago. After considering how much time this took, I'm not sure if I'd do it again.
Monday, 12th of May - Day off
Zurich Robotics Hackathon
Friday to Sunday, 9.-11. of May
- We were a team of four and attached a SO-100 arm to a Unitree Go2 Quadruped with the ultimate goal of having it retrieve objects in the room. We trained an ACT policy on picking up a tin can and a roll of tape (~60 episodes per item). This worked with ~60% accuracy. We used OpenAI to perceive the environment, interact with the user via voice and to trigger the ACT policy when it detected that it was close.
- You can find our demo here: https://x.com/gdb/status/1921963245071475107 Greg Brockman, president of OpenAI, retweeted it, which I thought was super cool!
- I had to dig much deeper into the LeRobot code and learnt a lot. I want to spend way more time reading and working with the lower-level code, not just the higher-level API, going forward. It was really cool playing around with the Unitree Go2 and the Unitree humanoid. Also, I again met some very cool people. Interestingly, while each project worked at some point, the demos of all six teams failed in the final presentations!
- Flying to Zurich on Friday, I watched the videos from Sequioa's AI ascent. I particularly liked Jim Fan's talk on the future of robotics and Bret's talk on agents.
Moving from Phospho to LeRobot code & market research
Thursday, 8th of May
- Today, I wanted to understand the robotics market in terms of size and growth, and in particularly how that compares to the AI market. I created a table of current market size, CAGR and geo split. I'm not sure how to think about market size estimations and how much they should guide my thinking.
- Previously, I've used Phospho, an interface, to use the SO-100. It saves you a lot of time, but also hides a lot of the complexity. It was great to get started, but I want to write my own code and make modifications myself. To this end I need to use the LeRobot code from HuggingFace. It's going to be a bit of work to dig into the code base, but I already sense it will be very useful to learn about how to write interacting with a robot.
- I also recorded a new dataset with (1) a context camera and (2) red tape on the SO-100's claw to hopefully make things easier for the ML model. I used Phospho to train a ACT model but it crashed. These feedback loops are taking too long. I want to train multiple models on my own GPUs concurrently and also need more control of the interface. Another reason to move to the LeRobot codebase.
- One shortcoming of the SO-100 IMO is its short reach (~40cm) and low lift (~300g). I've been keeping my eyes peeled for which robot would be the next hardware level with more capabilites. Reading through the LeRobot code I now found out about the Koch v1.1 (building on the original version by Alexander Koch). It costs approximately 668€. I thought this might be an upgrade to the SO-100, but the reach looks similar and as for lift I might as well upgrade to the stronger servo motor for the SO-100. This is a cool demo though. The next level beyond the SO-100 probably is the Agilex PiPER (2,699€) as I understand it.
Wednesday, 7th of May - Day off
Pi0 paper, ACT model
Tuesday, 6th of May
- I found this project where someone built two robots playing chess against each other. He seems to be using digital markers on the video, together with an ACT model, to teach the robot where to lift the pieces. This is the second SO-100 chess project I'm seeing, in addition to this one.
- The ACT model I started training yesterday crashed. I restarted a new training run. Somehow this model also isn't working. Next step: Use context camera and try training a model diretly via LeRobot instead of Phospho. I'd like to be able to debug better by seeing the models predictions and being able to verify that it has access to the camera. Phospho is cool, but features are a bit limited at the moment.
- I read the π0 paper from Physical Intelligence. The most interesting things I learnt were:
- To train robotic foundation models, you actually want to pre-train on very diverse and 'messy' data, including mistakes of the robot and how it recovered from them. You need this to teach the robot how to recover from small issues. You then want to fine-tune on fast, clean execution data to teach the model how it should execute in an ideal case.
- LLMs are enabled by internet-scale data to train them on. Nothing of the like exists for robotics. Data collection is a huge moat for robotics companies. I wonder if a platform brokering training data among robotics companies would be a viable business? Physical Intelligence used 10,000 hours of pre-training data from 7 robot configurations and 68 tasks. That's 3.5 years of data collection if you assume 8h per day. Further reinforcing the importance of data is that they actively ask for companies that can collect data to reach out to them.
- Their key results are (a) pretraining on diverse data improves performances in all but one eval case. (b) fine-tuning for too long on a task can actually lead to performance degradation. (c) the model does bot have any 'emergent' properties yet in the sense that it has 0% success on tasks that it hasn't seen yet (d) by breaking a task into smaller subtask prompts the model achieves better overall success rates.
- Idea: Virtually every user working with the SO-100 uploads their data to Huggingface, where it's publicly accessible. It would be interesting to copy the approach of the π0 paper and extend a VLM to train on this data and evaluate whether this leads to faster learning of a task.
Aloha paper + new chess recording
Monday, 5th of May
- I read the ALOHA paper, which states that part of its contribution is the development of a low-cost arm for 20,000€. This struck me as extremely expensive. My SO-100 costs 200€ for a leader and a follower arm, and I was expecting the next quality level to be more like 2000-3000€. But the cheapest Universal Robot arm costs 23,500€ and robots by Franka are not that different. How are they so expensive while Unitree's humanoid costs 16,000$? It seems like many robotics start-ups are building their own arms to keep costs low, such as Tau Robotics or AQL Robotics. The cheapest arm I found was the Agilex Piper for 2500€
- I decided that by the end of the month, I want to have written a robotics 101 guide summarising everything I learnt. I started the post, did research on some robotic arm producers and researched the founding year and total funding of some of the big robotics start-ups.
- The model I trained yesterday doesn't work either. I collected another dataset of 50 e2 -> e4 pawn moves as I moved office today, and I want to rule out that the cause of the error is the change in lighting. The new model still doesn't work. It goes through the motions, but doesn't open the claw or touch the chess pieces. After asking on the Phospho Discord someone suggested that (a) I might be moving too fast in the training recordings (check out a recording here) and (b) there might be an issue with the contrast of the gripper and the chess piece. This leads to believe that I may have overestimated the ability of Gr00t and robotic foundation models as a whole. Next I will train an ACT policy to see whether that works better.
- Ordered a webcam as I hope a second video source will help the ML model learn, as well as grip tape and a mount for the camera on the table.
- Defined goals for this week:
- Learn how to train an Act, Gr00t, and Pi0 model with the SO-100 arm and do so with a toy example.
- My understanding is that the three methods above are the three big archetypes of robotic learning. I want to read the three papers, understand their differences between and summarise this in my robotics 101 blog post.
- Gather ideas for what I want to build this weekend at the robotics hackathon in Zurich.
Questions I developed today
- Will I have to build my own robot to iterate at a low cost, and have a robot that is a bit stronger than the SO-100? Robots from companies like Universal Robots and Franka are really expensive!
- The Aloha paper uses four cameras, two of which are on the grippers and another two which are filming from the top and the side of the table. How important is the number of cameras for good training results?
First training with the SO-100
Sunday, 4th of May
- I trained a simple Gr00t model to move a pawn from E2 to E4 using 50 samples, but it's not really working yet as you can see here: https://x.com/DominiqueCAPaul/status/1919029034895167952
- I found out that I wasn't training for long enough and that that might be the issue. I started with 10 epochs, then extended to 25, but without success. Starting a training run with 50 epochs overnight now.