Logo

Daily Blog

Published on May 5, 2025

What is this? Every day in May, I'll write about what I worked on, key insights, and new questions that came up. Sharing this here helps myself stay accountable and forces me to reflect on whether what I am doing day-to-day is efficient in my pursuit of May: Do I want to build a robotics company?

OpenVLA paper

Wednesday, 7th of May

  • Train a pi0/pi0.5 model.

Pi0 paper, ACT model

Tuesday, 6th of May

  • Read the Pi0 paper. Review the Gr00t paper and compare.
  • The ACT model I started training yesterday crashed. I restarted a new training run.
  • I found this project where someone built two robots playing chess against each other. He seems to be using digital markers on the video together with an ACT model to teach the robot where to lift the pieces. This is the second SO-100 chess project I'm seeing in addition to this one.

Aloha paper + new chess recording

Monday, 5th of May

  • I read the ALOHA paper, which states that part of its contribution is the development of a low-cost arm for 20,000€. This struck me as extremely expensive. My SO-100 costs 200€ for a leader and a follower arm, and I was expecting the next quality level to be more like 2000-3000€. But the cheapest Universal Robot arm costs 23,500€ and robots by Franka are not that different. How are they so expensive while Unitree's humanoid costs 16,000$? It seems like many robotics start-ups are building their own arms to keep costs low, such as Tau Robotics or AQL Robotics. The cheapest arm I found was the Agilex Piper for 2500€
  • I decided that by the end of the month, I want to have written a robotics 101 guide summarising everything I learnt. I started the post, did research on some robotic arm producers and researched the founding year and total funding of some of the big robotics start-ups.
  • The model I trained yesterday doesn't work either. I collected another dataset of 50 e2 -> e4 pawn moves as I moved office today, and I want to rule out that the cause of the error is the change in lighting. The new model still doesn't work. It goes through the motions, but doesn't open the claw or touch the chess pieces. After asking on the Phospho Discord someone suggested that (a) I might be moving too fast in the training recordings (check out a recording here) and (b) there might be an issue with the contrast of the gripper and the chess piece. This leads to believe that I may have overestimated the ability of Gr00t and robotic foundation models as a whole. Next I will train an ACT policy to see whether that works better.
  • Ordered a webcam as I hope a second video source will help the ML model learn, as well as grip tape and a mount for the camera on the table.
  • Defined goals for this week:
    • Learn how to train an Act, Gr00t, and Pi0 model with the SO-100 arm and do so with a toy example.
    • My understanding is that the three methods above are the three big archetypes of robotic learning. I want to read the three papers, understand their differences between and summarise this in my robotics 101 blog post.
    • Gather ideas for what I want to build this weekend at the robotics hackathon in Zurich.

Questions I developed today

  • Will I have to build my own robot to iterate at a low cost, and have a robot that is a bit stronger than the SO-100? Robots from companies like Universal Robots and Franka are really expensive!
  • The Aloha paper uses four cameras, two of which are on the grippers and another two which are filming from the top and the side of the table. How important is the number of cameras for good training results?

First training with the SO-100

Sunday, 4th of May

  • I trained a simple Gr00t model to move a pawn from E2 to E4 using 50 samples, but it's not really working yet as you can see here: https://x.com/DominiqueCAPaul/status/1919029034895167952
  • I found out that I wasn't training for long enough and that that might be the issue. I started with 10 epochs, then extended to 25, but without success. Starting a training run with 50 epochs overnight now.