Research Notes on Humanoid Robots

After Elon’s announcement of the Tesla Bot [1], many people mocked the silly on-stage presentation and joked about the forever Level-5 Tesla Autopilot. The last time when humanoid robots caught my eye was the Honda Asimo, when I was a kid. I saved money to buy Robosapien the toy because that was the only bipedal humanoid robot toy available.

A decade passed, and the world has given us videos such as Atlas from Boston Dynamics [2]. Where are we at with the dream of humanoid robots? Can the Tesla Bot be delivered to happy customers in a reasonable time frame (< 5 years)? And the most crucial question: what commercial values do bipedal robots have now?

By no means of an expert, I set out to do some research to understand the scope of the problem, and the potential answers.

Why

Besides the obvious satisfaction of seeing a man-made object doing the most human-like behavior: walking, there are many more reasons why we need bipedal robots. It is also hard to distinguish whether these are practical reasons, or because the technology is cool, and we are trying to retrofit a reason behind it.

The most compelling answer I see so far, reasoned like this: even in the United States, where building codes have strong preferences on accessibility, many private venues are hard to access with only wheels. Getting on and off a truck with a lift is a hassle. There may be small steps when accessing your balcony / backyard. Even after more than 10 years since the first introduction of robot vacuum, these small wheeled robots still trap between wires, chair legs and cannot vacuum stairs at all. To get through the “last mile” problem, many people believe that legged robots will be the ultimate answer. Between bipedal, quadrupedal or hexapedal configurations, the bipedal ones seem to require the least degree-of-freedom (often means less actuators, especially the high-powered ones), thus, likely to be more power-efficient.

Many generic tasks we have today are designed for humans. It often requires the height of a human, operating with two hands, and two legs. A humanoid robot makes sense to operate these generic tasks.

The devil, however, comes from the fact that we have been building machines that can handle high-value and repeatable tasks for over a century. Many generic tasks we have today, either require high-adaptation, or low-value in itself. Through a century of standardization, these generic tasks are often both.

Have we reached a point where most high-value tasks are industrialized and the remaining tasks are the long-tail low-value ones that cumulatively, make commercial sense for a humanoid robot to adapt to? There is no clear-cut answer. To make the matter even more interesting, there are second-order effects. If the humanoid robots are successful in-volume, lower-end specialized tasks may become less economical to devise automated solutions in themselves. If you need examples, look no further than these Android-based, phone-like devices all around us, from meeting displays outside of your conference rooms, your smart TVs or even the latest digital toys for your babies. That is the economy of scale at work.

How

While people dreamed up the humanoid robots that automate day-to-day chores a century ago, it is a tricky question to answer whether we have all the relevant knowledge now to build it. We need to break it down.

1. Do we know how to build bipedals that can self-balance?

It seems that we knew this since Honda Asimo [3]. However, Asimo deployed what is often called the active dynamic walking system. It requires active control over every joint (i.e. every joint requires an actuator). This is not energy efficient.

Most demonstrations of bipedal robots fall into this category, that includes Atlas from Boston Dynamics. There are a few with some passive joints. Digit from Agility Robotics [4] is one of the known commercial products that tries to be energy efficient by leveraging a passive dynamic walking design.

Do we know how to build bipedals that can self-balance in any circumstances? Most of today’s research focused on this area: how to make bipedals walk / run faster, how to balance itself well with different weights and how to balance with uneven / slippery / difficult terrains.

For day-to-day chores, we are not going to have many difficult terrains nor to run parkours. From the system engineering perspective, falling gently would be a better proposition. On the other hand, to have any practical use, it requires to balance the bipedals with unknown weight distribution. Carrying a water bottle probably won’t change weight distribution much. But what about lifting a sofa?

2. Do we know how to build robot arms?

We’ve been building robot arms for many decades now. The general trend seems towards cheaper and more flexible / collaborative design. Many of these low-cost products consolidated around successful manufacturers such as KUKA or Universal Robots. New entrants that eyed low-cost such as Rethink Robotics had its misses. Acquisitions happened in this space and now KUKA, Rethink Robotics or Universal Robots are owned by bigger companies now.

However, high precision, high degree-of-freedom robot arms are still rather expensive. A sub-millimeter precision 6 degree-of-freedom robot arm can cost anywhere from 25k to 100k. A UR3e [5] weighs 11kg, with limited payload capacity. Arms with higher payload capacity weigh much more (> 20kg).

For home use, there are less constraints on what we can lift: repeatable precision probably can be relaxed to ~0.1 mm range rather than ~0.01mm range. Pressure sensors and pose sensors can be camera-based. We haven’t yet seen a robot arm that meets these requirements and costs around ~5k.

3. How does a humanoid robot sense the world?

A humanoid robot needs to sense the environment, make smart decisions when navigating, and respond to some fairly arbitrary requests to be useful.

During the past decade, we’ve worked very far on these fronts. Indoor LIDAR sensors [6] were used broadly in robot vacuums and the volume, in turn, drove down the cost. Any robot vacuum today can build an accurate floor plan within its first run.

Besides LIDAR, cameras came a long way to be high resolution and useful in many more settings. It can help guide the last centimeter grasp [7], sense the pressure or detect material [8]. The ubiquity of cameras in our technology stack makes them exceptionally cheap. They serve as the basis for many different sensory tasks.

4. How does a humanoid robot operate?

While we have much more knowledge on how to navigate indoor environments [9] to accomplish tasks, we are not that far with how the human-computer interface is going to work when operating a humanoid robot. Many works surrounding this were based on imitation, i.e. a human performs some tasks and the robot tries to do the same. No matter how good the imitation is, this kind of interaction is fragile because we cannot immediately grasp how generalized the imitation is going to be.

If we show a humanoid robot how to fetch a cup and pour in water, can we be assured that they can do the same with a mug? A plastic cup? A pitcher? What about pouring in coke? Coffee? Iced tea? The common knowledge required for such generalization is vast. But if they cannot generalize, it is like teaching a toddler how to walk - it’s going to be frustrating.

On the other spectrum, we’ve come a long way to give computers an objective and let themselves figure out how. We don’t need to tell our rovers on Mars how to drive - we only need to point a direction, and they can get there themselves. We also don’t need to tell our robot arms how to hold a cup - we only need to tell them to lift it up without flipping it.

We may be able to compose these discrete while autonomous actions to accomplish useful tasks. Humanoid robots, particularly those for educational purposes, have been working on the graphical programming interface for a long time [10]. However, I cannot help but feel these are much like touchscreen before iPhone: they exist and work, but in no way a superior method to interact with nor a productivity booster to accomplish things.

Where

If someone wants to invest early in this space, where to start?

SoftBank Robotics [11] has been acquiring companies in this space until recently. Their most prominent one would be the Pepper and Aldebaran’s NAO robot. However, they haven’t had any new releases for some years. Sale of Boston Dynamics doesn’t send a positive signal on their continuing investment in this space.

Agility Robotics [12] is a recent startup that focused on efficient bipedal humanoid robots. Their Digit robots are impressive and have been shipping to other companies for experimentations for quite some time. Their past Cassie bipedal robots are more open. You can download their models and experiment with MuJoCo [13] today. The Digit bipedal robots focused on the last mile (or last 100 feet?) package delivery. This puts them in direct competition against autonomous vehicles and quadcopters. The pitch is about versatility against autonomous vehicles on difficult terrains (lawn, steps, stairs), and efficiency against quadcopters (heavier packages).

Boston Dynamics hasn’t been serious about practical humanoid robots so far. On the other hand, Spot Mini has been shipping world-wide for a while now. The difficulty of a practical humanoid robot from Boston Dynamics comes from technical directions. Spot Mini uses electric actuators. Electric actuators are easy to maintain and replace. It does require some gearboxes and that can introduce other failure points and latencies. However, it can be modular thus serviceable. Atlas uses hydraulic actuators. While it provides high-force with low latency, it is expensive to maintain and breakages often mean messy oil leaks [14]. It would be curious to observe if they have any electric-actuator based variant at work.

Then there comes the Chinese. The Chinese companies working in this space are excellent at reducing the cost. UBTech’s Alpha 1E robot [15], a direct competitor to the NAO robot in the education space, is 1/18th the cost. HiWonder’s TonyPi / TonyBot [16], a much less polished product, is at 1/18th the cost too. It features Raspberry Pi / Arduino compatibility, thus, more friendly to tinkers.

That has been said, both companies are far away from a practical human-sized robot. While UBTech has been touring their human-sized robot Walker X [17] for a few years now, there is no shipping date and it looks like the Honda Asimo from 15 years ago. The company doesn’t seem to have the proper software / hardware expertise to ship such a highly integrated product. HiWonder doesn’t provide any indication that it is interested in human-sized robots. While cheap, it doesn’t seem that both of them are on the right technological path to deliver highly-integrated human-sized humanoids.

Unitree, a robotic company focused on legged locomotion, successfully shipped quadrupedal Unitree Go1 [18] at 1/25th cost of Spot Mini. Their previous models have seen some successes in the entertainment business. While mostly looked like a Mini Cheetah based [19], it is the first accessible commercial product on the market. It remains to be seen if Go1 can find its way into homes, and if so, whether that can help the company fund other quests in the home sector. The company has no official plan to enter the humanoid robot market, and even if they do, the technology requirement would look quite different than a Mini Cheetah variant.

Roborock’s robot vacuums [20] have been quietly gathering home data world-wide for some time now. On the software side and the software / hardware integration side, they are quite advanced. Their robot vacuums are generally considered to be smart around navigation. Their all-well-rounded robots have been gaining market shares around the world against iRobot at the same time, also raising the price steadily. It has been remarkable to observe how they can do both with better products. They haven’t had any stated plan to ship legged robots, not to mention a humanoid one. But their hardware has been most widely accessible among the above companies. Their software has been tested in the wild. It seems they would have quite a bit of synergy in the humanoid robot space.

At last, there is Tesla. The company hasn’t shipped any legged robots, nor any in-home robotics systems. Tesla Bot looks quite like Honda Asimo in technological direction. However, there is no technical details for exact how. The best we can guess is that these details are still in flux. That has been said, Tesla has done a stellar work at system integration when shipping their vehicles from zero to one. As we discussed above, we have these disparate technologies that somewhat can work, but to integrate them well into one coherent product, knowing where to cut features, where to retain the maximum utility, is a challenge waiting for an intelligent team to figure out from zero to one. I won’t be so quick to dismiss that Tesla cannot do this again.

This is an incomplete research note from someone who has done nothing significant in the stated space. You should do your own research to validate some of the claims I made here. Any insights from insiders will be greatly appreciated. Because the nature of this research note, unlike any other essays I posted here, I am going to provide references.

[1] https://www.youtube.com/watch?v=HUP6Z5voiS8

[2] https://www.youtube.com/watch?v=tF4DML7FIWk

[3] https://asimo.honda.com/downloads/pdf/asimo-technical-information.pdf

[4] https://www.youtube.com/watch?v=e0AhxwAKL7s

[5] https://www.universal-robots.com/products/ur3-robot/

[6] https://www.slamtec.com/en/Lidar/A3

[7] https://bair.berkeley.edu/blog/2018/11/30/visual-rl/

[8] https://ai.facebook.com/blog/reskin-a-versatile-replaceable-low-cost-skin-for-ai-research-on-tactile-perception/

[9] https://github.com/UZ-SLAMLab/ORB_SLAM3

[10] http://doc.aldebaran.com/1-14/getting_started/helloworld_choregraphe.html

[11] https://www.softbankrobotics.com/

[12] https://www.agilityrobotics.com/