NVIDIA Just Open-Sourced the First Humanoid Manipulation Scaling Law. The Headline Number Is 20,854 Hours of Strangers' GoPro Footage Under Apache 2.0. | LostJobs.AI

The most consequential humanoid release of this week was not a new robot. It was a model checkpoint, a Hugging Face URL, and a license file.

On April 28, NVIDIA shipped Isaac GR00T N1.7 — the open Vision-Language-Action (VLA) foundation model for humanoid robots — under the Apache 2.0 license, pushed live to GitHub and Hugging Face the same day, with an early-access announcement on the NVIDIA developer forums. The technical writeup published the underlying research as EgoScale, a paper on arXiv that contains, almost in passing, what NVIDIA is calling the first dexterity scaling law.

The rest of the humanoid industry has spent six years arguing about which sensor stack, which actuator topology, which OEM’s hands matter. NVIDIA is now arguing the answer to all three is none of them — what matters is how many hours of strangers’ kitchen-counter, factory-floor, and surgical-tray GoPro footage you have, and the function relating that number to robot dexterity is now published, log-linear, and free.

What was actually released

Per the Hugging Face model card, the GitHub repo, and NVIDIA’s developer-forum thread:

License: Apache 2.0. Full commercial licensing — material handling, packaging, inspection, anything not on the standard Apache exclusion list. No per-seat fee, no usage gate, no NVIDIA Enterprise contract required to ship a robot running it.
Backbone: Cosmos-Reason2-2B vision-language model, replacing N1.6’s earlier reasoning stack. Same 2B-parameter envelope.
Pretraining data: EgoScale — 20,854 hours of human egocentric video, spanning 20+ task categories, “from manufacturing and retail to healthcare and home environments.” That dataset is over 20× larger than every prior published human-to-robot policy-transfer dataset combined.
Architecture: Flow-based VLA policy. Pretrained on the 20K hours of human wrist-and-hand action prediction, then mixed with diverse robot demonstration data. Output is direct torque from pixels and state history — the same end-to-end architecture Figure shipped with Helix 02 in March.
Validated platforms: Unitree G1, AGIBot Genie 1, and the YAM bimanual tabletop manipulator. Three different chassis, three different actuator counts, three different gripper topologies — same checkpoint, no per-platform fine-tune required to get baseline performance.
Hand spec: 22 degrees of freedom per hand, finger-level control, contact-rich tasks like small parts assembly.

The Apache-2.0 piece is the line everyone is going to skip past and shouldn’t. NVIDIA could have gated this behind Enterprise, behind Omniverse Cloud, behind a per-robot royalty. Instead it shipped the same license you find on Linux. Every humanoid OEM in Shenzhen, Sunnyvale, Tokyo, and Munich woke up Wednesday to a baseline competitor brain that costs zero and can be forked.

The scaling law

The scientific contribution that’s going to outlive the model file is the EgoScale paper. The headline plot is a log-log graph of hours of human egocentric video on the x-axis vs. average completion rate on a 22-DoF dexterous manipulation benchmark on the y-axis. The line is clean and log-linear from 1,000 hours to 20,854 hours, with no flattening visible at the right edge.

Concretely:

1,000 hours of human GoPro footage: baseline dexterity score on the benchmark.
20,000 hours: completion rate more than doubles.
Validation loss on human wrist/hand action prediction: follows a clean log-linear relationship with data volume. NVIDIA’s claim is that this loss extrapolates predictably as the hours scale, and that the loss correlates with real robot performance on long-horizon tasks.

This is the same shape as the GPT-3-era language-model scaling laws (Kaplan et al. 2020). Same log-linear curve, same no-asymptote-yet, same “throw more data at it and the metric keeps climbing.” Except instead of throwing in more reddit and arxiv tokens, NVIDIA is throwing in more egocentric human video.

If that scaling holds out past 20K hours — which the paper claims the loss curves predict it will — then the next metric that matters in humanoid robotics is exactly one number: how many hours of egocentric human video can you point at the model. Compute is not the bottleneck (NVIDIA sells the compute). Hardware is not the bottleneck (Apache 2.0 means every OEM gets baseline parity). Data is the bottleneck. Whoever can capture or buy or scrape the most ego-video wins the next round.

Why the platform list matters

The model card validates GR00T N1.7 on three platforms specifically:

Unitree G1 — the Chinese mass-market humanoid Unitree is targeting 20,000 shipments of in 2026, ahead of its STAR-market IPO. Compact chassis, common gripper topology.
AGIBot Genie 1 — Shanghai-based AGIBot’s general-purpose humanoid, the platform that crossed 10,000 units in late March. Different actuator count, different gripper.
YAM bimanual — a tabletop dual-arm research rig, completely different form factor from the other two.

Three radically different physical platforms, one checkpoint, baseline performance on all three out of the box. That’s the cross-embodiment claim NVIDIA has been building toward since Isaac GR00T N1 at GTC 2024, and N1.7 is the first version where the validation matrix is wide enough to call the claim shipped.

For the OEM side, the implication is brutal in a useful way: the differentiator is no longer “we have a model” or “we have a chassis.” Everyone has a model now. The differentiator is whose hands can take advantage of a 22-DoF dexterity ceiling. That collapses the hardware competition onto a single axis: actuator count, finger sensor density, and BOM cost-per-DoF. Whoever ships the cheapest 22-DoF hand at scale is the platform that gets the GR00T fleet upgrade for free, because the model already handles 22 DoF natively.

What this does to the China–US race

Fold this into the China humanoid output forecast TrendForce published earlier this month — Chinese humanoid output expected to surge 94% in 2026, with Unitree and AGIBot together capturing ~80% of global shipments. NVIDIA validating GR00T N1.7 on both Unitree G1 and AGIBot Genie 1 is, charitably, a developer-relations decision; uncharitably, it’s NVIDIA recognizing where the volume actually is and shipping the model that maximizes the addressable robot population.

The US side: Figure has Helix 02 and won’t be running GR00T (different stack, different vertical-integration thesis). Tesla Optimus runs Tesla’s own foundation model. Boston Dynamics’ Atlas is moving onto its own NVIDIA-backed pipeline but is committed for 2026 and 2027 to Hyundai and Google DeepMind. Apptronik’s Apollo runs its own learned controllers.

So the GR00T fleet, as of April 28, looks like this: large Chinese mass-market chassis on a US-published Apache-2.0 brain, with a smaller US research-rig validation. Anyone in Europe or Korea standing up a humanoid OEM in 2026 can now skip the foundation-model build entirely and ship N1.7 from week one. The cost of participating in the humanoid race just dropped to zero. The cost of winning it is now exactly the cost of acquiring more hours of egocentric human video than your competitors.

What LostJobs is watching

Whether the EgoScale loss curve actually holds past 20K hours. NVIDIA claims it will. The first independent group to push GR00T N1.7 to 50K or 100K hours of additional egocentric data and report whether the dexterity benchmark keeps climbing log-linearly will settle whether this is GPT-3-style infinite-scaling or whether the curve flattens like a lot of vision-model scaling laws have.
Whether the next round of humanoid OEM funding rounds price in “no foundation-model moat.” Figure raised at a $39B valuation partly on the bet that Helix is differentiated. If the market reads N1.7 as good enough, the next Series C for any non-vertically-integrated humanoid is going to have to justify the model differentiation in dollar terms. Watch Skild AI, Physical Intelligence, and any of the smaller Chinese players raising in Q2.
Whether the egocentric-video data market becomes a thing. If hours of GoPro footage are the new GPU shortage, expect a market to form: data brokers selling work-context egocentric video to humanoid trainers, the way Scale AI sold annotation. The first $50M+ funding round for a “egocentric video for robots” startup will be the signal.

The dry coda: the most-shared thing on Wednesday morning was not the GR00T release or the Levie op-ed. It was a one-line tweet from a robotics engineer who downloaded the GR00T-N1.7 weights at 4:23 AM Pacific, ran the bimanual coin-flip benchmark on a YAM rig in his garage by 7:15 AM, and posted: “Three years of my dissertation just got obsoleted by a model card.” The reply with the most likes was: “At least it’s Apache 2.0. Last week it would have cost you a per-seat license to be obsoleted.”