AI State 2026

Notes to catch up on the foundational layer

Alan ArguelloDecember 31, 2025

Since June 2025 I have tried to catch up with advances in AI. And I say "tried" because, given how fast the market evolves, the Foundational Layer and the Application Layer are two things that each require a huge amount of time on their own.

For quick context: I worked two years as a Software Engineer between 2020 and 2022, before the LLM era. Back then my life was about learning as much software engineering as possible, every single day. Since then I took a more passive role on the technical side and shifted toward a more generalist role.

I use LLMs daily (ChatGPT, Gemini, Claude, etc.), but the technical depth I was operating at was no longer the same. So even though I had context, by this point I was rusty and not up to date on many things.

In the last few months, these were my main sources to get back up to speed:

Doing vibe coding more actively, especially with tools like v0 or Lovable, and with IDE assistants like Codex, Claude Code, Cursor, etc.
Starting a WhatsApp group with engineers I knew, to understand what they were thinking.
Building a product in the application layer (in my case, real time interactive digital avatars).
Attending and organizing hackathons in San Francisco.
Consuming more opinions from researchers at the bleeding edge (Karpathy, Fei-Fei Li, Yann LeCun).
Podcasts with technical guests and themes (Dwarkesh, Lex Fridman, a16z).
Spending time inside the X (Twitter) bubble, which is where you learn about most things first.

I should also clarify that a large part of my learning focus was on training and data, because the initial motivator was evaluating the feasibility of starting a data labeling company. To do that, I needed to better understand the real need, the current state, and where the market could be heading.

I am still on this learning path, so any suggestions or feedback are welcome.

Quick map: what I learned (high level)

If you only read one section, make it this:

The foundational layer is no longer "just models". It is chips + data centers + power + data pipelines + training recipes.
Text was a lucky dataset. As soon as you leave text (audio, video, robotics, enterprise), data becomes expensive and constrained.
Energy and grid connectivity are becoming first-order constraints, not footnotes.
Evaluation is turning into its own bottleneck. We can build stronger models faster than we can measure what "stronger" means.
In robotics and the physical world, the bottleneck is usually not cleverness. It is experience, operations, and data.

The base: Foundational Layer

I will split this piece into my learnings on the Foundational Layer, and leave the application layer for a second part.

A simple definition that helped me:

Foundational Layer: the stack that makes frontier models possible (compute, infrastructure, energy, data, training, evaluation).
Application Layer: everything built on top (products, workflows, integrations, distribution, UX, pricing, trust, compliance).

How did we get here?

AI has gone through many waves. Sometimes there was excitement, and many other times it cooled down because the promise felt far away in practical terms.

Two milestones that are worth remembering:

ImageNet: in the late 2000s ImageNet was formalized as a massive, hierarchical dataset, built largely through large scale collection and labeling using Amazon Mechanical Turk.
AlexNet (2012): a model that unintentionally became a symbol of a very concrete advance. Training deep nets on GPUs instead of CPUs changed the performance game.

Still, as useful as these advances were for the field, they did not explode into the mainstream because they felt like lab progress, not something that millions of people would suddenly use.

Fast forward to November 2022: OpenAI launches ChatGPT, and mass adoption makes the topic unavoidable.

Part of what is interesting is that ChatGPT takes advantage of an earlier breakthrough: the 2017 paper Attention Is All You Need, which proposes the Transformer architecture. On top of that, it became increasingly clear what people summarize as scaling laws: with more compute, more data, and the right setup, models tend to improve in a fairly predictable way.

The pillars driving model development today

At a high level, this is how I see it:

Chips (compute hardware)
Infrastructure (data centers, networks, deployment)
Energy (to power all of that)
Data (and the evaluation and labeling layer that makes it useful)
Training algorithms (Transformers and whatever comes next)

There is also a transversal pillar: talent. It is not a minor detail, but it sits as a shadow over everything else.

A mental model that made this easier for me:

Chips are the engine.
Data centers are the factory.
Energy is the fuel.
Data is the raw material.
Training algorithms are the recipe.
Evaluation is the test suite.
Talent is the constraint across all of it.

Challenges by pillar

1) Chips

The biggest structural bottleneck is the concentration of leading edge manufacturing in Taiwan, particularly at TSMC. Many summaries cite that a very large share of leading edge production is concentrated there. The exact number is less important than the direction: it is a supply chain choke point.

That introduces a massive geopolitical risk. No one can stop depending on that overnight.

A practical detail that is easy to miss: "chips" is not just the GPU. It is also memory (HBM), packaging, interconnect, and the overall system design that makes training scale. In other words, compute is a supply chain, not a SKU.

On the types of chips:

For training, GPUs have been the de facto standard. Nvidia, for example, positions the H100 as an accelerator for LLM and Transformer workloads.
For inference, there is a whole world of optimization. TPUs and specialized architectures aim to improve cost and performance for serving.
In parallel, startups like Groq have tried to attack inference with custom hardware and compilation (LPUs). In December 2025 Groq announced a licensing agreement with NVIDIA for its LPU technology.

Key takeaway: Compute is not only about "how fast is the chip". It is about how reliably you can get it, cluster it, and feed it.

Open question: Does inference become a commodity (cheap, abundant), or does it remain strategically scarce because it is still gated by power and supply chain?

2) Infrastructure (data centers and deployment)

Once you have chips, you need to put them into clusters and operate them. Here the challenge is brutally physical:

There are not enough chips.
There is not enough data center capacity at the pace demand requires.
And above all, there is not enough energy and electrical connectivity ready on time.

Data centers are not just "rent racks". They are land, permits, cooling, transformers, networking, logistics, and long lead times. One of the most important constraints is simply: can you get enough power to the site, fast enough, with reasonable reliability?

Two families of ideas that are emerging:

Compute off Earth: proposals exploring data centers in space or even more extreme concepts. In 2025 there were reports about plans for data centers on the Moon, more as a signal of direction than something immediate.
Alternative compute markets: from decentralized networks like Akash to marketplaces like Vast.ai that aim to monetize GPUs and idle capacity.

There is also the edge topic. Running inference closer to the user, or closer to the physical world (factories, cameras, robots). Microsoft describes Azure Extended Zones as small footprint extensions of Azure for low latency or data residency.

My read is that infrastructure is entering an industrial mode. Less software only, more real world constraints.

Key takeaway: The next bottleneck after "buy GPUs" is often: power + cooling + deployment reality.

Open question: Will the winning infrastructure play be hyperscalers only, or will new markets (decentralized, regional, vertical) meaningfully absorb demand?

3) Energy

Data centers imply energy. And the magnitude is no longer a footnote.

Data centers are already a relevant consumer at country scale, with growth driven by digitalization and AI. Beyond any single number, what matters is the trend: demand is growing faster than the ability to connect and distribute efficiently.

From my side, going back to electrical engineering: in many countries there are technical losses (materials, poor configuration, inefficiencies) and non technical losses (theft, lack of measurement, zero optimization in homes and businesses). The smart grid dream has existed for years, but implementing it well is extremely hard.

What seems interesting to me:

Fusion, with companies like Helion.
Optimization, through sensors, control, fine grained measurement, and software to reduce waste.
In the short term, more boring but necessary electrical infrastructure like transmission, substations, and interconnection.
Also worth watching: anything that reduces time-to-power (permits, grid interconnect, capacity planning). That is not sexy, but it matters.

During an xAI hackathon in December I heard a phrase that stuck with me from Jimmy Ba, xAI cofounder: If we had infinite compute and energy, we would be much further ahead. It captures the bottleneck.

Key takeaway: Energy is not only "how much power exists". It is how fast you can connect it, where, and at what reliability.

Open question: Will power constraints push more training to regions with cheap energy, and more inference to the edge, or will the stack remain centralized?

4) Data (and evaluation)

With energy and compute, you train. And then you hit data.

The reason LLMs advanced so fast is that text is an extremely abundant data type. But the problem becomes obvious as soon as you leave text, or as soon as you need more private or real world data (manufacturing, banking, sensors, robotics).

A framing that helped me: outside text, data is often locked, not missing.

Locked behind privacy, contracts, and incentives.
Locked behind operations (collecting it is labor).
Locked behind instrumentation (you need sensors, pipelines, QA).
Locked behind evaluation (you need to know what "good" looks like).

Three typical paths:

Synthetic data
Models that need less data
Collecting and labeling new data

4.1 Synthetic data and model collapse

Synthetic data helps, but it carries a real risk: model collapse. Training new models on data generated by previous models can degrade the distribution, collapse diversity, and amplify artifacts.

In simple terms, if you only feed yourself your own answers, you end up living at the center of your own Gaussian bell.

My practical takeaway: synthetic data is most useful when you can anchor it with reality, like:

You generate candidates synthetically.
You verify, filter, or score them against something real (tests, simulators, human experts, or ground truth data).

4.2 Models that need less data

This is the most interesting path conceptually, but also the hardest. The idea that a system can learn a lot from little data, or through self play (like AlphaGo), is powerful, but not trivial to generalize.

A lot of the difficulty is not the idea, it is the environment. Games have clean rules and fast feedback loops. Real life is noisy, slow, and expensive.

4.3 Collecting and labeling data

This is the expensive but direct path. Millions of new examples, high quality, with good evaluation control.

The market has evolved:

Before, the narrative was crowd labor.
Today, for many frontier problems, what matters is expertise (or at least a smaller group of well-trained specialists, with strong QA).

There is also the topic of RL and verification:

RLHF became the popular term.
But for certain domains, you can use verifiable feedback, framed as Reinforcement Learning with Verifiable Rewards (RLVR).

Finally, there is a challenge that feels underestimated: evaluating reasoning. Not only does the final answer matter, the path does too. And in many real domains, there is real human ambiguity.

Key takeaway: Data is not only "more examples". It is often collection + incentives + QA + eval design.

Open question: Do we get a breakthrough in learning with less real data, or do we mostly solve it by building better pipelines to collect and verify real world data?

5) Training algorithms

The Transformer was the big breakthrough. But beyond optimizations and variations, there has not been another equivalent leap that changed the base architecture of the entire industry.

What has happened is refinement in a few directions:

Scaling efficiently, with approaches like mixture of experts.
Making inference cheaper, with better decoding, distillation, quantization, and system-level optimizations.
Exploring alternatives, such as state space models like Mamba.

A useful simplification: training is not just pretraining anymore. A lot of capability and product usefulness comes from post-training (instruction tuning, preference optimization, RL, tool use). So "algorithms" includes the whole training recipe, not only the base architecture.

The consensus seems to be that doing the same thing but bigger eventually hits walls like data, energy, evaluation, and cost.

Key takeaway: The next gains are often less about a new architecture and more about better recipes, better systems, and better evaluation.

Open question: Is there another architecture-level shift coming, or do we get most progress from scaling and refinement until constraints force a new paradigm?

General challenges

1) Benchmark overoptimization and the evaluation crisis

We optimize for benchmarks because they are easy to measure and market.

Chatbot Arena influences public perception.
SWE-bench attempts to measure real software engineering capability.
LiveCodeBench focuses on contamination resistant evaluation.
ARC-AGI raises the question of how to measure generalization.

High scores do not equal intelligence in a broad sense.

The deeper issue: benchmarks are static targets in a dynamic system. Once everyone knows the test, the ecosystem starts training against the test, directly or indirectly (including via contamination). That does not mean progress is fake, it means measurement gets harder.

A mental model that helped me: evals are like unit tests.

If your unit tests are bad, you can ship broken software while looking green.
If your evals are weak, you can ship a model that looks strong on paper but fails in real use.

2) Monetization

Money is being burned because compute is expensive and the race is competitive.

Labs are pushed to build product surfaces:

Subscriptions
APIs
Media products like Sora

Monetization is not just about profit. It is a requirement to fund training, infrastructure, and talent.

One simple way to frame it: every lab is trying to solve a moving target of unit economics while still racing on capability. That pressure shapes product choices.

Large World Models

A quote from Fei-Fei Li captures the idea well. Language is generative, but worlds follow complex physical rules. Representing a world consistently is far more complex than modeling a one dimensional signal like language.

This is why world models, simulation, and learning dynamics matter.

When people say "world models", I interpret it as: models that can represent how a world evolves, not only what it looks like. Ideally:

They can simulate counterfactuals (what happens if I do X?).
They are interactive (not only passive video generation).
They preserve consistency over time (objects, physics, causality).
They can be used for planning.

Recent signals:

DeepMind and interactive world models
Meta and predictive video representations
NVIDIA and physical AI foundation models

The promise is reducing real world data needs through simulation. The risk is simulations that are too clean to reflect reality.

Key takeaway: If we can simulate better, we can learn faster. But simulation quality is the real bottleneck.

Robotics

Robotics is advancing, but it still hits the classic bottleneck of data and real world experience.

Progress often requires:

Teleoperation or demonstrations
Trajectory datasets
Evaluation in noisy real conditions

Large datasets like Open X-Embodiment aim to scale learning across robots and tasks.

Many argue that in robotics, data and experience matter more than compute. I interpret it as: compute helps, but there is no shortcut to messy reality.

World models could reduce the marginal cost of collecting demonstrations, but in the short term robotics remains AI plus operations plus hardware plus data.

Key takeaway: Robotics is where the foundational layer constraints collide: data collection is expensive, evaluation is hard, and deployment is physical.

Closing thoughts

If I had to summarize my current model of the foundational layer in one line:

AI progress is still fast, but increasingly gated by the physical world (chips, power, deployment) and by measurement (data quality and evaluation).

AI State 2026

Quick map: what I learned (high level)

The base: Foundational Layer

How did we get here?

The pillars driving model development today

Challenges by pillar

1) Chips

2) Infrastructure (data centers and deployment)

3) Energy

4) Data (and evaluation)

4.1 Synthetic data and model collapse

4.2 Models that need less data

4.3 Collecting and labeling data

5) Training algorithms

General challenges

1) Benchmark overoptimization and the evaluation crisis

2) Monetization

Large World Models

Robotics

Closing thoughts

References

From intention to operation.
With results.

Quick map: what I learned (high level)

The base: Foundational Layer

How did we get here?

The pillars driving model development today

Challenges by pillar

1) Chips

2) Infrastructure (data centers and deployment)

3) Energy

4) Data (and evaluation)

4.1 Synthetic data and model collapse

4.2 Models that need less data

4.3 Collecting and labeling data

5) Training algorithms

General challenges

1) Benchmark overoptimization and the evaluation crisis

2) Monetization

Large World Models

Robotics

Closing thoughts

References

From intention to operation.With results.

From intention to operation.
With results.