Notes to catch up on the foundational layer
Since June 2025 I have tried to catch up with advances in AI. And I say "tried" because, given how fast the market evolves, the Foundational Layer and the Application Layer are two things that each require a huge amount of time on their own.
For quick context: I worked two years as a Software Engineer between 2020 and 2022, before the LLM era. Back then my life was about learning as much software engineering as possible, every single day. Since then I took a more passive role on the technical side and shifted toward a more generalist role.
I use LLMs daily (ChatGPT, Gemini, Claude, etc.), but the technical depth I was operating at was no longer the same. So even though I had context, by this point I was rusty and not up to date on many things.
In the last few months, these were my main sources to get back up to speed:
I should also clarify that a large part of my learning focus was on training and data, because the initial motivator was evaluating the feasibility of starting a data labeling company. To do that, I needed to better understand the real need, the current state, and where the market could be heading.
I am still on this learning path, so any suggestions or feedback are welcome.
If you only read one section, make it this:
I will split this piece into my learnings on the Foundational Layer, and leave the application layer for a second part.
A simple definition that helped me:
AI has gone through many waves. Sometimes there was excitement, and many other times it cooled down because the promise felt far away in practical terms.
Two milestones that are worth remembering:
Still, as useful as these advances were for the field, they did not explode into the mainstream because they felt like lab progress, not something that millions of people would suddenly use.
Fast forward to November 2022: OpenAI launches ChatGPT, and mass adoption makes the topic unavoidable.
Part of what is interesting is that ChatGPT takes advantage of an earlier breakthrough: the 2017 paper Attention Is All You Need, which proposes the Transformer architecture. On top of that, it became increasingly clear what people summarize as scaling laws: with more compute, more data, and the right setup, models tend to improve in a fairly predictable way.
At a high level, this is how I see it:
There is also a transversal pillar: talent. It is not a minor detail, but it sits as a shadow over everything else.
A mental model that made this easier for me:
The biggest structural bottleneck is the concentration of leading edge manufacturing in Taiwan, particularly at TSMC. Many summaries cite that a very large share of leading edge production is concentrated there. The exact number is less important than the direction: it is a supply chain choke point.
That introduces a massive geopolitical risk. No one can stop depending on that overnight.
A practical detail that is easy to miss: "chips" is not just the GPU. It is also memory (HBM), packaging, interconnect, and the overall system design that makes training scale. In other words, compute is a supply chain, not a SKU.
On the types of chips:
Key takeaway: Compute is not only about "how fast is the chip". It is about how reliably you can get it, cluster it, and feed it.
Open question: Does inference become a commodity (cheap, abundant), or does it remain strategically scarce because it is still gated by power and supply chain?
Once you have chips, you need to put them into clusters and operate them. Here the challenge is brutally physical:
Data centers are not just "rent racks". They are land, permits, cooling, transformers, networking, logistics, and long lead times. One of the most important constraints is simply: can you get enough power to the site, fast enough, with reasonable reliability?
Two families of ideas that are emerging:
There is also the edge topic. Running inference closer to the user, or closer to the physical world (factories, cameras, robots). Microsoft describes Azure Extended Zones as small footprint extensions of Azure for low latency or data residency.
My read is that infrastructure is entering an industrial mode. Less software only, more real world constraints.
Key takeaway: The next bottleneck after "buy GPUs" is often: power + cooling + deployment reality.
Open question: Will the winning infrastructure play be hyperscalers only, or will new markets (decentralized, regional, vertical) meaningfully absorb demand?
Data centers imply energy. And the magnitude is no longer a footnote.
Data centers are already a relevant consumer at country scale, with growth driven by digitalization and AI. Beyond any single number, what matters is the trend: demand is growing faster than the ability to connect and distribute efficiently.
From my side, going back to electrical engineering: in many countries there are technical losses (materials, poor configuration, inefficiencies) and non technical losses (theft, lack of measurement, zero optimization in homes and businesses). The smart grid dream has existed for years, but implementing it well is extremely hard.
What seems interesting to me:
During an xAI hackathon in December I heard a phrase that stuck with me from Jimmy Ba, xAI cofounder: If we had infinite compute and energy, we would be much further ahead. It captures the bottleneck.
Key takeaway: Energy is not only "how much power exists". It is how fast you can connect it, where, and at what reliability.
Open question: Will power constraints push more training to regions with cheap energy, and more inference to the edge, or will the stack remain centralized?
With energy and compute, you train. And then you hit data.
The reason LLMs advanced so fast is that text is an extremely abundant data type. But the problem becomes obvious as soon as you leave text, or as soon as you need more private or real world data (manufacturing, banking, sensors, robotics).
A framing that helped me: outside text, data is often locked, not missing.
Three typical paths:
Synthetic data helps, but it carries a real risk: model collapse. Training new models on data generated by previous models can degrade the distribution, collapse diversity, and amplify artifacts.
In simple terms, if you only feed yourself your own answers, you end up living at the center of your own Gaussian bell.
My practical takeaway: synthetic data is most useful when you can anchor it with reality, like:
This is the most interesting path conceptually, but also the hardest. The idea that a system can learn a lot from little data, or through self play (like AlphaGo), is powerful, but not trivial to generalize.
A lot of the difficulty is not the idea, it is the environment. Games have clean rules and fast feedback loops. Real life is noisy, slow, and expensive.
This is the expensive but direct path. Millions of new examples, high quality, with good evaluation control.
The market has evolved:
There is also the topic of RL and verification:
Finally, there is a challenge that feels underestimated: evaluating reasoning. Not only does the final answer matter, the path does too. And in many real domains, there is real human ambiguity.
Key takeaway: Data is not only "more examples". It is often collection + incentives + QA + eval design.
Open question: Do we get a breakthrough in learning with less real data, or do we mostly solve it by building better pipelines to collect and verify real world data?
The Transformer was the big breakthrough. But beyond optimizations and variations, there has not been another equivalent leap that changed the base architecture of the entire industry.
What has happened is refinement in a few directions:
A useful simplification: training is not just pretraining anymore. A lot of capability and product usefulness comes from post-training (instruction tuning, preference optimization, RL, tool use). So "algorithms" includes the whole training recipe, not only the base architecture.
The consensus seems to be that doing the same thing but bigger eventually hits walls like data, energy, evaluation, and cost.
Key takeaway: The next gains are often less about a new architecture and more about better recipes, better systems, and better evaluation.
Open question: Is there another architecture-level shift coming, or do we get most progress from scaling and refinement until constraints force a new paradigm?
We optimize for benchmarks because they are easy to measure and market.
High scores do not equal intelligence in a broad sense.
The deeper issue: benchmarks are static targets in a dynamic system. Once everyone knows the test, the ecosystem starts training against the test, directly or indirectly (including via contamination). That does not mean progress is fake, it means measurement gets harder.
A mental model that helped me: evals are like unit tests.
Money is being burned because compute is expensive and the race is competitive.
Labs are pushed to build product surfaces:
Monetization is not just about profit. It is a requirement to fund training, infrastructure, and talent.
One simple way to frame it: every lab is trying to solve a moving target of unit economics while still racing on capability. That pressure shapes product choices.
A quote from Fei-Fei Li captures the idea well. Language is generative, but worlds follow complex physical rules. Representing a world consistently is far more complex than modeling a one dimensional signal like language.
This is why world models, simulation, and learning dynamics matter.
When people say "world models", I interpret it as: models that can represent how a world evolves, not only what it looks like. Ideally:
Recent signals:
The promise is reducing real world data needs through simulation. The risk is simulations that are too clean to reflect reality.
Key takeaway: If we can simulate better, we can learn faster. But simulation quality is the real bottleneck.
Robotics is advancing, but it still hits the classic bottleneck of data and real world experience.
Progress often requires:
Large datasets like Open X-Embodiment aim to scale learning across robots and tasks.
Many argue that in robotics, data and experience matter more than compute. I interpret it as: compute helps, but there is no shortcut to messy reality.
World models could reduce the marginal cost of collecting demonstrations, but in the short term robotics remains AI plus operations plus hardware plus data.
Key takeaway: Robotics is where the foundational layer constraints collide: data collection is expensive, evaluation is hard, and deployment is physical.
If I had to summarize my current model of the foundational layer in one line:
AI progress is still fast, but increasingly gated by the physical world (chips, power, deployment) and by measurement (data quality and evaluation).

Let's schedule a free call to map your operations and identify cases with clear returns.