AGI: Where Are We, Really?

March 1, 2026

Not the hype version. Not the doomer version. The version from someone who actually has to make technology decisions with real money.

This is an opinion piece — a practitioner's analysis of publicly available research, industry statements, and benchmark data, not a peer-reviewed scientific assessment. Sources are linked inline and listed at the end.

I've been building enterprise systems for over 20 years. I've lived through every hype cycle — SOA, microservices, blockchain, "digital transformation." Every single time, some executive stood on a stage and promised the world was about to change forever. Every single time, the people who bet their architecture on the keynote regretted it.

So when Sam Altman says he's "confident" OpenAI knows how to build AGI, I pay attention. But I also check his fundraising calendar.

Let me tell you what I actually see happening.

Nobody Can Even Agree on What AGI Means

This is the part that drives me crazy. We're having a global debate about when AGI arrives, and we haven't agreed on what it is.

The optimist definition is basically "a system that can do most knowledge work as well as a skilled human." If that's your bar, honestly? We're close. I watched Claude refactor a 2,000-line Apex class last month and it did a better job than most mid-level developers would. That's not nothing.

The hard definition is "a system that can learn anything a human can, handle genuinely novel situations, and maintain coherent understanding over time." If that's your bar? We're not even in the same zip code.

This definitional ambiguity isn't just academic — IEEE Spectrum published a deep dive on why tracking AGI progress is so difficult when researchers can't agree on the target. And as AIMultiple's analysis of 9,800+ expert predictions shows, forecasts range from 2027 to "never," largely depending on which definition people use.

Here's why this matters: every CEO claiming AGI is "2-3 years away" is using the easy definition. They're not telling you that. And the difference between those two definitions isn't a few engineering sprints — it might be an entirely different scientific paradigm.

Follow the Money

I could give you a balanced, diplomatic summary of what each lab leader is saying about AGI timelines. But let me just be direct about it:

Sam Altman (OpenAI) said in January 2026 that they're "confident they know how to build AGI." He's also projecting $30 billion in revenue. AGI claims attract capital. You do the math. (Covered extensively in MIT Technology Review's 2026 outlook.)

Dario Amodei (Anthropic) says he's "more confident than ever" — powerful AI capabilities in 2-3 years. Anthropic is raising at a $60B+ valuation. Coincidence? I'm not that naive.

Demis Hassabis (DeepMind) quietly shifted his estimate from "10 years" to "3-5 years." Google is betting its future on Gemini. Of course the timeline is shrinking.

The one person I actually trust? Shane Legg, also at DeepMind, who has said the same thing for over a decade: 50% probability of minimal AGI by 2028. He's the only one whose prediction doesn't conveniently align with a fundraising round.

To be fair: these leaders have access to internal capabilities we don't see. Their optimism could reflect genuine technical progress, not just fundraising strategy. 80,000 Hours' comprehensive analysis makes a credible case that the convergence of scaling, RL breakthroughs, and test-time compute creates real reasons for shorter timelines. I'm skeptical — but I could be wrong.

I'm not saying these people are lying. I'm saying you should evaluate their AGI claims the same way you'd evaluate a keynote promising their latest release will "transform your business." With polite skepticism and one hand on your wallet.

My Honest Assessment of What Works Right Now

Forget AGI. Here's what I'd actually stake my reputation on, based on what I've seen in production — not demos, not blog posts, production:

The Honest Scorecard— what I'd actually bet money on

● Code generation — I use this daily. It's the real deal. Not perfect, but genuinely useful.

● Document analysis — Feed it a 50-page SOW, get a decent summary in seconds. Saves hours.

● Tool orchestration — The sleeper hit. Chaining API calls, data pipelines, multi-step workflows. This is where it gets serious.

● Data extraction — Boring but bulletproof. PDFs, emails, forms into clean structured data.

● Chat interfaces — Table stakes at this point. If you're not offering this, your competitors are.

● Autonomous workflows — Impressive when they work. Spectacular failures when they don't. Don't let it run unsupervised.

● Complex planning — Great demos. Shaky in production. Falls apart the moment ambiguity shows up.

● Research synthesis — Getting better fast, but still hallucinates on edge cases. Always verify.

● Self-correction — The biggest improvement of 2025. Models that check their own work. Not reliable enough yet, but the trajectory is exciting.

● Novel scientific reasoning — Not even close. Can't generate a genuinely new hypothesis to save its life.

● Physics understanding — Embarrassingly bad. Fails basic spatial reasoning that toddlers handle.

● Long-term memory — Context windows are not memory. Stop pretending they are.

● Real creativity — Remixing is not creating. Fight me.

The green stuff? Ship it. The yellow stuff? Experiment carefully. The red stuff? Don't plan around it. Not yet.

The Number That Should Sober Everyone Up

The big technical story of 2025 was refinement loops — models that check and correct their own work mid-task. It's genuinely clever engineering, and it pushed ARC-AGI-2 scores from 31% to 54% (ARC Prize 2025 results).

But here's the number nobody puts in their keynotes:

Raw LLM score on ARC-AGI-2 (no tricks, no scaffolding)

Humans score ~95%. They solve these puzzles in under two attempts.

Zero. When you strip away the engineering scaffolding — the refinement loops, the chain-of-thought prompting, the tool use — and test the raw model on novel reasoning tasks, every frontier LLM scores zero.

That doesn't mean AI is useless. I just showed you a whole list of things it's great at. But it means the gap between "very useful tool" and "general intelligence" isn't a gentle slope. It's a cliff. And no amount of marketing is going to close it.

What's actually missing? The ability to apply multiple rules at once (compositional reasoning), handle situations it's never seen before (not just pattern-match), understand meaning beyond surface form (symbolic understanding), and remember things across long time horizons (real memory, not context windows). These aren't bugs — they're limitations well-documented in Stanford HAI's 2025 AI Index Report, which tracks exactly where current systems excel and where they plateau.

The Scaling Debate (And Why I Don't Care Who Wins)

There's a war happening in AI research right now over one question: can we just keep making models bigger and eventually hit AGI?

One camp says yes — reinforcement learning, synthetic data, and test-time compute are new axes that keep delivering gains. The other camp, led by Yann LeCun and increasingly Ilya Sutskever (the guy who co-founded OpenAI), says no — we've hit a fundamental wall and need entirely new ideas. Understanding AI's 2026 predictions gives a good overview of where this debate stands.

Here's my take: I don't care who's right.

Even if scaling never produces AGI, it's going to keep producing incredibly useful tools. The jump from "GPT-3.5 can barely write a for loop" to "Claude builds production features in my codebase" happened in 30 months. That trajectory — not AGI, but that trajectory — is what should drive your architecture decisions.

Where the Smart Money Is

Prediction markets are the closest thing we have to honest forecasts, because people are betting real money:

Polymarket gives OpenAI a 9% chance of achieving AGI by 2027.
Metaculus community median: first "weakly general" AI system by February 2028.
Research community median: somewhere between 2028 and 2040, depending on whose definition you use (AIMultiple's meta-analysis of 9,800+ predictions).
Stanford HAI's 2026 outlook: "Better benchmarks, better agents — not AGI."

Translation: the smart money says we'll get systems that feel general by the late 2020s but actually aren't. Real general intelligence — the kind where the AI redesigns your entire system better than you can — is much further out. Maybe decades. Maybe never with current approaches. The Council on Foreign Relations frames 2026 as a decisive year not for achieving AGI, but for establishing the governance and competitive frameworks that will shape how we get there.

So What Should You Actually Do?

Three questions. That's all you need.

Q:Are you waiting for AGI before investing in AI?

Then you're already behind. The tools shipping right now justify investment. Every quarter you wait, your competitors pull further ahead. Stop waiting for the perfect. Start building with the very good.

Q:Are you designing your systems specifically for AGI?

Stop. Nobody knows what AGI will look like, which means nobody knows what "AGI-ready" architecture means. Instead, build systems that are composable, API-first, and treat AI as a pluggable service layer. If AGI shows up in 2028, great — your architecture adapts. If it shows up in 2045, you've spent the intervening years getting real value instead of waiting.

Q:Is your data actually ready?

Be honest. Because here's the unglamorous truth that won't get likes on LinkedIn: data quality is the single highest-ROI investment you can make in AI. Every AI capability — current and future — runs on clean, governed, accessible data. If yours is a mess, nothing else matters. Fix that first.

And if you're running a Salesforce org? Make it agent-readable now. Field descriptions, clean APIs, scoped permissions. I wrote a whole post on how to set that up.

One more thing — if you want to track actual AGI progress instead of relying on CEO tweets, watch ARC-AGI-3 (coming late 2026). It tests interactive reasoning: exploration, planning, memory. It's the closest thing we have to a real-world intelligence test. More signal in one benchmark than a year of keynotes.

Bottom Line

The progress is real. The hype is also real. Learning to tell them apart is the most valuable skill in enterprise technology right now.

Don't build for AGI. Build for the trajectory. Build systems that are modular enough to absorb whatever comes next — whether that's AGI in 2028 or incrementally smarter tools for the next twenty years. Either way, you win.

The best time to start was last year. The second-best time is right now.

Sources: