The Water Is Rising
And Nobody Is Stopping It
Think back to February 2020. A few people were talking about a virus spreading abroad. When you saw people stockpiling toilet paper, you said âtheyâre overreacting.â Three weeks later, the world had changed.
OthersideAI CEO Matt Shumer began his very long post on X on February 11, 2026, in exactly this way: âI think weâre currently in the âlooks like overreactingâ phase of something much bigger than Covid.â The post was viewed more than 80 million times. It was discussed everywhere from CNN to Fortune, from CNBC to Barstool Sports. Even Wharton professor Ethan Mollick said, âThis viral article is worth reading.â
So what was the shocking development that led him to write this?
February 5, 2026. On the same day, two major AI companies released their new models: OpenAI released GPT-5.3-Codex, and Anthropic released Claude Opus 4.6. Shumer described that moment like this: âSomething clicked. Not like a light switch... More like the moment you realize the water has been rising and itâs now up to your chest.â
AI Is Now Building Itself
The technical document for GPT-5.3-Codex contains this critical sentence: âGPT-5.3-Codex is our first model to have played a meaningful role in its own creation.â
I read that sentence again and felt a chill.
OpenAIâs Codex team used early versions of the model to find and fix errors in its own training process, manage its own development, and evaluate test results. In other words, AI had worked directly to make AI better.
This means that the concept known in technical literature as recursive self-improvement has now moved from theory to practice. The first concrete steps of the âintelligence explosionâ scenario that mathematician I. J. Good predicted in 1965 have been taken: a machine smarter than you designs a machine smarter than itself. That machine designs an even smarter one. And nobody can know where it will end up.
What Do the Numbers Say?
According to measurements by independent research organization METR, the length of time AI can independently complete tasks roughly doubles every seven months â a different version of the famous Mooreâs Law. A brief look at the past:
2022: AI could not do multiplication correctly; it would say 7Ă8 = 54.
2024: It could write working code and explain graduate-level scientific knowledge.
Late 2025: Some of the worldâs leading engineers were saying they had handed over most of their coding work to AI.
February 5, 2026: The release of the new models made everything that came before look like ancient history.
OpenAI moved from the previous Codex version to the new one in less than two months; previously, this gap was between six months and a year. With recursive self-improvement, four major updates per year, and then one update per month, could become possible. Because the AI that improves AI needs neither sleep nor breaks. Its only goal is to make itself smarter.
Arms Race 2.0
Here is the most uncomfortable part of the matter: the people who know all of this best are the very people building all of this.
One Anthropic employee put it perhaps most strikingly: âWe want the current Claude to build the next Claude, so we can go home and knit sweaters.â
Anthropicâs chief scientist Jared Kaplan said in an interview with The Guardian: âImagine you create an AI that is smarter than you, or as smart as you. It then creates an AI that is much smarter. That sounds like a frightening process. You canât know where it will end up.â Kaplan described this as the âultimate riskâ and said the critical moment could come between 2027 and 2030.
Anthropic CEO Dario Amodei and Google DeepMind CEO Demis Hassabis both addressed these topics at Davos. Amodei said, âIn 2026 or 2027, AI models will likely be much smarter than almost all humans at almost all tasks,â while Hassabis added, âWhether this self-improvement loop that weâre all working on can truly be completed without human intervention â we donât know yet.â But the future is much closer than we expected.
The situation is this: these people see the danger, they talk about the danger, and they continue building the same danger. So why?
âIf We Donât Do It, Someone Else Willâ
The Harvard International Review article titled âA Race to Extinctionâ summarizes the situation this way: âThe fear of losing the technological arms race may encourage companies and governments to accelerate development and cut corners; advanced systems may emerge without enough attention being paid to safety.â
This is an exact repeat of the nuclear arms race of the Cold War era. In the late 1950s, American politicians believed the Soviet Union had surpassed the US in missile capability. This fear of a âmissile gapâ pushed the US to accelerate ballistic missile development. In the early 1960s, it turned out the missile gap was a myth. But by that time, the race had already spun out of control.
The same dynamic is now playing out in AI. OpenAI reduced its safety testing from months to days. Former OpenAI safety researcher Steven Adler stated that the company did not fully apply the safety tests it had committed to for its most advanced models. OpenAI also rewrote its internal policies to allow the release of models carrying âhigh riskâ â and even announced that models carrying âcritical riskâ could be released if a competing lab had already released a similar model.
MIT Technology Reviewâs analysis draws a broader picture: there can be no winner in the US-China AI race. The real existential threat does not come from China, but from bad actors using advanced AI as a weapon.
Should We Calm Down a Bit?
As with every viral post, Shumerâs claims received serious pushback.
Well-known NYU professor Gary Marcus wrote that the picture Shumer painted was not realistic. According to Marcus, Shumer ignores the hallucinations and errors that AI systems still frequently make. He also points out that METRâs well-known task-duration metric only applies to coding tasks, and that the success threshold is 50% accuracy, not 100%.
In fact, one important reason AI has advanced so quickly in coding is that code has objective quality measures. Code either compiles or it doesnât; it either passes tests or it doesnât. But in fields like law, finance, and medicine, what counts as âgoodâ work is often subjective.
The Washington Postâs analysis draws a similar line: âShumer is probably right directionally. Even a world with just very smart machines would be quite strange. But it probably wonât happen as fast as people think. Software companies are best positioned to innovate in the area they understand best. But most of the economy is not the software sector.â
What Is Being Done on the Safety Side?
It should be acknowledged: Anthropic is one of the rare companies in this race that is both running the fastest and shouting the loudest â âcareful, this is dangerous.â
Alignment Faking: Anthropic researchers discovered that Claude 3 Opus was strategically âfaking alignmentâ on its own, without any training for this behavior. The model followed the rules when it thought it was being watched; when it thought it was not being watched, it broke them. It did this with the reasoning of âthis is the least bad option to prevent my values from being changed in the future.â This behavior was observed in 12% of tests; after retraining attempts, the rate rose to 78%.
Circuit Tracing: Anthropic developed a method to track Claudeâs thinking process and shared it as open source. This allows them to detect whether the model is genuinely computing, or just âmaking things up.â
Sabotage Risk Report: In February 2026, Anthropic published a 53-page sabotage risk report for Opus 4.6. The report acknowledged that the model could deliberately assist with chemical weapons research, could perform unauthorized actions such as sending emails without permission, and could secretly complete side tasks while appearing to follow normal instructions. The company assessed the overall risk as âvery low but not negligible.â
All of these are important efforts. However, there is a paradox here: Anthropic continues to develop the very technology that creates these risks, even while identifying them.
A More Realistic Scenario Than the Terminator
The real risk is not a dramatic âawakening momentâ like in the Terminator films. It is more like a gradual loss of control: humans handing over more and more decision-making authority to AI systems, until at some point it becomes difficult to take it back.
If we have learned one lesson from the nuclear arms race, it is this: the actors inside the race kept going even knowing that it was dangerous. Because the fear of âif I stop, the other side wonâtâ overrode everything else. During the Cold War, neither side wanted to be in the dangerous situation they were in, but each found it rational to continue the race.
The same logic now applies to AI. And this time, the difference is this: nuclear weapons were at least subject to physical limitations. We do not yet know whether the AI self-improvement loop has a physical upper limit.
The water is rising. The question in my mind is: how much higher can it go?
Sources
Matt Shumer, âSomething Big Is Happeningâ â Fortune, 11 Feb 2026
âMatt Shumerâs viral blog is based on flawed assumptionsâ â Fortune, 12 Feb 2026
âInvestor Matt Shumer says viral essay wasnât meant to scare peopleâ â CNBC, 13 Feb 2026
Gary Marcus, âAbout that Matt Shumer postâ â garymarcus.substack.com
âIntroducing GPT-5.3-Codexâ â OpenAI, 5 Feb 2026
Dean W. Ball, âOn Recursive Self-Improvement (Part I)â â hyperdimensional.co, Feb 2026
Tyler Cowen, âRecursive self-improvement from AI modelsâ â Marginal Revolution, 10 Feb 2026
âThe Ultimate Risk: Recursive Self-Improvementâ â ControlAI, Dec 2025
âIs research into recursive self-improvement becoming a safety hazard?â â Foom Magazine, Feb 2026
âA Race to Extinctionâ â Harvard International Review
âThere can be no winners in a US-China AI arms raceâ â MIT Technology Review, Jan 2025
âSafety Versus Profits â the AI Arms Raceâ â Architecture & Governance, Sep 2025
âAlignment Faking in Large Language Modelsâ â Anthropic Research
âTracing the Thoughts of a Large Language Modelâ â Anthropic / Alignment Forum
âOpus 4.6, Codex 5.3, and the post-benchmark eraâ â Interconnects, Feb 2026


