Something is Coming

The next wave is verified binaries, not better autocomplete

Feb 12, 2026

Matt Shumer’s “Something Big Is Happening” went viral this week and frames AI as a shock event on the scale of COVID. The urgency is real, but the framing is already behind. AI writing code is the present. AI shipping executables is the next step, and it is closer to capital markets technology than most banks want to admit.

People keep asking why AI got good at coding before it got good at everything else. The usual explanation is tech bias rather than strategy. That explanation misses the point, and the question is becoming obsolete anyway because the shift from code generation to verified output generation is already underway.

Code is the control plane. Once a system can generate, modify, test, and ship software, you have a factory that turns intent into production systems. That factory will not stop at human-readable source code, because source code becomes an intermediate artifact once the pipeline can prove correctness through tests, evaluation harnesses, and runtime verification.

Why software came first

Software has one advantage that most white-collar domains do not: you know when you are wrong. Code compiles or it does not. Tests pass or fail. Latency fits the budget or blows it up. There’s no “it feels about right” or “let’s circle back after we socialize this.” You get automatic scoring, infinite practice problems, and instant feedback, which is why software became the first target.

This also explains why the “AI can’t replace judgment” comfort line is aging badly. Judgment becomes legible once it’s operationalized into constraints and evaluated against outcomes. Coding happened to be the domain where that loop could be tightened first.

From code to binaries

The most important part of the story is that “coding first” is not about Python or Rust. It is about producing verified, working systems regardless of the intermediate representation.

Anthropic’s work on building a C compiler is the cleanest public hint of where this goes. They ran 16 Claude agents in parallel against a shared repository, coordinated through a simple locking mechanism. Nearly 2,000 sessions later, the output was a roughly 100,000-line Rust-based C compiler that can build Linux 6.9 across multiple architectures. The detail that matters is not “look, lots of Rust.” The detail that matters is that a working compiler emerged from iterative testing against a specification, with CI and constraints doing most of the steering rather than a human reading every line.

That pattern generalizes. Once the system can go from requirements and tests to a verified executable, human-readable code becomes evidence rather than the artifact. Code review still has a role while intent is unclear and tests are incomplete, but it stops being the primary gate when proof becomes the gate.

The compounding problem

AI labs are software factories. When Claude gets better at coding, it gets better at improving the pipeline that builds the next Claude. Better data pipelines. Tighter evaluation harnesses. Cleaner orchestration. Faster debugging. This is an assembly line upgrade rather than nicer demos.

The compiler example makes the compounding tangible. Parallel agents, shared repo, constraints and tests enforcing honesty through evidence rather than supervision. That’s what “coding first” actually unlocked: long-running autonomous work inside an environment that can correct and verify itself.

What this means for banks

Banks should care, but not because “developers will disappear.” The first wave consumes friction around the business rather than replacing the business. Markets IT is full of glue work: adapters, pipelines, reconciliations, test scaffolding, entitlement plumbing, audit evidence packaging, and the operational choreography that turns a trading decision into a controlled capability.

A request seems simple until entitlements show up, then data lineage, then downstream accounting, then control evidence, then UAT environments that don’t match prod, then vendor release cycles that run on geological time. The work explodes because the surrounding system is messy, not because the idea is hard.

AI compresses that build layer fast. Service wrappers, adapters, mocks, UI stubs for ops, scripts that clean messy data, boilerplate logging, and alerting become cheap when generation actually works. That gets even cheaper when the output is not “here’s code,” but “here’s a verified executable that satisfies these constraints.”

This is the part most banks misread. Regulated environments already run on black-box validation. Vendor systems ship as binaries. Third-party models are opaque. Control frameworks are built around behavior verification rather than implementation transparency. A world where AI ships verified binaries is not alien to banking. It’s familiar. The novelty is the speed.

The constraint layer does not compress at the same rate. What are we allowed to do with client data. Who owns this dataset. What counts as an exception. What evidence satisfies auditors. Who signs off and carries liability. These are operating model questions rather than coding questions, and most banks are weak there.

AI exposes that weakness. Vague ownership gets surfaced faster. Hand-wavy data architecture spawns endpoints nobody can govern. Checkbox controls let you ship faster, which means louder incidents, messier remediation, and longer audit fights. Then everyone blames “AI risk” instead of admitting the boring truth: the operating model was incoherent before the AI arrived.

The labor shift nobody wants to say out loud

Entry-level roles exist because seniors need leverage. Juniors produce first drafts, middle layers absorb friction, seniors carry accountability. AI changes the economics of that leverage.

Fewer juniors makes sense because first drafts become cheap. A thinner middle makes sense because coordination and follow-up get automated. More senior operators become critical because constraint definition, verification strategy, and accountable sign-off do not disappear.

That shift accelerates if the intermediate artifact becomes a verified binary rather than readable code, because the traditional learning path of “read code, internalize patterns, build intuition” loses relevance. You cannot read a binary. You can only specify, test, observe, and control.

Markets technology gets hit early because failure is priced immediately. Demos become cheap, shipping becomes cheap, and failure becomes cheap, which pushes differentiation toward detection speed, rollback discipline, blast-radius containment, and evidence of control. DORA matters because it decides whether faster building becomes real capacity rather than higher incident volume.

The uncomfortable truth

AI makes building cheaper, which puts a spotlight on everything that prevents building from becoming production outcomes. Slow prioritization fora hurt more. Ambiguous architecture hurts more. Performative data ownership hurts more. Teams ship faster and break faster, and the gains get eaten by incident calls and remediation unless someone owns constraints with teeth.

Banks that treat this as “AI risk” will build elaborate approvals for generated code. Banks that understand what is coming will invest in specification discipline, verification frameworks, and operational telemetry that works regardless of how the executable was produced. That group ships faster and breaks less because it optimized for the world that is arriving rather than the world where code review is the center of gravity.

Lars

The Binary Beat

Discussion about this post

Ready for more?