Will Better AI Erase the Limits of Solo Development? Five Questions Where the Optimistic and Pessimistic Scenarios Diverge

Posted Jun 4, 2026

10 min read

AI-Generated Content

This article was generated by AI. The accuracy of the content is not guaranteed, and we accept no responsibility for any damages resulting from use of this article. By continuing to read, you agree to the Terms of Use.

Who this is for: Engineers and developers who aren’t satisfied with where AI is today and want to build “what happens next” into their own decision-making criteria
Assumed background: Basic hands-on experience with AI coding tools
Reading time: 10 minutes

Overview

“Once AI gets smart enough, one person will be able to build anything.” This claim and its counterargument—”Even as AI advances, some things will only ever be doable by humans”—are both being made in 2026 with a fair amount of supporting evidence.

There’s an awkward structure here. Which of these premises you hold changes your career choices, your hiring strategy, and your product investment decisions. Stack those decisions on top of each other, and three years out you end up standing in a completely different place.

In the companion article (The Conditions Under Which Solo Development Works—and Five Patterns Where It Breaks), I laid out today’s limits. This article answers a different question: as AI keeps improving, how far do those five patterns change, and what stays the same?

To state the conclusion up front: the “optimistic scenario” and the “pessimistic scenario” lead to completely different predictions. No one can declare which is right at this point. What matters is being conscious of which premise you are operating on.

Setting the terms: defining optimism and pessimism

Before we begin, let me align on terminology.

Optimistic scenario: AI capability keeps improving at the current pace, and within two to five years agentic AI becomes able to autonomously complete the majority of complex software development tasks. Progress on benchmarks transfers to real-world production environments.

Pessimistic scenario: Benchmark scores keep rising, but the problems of real-world complexity, context-dependence, and accountability are not solved by improved performance. Domains that require expert human judgment continue to remain.

Neither position is “AI is useless.” The difference lies in the estimate of “how far substitution reaches” and “what human role remains.”

Question 1: Will the security holes be sealed?

Optimistic scenario

In 2026, researchers’ “vericoding”—an approach in which AI generates code that is formally proven correct against a specification—has entered the demonstration stage. Meanwhile, an AI-driven vulnerability discovery system called “AISLE” autonomously found 12 vulnerabilities in the January 2026 OpenSSL security release¹. If agents can run loops of self-testing and self-correction, there is a possibility that vulnerability “generation and verification become self-contained within a single agent.”

Pessimistic scenario

What Veracode’s 2025 report showed is that even as model generations advance, “the ability to write functionally correct code” and “the ability to write secure code” move independently of each other. On top of that, the spread of agentic AI has created a new attack surface. The OWASP Top 10 for Agentic Applications 2026 catalogs agent-specific risks—goal hijacking, tool misuse, memory poisoning². A double challenge has emerged: defending security with AI while keeping AI itself from causing security problems.

The underlying question: whether verification is “confirmation that can be automated” or “confirmation that requires human judgment”—that very classification shifts as AI evolves.

Question 2: Will the specialized-domain trap disappear?

Optimistic scenario

Specialized AI for medicine, law, and finance is rapidly improving in accuracy. More importantly, the regulatory frameworks themselves have begun to be built on the assumption that AI will be used. If a “human-supervised” structure is established—one that keeps logs of the basis on which AI made a decision and has humans audit them—then the barrier to a single developer partnering with AI to build a specialized-domain product itself comes down.

Pessimistic scenario

Where legal responsibility sits does not move away from humans no matter how smart AI becomes. Regulations such as the EU AI Act mandate human oversight for decisions made by “high-risk AI systems,” and the bar is moving in the direction of getting stricter. The essence of the “specialized-domain trap” is not a problem of AI accuracy but a problem of the structure of accountability. Even if AI is 99% accurate, who takes responsibility when the remaining 1% happens does not change.

The underlying question: “an error in AI’s output” and “who that error is attributed to” are separate. Even as performance rises, the latter does not change.

Question 3: Will agents clear the wall at scale?

Optimistic scenario

On SWE-bench Verified (a benchmark for autonomously resolving real GitHub Issues), top models reached 93.9% as of April 2026³. That’s a level of autonomously resolving roughly nine out of ten real tasks. If agents can run a cycle of autonomously executing performance tests, analyzing bottlenecks, and improving architecture, then the scaling problem shifts from “something a solo developer can’t notice” to “something the agent handles automatically.”

Pessimistic scenario

There’s an important caveat to the SWE-bench numbers. OpenAI’s investigation confirmed that 59.4% of the hardest unsolved problems had flawed test cases³, and on the uncontaminated, higher-difficulty SWE-bench Pro the same model drops to 45.9%. Furthermore, research indicates that as a codebase grows, “reading code” becomes more of a bottleneck than “writing code,” and scaling that comprehension ability remains an open challenge⁴. The complexity of production environments—unexpected user behavior, skewed data distributions, dependencies on other systems—does not show up in benchmarks.

The underlying question: whether success on controlled tasks transfers to uncontrolled production environments is a separate matter.

Question 4: Will technical debt resolve itself automatically?

Optimistic scenario

If the ability to grasp an entire codebase and perform consistent refactoring improves, AI may become able to periodically clean up the duplication, contradictions, and design drift that arise from a “paste it and make it run” style. If not just the “speed of writing” but also the “ability to organize” improves, there may come a point where the speed of debt accumulation and the speed of resolution flip.

Pessimistic scenario

The core of technical debt is not “code quality” but “the transmission of design intent.” “Why was it structured this way?” “Which trade-offs were in mind when it was written?”—this information is not written in the code; it lives in human heads. An empirical study (arXiv 2603.28592) suggests that technical debt in AI-generated code arises not from “the volume of code” but from “the absence of context”⁵. Even if AI can organize code, without recorded intent there’s no way to confirm whether the next decision is “organizing in the right direction.”

The underlying question: “writing code with intent” and “passing that intent on” are different acts. The latter is inseparable from the human practice of documentation and design records.

Question 5: Will data solve the “make it mine” problem in UX?

Optimistic scenario

In an environment where large amounts of user-behavior data can be collected, AI can autonomously run automated A/B tests, analyze click heatmaps, and serve different screens by user segment. The parts where designers used to decide “what resonates with users” by gut feeling could potentially be replaced in a data-driven way.

Pessimistic scenario

An analysis of vibe-coding UX (arXiv 2509.10652) points out that the problem of UX design lies not in knowledge of “what users click” but in empathy for “what users are trying to understand”⁶. User interviews, walkthrough tests, and contextual inquiry are processes for drawing out the verbalization of “why they did it”—not behavioral data. This cannot be replaced even as the volume of data grows: the measurements increase, but understanding does not.

The underlying question: predicting “what users will do” and understanding “what users intend” require different kinds of information.

Three structures that remain beyond either scenario

Lining up the five questions, certain “hard-to-change” structures emerge regardless of the optimism/pessimism divide.

1. Accountability is not solved by performance

No matter how accurate AI is, the answer to “who takes responsibility” does not change. This is not a technical problem but a problem of social and legal design.

2. Context continuity is held by someone

Why a product ended up with this spec, which user’s voice it came from, what was decided to be cut—this context does not remain in the code or the logs. Holding it and carrying it into the next decision tends to remain a human role.

3. Asymmetric verification does not vanish easily

The cost of AI generating an output keeps falling, but “confirming whether that output is correct” is a separate problem. In specialized domains, finding an error requires correct knowledge, and that is independent of AI’s generative ability.

Conclusion: less “which is right” than “which one am I operating on”

The optimistic scenario is not baseless fantasy. SWE-bench at 93.9%, formal verification, autonomous vulnerability discovery—these are things happening today. At the same time, the pessimistic scenario is not baseless conservatism. Benchmark contamination, flat security, the invariance of accountability—today’s data shows these too.

The problem is that which premise you hold changes your actions now.

On an optimistic premise, the rational move now is “invest in AI and trim headcount.” Not hiring specialists, and waiting for AI’s coverage to expand, comes into view as an option. On a pessimistic premise, you need to start now: investing in humans with specialized knowledge, building the habit of recording design intent, and constructing user-research processes.

We don’t know which is right. But operating unconsciously on one premise or the other is dangerous.

Being aware of whether you lean optimistic or pessimistic, and continuing to deliberately watch for the signals that would break your premise—this, I think, is the most solid answer available right now for not misjudging in a rapidly changing environment.

You may also be interested in these related articles:

“With AI, You Don’t Need Designers or Engineers”—The Conditions Under Which Solo Development Works, and Five Patterns Where It Breaks - A companion article organizing today’s failure patterns
Work You Can Hand to AI, and Work You Must Never Let Go Of - The delegation line from an individual-skills perspective
The “Solo Development” Bus Factor Problem in the AI Era - The perspective of organizational continuity risk
Does AI Really Erode Your Thinking Ability? - A scientific look at cognitive offloading
From “Tightly Coupled” to “Loosely Coupled” Team Collaboration - A productivity-design argument of team vs. solo

References

References corresponding to the citation numbers in the text are listed in numerical order.

Additional references (not cited by number in the text)

Why SWE-bench Verified no longer measures frontier coding capabilities - OpenAI (2026). OpenAI’s view on the limits of the benchmark. [Reliability: Medium–High]
GenUI vs. Vibe Coding: Who’s Designing? - Nielsen Norman Group (2025). The limits of AI-generated UI in UX design. [Reliability: Medium–High]
2025 GenAI Code Security Report - Veracode (2025). The flat-security problem of AI-generated code. [Reliability: Medium–High]

AISLE Discovered 12 out of 12 OpenSSL Vulnerabilities - AISLE (2026). The primary announcement reporting that an AI-driven vulnerability discovery system autonomously found 12 vulnerabilities (some of them long-standing) in the January 2026 OpenSSL release. Also reported by outlets such as Schneier on Security. [Reliability: Medium] ↩︎
State of AI Agent Security 2026 Report: When Adoption Outpaces Control - Gravitee.io (2026). The reality of agentic AI adoption outpacing control. 80.9% of AI-adopting organizations have agents in testing or production, but only 14.4% release them through full security approval. [Reliability: Medium] ↩︎
SWE-bench Leaderboard 2026: All Model Scores, Rankings & What They Actually Mean - CodeAnt AI (2026). A commentary on the trajectory of SWE-bench Verified scores, including the contamination problem. Includes OpenAI’s report of 59.4% flawed test cases. [Reliability: Medium] ↩︎ ↩︎²
SWE-AGI: Benchmarking Specification-Driven Software Construction with MoonBit in the Era of Autonomous Agents - arXiv:2602.09447 (2026). The code-comprehension bottleneck as codebase size grows. [Reliability: Medium (preprint)] ↩︎
Debt Behind the AI Boom: A Large-Scale Empirical Study of AI-Generated Code in the Wild - arXiv:2603.28592 (2026). An empirical study of technical debt in AI-generated code. [Reliability: Medium (preprint)] ↩︎
Vibe Coding for UX Design: Understanding UX Professionals’ Perceptions of AI-Assisted Design and Development - arXiv:2509.10652 (2025). UX professionals’ perceptions of vibe coding. [Reliability: Medium (preprint)] ↩︎

AI・Technology

This post is licensed under CC BY 4.0 by the author.

Overview

Setting the terms: defining optimism and pessimism

Question 1: Will the security holes be sealed?

Question 2: Will the specialized-domain trap disappear?

Question 3: Will agents clear the wall at scale?

Question 4: Will technical debt resolve itself automatically?

Question 5: Will data solve the “make it mine” problem in UX?

Three structures that remain beyond either scenario

Conclusion: less “which is right” than “which one am I operating on”

Related Articles

References

Additional references (not cited by number in the text)

Trending Tags