Where "AI Means You Don't Need an Axis" Falls Apart: AI Only Made Implementation Cheap
This article was generated by AI. The accuracy of the content is not guaranteed, and we accept no responsibility for any damages resulting from use of this article. By continuing to read, you agree to the Terms of Use.
- Who this is for: Engineers who feel (or are being told) that “with AI, deep specialization is no longer needed,” and tech leads and HR who plan engineer development and team structure on an AI-first basis
- Assumed knowledge: Some exposure to AI coding assistants (Copilot, agentic tools, and the like)
- Reading time: about 14 min
Overview
“With AI around, you don’t need a deep specialist axis anymore.” In the career discourse of the AI era, this voice keeps getting louder. AI handles the implementation; whatever you don’t know, you can just ask the AI.
But this argument overlooks one thing. What AI made cheap is only the middle of software development—the implementation step of writing code. Implementation is only 25–35% of the entire pipeline from concept to production release1. The rest—”what to build” (requirements, specs, product judgment) and “how to bring it to production quality” (review, design, security, productionization)—has not gotten cheaper. If anything, the more AI generates, the heavier the review and verification load becomes. Developers’ single biggest frustration is now fixing AI output that is “almost right, but subtly wrong”2.
In other words, the axis did not disappear. The place it’s needed shifted—reassigned from the middle to both ends. The product axis of “what to build” and the technical axis of “how to get it to production”—AI has taken over neither. It’s not that “you no longer need an axis”; it’s that even people without an axis can now start building. The entrance widened, but the exit (actually shipping to production) is unchanged.
This article starts from that picture—”only the middle got cheap”—and uses primary sources to examine, one by one, the chain of “with AI, you don’t need X” arguments: you can win shallow-and-broad / you don’t need an axis at all / just mass-produce / you don’t need juniors. They all land in the same place. AI amplifies people who have an axis and, conversely, sinks those who don’t.
This piece is a sister article in a trilogy. The structural argument for “can you actually succeed shallow-and-broad without building an axis” is covered in the main article, and “if AI does it, can checking be light?” is covered in the other sister article.
AI only made implementation cheap—the axis is reassigned to both ends
First, let’s clear up the biggest misconception: conflating “AI writes code” with “AI does development.”
Writing code is only one part of the whole of software development. One consulting firm’s analysis reports that coding and testing make up 25–35% of the entire concept-to-launch pipeline, and that speeding that part up just pushes the bottleneck “downstream, into review, integration, and release”1. Indeed, in shops where AI-generated code has grown, the load has moved to review and verification. In a survey of roughly 49,000 developers, the top frustration was fixing AI output that is “almost right, but subtly wrong,” and trust in AI accuracy actually fell2.
This picture—”the middle gets cheap, both ends come to the foreground”—is starting to show up as a division of labor in the field. One SaaS company opened a coding agent to its entire staff (including product managers, designers, and support), and in 60 days 11 non-engineers merged 40 pull requests. But the operating rule was explicit: non-engineers and agents create the code; engineers review and merge it. Direct access to production infrastructure, deploying without review, and access to sensitive data are all off-limits to non-engineers3. “Building” was democratized, but “finishing to production quality” stayed with engineers—the people who hold the technical axis.
The well-known engineer Simon Willison has named and framed this shift “vibe engineering.” Now that anyone can produce code with AI, an engineer’s differentiator has moved from writing code to engineering discipline: planning, specs, testing, review, productionization, and accountability. In his words: if you review the LLM-written code, test it thoroughly, and can explain how it works, then that isn’t vibe coding—it’s software development4.
And what’s easily missed is the other end—“what to build” needs an axis too. The reason non-engineers can crank out “good” prototypes at blistering speed is precisely that they hold an axis of understanding for users and the product. Throw a vague instruction at it, and AI fills the gaps with its own assumptions, and the result collapses5. The ability to define requirements precisely and judge what is worth building—that is exactly what this article’s sister, the main piece, calls the non-technical axis.
To sum up: what AI made cheap is only the middle, implementation. Both ends—”what to build” (the product axis) and “how to get it to production quality” (the technical axis)—have not gotten cheaper. So—
- If the builder holds the technical axis (an engineer), the axis is baked in from the start, and one person can carry it all the way to production. The distance from prototype to production is short.
- If the builder lacks the technical axis (a non-engineer / raw AI output), the axis shows up downstream as a division of labor. Someone has to finish it.
Either way, the total amount of axis does not disappear; only where you invest it changes. A person who truly holds no axis at all—neither a product axis nor a technical axis—can’t even make a good prototype with AI, let alone finish it for production.
An honest caveat. The geometry itself—that “value concentrates at both ends”—is still in the realm of opinion and observation. There’s a competing view that “value is instead distributed across the whole pipeline flow.” And the strong claim that “implementation was commoditized overnight” is also overrated—there’s even a measurement showing that skilled developers get slower when using AI on a codebase they know well6. But that “slowdown” means the more skilled you are, the more you pay to verify AI output—which, ironically, supports the main thread: the verification axis has come to the foreground. This article does not assert “concentration at both ends”; it stays within the bound that “the place an axis is needed has been reassigned from the middle to both ends”7.
Can “shallow and broad” win with AI?—it sinks instead
Here comes a fair counterargument: “Even if both ends need an axis, surely a shallow-and-broad person can use AI to fill in the missing parts?”
True, there is data pointing toward AI leveling skills. In a GitHub Copilot experiment, the treatment group was about 56% faster, and less-experienced developers benefited the most8. Large RCTs across multiple companies likewise found productivity gains were significant among juniors9. The “jagged frontier” study of consultants found that within the range where AI is capable, the bottom half improved by 43%—a genuine lift10.
But the lift works only on routine, isolated tasks with a visible answer. That same jagged-frontier study also reported that beyond the range where AI is capable, the AI-using group did 19% worse10. And the clearest demonstration that results split on whether you have an axis came from an RCT of 640 entrepreneurs. Given AI, high performers improved by more than 15%, while low performers got roughly 10% worse. The difference lay in “discerning what to delegate to AI and what to judge yourself”11. Far from closing the gap, AI widened it.
The skilled side is consistent too. As noted above, the deeper a developer’s context, the higher the cost of verifying AI output—sometimes slowing them down6. When one large study summarized AI’s effect as “it doesn’t fix teams, it amplifies them,” it was pointing to this asymmetry12. Without an axis, there is no foundation to amplify in the first place.
“Benchmarks are high, so no axis needed”—the numbers are crumbling
The most extreme counterargument is: “AI does all the specialist work, so there’s no point in a human holding an axis.” The famous line “natural language is the new programming language” captures that mood13.
The basis for it usually boils down to one thing: the fact that AI posts high scores on coding benchmarks14. But—those very numbers are now shaking badly. In one benchmark, about 30% of the successful cases had the solution embedded in the task description or comments, and most of the tasks were created during the period when they would have entered the models’ training data. Memorization, not reasoning, is mixed in. Tellingly, OpenAI itself stopped using a flagship benchmark, stating it could “no longer measure frontier coding ability”15. The strongest numerical basis for “you don’t need an axis” was padded with contamination and memorization.
Even granting AI’s capability, “therefore humans don’t need an axis” doesn’t follow. As the automation classic “Ironies of Automation” notes, the more automation advances, the more important and difficult the human’s leftover work—handling exceptions and monitoring—becomes16. In software, this shows up as a “verification gap.” Subjects using an AI assistant wrote significantly more vulnerable code, and were also more likely to mistakenly believe they had written secure code17. Packages recommended by AI don’t exist over 20% of the time in some models (21.7% for open-source models), making them a breeding ground for a new class of attack18. The net that catches “almost-right errors”—that is the axis.
“Fast, so just mass-produce”—it returns as debt
“Since you can build fast, just mass-produce and worry about quality and design later.” This doesn’t hold either. A large estimate finds that for every 25% increase in AI adoption, delivery stability drops by about 7%19, and a study analyzing over 200 million lines found duplicated code rose from 8.3% in 2021 to 12.3% in 2024—and 2024 became the first year in history in which “added duplication” surpassed “refactoring”20. A Fortune-50-scale analysis found that as AI-generated code grew, paths leading to privilege escalation increased by a staggering 322%21. “Shallow, broad, mass-produced” comes back as measurable debt. The only people who can fend off that debt are those who hold the design and security axis.
The derivative arguments are the same. “You don’t need fundamentals (just ask the AI)”—but “almost-right errors” are invisible to those without fundamentals2. “You don’t need to learn programming”—if software could be written rigorously in natural language, there would have been no need to invent programming languages in the first place. The essential complexity of software does not vanish when you change the tool22. “You don’t need to learn for yourself (AI will teach you)”—results come in the short term, but skipping the hard-won process of acquiring skills thins out long-term skill formation23. Every one of these holds only when a person with an axis uses AI, and breaks down when a person without an axis dumps the whole thing on it.
Even when you become the one “hiring” AI
You can also reframe this as a change of stance. Using AI means assigning work, evaluating the results, and bearing responsibility—in effect, the role of the one who hires, the manager. This shift has even been described as “every employee becomes the boss of agents”2425 (though it’s heavily a vision of the future—only a handful of companies have fully entrusted core operations to AI26).
The problem comes next. What a good employer needs is an eye for telling good results from bad, and the judgment to allocate what to delegate versus what to keep for yourself—and that, in the end, is the axis itself. As the “jack-of-all-trades” theory holds, an entrepreneur (= the one who hires) is constrained across the whole business by their weakest skill27. The more someone lacks the axis of evaluation and judgment, the worse an employer they make, dumping everything on AI and causing accidents. The more that AI can be entrusted with, the more everyone is pushed toward being the one who hires. And to be a good employer, you need an axis.
“No need for juniors”—the soil that grows axes dries up
Finally, one organizational argument: “Since AI takes over routine work, we don’t need junior hires or new-grad training.”
That hiring is thinning is a fact. A study using U.S. payroll data reported that in occupations more exposed to AI, employment of young workers (ages 22–25) fell relatively, and young software developers dropped about 20% from the peak28. A large analysis also finds that at companies adopting GenAI, junior hiring fell while seniors kept rising (both are preprints, and note this is reduced hiring, not layoffs)29. That said, this decline overlaps with the period of shrinking tech hiring driven by rate hikes, so AI cannot be pinned down as the sole cause30.
The problem is that there’s a pipeline trap here. Seniors are the grown-up form of people who built an axis by getting their hands dirty in their junior years. If you don’t grow juniors, seniors run dry a few years later. Microsoft technical executives, too, urge companies to keep hiring and developing young talent—even accepting an initial productivity dip31. “Since we have AI, we don’t need new grads” risks, in exchange for near-term efficiency, drying up the very soil that grows axes.
Summary
- What AI made cheap is only the middle—implementation. Implementation is only 25–35% of the whole pipeline; both ends—”what to build” (the product axis) and “how to get it to production quality” (the technical axis)—have not gotten cheaper1234.
- The axis did not disappear; the place it’s needed was reassigned from the middle to both ends. If the builder has an axis it’s baked in from the start; if not, it shows up downstream as a division of labor. The total amount of axis is unchanged. A person who truly has no axis at all can neither build nor finish, even with AI.
- “You can win shallow-and-broad with AI” doesn’t hold. Leveling only works on routine, isolated tasks, and people without an axis sink instead1011. AI amplifies people who hold an axis12.
- “Benchmarks are high, so no axis needed” can’t be supported either. Those numbers have lost their reliability to contamination and memorization15, and the more automation advances, the more a verification axis is needed (Ironies of Automation)1617.
- “Fast, so just mass-produce” returns as quality, stability, and security debt—measurable enough to count192021.
- The one who uses AI shifts to being “the one who hires,” but a good employer needs an axis of evaluation and judgment2427.
- “We don’t need new grads” carries a pipeline trap that dries up the soil that grows axes (though AI can’t be pinned as the sole cause)282930.
- An honest caveat: the geometry of “concentration at both ends” is observation-based, and there’s a competing view that “value is distributed across the whole flow.” “Instant commoditization of implementation” is also overrated—skilled developers even slow down with AI6. This article stays within the bound that “the place an axis is needed has been reassigned.”
This article is a sister piece in a trilogy.
- Why the Axis-less Generalist Hits a Ceiling — why you need an axis (the structural argument)
- Is “AI Does It, So Checking Can Be Light” True? — how to design the substance of checking
Related
Take a look at other articles related to this theme:
- Why the Axis-less Generalist Hits a Ceiling - the main piece of this series; argues the necessity of an axis from structure
- I-shaped, T-shaped, π-shaped: A Skill Matrix of Depth and Breadth - a systematic take on defining “axis,” “depth,” and “breadth”
- Generalist or Specialist? AI-Era Role Design That Changes with Company Size - role differentiation in the AI era, argued from the organizational-design side
References
References corresponding to the citation numbers in the body are listed in order.
From Pilots to Payoff: Generative AI in Software Development - Bain & Company (2025). Coding and testing are 25–35% of the whole pipeline; speeding them up pushes the bottleneck downstream (review, integration, release). 【Reliability: medium–high (consulting study)】 ↩︎ ↩︎2 ↩︎3
Stack Overflow Developer Survey 2025: AI - Stack Overflow (2025). About 49,000 respondents. Trust in AI accuracy fell to 29%; 66% spend time fixing output that is “almost right, but subtly wrong.” 【Reliability: medium–high】 ↩︎ ↩︎2 ↩︎3 ↩︎4
Our entire company ships code now: 40 PRs from non-engineers in 60 days - epilot dev blog (2026). 11 non-engineers merged 40 PRs in 60 days; the operating model is “non-engineers/agents build, engineers review and merge.” A self-reported case (caveat: a vendor-leaning success story). 【Reliability: medium】 ↩︎ ↩︎2
Vibe engineering - Simon Willison (2025). With AI letting anyone produce code, the differentiator moves from code generation to planning, specs, testing, review, productionization, and accountability. A primary essay by a well-known engineer (observation-based). 【Reliability: medium (opinion)】 ↩︎ ↩︎2
Spec-driven development with AI: Get started with a new open-source toolkit - GitHub Blog (2025). Vague instructions get filled by AI’s own assumptions and the result collapses; specs are framed as a device for naming decisions up front. 【Reliability: medium (vendor dev blog)】 ↩︎
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity - METR (2025). RCT with 16 experienced OSS developers and 246 tasks. Completion was 19% slower with AI, while participants mistakenly believed they were faster. Small-scale and context-specific—generalize with care. 【Reliability: medium–high】 ↩︎ ↩︎2 ↩︎3
When AI turns software development inside out - VentureBeat (2026). Observation that the geometry of development flips from “diamond” to “barbell,” with humans engaging deeply at both ends—requirements definition and result verification. Opinion (not empirical). 【Reliability: medium (opinion)】 ↩︎
The Impact of AI on Developer Productivity: Evidence from GitHub Copilot - Peng, Kalliamvakou, Cihon, Demirer, arXiv:2302.06590 (2023). Treatment group ~56% faster; less-experienced developers benefited most. Single isolated task, code quality unmeasured, heterogeneous effects at suggestion level. 【Reliability: medium】 ↩︎
The Effects of Generative AI on High-Skilled Work: Evidence from Three Field Experiments with Software Developers - Cui et al., Management Science (2025). RCT across three companies, 4,867 people. Productivity gains were significant among juniors and those with shorter tenure. 【Reliability: high】 ↩︎
Navigating the Jagged Technological Frontier - Dell’Acqua et al., Organization Science (2026). BCG experiment with 758 people. Within the frontier, bottom 43% / top 17% improved; beyond the frontier, the AI group did 19% worse. 【Reliability: high】 ↩︎ ↩︎2 ↩︎3
The Uneven Impact of Generative AI on Entrepreneurial Performance - Otis et al., Harvard Business School Working Paper 24-042. RCT of 640 entrepreneurs in Kenya. High performers improved, low performers got worse, and the gap widened. 【Reliability: medium–high】 ↩︎ ↩︎2
DORA 2025: State of AI-assisted Software Development - Google Cloud DORA (2025). “AI doesn’t fix teams, it amplifies them.” AI adoption is negatively related to stability. 【Reliability: medium】 ↩︎ ↩︎2
The hottest new programming language is English - Andrej Karpathy (2023). The view that “natural language is the new programming language.” Cited not as evidence but as an individual opinion symbolizing the no-axis-needed argument. 【Reliability: needs verification (individual opinion)】 ↩︎
SWE-bench Verified - A human-verified coding benchmark measuring resolution of real-repo issues (official). For interpreting the scores, see the caveat in 15. 【Reliability: medium–high (official; caveat on measurement validity)】 ↩︎
Cluster of contamination/memorization findings: Why we no longer evaluate on SWE-bench Verified - OpenAI / SWE-Bench+ (solution leakage) / memorization-or-reasoning analysis. Solution leakage, weak tests, and training-data contamination inflate high scores. 【Reliability: medium–high】 ↩︎ ↩︎2 ↩︎3
Ironies of Automation - Lisanne Bainbridge, Automatica 19(6), 775–779 (1983). DOI: 10.1016/0005-1098(83)90046-8. A classic showing that as automation advances, the human’s monitoring, exception handling, and judgment become more important and more difficult. 【Reliability: high】 ↩︎ ↩︎2
Do Users Write More Insecure Code with AI Assistants? - Perry, Srivastava, Kumar, Boneh, ACM CCS (2023). The AI-assistant group wrote significantly more vulnerable code and was more likely to wrongly believe it was secure. 【Reliability: high】 ↩︎ ↩︎2
We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs - Spracklen et al., USENIX Security Symposium (2025). 5.2% of recommended packages in commercial models and 21.7% in open-source models don’t exist (~205,000 hallucinations), which can become a breeding ground for slopsquatting attacks. 【Reliability: high】 ↩︎
2024 Accelerate State of DevOps Report - Google DORA (2024). For every 25% increase in AI adoption, delivery stability is estimated to drop about 7.2%. 【Reliability: medium–high】 ↩︎ ↩︎2
AI Copilot Code Quality: 2025 Research - GitClear (2025). Over 210 million lines analyzed. Duplicated code rose 8.3% (2021) → 12.3% (2024); in 2024, added duplication exceeded refactoring. Vendor analysis, correlational—causality unestablished. 【Reliability: medium】 ↩︎ ↩︎2
4x Velocity, 10x Vulnerabilities: AI Coding Assistants Are Shipping More Risks - Apiiro (2025). Fortune-50 scale, thousands of repos. As AI-generated code grew, privilege-escalation paths +322% and design flaws +153%. Vendor analysis. 【Reliability: medium】 ↩︎ ↩︎2
No Silver Bullet—Essence and Accident in Software Engineering - Frederick P. Brooks, IEEE Computer (1986). A classic argument that the essential complexity of software cannot be removed by tools. 【Reliability: high (classic)】 ↩︎
Kids who offload critical thinking to AI may learn less - Hechinger Report (2024). Summarizes studies in BJET and others. The more heavily ChatGPT is used, the worse the later performance on AI-free tasks. 【Reliability: medium】 ↩︎
2025 Work Trend Index: The Year the Frontier Firm Is Born - Microsoft WorkLab (2025). Proposes that “every employee becomes an agent boss.” Caveat: from a vendor with its own AI products, and a projection. 【Reliability: medium (vendor study; projection)】 ↩︎ ↩︎2
To Thrive in the AI Era, Companies Need Agent Managers - Srinivasan & Wei, Harvard Business Review (2026). Argues for the emergence of an “agent manager” role overseeing AI agents. Opinion based on a single case. 【Reliability: medium (opinion)】 ↩︎
Harvard Business Review survey: only 6% of companies trust AI agents - Fortune reporting / HBR Analytic Services (2025). Only 6% of companies have fully entrusted core operations to AI agents. 【Reliability: medium】 ↩︎
Balanced Skills and Entrepreneurship - Edward P. Lazear, American Economic Review (2004) / Journal of Labor Economics (2005). The “jack-of-all-trades” theory. A model in which entrepreneurs = generalists (the business is constrained by their weakest skill) and the employed = specialists. 【Reliability: high (theory)】 ↩︎ ↩︎2
Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of Artificial Intelligence - Brynjolfsson, Chandar, Chen, Stanford Digital Economy Lab (2025). ADP payroll data. In high-AI-exposure occupations, relative employment of ages 22–25 fell about 16%, and young software developers dropped about 20% from the peak. The authors themselves note other possible factors. 【Reliability: medium–high】 ↩︎ ↩︎2
Generative AI as Seniority-Biased Technological Change - Hosseini & Lichtinger, SSRN 5425555 (2025). Data on 62 million people and 285,000 firms. At GenAI-adopting firms, junior hiring fell while seniors rose. Reduced hiring, not layoffs. Preprint. 【Reliability: medium (preprint; large-scale difference analysis)】 ↩︎ ↩︎2
Tracking the Impact of AI on the Labor Market - Yale Budget Lab (2026). No clear correlation between AI exposure and employment confirmed; “no evidence of large-scale labor-market disruption (speculative at this point).” Note the confound: the tech-hiring decline overlaps with the end of ZIRP and rate hikes. 【Reliability: medium–high】 ↩︎ ↩︎2
The Looming Junior Developer Pipeline Crisis - InfoQ summary / Russinovich & Hanselman, Communications of the ACM (2026). Recommends continuing to hire and develop young talent even at the cost of an initial productivity dip (peer opinion). 【Reliability: medium】 ↩︎