AI Vibe Coding vs. Writing Code by Hand: Sorting Out the Productivity-Growth Trade-off for Junior Engineers, with Data

Posted Apr 13, 2026

13 min read

AI-Generated Content

This article was generated by AI. The accuracy of the content is not guaranteed, and we accept no responsibility for any damages resulting from use of this article. By continuing to read, you agree to the Terms of Use.

Target audience: Junior engineers and engineering managers navigating AI coding strategy
Prerequisites: Basic experience with GitHub Copilot, Cursor, Claude Code, etc.
Reading time: 15 minutes

Overview

“Is writing code by hand a stupid thing to do in 2026?” This question keeps resurfacing on developer social media, and there is no clean answer. The productivity camp points to MIT’s RCT (+27-39% for less-experienced developers)¹ and argues that going all-in on AI is the only rational move. The growth camp points to Anthropic’s RCT (17-point drop in conceptual understanding, Cohen’s d=0.738)² and warns that skill formation is being quietly destroyed. Both datasets are real, and neither can be ignored.

This article does not take a side. Instead, it lays out the major data points available as of April 2026, organizes them by camp, and makes the trade-off structure visible. From there, we offer decision frames you can apply based on your own career stage and priorities.

Two companion articles take explicit positions on the same question—a productivity-first guide and a growth-first guide. Reading this piece alongside them should give you the raw material to form your own answer.

The Two Camps and Their Key Data

Camp A: Productivity First—“Juniors Especially Should Use AI”

The central evidence here is Demirer et al. (MIT Sloan, 2024), a family of RCTs¹. Across Microsoft, Accenture, and an anonymous Fortune 100 electronics manufacturer, GitHub Copilot was randomly assigned to a total of 4,867 developers, with weekly tasks completed as the outcome measure.

Overall average: +26%
Less-experienced developers: +27-39%
Experienced developers: +8-13%

This is one of the largest-sample AI productivity RCTs ever run. The “AI lifts juniors more” pattern also shows up in Kazemitabaar et al. at CHI 2023 (69 participants aged 10-17): 1.15x completion rate on code creation, 1.8x on scored tasks³. No negative effect on handwritten modification tasks, and a slightly (non-significant) better retention test one week later.

Market data since 2025 reinforces the story. In Stack Overflow Developer Survey 2025, 44% of learning developers used AI tools (up from 37% in 2024)⁴. JetBrains State of Developer Ecosystem 2025 (n=24,534) found that 85% of developers use AI regularly and 68% expect “AI proficiency will become a job requirement”⁵.

Camp B: Growth First—“Leaning on AI Makes Your Understanding Shallow”

The central evidence here is the Anthropic 2026 RCT (Shen & Tamkin)². 52 software engineers (mostly juniors) were randomly assigned to AI-assisted or hand-coding conditions while learning a new Python library, Trio.

Quiz score: AI group 50% vs. hand-coding group 67% (17-point gap, Cohen’s d=0.738, p=0.01)
Gap is largest on debugging
Productivity gain: not statistically significant

“You don’t get faster, and you understand less.” That is the harshest reading of this RCT.

But a second critical finding comes from the cluster analysis. The highest-scoring group was the “conceptual inquiry only” cluster (n=7)—engineers who used AI as a conversational partner for concepts rather than a code generator. It wasn’t that all AI users did worse; how you use AI is what splits the outcomes². This becomes the core of the hybrid approach proposed later.

A supporting data point is Prather et al. (ICER 2024, 21 participants, observation + eye-tracking)⁶. They document three metacognitive difficulties—“illusion of competence” and the “Interruption / Mislead / Progression” pattern—and show a bimodal split under AI assistance: some students accelerate, others stall.

In Japan, Kawamura & Uchida (Nara KOSEN, 2025) found that AI groups finished tasks faster with lower variance, but no significant difference in conceptual test scores, and flagged that “AI may reduce opportunities for thinking and exploration”⁷. On the cognitive science side, Gerlich (2025, n=666) reported a correlation of r=+0.72 between AI use and cognitive offloading, and r=-0.75 between offloading and critical thinking. Younger users showed higher AI dependence and lower critical thinking scores⁸.

The Structure That Emerges When You Line the Data Up

Put the major studies side by side and an asymmetry between productivity and understanding comes into focus.

Study	Subjects	Key finding	Caveat
MIT Sloan 2024¹	4,867 developers (RCT + staggered rollout)	+27-39% productivity for juniors, +8-13% for veterans	Code quality not measured
CHI 2023 Kazemitabaar³	69 novice learners aged 10-17	1.15x completion, 1.8x score. No harm on handwritten modification	Generalizing to adult juniors requires care
Anthropic 2026²	52 engineers (mostly junior)	17-point drop in understanding (Cohen’s d=0.738, large effect). No significant productivity gain	Specific to learning a new library
ICER 2024 Prather⁶	21 students	Documents “illusion of competence”. Bimodal outcomes	Observational; limited causal inference
Nara KOSEN 2025⁷	KOSEN students	Less time, no difference in understanding	Small sample
METR 2025⁹	16 veteran OSS developers	19% slowdown with AI, but developers thought they were 20% faster	Veterans, not juniors
Gerlich 2025⁸	666 general workers	Offloading and critical thinking r=-0.75	Correlational; causation unclear

Staring at this table, two patterns emerge.

flowchart TB
    A["What you measure"] --> B["Tasks completed<br>(quantitative productivity)"]
    A --> C["Understanding & debugging<br>(qualitative skill)"]

    B --> D["MIT Sloan et al.<br>Juniors gain most"]
    C --> E["Anthropic et al.<br>Juniors lose most"]

    D --> F["Looks positive on<br>short-term business metrics"]
    E --> G["Review load, bug rate,<br>maintainability in 3-5 years"]

    classDef pos stroke:#2ea44f,stroke-width:3px
    classDef neg stroke:#cf222e,stroke-width:3px
    class D,F pos
    class E,G neg

Measured by quantity, juniors-with-AI is strong. Measured by quality, juniors-with-AI is dangerous. That is the structure that lets both datasets be true simultaneously. Both are real, neither can be dismissed.

Clearing Up a Myth: “Vibe Coding Makes You 3x Faster”

A major source of confusion in this debate is the “vibe coding is 3x / 5x faster” claim. When you trace it to primary sources, the number is not backed by RCTs.

Most of the figures in circulation come from self-reported surveys. Bubble’s 2025 State of Visual Development shows user self-reports of “10x or more: 23.5%”, “5-10x: 16.7%”, and “3-5x: 19.1%”¹⁰. But there is no control group, the sample is specific to Bubble users, and causal inference is impossible.

Rigorous RCTs paint a different picture.

MIT Sloan 2024 (RCT): juniors +27-39%, overall +26%¹
METR 2025 (RCT, veterans): 19% slowdown, developers thought they were 20% faster⁹
Anthropic 2026 (RCT, juniors): productivity gain not statistically significant²

What RCTs actually show is a modest picture: “used well, tens of percent gain; used poorly, a slowdown.” When you see “3x” or “5x”, check whether the primary source is an RCT or a self-reported survey. That is the basic literacy move for 2026.

The most striking finding in the METR study is that developers misjudge their own speed⁹. Perceived speed drifts away from measured speed. A junior who feels they are “writing at crazy speed with AI” may be writing at roughly the same pace, or slower, when actually measured. The Anthropic study documents the same illusion on the understanding side, calling it “illusion of competence”².

Three Frames for Making the Call in Practice

Organizing the data alone won’t decide it for you. Apply these three frames to your own context.

Frame 1: Your Career Time Horizon

Optimizing for the next year and optimizing for the next 5-10 years produce different answers.

Short-term (next review, next project): productivity camp. Capture the 27-39% from the MIT study.
Medium-term (next job switch, next promotion): both matter. Hiring teams evaluate “can use AI” AND “has the fundamentals”.
Long-term (10-year career): lean growth camp. Debugging skill, code reading, and design judgment only compound through handwritten practice.

If you are in your early 20s with 30-40 working years ahead, the compounding from fundamentals beats short-term productivity. If you are already in your 30s and need immediate impact in your current role, prioritizing productivity is reasonable.

Frame 2: The Nature of the Task

Not all tasks are equal. New-material learning vs. known territory, maintenance vs. greenfield all shift the optimum.

flowchart TB
    A["Task nature"] --> B["New learning"]
    A --> C["Known work"]
    A --> D["Maintenance"]
    B --> E["Hand + AI Q&A"]
    C --> F["Full AI use"]
    D --> G["Hand-first"]

    classDef hand stroke:#2ea44f,stroke-width:3px
    classDef ai stroke:#6366f1,stroke-width:3px
    class E,G hand
    class F ai

It matters that the Anthropic RCT produced its 17-point understanding gap on a new-library learning task². That result does not necessarily transfer to “writing CRUD in a stack you already know.” Hand-first when you are stepping into unknown territory; AI-heavy in territory you have already mastered—that kind of split is rational.

Frame 3: The Reality of Your Evaluation Environment

Look clearly at what actually gets rewarded in your environment.

Startup, zero-to-one phase, solo: a working product is justice. Productivity camp is rational.
Large company, legacy maintenance: debugging and reading skill are the evaluation axes. Growth camp is rational.
Preparing for an engineering interview cycle: on-site interviews are increasingly AI-restricted. Handwriting skill is required.
Already trusted, long-term employment: pick based on your own preferences and learning philosophy.

There is no universal answer of “using AI is smarter” or “writing by hand is smarter.” What gets rewarded in your environment, and what kind of engineer you want to be in 3-5 years—that’s what decides it.

A Concrete Hybrid: Start with 70/30

You don’t have to commit fully to one camp. As a practical rule of thumb that holds both camps’ claims together, try 70% hand / 30% AI for new learning, and 70% AI / 30% hand for known territory.

Context	Hand-coding share	What AI is for
Learning a new library or framework	70%	Concept questions, error-message interpretation
Routine implementation in known tech	30%	Boilerplate generation, test generation
Debugging	80%	Last-resort hint questions
Code review / refactoring	50%	Generating alternatives, surfacing angles
Tech selection / design decisions	90%	Brainstorming comparison axes

This split aims for the midpoint between how the top cluster in the Anthropic RCT used AI (“conceptual inquiry only”)² and the productivity gains MIT documented¹. In Bjork & Bjork’s “desirable difficulties” framework¹¹, the goal is to keep the necessary cognitive load while offloading only the unproductive load to AI. The numbers themselves are not a rigorously optimal split; they are a starting point that you recalibrate every three months using the self-assessment below. In practice, many engineers will want to start at 80/20 for new learning and gradually reduce the hand-share, or start at 50/50 for known territory and gradually raise the AI-share.

Re-evaluate the mix every three months. Is your debugging getting faster? Can you explain more of the code you write? Is the amount you can write without AI growing? You have to collect this data on yourself—nobody else will.

Summary: “Stupid” Depends Entirely on How You Use It

As of April 2026, the data points to this answer.

Full delegation to AI is dangerous for juniors: Anthropic’s 17 points, ICER’s “illusion of competence”, and Gerlich’s cognitive offloading all point in the same direction.
Full rejection of AI is inefficient for juniors: MIT’s 27-39% and CHI 2023’s 1.15x completion rate show the short-term productivity benefit.
Juniors who use AI in conceptual-inquiry mode do best: this is the usage pattern of the top-scoring cluster in the Anthropic RCT.

The answer to “is writing code by hand a stupid thing to do?” is “it depends on how you use AI.” Refusing AI entirely is a short-term disadvantage. Handing everything to AI is a long-term liability. The sweet spot is a third mode: write it yourself, with AI as a conversational partner.

For more concrete practice, the productivity-first guide lays out four principles, and the growth-first guide lays out a four-step procedure. Pick based on your priorities, and revisit the ratio every three months.

Related reading that fills out the 2026 picture of AI and skill formation: why vibe coding fails for experts, the AI deskilling paradox, and AI as a “skill equalizer”.

Footnotes

Demirer, M., Cui, Z., Musolff, L., Jaffe, S., Peng, S., & Salz, T. (2024). “The Effects of Generative AI on High Skilled Work: Evidence from Three Field Experiments with Software Developers.” SSRN Working Paper ID 4945566. Article version: MIT Sloan, November 4, 2024. RCT + staggered rollout across three companies, n=4,867. Overall +26%, juniors +27-39%, veterans +8-13%. ↩︎ ↩︎² ↩︎³ ↩︎⁴ ↩︎⁵
Shen, J. H., & Tamkin, A. (2026). “How AI assistance impacts the formation of coding skills.” Anthropic, published January 29, 2026. 52 engineers (mostly junior), learning a new Python library (Trio) in an RCT. Quiz scores: AI group 50% vs. hand-coding group 67% (Cohen’s d=0.738, p=0.01). No significant productivity gain. The top-scoring cluster (65%+) was the “conceptual inquiry only” group, n=7. https://www.anthropic.com/research/AI-assistance-coding-skills, paper: arXiv:2601.20245 ↩︎ ↩︎² ↩︎³ ↩︎⁴ ↩︎⁵ ↩︎⁶ ↩︎⁷ ↩︎⁸
Kazemitabaar, M., Chow, J., Ma, C. K. T., Ericson, B. J., Weintrop, D., & Grossman, T. (2023). “Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming.” CHI 2023. 69 participants aged 10-17. 1.15x completion rate, 1.8x scores, no negative effect on handwritten modification tasks. https://arxiv.org/abs/2302.07427 ↩︎ ↩︎²
Stack Overflow. (2025). “2025 Developer Survey: AI.” 44% of learning developers use AI tools. https://survey.stackoverflow.co/2025/ai ↩︎
JetBrains. (2025). “The State of Developer Ecosystem 2025.” n=24,534 across 194 countries. 85% use AI regularly, 68% expect “AI proficiency will become a job requirement.” https://devecosystem-2025.jetbrains.com/artificial-intelligence ↩︎
Prather, J., et al. (2024). “The Widening Gap: The Benefits and Harms of Generative AI for Novice Programmers.” ICER ‘24. 21 participants, observation + eye-tracking. Documents “illusion of competence” and the “Interruption / Mislead / Progression” pattern. https://arxiv.org/abs/2405.17739 ↩︎ ↩︎²
Kawamura, T. & Uchida, S. (2025). “The Impact of Generative AI Programming on Learning Outcomes.” Nara National College of Technology (KOSEN). AI groups finished faster with less variance but no difference in conceptual understanding. https://www.jsise.org/wp-content/uploads/2025/02/2024_kansai_p09.pdf ↩︎ ↩︎²
Gerlich, M. (2025). “AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking.” Societies, 15(1), 6. n=666. AI use and cognitive offloading r=+0.72; offloading and critical thinking r=-0.75. https://www.mdpi.com/2075-4698/15/1/6. A correction to Table 4 was published in September 2025 as Societies 15(9), 252; the author states that the scientific conclusions are unchanged. ↩︎ ↩︎²
METR. (July 2025). “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity.” RCT with 16 developers and 246 tasks. 19% slowdown when using AI, though developers believed they were 20% faster. https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ ↩︎ ↩︎² ↩︎³
Bubble. (2025). “2025 State of Visual Development and AI App Building.” User self-report survey with figures such as “10x or more: 23.5%” and “5-10x: 16.7%”. No control group. https://bubble.io/blog/2025-state-of-visual-development-ai-app-building/ ↩︎
Bjork, E. L., & Bjork, R. A. (2011). “Making Things Hard on Yourself, But in a Good Way: Creating Desirable Difficulties to Enhance Learning.” UCLA Bjork Learning and Forgetting Lab. https://bjorklab.psych.ucla.edu/wp-content/uploads/sites/13/2016/04/EBjork_RBjork_2011.pdf ↩︎

AI・Technology

AI Vibe-Coding Productivity Skill-Formation Junior-Developer Trade-off

This post is licensed under CC BY 4.0 by the author.