Post
JA EN

AI Vibe Coding vs. Writing Code by Hand: Sorting Out the Productivity-Growth Trade-off for Junior Engineers, with Data

AI Vibe Coding vs. Writing Code by Hand: Sorting Out the Productivity-Growth Trade-off for Junior Engineers, with Data
  • Target audience: Junior engineers and engineering managers navigating AI coding strategy
  • Prerequisites: Basic experience with GitHub Copilot, Cursor, Claude Code, etc.
  • Reading time: 15 minutes

Overview

“Is writing code by hand a stupid thing to do in 2026?” This question keeps resurfacing on developer social media, and there is no clean answer. The productivity camp points to MIT’s RCT (+27-39% for less-experienced developers)1 and argues that going all-in on AI is the only rational move. The growth camp points to Anthropic’s RCT (17-point drop in conceptual understanding, Cohen’s d=0.738)2 and warns that skill formation is being quietly destroyed. Both datasets are real, and neither can be ignored.

This article does not take a side. Instead, it lays out the major data points available as of April 2026, organizes them by camp, and makes the trade-off structure visible. From there, we offer decision frames you can apply based on your own career stage and priorities.

Two companion articles take explicit positions on the same question—a productivity-first guide and a growth-first guide. Reading this piece alongside them should give you the raw material to form your own answer.

The Two Camps and Their Key Data

Camp A: Productivity First—“Juniors Especially Should Use AI”

The central evidence here is Demirer et al. (MIT Sloan, 2024), a family of RCTs1. Across Microsoft, Accenture, and an anonymous Fortune 100 electronics manufacturer, GitHub Copilot was randomly assigned to a total of 4,867 developers, with weekly tasks completed as the outcome measure.

  • Overall average: +26%
  • Less-experienced developers: +27-39%
  • Experienced developers: +8-13%

This is one of the largest-sample AI productivity RCTs ever run. The “AI lifts juniors more” pattern also shows up in Kazemitabaar et al. at CHI 2023 (69 participants aged 10-17): 1.15x completion rate on code creation, 1.8x on scored tasks3. No negative effect on handwritten modification tasks, and a slightly (non-significant) better retention test one week later.

Market data since 2025 reinforces the story. In Stack Overflow Developer Survey 2025, 44% of learning developers used AI tools (up from 37% in 2024)4. JetBrains State of Developer Ecosystem 2025 (n=24,534) found that 85% of developers use AI regularly and 68% expect “AI proficiency will become a job requirement”5.

Camp B: Growth First—“Leaning on AI Makes Your Understanding Shallow”

The central evidence here is the Anthropic 2026 RCT (Shen & Tamkin)2. 52 software engineers (mostly juniors) were randomly assigned to AI-assisted or hand-coding conditions while learning a new Python library, Trio.

  • Quiz score: AI group 50% vs. hand-coding group 67% (17-point gap, Cohen’s d=0.738, p=0.01)
  • Gap is largest on debugging
  • Productivity gain: not statistically significant

“You don’t get faster, and you understand less.” That is the harshest reading of this RCT.

But a second critical finding comes from the cluster analysis. The highest-scoring group was the “conceptual inquiry only” cluster (n=7)—engineers who used AI as a conversational partner for concepts rather than a code generator. It wasn’t that all AI users did worse; how you use AI is what splits the outcomes2. This becomes the core of the hybrid approach proposed later.

A supporting data point is Prather et al. (ICER 2024, 21 participants, observation + eye-tracking)6. They document three metacognitive difficulties—“illusion of competence” and the “Interruption / Mislead / Progression” pattern—and show a bimodal split under AI assistance: some students accelerate, others stall.

In Japan, Kawamura & Uchida (Nara KOSEN, 2025) found that AI groups finished tasks faster with lower variance, but no significant difference in conceptual test scores, and flagged that “AI may reduce opportunities for thinking and exploration”7. On the cognitive science side, Gerlich (2025, n=666) reported a correlation of r=+0.72 between AI use and cognitive offloading, and r=-0.75 between offloading and critical thinking. Younger users showed higher AI dependence and lower critical thinking scores8.

The Structure That Emerges When You Line the Data Up

Put the major studies side by side and an asymmetry between productivity and understanding comes into focus.

StudySubjectsKey findingCaveat
MIT Sloan 202414,867 developers (RCT + staggered rollout)+27-39% productivity for juniors, +8-13% for veteransCode quality not measured
CHI 2023 Kazemitabaar369 novice learners aged 10-171.15x completion, 1.8x score. No harm on handwritten modificationGeneralizing to adult juniors requires care
Anthropic 2026252 engineers (mostly junior)17-point drop in understanding (Cohen’s d=0.738, large effect). No significant productivity gainSpecific to learning a new library
ICER 2024 Prather621 studentsDocuments “illusion of competence”. Bimodal outcomesObservational; limited causal inference
Nara KOSEN 20257KOSEN studentsLess time, no difference in understandingSmall sample
METR 2025916 veteran OSS developers19% slowdown with AI, but developers thought they were 20% fasterVeterans, not juniors
Gerlich 20258666 general workersOffloading and critical thinking r=-0.75Correlational; causation unclear

Staring at this table, two patterns emerge.

flowchart TB
    A["What you measure"] --> B["Tasks completed<br>(quantitative productivity)"]
    A --> C["Understanding & debugging<br>(qualitative skill)"]

    B --> D["MIT Sloan et al.<br>Juniors gain most"]
    C --> E["Anthropic et al.<br>Juniors lose most"]

    D --> F["Looks positive on<br>short-term business metrics"]
    E --> G["Review load, bug rate,<br>maintainability in 3-5 years"]

    classDef pos stroke:#2ea44f,stroke-width:3px
    classDef neg stroke:#cf222e,stroke-width:3px
    class D,F pos
    class E,G neg

Measured by quantity, juniors-with-AI is strong. Measured by quality, juniors-with-AI is dangerous. That is the structure that lets both datasets be true simultaneously. Both are real, neither can be dismissed.

Clearing Up a Myth: “Vibe Coding Makes You 3x Faster”

A major source of confusion in this debate is the “vibe coding is 3x / 5x faster” claim. When you trace it to primary sources, the number is not backed by RCTs.

Most of the figures in circulation come from self-reported surveys. Bubble’s 2025 State of Visual Development shows user self-reports of “10x or more: 23.5%”, “5-10x: 16.7%”, and “3-5x: 19.1%”10. But there is no control group, the sample is specific to Bubble users, and causal inference is impossible.

Rigorous RCTs paint a different picture.

  • MIT Sloan 2024 (RCT): juniors +27-39%, overall +26%1
  • METR 2025 (RCT, veterans): 19% slowdown, developers thought they were 20% faster9
  • Anthropic 2026 (RCT, juniors): productivity gain not statistically significant2

What RCTs actually show is a modest picture: “used well, tens of percent gain; used poorly, a slowdown.” When you see “3x” or “5x”, check whether the primary source is an RCT or a self-reported survey. That is the basic literacy move for 2026.

The most striking finding in the METR study is that developers misjudge their own speed9. Perceived speed drifts away from measured speed. A junior who feels they are “writing at crazy speed with AI” may be writing at roughly the same pace, or slower, when actually measured. The Anthropic study documents the same illusion on the understanding side, calling it “illusion of competence”2.

Three Frames for Making the Call in Practice

Organizing the data alone won’t decide it for you. Apply these three frames to your own context.

Frame 1: Your Career Time Horizon

Optimizing for the next year and optimizing for the next 5-10 years produce different answers.

  • Short-term (next review, next project): productivity camp. Capture the 27-39% from the MIT study.
  • Medium-term (next job switch, next promotion): both matter. Hiring teams evaluate “can use AI” AND “has the fundamentals”.
  • Long-term (10-year career): lean growth camp. Debugging skill, code reading, and design judgment only compound through handwritten practice.

If you are in your early 20s with 30-40 working years ahead, the compounding from fundamentals beats short-term productivity. If you are already in your 30s and need immediate impact in your current role, prioritizing productivity is reasonable.

Frame 2: The Nature of the Task

Not all tasks are equal. New-material learning vs. known territory, maintenance vs. greenfield all shift the optimum.

flowchart TB
    A["Task nature"] --> B["New learning"]
    A --> C["Known work"]
    A --> D["Maintenance"]
    B --> E["Hand + AI Q&A"]
    C --> F["Full AI use"]
    D --> G["Hand-first"]

    classDef hand stroke:#2ea44f,stroke-width:3px
    classDef ai stroke:#6366f1,stroke-width:3px
    class E,G hand
    class F ai

It matters that the Anthropic RCT produced its 17-point understanding gap on a new-library learning task2. That result does not necessarily transfer to “writing CRUD in a stack you already know.” Hand-first when you are stepping into unknown territory; AI-heavy in territory you have already mastered—that kind of split is rational.

Frame 3: The Reality of Your Evaluation Environment

Look clearly at what actually gets rewarded in your environment.

  • Startup, zero-to-one phase, solo: a working product is justice. Productivity camp is rational.
  • Large company, legacy maintenance: debugging and reading skill are the evaluation axes. Growth camp is rational.
  • Preparing for an engineering interview cycle: on-site interviews are increasingly AI-restricted. Handwriting skill is required.
  • Already trusted, long-term employment: pick based on your own preferences and learning philosophy.

There is no universal answer of “using AI is smarter” or “writing by hand is smarter.” What gets rewarded in your environment, and what kind of engineer you want to be in 3-5 years—that’s what decides it.

A Concrete Hybrid: Start with 70/30

You don’t have to commit fully to one camp. As a practical rule of thumb that holds both camps’ claims together, try 70% hand / 30% AI for new learning, and 70% AI / 30% hand for known territory.

ContextHand-coding shareWhat AI is for
Learning a new library or framework70%Concept questions, error-message interpretation
Routine implementation in known tech30%Boilerplate generation, test generation
Debugging80%Last-resort hint questions
Code review / refactoring50%Generating alternatives, surfacing angles
Tech selection / design decisions90%Brainstorming comparison axes

This split aims for the midpoint between how the top cluster in the Anthropic RCT used AI (“conceptual inquiry only”)2 and the productivity gains MIT documented1. In Bjork & Bjork’s “desirable difficulties” framework11, the goal is to keep the necessary cognitive load while offloading only the unproductive load to AI. The numbers themselves are not a rigorously optimal split; they are a starting point that you recalibrate every three months using the self-assessment below. In practice, many engineers will want to start at 80/20 for new learning and gradually reduce the hand-share, or start at 50/50 for known territory and gradually raise the AI-share.

Re-evaluate the mix every three months. Is your debugging getting faster? Can you explain more of the code you write? Is the amount you can write without AI growing? You have to collect this data on yourself—nobody else will.

Summary: “Stupid” Depends Entirely on How You Use It

As of April 2026, the data points to this answer.

  • Full delegation to AI is dangerous for juniors: Anthropic’s 17 points, ICER’s “illusion of competence”, and Gerlich’s cognitive offloading all point in the same direction.
  • Full rejection of AI is inefficient for juniors: MIT’s 27-39% and CHI 2023’s 1.15x completion rate show the short-term productivity benefit.
  • Juniors who use AI in conceptual-inquiry mode do best: this is the usage pattern of the top-scoring cluster in the Anthropic RCT.

The answer to “is writing code by hand a stupid thing to do?” is “it depends on how you use AI.” Refusing AI entirely is a short-term disadvantage. Handing everything to AI is a long-term liability. The sweet spot is a third mode: write it yourself, with AI as a conversational partner.

For more concrete practice, the productivity-first guide lays out four principles, and the growth-first guide lays out a four-step procedure. Pick based on your priorities, and revisit the ratio every three months.

Related reading that fills out the 2026 picture of AI and skill formation: why vibe coding fails for experts, the AI deskilling paradox, and AI as a “skill equalizer”.

Footnotes

  1. Demirer, M., Cui, Z., Musolff, L., Jaffe, S., Peng, S., & Salz, T. (2024). “The Effects of Generative AI on High Skilled Work: Evidence from Three Field Experiments with Software Developers.” SSRN Working Paper ID 4945566. Article version: MIT Sloan, November 4, 2024. RCT + staggered rollout across three companies, n=4,867. Overall +26%, juniors +27-39%, veterans +8-13%. ↩︎ ↩︎2 ↩︎3 ↩︎4 ↩︎5

  2. Shen, J. H., & Tamkin, A. (2026). “How AI assistance impacts the formation of coding skills.” Anthropic, published January 29, 2026. 52 engineers (mostly junior), learning a new Python library (Trio) in an RCT. Quiz scores: AI group 50% vs. hand-coding group 67% (Cohen’s d=0.738, p=0.01). No significant productivity gain. The top-scoring cluster (65%+) was the “conceptual inquiry only” group, n=7. https://www.anthropic.com/research/AI-assistance-coding-skills, paper: arXiv:2601.20245 ↩︎ ↩︎2 ↩︎3 ↩︎4 ↩︎5 ↩︎6 ↩︎7 ↩︎8

  3. Kazemitabaar, M., Chow, J., Ma, C. K. T., Ericson, B. J., Weintrop, D., & Grossman, T. (2023). “Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming.” CHI 2023. 69 participants aged 10-17. 1.15x completion rate, 1.8x scores, no negative effect on handwritten modification tasks. https://arxiv.org/abs/2302.07427 ↩︎ ↩︎2

  4. Stack Overflow. (2025). “2025 Developer Survey: AI.” 44% of learning developers use AI tools. https://survey.stackoverflow.co/2025/ai ↩︎

  5. JetBrains. (2025). “The State of Developer Ecosystem 2025.” n=24,534 across 194 countries. 85% use AI regularly, 68% expect “AI proficiency will become a job requirement.” https://devecosystem-2025.jetbrains.com/artificial-intelligence ↩︎

  6. Prather, J., et al. (2024). “The Widening Gap: The Benefits and Harms of Generative AI for Novice Programmers.” ICER ‘24. 21 participants, observation + eye-tracking. Documents “illusion of competence” and the “Interruption / Mislead / Progression” pattern. https://arxiv.org/abs/2405.17739 ↩︎ ↩︎2

  7. Kawamura, T. & Uchida, S. (2025). “The Impact of Generative AI Programming on Learning Outcomes.” Nara National College of Technology (KOSEN). AI groups finished faster with less variance but no difference in conceptual understanding. https://www.jsise.org/wp-content/uploads/2025/02/2024_kansai_p09.pdf ↩︎ ↩︎2

  8. Gerlich, M. (2025). “AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking.” Societies, 15(1), 6. n=666. AI use and cognitive offloading r=+0.72; offloading and critical thinking r=-0.75. https://www.mdpi.com/2075-4698/15/1/6. A correction to Table 4 was published in September 2025 as Societies 15(9), 252; the author states that the scientific conclusions are unchanged. ↩︎ ↩︎2

  9. METR. (July 2025). “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity.” RCT with 16 developers and 246 tasks. 19% slowdown when using AI, though developers believed they were 20% faster. https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ ↩︎ ↩︎2 ↩︎3

  10. Bubble. (2025). “2025 State of Visual Development and AI App Building.” User self-report survey with figures such as “10x or more: 23.5%” and “5-10x: 16.7%”. No control group. https://bubble.io/blog/2025-state-of-visual-development-ai-app-building/ ↩︎

  11. Bjork, E. L., & Bjork, R. A. (2011). “Making Things Hard on Yourself, But in a Good Way: Creating Desirable Difficulties to Enhance Learning.” UCLA Bjork Learning and Forgetting Lab. https://bjorklab.psych.ucla.edu/wp-content/uploads/sites/13/2016/04/EBjork_RBjork_2011.pdf ↩︎

This post is licensed under CC BY 4.0 by the author.