Post
JA EN

Only Those with Strong Metacognition Become More Creative with AI — The Cognitive Science Behind the Widening AI Divide

Only Those with Strong Metacognition Become More Creative with AI — The Cognitive Science Behind the Widening AI Divide
  • Target audience: IT engineers who use AI tools in their daily work
  • Prerequisites: Basic hands-on experience with AI tools such as ChatGPT, GitHub Copilot, etc.
  • Reading time: 18 minutes

Overview

“AI democratizes creativity” — research is pouring cold water on this optimistic view. In a field experiment involving 250 employees, only those with high metacognitive ability (the capacity to plan, monitor, and refine) showed improved creativity when using AI1. But does the binary distinction of “high vs. low metacognition” that the research presents fully capture what happens in practice? Many people who effectively use AI don’t personally handle all three components of metacognition — they delegate some to AI while exercising their judgment at other stages. This article examines the importance of metacognition that research demonstrates, while digging deeper into the practical question of where human judgment enters the process.

Is “Anyone Can Be Creative with AI” Really True?

As AI tools have proliferated, so has the narrative that “AI democratizes creativity.” The vision: anyone can simply prompt an AI and produce professional-grade writing, designs, or code.

However, according to a Gallup survey cited in an HBR article, only 26% of employees using generative AI reported improved creativity1. In other words, three out of four people using AI do not feel they are benefiting from enhanced creativity.

What this number suggests is that AI’s impact on creativity is far from uniform — there are substantial individual differences in who benefits. So what determines the difference?

Study Design

The study by Lu, Sun, Li, Foo, & Zhou (2026) was a field experiment conducted at a technology consulting firm in China1. 250 employees were randomly assigned to two groups: one used ChatGPT for their work while the other did not. After a one-week observation period, each employee’s creativity was evaluated by both their supervisor and external raters.

What Is Metacognition?

Metacognition is the ability to “think about thinking,” and it consists of three components:

flowchart TB
    M["Metacognition"]
    P["Planning<br>Thinking through task<br>steps in advance"]
    Mo["Monitoring<br>Tracking whether your<br>approach is working"]
    R["Refining<br>Adjusting methods when<br>you notice lack of progress"]
    M --> P
    M --> Mo
    M --> R

In a software engineering context:

  • Planning: Thinking ahead about what to ask AI and how to decompose the problem
  • Monitoring: Continuously evaluating whether AI output aligns with your objectives
  • Refining: Adjusting prompts or strategies when results fall short of expectations

Key Findings

The results were unambiguous:

Employees with high metacognitive ability showed significantly improved creativity when using AI. Both supervisors and external raters judged their ideas as more novel and more useful. In contrast, employees with low metacognitive ability showed no creativity gains from AI use1.

Why Metacognition Matters — Two Mechanisms

The study identified two mechanisms through which AI enhances creativity1:

1. Information and Knowledge Expansion

AI enables rapid access to diverse information. However, creatively leveraging that information as “novel combinations” requires the ability to ask “How does this relate to my problem?” and “What other angles could I explore?” — in other words, metacognition.

When metacognitive ability is low, people tend to accept AI output at face value, and no genuine information expansion occurs.

2. Freeing Up Cognitive Capacity

AI reduces cognitive load by handling routine tasks like summarizing, drafting, and research. The freed-up cognitive capacity should then be directed toward more complex problem-solving.

But metacognition is the key here too. Whether that freed capacity goes toward “deeper thinking” or “intellectual disengagement” depends on metacognitive ability. People with high metacognition invest the freed capacity into creative thinking, while those with low metacognition become more dependent on AI and increasingly passive.

Limitations of This Study

This study has several limitations. The one-week observation period is too short to reveal long-term effects. The sample is limited to a single company in China, leaving cultural and industry variations untested. While the sample size (n=250) provides adequate statistical power, larger-scale replications are needed.

Benefits Differ Between Experts and Novices — A Human-AI Co-Creation Experiment

Beyond metacognition, other factors influence who benefits from AI. Wang, Kim, Peng, & Wang (2025) examined how differences in design experience affect the outcomes of AI co-creation2.

Study Design

They conducted a 2×2 factorial experiment (design process × experience level), comparing a traditional creative design process (TCDP) with a human-AI co-creative design process (HAI-CDP). Creativity was measured across five dimensions: novelty, elaboration, quality, number of ideas, and perceived creative support capability.

Key Findings

Experience LevelPrimary AI BenefitMechanism
NovicesIncreased quantity of ideasProviding starting points for ideation
ExpertsImproved quality and elaboration of outputAmplifying existing skills

For novices, AI helps escape the “blank page” state of not knowing where to begin. But without the ability to evaluate, select, and refine the generated ideas — precisely metacognitive skills — quantity does not convert into quality.

Experts can critically evaluate AI output and combine it with their own knowledge to refine it. The result: the reality that “AI + novice ≠ expert” emerges clearly2. AI does not close the skills gap — it amplifies existing skills.

AI Is an Amplifier

Synthesizing these studies reveals a clear pattern: AI functions not as an equalizer but as an amplifier.

A study by Fügener, Walzner, & Gupta (2025), published in Management Science, also classifies AI’s role into “automation” and “augmentation,” noting that augmentation only works when human capability is a prerequisite3. When AI plays a complementary rather than substitutive role, its effect is proportional to the human’s existing skill level.

The argument so far is straightforward: high metacognition yields AI benefits; low metacognition does not. But does this binary framing accurately capture what happens in practice?

The Gap Between Research and Practice — Metacognition Is Not All-or-Nothing

The Model Research Assumes

In the field experiment described above, metacognitive ability was treated as a pre-measured individual trait. The model: “high metacognition people” and “low metacognition people” exist, and this trait determines whether AI use succeeds or fails.

But is the reality of how people use AI in practice somewhat more nuanced?

What Actually Happens in Practice — Partial Delegation

Consider, for example, the following workflow:

  1. The human provides a brief directional instruction (something like “write code for this problem”)
  2. AI generates structured output (AI handles most of the planning)
  3. The human reviews the output and evaluates: “This perspective is missing” or “The design approach is off”
  4. AI makes corrections
  5. The human makes the final judgment on whether to adopt the output

In this workflow, most of the planning is delegated to AI, while human resources are concentrated on monitoring and refining. This differs from the research model that assumes “humans lead all three components,” but it is also clearly different from a “zero metacognition” pattern.

flowchart TB
    subgraph Model Assumed by Research
        direction TB
        A1["Human plans"] --> A2["Human instructs AI"] --> A3["Human evaluates"] --> A4["Human directs revisions"]
    end

    subgraph Common Practice in the Field
        direction TB
        B1["Human gives direction<br>(brief seed)"] --> B2["AI plans & structures"] --> B3["Human evaluates<br>'This perspective is missing'"] --> B4["AI revises"] --> B5["Human decides"]
    end

Reframing the Question

Given this reality, the question shifts from “Is metacognitive ability high or low?” to “Where in the process does human judgment enter?”

Research by Tankelevitch et al., presented at CHI 2024, systematically characterized the “metacognitive demands” that generative AI places on users, showing that metacognitive judgment is required at multiple stages: prompt creation, output evaluation, and iterative refinement4. Crucially, this research does not argue that “humans must handle every stage” — rather, it demonstrates that “human metacognitive judgment must be substantively functioning at least at some stage.”

The Distribution of Metacognition — Three Patterns

Integrating research findings with field realities, the ways metacognition figures into AI use can be classified into at least three patterns:

Pattern 1: Human Involvement at Every Stage

1
Human plans → Human instructs AI → Human evaluates → Human directs revisions

The ideal type assumed by research. Human metacognition functions at every stage, from problem structuring and AI instruction to output evaluation and strategy revision. High-quality output is expected, but the cognitive cost is also high.

Pattern 2: Delegate Planning, Focus on Evaluation and Refinement

1
Human sets direction → AI plans & executes → Human evaluates & refines → AI re-executes

Much of the planning stage is delegated to AI, while the human focuses on monitoring and refinement. The ability to judge “this perspective is missing” or “the structure is off” — that is, domain knowledge and evaluative skill — is a prerequisite. Planning metacognition is weak, but monitoring and refinement metacognition are functional.

Pattern 3: Full Delegation to AI

1
Human says "do it" → AI plans & executes → Human accepts as-is

Human judgment is substantively absent from planning, monitoring, and refinement alike. This corresponds to the state research classifies as “low metacognition,” where AI output is accepted without scrutiny. No creativity improvement can be expected.

The Essential Difference Between Patterns

There is a qualitative difference between Patterns 1 and 2. Pattern 1 deploys metacognition across all fronts, while Pattern 2 concentrates it on specific stages. But between Patterns 2 and 3, there is a more fundamental rupture: Pattern 2 involves human judgment-based intervention; Pattern 3 does not.

flowchart TB
    P1["Pattern 1<br>Involved at every stage"]
    P2["Pattern 2<br>Focused on evaluation<br>and refinement"]
    P3["Pattern 3<br>Full delegation to AI"]

    P1 --- |"Qualitative difference<br>(degree of focus)"| P2
    P2 --- |"Fundamental rupture<br>(presence vs. absence<br>of judgment)"| P3

    style P1 fill:#4a9,stroke:#333,color:#fff
    style P2 fill:#4a9,stroke:#333,color:#fff
    style P3 fill:#d55,stroke:#333,color:#fff

In short, the presence or absence of metacognition matters far more than its quantity. Humans do not need to handle all three components, but at some stage in the process, a human must be making substantive judgments.

The Matthew Effect — What Is Truly Dangerous?

“The Matthew Effect” — “For to every one who has will more be given, and he will have abundance; but from him who has not, even what he has will be taken away.” This concept, named by sociologist of science Robert K. Merton in 1968, is acquiring new significance in the age of AI.

Building on the discussion so far, the Matthew Effect in the AI era manifests not simply as a matter of “high vs. low metacognition” but as whether someone maintains human judgment somewhere in their process.

At the individual level: Even with partial delegation as in Pattern 2, the cycle of evaluation and refinement maintains and strengthens the human’s own judgment over time. In Pattern 3, however, dependence on AI deepens, creating a risk that evaluative ability itself atrophies12. This is a self-reinforcing loop — those who exercise judgment continue to sharpen it, while those who don’t continue to lose it.

At the organizational and societal level: These individual-level disparities ripple outward. Research in Behaviour & Information Technology identifies an “AI divide” between those who can sustainably benefit from AI and those who cannot5. The IMF’s 2026 report also notes that while AI-related skill shifts are pushing up employment and wages, they risk deepening polarization through the hollowing out of the middle tier6. What both studies suggest is that the source of inequality lies not in access to tools but in how they are used.

The Reality in Software Engineering

The discussion so far directly relates to how software engineers use AI. GitHub Copilot, Cursor, Claude Code — the way people engage with these tools maps neatly onto the three patterns described above.

Pattern 1 Example: Involved from the Design Stage

1
2
3
"Implement authentication. JWT-based with refresh token
rotation, using Redis for session management."
→ AI implements → Human verifies edge cases → Directs revisions

The human handles problem structuring, technology selection, and constraint specification. AI handles implementation.

Pattern 2 Example: Set Direction, Focus on Evaluation

1
2
3
"Implement authentication."
→ AI proposes design and implementation → "The refresh token handling
is weak — fix this." → AI revises → Human reviews and adopts

Planning is left to AI, but the human critically evaluates the output and identifies problems. Domain knowledge is what makes it possible to judge that something is “weak.”

Pattern 3 Example: Accept Output As-Is

1
2
"Implement authentication."
→ AI implements → Committed as-is

AI handles planning, and there is no evaluation. If it runs, it ships. Security issues and design trade-offs go unrecognized.

Is Pattern 2 “Being Lazy”?

Pattern 2 might at first glance appear inferior to Pattern 1. But that is not necessarily the case.

The condition for Pattern 2 to work is that the ability to evaluate output is sufficient. Even without crafting the plan yourself, if you can look at AI output and say “this is inadequate” or “this direction is wrong,” the quality of the final output is maintained.

Conversely, the risk of Pattern 2 materializes when AI is used in domains beyond one’s evaluative capacity. Cases where you judge AI output as “looks fine” in a technical area you don’t understand — this is effectively falling into Pattern 3 while believing you are in Pattern 2.

Can Metacognition Be Trained?

An important point the research makes is that metacognitive ability is not a fixed trait — it can be improved through training1. The authors of the HBR article propose interventions including short training modules, social-psychological exercises, and workflow design.

Additionally, a study by Schmidt, Dörner, & Bernholt (2025), published in the International Journal of Human-Computer Interaction, developed and validated a scale for “Collaborative AI Metacognition”7. This scale explained unique variance in predicting AI use effectiveness, independent of general metacognition. In other words, metacognitive skills specific to AI use exist and can be measured and cultivated.

What Specifically Should Be Trained?

However, the general exhortation to “train your metacognition” is probably insufficient. Drawing on the discussion thus far, what specifically needs training can be distilled into three points:

1. The ability to recognize your own process

Recognizing whether you are closer to Pattern 1, 2, or 3. This itself is metacognition. The ability to ask yourself: “Am I genuinely evaluating AI’s output right now, or am I just letting it pass?”

2. Evaluative ability — the capacity to say “no” to AI output

The prerequisite for Pattern 2 to function. Domain knowledge is what enables judgments like “this design is weak” or “this argument is flawed.” As Wang et al.’s research shows, experts benefit more from AI because they can critically evaluate and refine output2. The ability to leverage AI effectively is ultimately grounded in the ability to do the work without AI.

3. The judgment to pull back — knowing the limits of your evaluative capacity

The most frequently overlooked yet arguably most important form of metacognition. In domains where you cannot adequately evaluate, Pattern 2 collapses into Pattern 3. Can you recognize “I cannot judge AI output in this domain” and choose alternative means (consulting a human expert, learning the subject yourself, etc.)?

AI Adoption Does Not Benefit Everyone Equally

“Deploy AI tools and everyone’s productivity goes up” — this premise contradicts the research reviewed here. AI adoption without training risks widening rather than narrowing the gap.

Research by Sun, Hu, & Su (2026), published in Frontiers in Psychology, identified two psychological pathways through which AI collaboration enhances work engagement: the perception of “meaningful work” and “creative self-efficacy”8. However, the study also suggested that poorly implemented AI adoption can undermine engagement.

What organizations considering AI adoption need to think about is not how to operate the tools but “how to use the way you use them.” Specifically:

  • Embed processes that prompt questioning of AI output — “Why this approach?” and “What alternatives exist?” — into daily workflows
  • Create opportunities for engineers to reflect on their current AI usage pattern (which of Patterns 1 through 3 they most closely resemble)
  • Evaluate the effectiveness of AI use not only by “speed” but also by “quality of judgment” and “continued learning”

Conclusion

AI is not a universal equalizer — it is an amplifier of existing cognitive skills. As the field experiment with 250 participants demonstrated, only those with high metacognitive ability see creativity improvements from AI use1.

However, the binary of “high vs. low metacognition” does not fully capture what happens in practice. Many people who effectively use AI do not personally handle all three metacognitive components — they delegate parts of planning to AI while exercising judgment at the evaluation and refinement stages.

What truly matters is whether substantive human judgment exists somewhere in the process. When all three components are absent — when AI output is accepted as-is, without questioning, without revision — AI loses its function as an amplifier, and no creativity improvement occurs.

The question being asked is not “whether to use AI” or “whether metacognitive ability is high.” It is whether you are aware of where, in your own process, your own judgment enters. And that awareness is nothing other than the essence of metacognition itself.

See also these other articles on related themes:

References

References corresponding to in-text citation numbers are listed in numerical order below.

Additional References (not cited by number in the text)

  1. Why AI Boosts Creativity for Some Employees but Not Others - Lu, Sun, Li, Foo, & Zhou. Harvard Business Review (2026). Original paper: “How and For Whom Using Generative AI Affects Creativity: A Field Experiment,” Journal of Applied Psychology. n=250, field experiment, peer-reviewed. [Reliability: High] ↩︎ ↩︎2 ↩︎3 ↩︎4 ↩︎5 ↩︎6 ↩︎7 ↩︎8

  2. Exploring creativity in human–AI co-creation: a comparative study across design experience - Wang, Kim, Peng, & Wang. Frontiers in Computer Science (2025). 2×2 factorial experiment, peer-reviewed. [Reliability: High] ↩︎ ↩︎2 ↩︎3 ↩︎4

  3. Roles of Artificial Intelligence in Collaboration with Humans: Automation, Augmentation, and the Future of Work - Fügener, Walzner, & Gupta. Management Science (2025). Peer-reviewed. [Reliability: High] ↩︎

  4. The Metacognitive Demands and Opportunities of Generative AI - Tankelevitch et al. Proceedings of CHI 2024. Peer-reviewed conference paper. [Reliability: High] ↩︎

  5. Bridging the gap: inequalities that divide those who can and cannot create sustainable outcomes with AI - Behaviour & Information Technology (2025). Peer-reviewed. [Reliability: High] ↩︎

  6. Bridging Skill Gaps for the Future: New Jobs Creation in the AI Age - IMF Staff Discussion Note (2026). Official institutional publication. [Reliability: High] ↩︎

  7. Generative AI in Human-AI Collaboration: Validation of the Collaborative AI Literacy and Collaborative AI Metacognition Scales for Effective Use - Schmidt, Dörner, & Bernholt. International Journal of Human-Computer Interaction (2025). Scale development and validation study, peer-reviewed. [Reliability: High] ↩︎

  8. Unlocking human potential in the AI Age: how employee-AI collaboration transforms work engagement through dual psychological pathways - Sun, Hu, & Su. Frontiers in Psychology (2026). Peer-reviewed. [Reliability: High] ↩︎

This post is licensed under CC BY 4.0 by the author.