AI Boosts Individual Creativity but Kills Collective Diversity — The Paradox of Creative Homogenization
This article was generated by AI. The accuracy of the content is not guaranteed, and we accept no responsibility for any damages resulting from use of this article. By continuing to read, you agree to the Terms of Use.
- Target audience: Software engineers who use AI tools on a daily basis
- Prerequisites: Basic experience with AI tools such as GitHub Copilot, ChatGPT, Claude, etc.
- Reading time: 20 minutes
Overview
This article examines the paradox that generative AI “enhances individual creativity while reducing collective diversity,” drawing on multiple peer-reviewed studies. In an experiment published in Science Advances, Doshi & Hauser (2024) found that while participants who received AI assistance produced individually higher-rated work, the similarity between their works increased. Furthermore, a phenomenon called “creative scar” has been reported, where creativity fails to recover to its original level even after AI is removed. This article explores the challenge of maintaining diversity in human-AI co-creation, examining these research findings through the lens of software development.
It is worth noting that this article itself was generated by AI. Reading it with the awareness that it is a product of the very “AI-driven homogenization” it describes may deepen your sense of the paradox.
The Problem: AI-Improved Writing That All Looks the Same
The spread of generative AI has ushered in an era where “anyone can produce reasonably high-quality text and code.” GitHub Copilot suggests completions, ChatGPT generates drafts, and Claude Code proposes implementations with full awareness of your codebase.
However, while individual quality improves, multiple independent research groups have reported that overall diversity is being lost. This is counterintuitive. Why do “everyone getting better” and “everyone producing similar things” happen simultaneously?
The Science Advances Study — Individuals Improve, Collectives Homogenize
Study Design
Doshi & Hauser (2024) directly tested this question in an experimental study published in Science Advances1.
Experiment overview:
| Item | Details |
|---|---|
| Participants | 293 (selected from 500) |
| Evaluators | 600 (3,519 total evaluations) |
| Task | Writing an 8-sentence short story |
| Themes | “The open ocean,” “The jungle,” “A different planet” |
| Target readers | Teenagers / Young adults |
Participants were randomly assigned to one of three groups:
- Human only (control group)
- One AI idea available for reference
- Up to five AI ideas available for reference
Key Findings
The results revealed a clear paradox:
Individual-level improvement:
- In the group with access to five AI ideas, novelty increased by 8.1% and usefulness by 9.0%
- The effect was larger for participants with lower baseline creativity: novelty up 10.7%, usefulness up 11.5%
- Writing quality, enjoyment, and plot twists also improved
Collective-level homogenization:
- Works in the AI-assisted groups were more similar to each other compared to the human-only group
- Works in the AI-assisted groups showed 5.2% higher similarity to the original AI-generated ideas
flowchart TB
AI["Generative AI<br>Idea Provision"]
IND["Individual Creativity<br>Novelty +8.1%<br>Usefulness +9.0%"]
COL["Collective Diversity<br>Inter-work similarity ↑<br>Similarity to AI ideas +5.2%"]
AI --> IND
AI --> COL
classDef upStyle stroke:#2ea44f,stroke-width:3px
classDef downStyle stroke:#d29922,stroke-width:3px
class IND upStyle
class COL downStyle
Limitations of This Study
This study has several important limitations. First, the task was limited to “8-sentence short stories,” and generalizing to complex creative activities such as long-form writing or software development requires caution. Second, participants merely “referenced” AI ideas, which differs from real-time AI interaction (the way tools like Copilot or Cursor are used today). Additionally, the effects of monetary incentives and long-term impacts were not examined1.
The Mechanism of Homogenization — Why AI Produces “Similar Things”
The Structural Tendency of LLMs
The root cause of homogenization lies in the learning structure of LLMs. LLMs tend to generate the most probable token sequences from their training data, making them likely to produce similar outputs for the same prompt.
A study by Microsoft Research, published in PNAS, quantitatively demonstrated this phenomenon2. When GPT-4 and LLaMA-3 were asked to generate 100 short stories, the LLM-generated stories showed the same plot elements recurring repeatedly.
The specific examples are striking. When asked to generate continuations of Franz Kafka’s “Give It Up!” 100 times:
- In 50 out of 100 cases, a police officer instructed the protagonist to “turn left at the second corner”
- In 18 out of 100 cases, the instruction was to “turn right at the second corner”
- In 16 out of 100 cases, a “bakery” was mentioned as a landmark
If 100 human authors tackled the same prompt, this degree of convergence would not occur. The research team proposed the “Sui Generis Score” to measure this lack of diversity, showing that LLM outputs exhibit significantly lower uniqueness compared to human creative work2.
Propagation Through the Anchoring Effect
The homogeneity of LLMs propagates throughout society via human AI usage. The anchoring effect, well-established in cognitive science, serves as the mediating mechanism.
Humans tend to be pulled toward the first piece of information they encounter. When AI presents an “80-point answer” first, humans use it as a starting point for their thinking. The result:
- AI generates output that converges probabilistically near the “optimal solution”
- Humans see this output and think within its neighborhood (anchoring)
- Human modifications don’t stray far from the original AI output
- Different users employing the same AI produce similar outputs at scale
Research presented at Creativity & Cognition 2024 also reported that exposure to LLMs homogenizes human creative idea generation3.
Concrete Examples in Software Development
Engineers may recognize patterns like these:
- Code patterns suggested by GitHub Copilot look similar across different projects
- Asking AI to “design this API” returns the “standard pattern” of RESTful + CRUD + JWT authentication
- Multiple teams independently using AI for design converge on similar architectures
These are not necessarily bad outcomes. Convergence on standard patterns is a plus for maintainability and inter-team consistency. However, the cost is that innovative designs become less likely to emerge.
“Creative Scar” — Creativity That Doesn’t Recover After Removing AI
What Is a “Scar”?
An even more concerning finding than homogenization was reported in Zhou, Liu, Huang, & Li (2025): the phenomenon of “creative scar”4.
A “scar” refers to the mark left after a wound heals. What this research demonstrated is that even after AI is removed following a period of use, creativity fails to return to its original level.
Study Design
This research consisted of two studies:
| Item | Study 1 (Natural Experiment) | Study 2 (Controlled Experiment) |
|---|---|---|
| Design | Natural experiment | 7-day controlled experiment + follow-up |
| Participants | Large scale | 61 university students |
| Follow-up | None | 30 days and 60 days later |
| Ideas generated | — | 3,593 original ideas, 427 solutions (18 creative tasks) |
In Study 2, half the participants used ChatGPT-4 for 7 days, then performed creative tasks without AI.
Key Findings
The results showed three important patterns:
- During AI use: Creative performance improved (individual-level enhancement)
- Immediately after AI removal: Creativity declined significantly and did not recover to its original level
- 30 and 60 days later: Homogeneity (inter-work similarity) continued to increase even after AI removal
The research team called this the “creativity illusion.” AI temporarily boosts creative performance, but it does not develop humans’ creative ability. Instead, it forms AI dependency, and after removal, creativity may actually decline below pre-AI-use levels4.
Limitations of This Study
While this finding is significant, several limitations must be acknowledged. The sample size for Study 2 was small at 61 participants and limited to university students. Whether 7 days of AI use produces a lasting “scar” remains uncertain — despite the 60-day follow-up, verification over months or years has not been conducted. It is also unclear to what extent the types of creative tasks used (idea generation exercises) generalize to complex creative activities like software design.
The Muscle Atrophy Analogy and Its Limits
“Creative scar” is similar to muscle loss after stopping exercise (disuse atrophy). Just as unused muscles weaken, delegating creative thinking to AI may cause one’s ability to generate ideas independently to atrophy.
However, this analogy has limits. Muscle strength recovers with retraining, but research on the recoverability of creative skills is still lacking. The fact that homogeneity continued to increase even 60 days after AI removal may suggest structural changes beyond simple “atrophy from disuse.”
Corroborating Evidence — A Consistent Pattern Across Multiple Studies
The problems of homogenization and creative scarring are not limited to the studies above; similar trends have been reported across multiple independent studies.
Medeiros et al. (2025) conducted an experiment where participants performed a Divergent Association Task after being exposed to ChatGPT output5. The results found no evidence that AI output improves human divergent thinking through priming effects. In fact, the group exposed to low-creativity AI output showed decreased scores. This is a cautious finding demonstrating that AI does not automatically elevate human creativity.
Additionally, a 2025 review study published in PMC noted that generative AI is multidimensionally reshaping creativity, while also flagging homogenization risks6.
Synthesizing these findings, a consistent pattern emerges: “AI helps individuals but homogenizes collectives, and moreover, this impact persists even after AI is removed.” However, each study was conducted under different conditions, with different tasks and samples, and there is variation across studies in effect sizes and duration. This field is still developing, and additional research is needed before drawing definitive conclusions.
Implications for Software Development — Is Code Homogenization Happening?
An Important Caveat
Most of the studies presented so far dealt with tasks such as short story writing and idea generation. Software development differs qualitatively from these tasks in several ways:
- Code has patterns that are close to “correct answers” (especially for boilerplate and CRUD operations)
- A degree of standardization is actually desirable for reusability and maintainability
- Situations that demand creativity (architecture design, problem decomposition, algorithm selection) are limited
Therefore, the following discussion is a speculative application to software development and is not based on direct evidence.
Where Homogenization Is and Isn’t a Problem
Not all homogenization in software development is problematic:
Where homogenization is desirable:
- Boilerplate code generation
- Test case patterns
- Error handling best practices
- Log output formatting
Where homogenization may be problematic:
- System architecture design
- Exploring alternatives during technology selection
- Performance optimization approaches
- Interaction design affecting user experience
For the former, “correct answers” are relatively clear, and AI-driven standardization functions as a quality floor. For the latter, creativity that redefines the problem structure itself is required, and adopting AI suggestions wholesale risks falling into local optima.
The “80-Point Architecture” Trap
Ask AI to “design a microservices architecture,” and you’ll almost certainly get a “standard” answer. API Gateway, service mesh, event-driven design, CQRS pattern — these combinations function as an “80-point architecture” that works adequately in many scenarios.
The problem is that this “80 points” becomes everyone’s starting point. As the Doshi & Hauser study showed, when AI ideas serve as the starting point, deviation from them decreases1. When everyone starts from the same “80 points,” the final architectures tend to resemble each other.
For some projects, that’s perfectly adequate. But when competitive advantage depends on design uniqueness — for example, when you need to push the boundaries of scalability, or creatively solve domain-specific constraints — the “standard 80 points” may not be an appropriate starting point.
Discussion — How to Navigate the Trade-Off Between Leveling Up and Homogenization
The Value of a World Where “80-Point Writing” Is Mass-Produced
Homogenization cannot simply be condemned as “bad.” In many contexts, 80-point quality is perfectly functional.
Business email drafts, routine documentation, boilerplate code — when these are elevated in quality, overall productivity improves. AI assistance is especially beneficial for individuals in areas where baseline creativity is low (in research terms, “those with low baseline creativity scores”), as the Doshi & Hauser study demonstrates1.
The problem arises when the same approach is applied to all creative activities. The cost of homogenization becomes apparent in activities where uniqueness has value — conceiving new products, designing innovative architectures, generating research ideas.
A Shift in the Value Axis of Creativity
The research suggests that the spread of AI may change how creativity is valued.
In a world where AI can mass-produce “high-quality standard output,” “quality” alone will no longer be a differentiator. Instead, the value of “uniqueness” — approaches that no one else (including AI) would think of — will rise in relative terms.
This has practical implications for software engineers. In an era where AI can instantly deliver an “80-point implementation,” an engineer’s value will shift from “being able to write 80-point code quickly” to “being able to identify the direction toward 100 points that AI cannot reach.”
Rethinking How We Use AI — Three Approaches
Based on research findings, several practical approaches to countering homogenization risk can be considered. Note, however, that the effectiveness of these approaches has not been rigorously verified; they are proposals based on inference from the research.
1. Use AI as a “critic” rather than a “first-draft generator”
Homogenization arises when AI output serves as the starting point. Conversely, in a workflow where humans first generate their own ideas and then have AI critique them, human uniqueness is preserved as the starting point.
1
2
3
4
5
[Workflow with high homogenization risk]
Ask AI to "design it" → Modify AI output → Final product
[Workflow with low homogenization risk]
Human designs first → Ask AI "What are the problems with this design?" → Human decides on modifications
This approach is consistent with research findings that constraints enhance creativity. The “constraint” of thinking for oneself activates creative thinking7.
2. Deliberately reject AI suggestions
As research on the anchoring effect shows, AI’s initial suggestion limits the scope of thinking. To counter this, deliberately ignoring AI’s first suggestion and exploring different directions may be effective.
For example, in architecture design:
- Have AI generate an initial proposal
- Explicitly designate that proposal as the “option not to use”
- Ask AI to “suggest three approaches completely different from this one”
- Have a human compare all four options (including the original) and decide
This approach is inspired by the finding from George & Wiley (2020) that presenting “examples to avoid” improves originality7.
3. Establish a “my ideas first” rule
The creative scar research suggests that AI dependency risks disuse atrophy of creative skills4. The key to addressing this risk is forming your own ideas before being exposed to AI output.
In an engineering context:
- Don’t use AI during the initial phase of design reviews; brainstorm with humans only
- When encountering a new problem, spend 10 minutes writing down ideas before asking AI
- Establish a team rule that “everyone brings their own proposals before seeing AI suggestions”
This preserves diversity in starting points and avoids AI’s anchoring effect. However, no study has yet directly verified the effectiveness of this method.
Conclusion
Multiple independent studies paint a consistent paradox surrounding AI and creativity.
Findings that are becoming established:
- AI improves individual creative performance (with larger effects for those with lower baselines)1
- AI-assisted outputs become more similar to each other (collective homogenization)123
- LLMs have a structural tendency to generate less diverse output2
Suggestive findings that require further verification:
- Creativity may not recover to its original level after AI is removed (creative scar)4
- Homogeneity may continue to increase even after AI removal4
Unanswered questions:
- Does the same pattern hold for complex creative activities like software development?
- Is creative scarring recoverable, and if so, under what conditions?
- Can homogenization be prevented through different ways of using AI?
This article itself may be one sample of “AI-driven homogenization.” If AI-generated articles all follow a similar structure and reach similar conclusions, that is precisely an example of the paradox this article describes. I encourage readers not to take this article at face value, but to evaluate it critically against your own experience.
The question is not “whether to use AI.” It is how to preserve your own voice while benefiting from AI. That requires the conscious effort to resist settling for AI’s “80-point optimal solution” and to pursue the perspective that only you can offer.
Related Articles
For more on this topic, see these related articles:
- AI Only Boosts Creativity for Those with High Metacognition - The relationship between metacognition and the AI creativity gap. A deep dive into the “individual differences” issue that complements this article’s “homogenization” theme
- Constraints Enhance Creativity: The Science of Idea Generation in Software Development - The theoretical foundation for the “reject AI suggestions” approach proposed in this article
- The Truth Behind Experts Who Seem to “Delegate Everything to AI” - Meta-knowledge as the reason experts can avoid AI homogenization
- The Expert Who Doesn’t Write Prompts — Meta-Prompting - A meta-level approach to critically evaluating AI output
- The True Value of AI: Multifaceted Evaluation Beyond Time Savings - An analysis of the same Doshi & Hauser study from the perspective of “AI value assessment.” Provides a complementary understanding of the relationship between homogenization and value creation
References
References corresponding to the citation numbers in the text are listed below in numerical order.
Additional References (not cited by number in the text)
The paradox of creativity in generative AI - PMC (2025). Review on the creativity paradox of generative AI. [Reliability: High]
Homogenizing effect of large language models (LLMs) on creative diversity: An empirical comparison of human and ChatGPT writing - Empirical study on the homogenizing effect of LLMs on creative diversity. [Reliability: High]
Best humans still outperform artificial intelligence in a creative divergent thinking task - Koivisto, M. & Grassini, S. Scientific Reports (2023). Comparative study of human and AI divergent creativity. [Reliability: High]
Generative AI enhances individual creativity but reduces the collective diversity of novel content - Doshi, A. R., & Hauser, O. P. Science Advances, 10(28) (2024). n=293 (writers) + 600 (evaluators), randomized controlled experiment, peer-reviewed. [Reliability: High] ↩︎ ↩︎2 ↩︎3 ↩︎4 ↩︎5 ↩︎6
Echoes in AI: Quantifying lack of plot diversity in LLM outputs - Microsoft Research. Proceedings of the National Academy of Sciences (PNAS) (2025). Quantitative analysis of 100 stories generated by GPT-4 and LLaMA-3, peer-reviewed. [Reliability: High] ↩︎ ↩︎2 ↩︎3 ↩︎4
Homogenization Effects of Large Language Models on Human Creative Ideation - Proceedings of the 16th Conference on Creativity & Cognition (2024). Examines the homogenization effect of LLM exposure on human creative idea generation, peer-reviewed conference paper. [Reliability: High] ↩︎ ↩︎2
Creative scar without generative AI: Individual creativity fails to sustain while homogeneity keeps climbing - Zhou, Y., Liu, Q., Huang, J., & Li, G. Technology in Society (2025). Study 1: natural experiment, Study 2: n=61, 7-day controlled experiment + 30-day and 60-day follow-ups, peer-reviewed. Note: small sample size limited to university students. [Reliability: Medium-High] ↩︎ ↩︎2 ↩︎3 ↩︎4 ↩︎5
Human-AI Co-Creativity: Does ChatGPT Make Us More Creative? - Medeiros, K. E. et al. The Journal of Creative Behavior (2025). Examines the relationship between ChatGPT priming effects and divergent thinking, peer-reviewed. [Reliability: High] ↩︎
Artificial Intelligence Reshapes Creativity: A Multidimensional Evaluation - Zhang, Shao, Yuan, & Shen. PsyCh Journal, 14(6), 831-840 (2025). Review study on how AI is multidimensionally reshaping creativity, peer-reviewed. [Reliability: High] ↩︎
Need something different? Here’s what’s been done: Effects of examples and task instructions on creative idea generation - George, T., & Wiley, J. Memory & Cognition, 48(2), 226-243 (2020). Experimentally verified that presenting “examples to avoid” improves originality, peer-reviewed. [Reliability: High] ↩︎ ↩︎2