The Tight Coupling Trap of AI Pair Programming: Understanding Technical Debt Through Coupling Balance

Posted Nov 6, 2025

25 min read

AI-Generated Content

This article was generated by AI. The accuracy of the content is not guaranteed, and we accept no responsibility for any damages resulting from use of this article. By continuing to read, you agree to the Terms of Use.

Target Audience: Software engineers, architects, developers using AI tools (GitHub Copilot, Cursor, Claude, etc.) in their work
Prerequisites: Object-oriented programming, basic software design concepts
Reading Time: 20 minutes

Overview

AI pair programming tools like GitHub Copilot, Cursor, and ChatGPT dramatically improve development speed, but they also accumulate unexpected technical debt. A 2025 GitClear study analyzing 211 million lines of code changes revealed that after AI adoption, duplicated code increased 4-fold and refactoring decreased by 60%¹.

This article uses the theoretical framework from Vlad Khononov’s “Balancing Coupling in Software Design”² to analyze the “tight coupling” problem in AI-generated code. Furthermore, we use Cynefin theory to clarify areas that should be delegated to AI versus areas requiring human judgment, and propose practical prompt design techniques.

By reading this article, you can systematically understand quality issues in AI-generated code and learn practical methods for sustainable AI-collaborative development.

Note:

The code examples and prompt templates in this article are illustrative examples for explanation purposes and have not been executed to verify their operation. When using them in actual projects, please conduct appropriate testing and verification, and customize according to project characteristics.

The Reality of AI-Generated Code in Data

GitClear 2025 Report: Shocking Numbers

GitClear published a large-scale study in January 2025 investigating the impact of AI assistants on code quality¹. This study analyzed 211 million lines of code changes over 5 years from 2020 to 2024 from repositories owned by Google, Microsoft, Meta, and multiple large enterprises.

Key Findings:

Surge in Duplicated Code
- The percentage of copy/pasted code lines increased from 8.3% to 12.3% (approximately 50% increase)
- Code blocks with 5+ duplicate lines increased 4-fold
Significant Decrease in Refactoring
- Code changes classified as “refactoring” dropped from 25% in 2021 to less than 10% in 2024 (60% decrease)
- Notable increase in DRY (Don’t Repeat Yourself) principle violations
Trade-off Between Development Speed and Quality
- 63% of developers using AI assistants use AI in their work¹
- However, long-term maintainability and reusability are declining

Google DORA 2024 Report: Impact on Stability

Google’s 2024 DevOps Research and Assessment (DORA) report reported the correlation between AI adoption and delivery stability³.

Key Findings:

For every 25% increase in AI adoption rate, delivery stability decreases by 7.2%
Code review speed improves, but defect rate increases

This result suggests that while AI excels at generating “code that works quickly,” there are challenges in long-term quality metrics.

Academic Research: Collapse of Partition Quality in Large-Scale Generation

A research paper submitted to arXiv⁴ analyzed coupling and cohesion in AI-generated code.

Key Findings:

Small code snippets (function level): Maintainability equivalent to human-written code (high cohesion, low coupling)
Large-scale code generation (entire applications, large modules): Partition quality significantly degrades, maintainability severely worsens
AI tends to propose inappropriate solutions when facing large, complex problems

This research result shows that AI excels at “local optimization” but has limitations in “global architecture design.”

TiMi Studio Case Study

A study published in ACM Digital Library⁵ reported a case study of AI pair programming adoption in a game development team at TiMi Studio.

Positive Aspects:

Reduction in cyclomatic complexity
Improved code coverage
Reduction in code smells

Negative Aspects:

Reliability-questioning
Explainability-questioning
Trust-lacking
Autonomy-losing

This case study shows that while AI pair programming may improve quality metrics, it introduces new challenges to developers’ cognitive processes and decision-making.

Why AI Generates Tightly Coupled Code

1. “Working Code” First Design Philosophy

AI language models learn from large amounts of code in their training data, but their learning goal is generating “syntactically correct, executable code”⁶. Abstract design principles like long-term maintainability, extensibility, and modularity are not direct learning goals.

An article published in LeadDev⁷ lists the following as typical problems with AI-generated code:

Highly coupled code
God Objects (objects with excessively concentrated responsibilities)
Overly structured solutions

These patterns occur because AI prioritizes “code that works now” and doesn’t consider design intent or long-term impact.

2. Limitations in Context Understanding

AI optimizes within the presented context (prompts, surrounding code) but cannot understand “tacit knowledge” like overall project architecture, existing module structure, and team coding conventions⁸.

This limitation causes the following problems:

Ignoring Existing Modules
- Generates new code even when similar functionality already exists
- Results in duplicated code and inconsistent implementations
Direct Dependency References
- Directly depends on specific implementations without going through abstraction layers or Dependency Injection
- Testability and module independence decrease
Boundary Blurring
- Generates coupling across domain boundaries and layer boundaries
- Violates design principles like Clean Architecture and Hexagonal Architecture

3. Difficulty Understanding Abstract Concepts

A 27-day AI experiment article published on Medium⁸ reports that AI agent tools “struggle with abstract concepts like design principles, user experience, and code maintainability.”

Concrete Example:

When asking AI to “implement a RESTful API,” it tends to generate tightly coupled code like this:

  
# Typical AI-generated pattern (tight coupling)
class UserAPI:
    def __init__(self):
        self.db = MySQLDatabase("localhost", "user", "pass", "db")  # Direct dependency
        self.logger = FileLogger("/var/log/app.log")  # Direct dependency
        self.cache = RedisCache("localhost:6379")  # Direct dependency

    def get_user(self, user_id):
        # Business logic, data access, logging, caching mixed together
        self.logger.log(f"Fetching user {user_id}")
        cached = self.cache.get(f"user:{user_id}")
        if cached:
            return cached
        user = self.db.query(f"SELECT * FROM users WHERE id = {user_id}")
        self.cache.set(f"user:{user_id}", user)
        return user

Problems with this code:

Direct dependency on database implementation (MySQL)
Direct dependency on logging implementation (FileLogger)
Direct dependency on cache implementation (Redis)
Requires actual database, file system, and Redis for testing
Difficult to switch databases or caches

Meanwhile, human designers introduce abstractions:

  
# Loosely coupled version designed by humans
class UserAPI:
    def __init__(
        self,
        repository: UserRepository,  # Abstraction
        logger: Logger,  # Abstraction
        cache: Cache  # Abstraction
    ):
        self.repository = repository
        self.logger = logger
        self.cache = cache

    def get_user(self, user_id):
        self.logger.log(f"Fetching user {user_id}")
        cached = self.cache.get(f"user:{user_id}")
        if cached:
            return cached
        user = self.repository.find_by_id(user_id)
        self.cache.set(f"user:{user_id}", user)
        return user

4. Local Optimization Bias

As academic research⁴ shows, AI achieves good quality in small code snippets (function level) but quality degrades in large-scale code generation.

This is because AI performs the following “local optimizations”:

Prioritizes Completion Within Current Scope
- Implements all necessary processing within a function
- Doesn’t consider delegation to other modules or functions
Selects Immediately Available Dependencies
- Selects directly accessible concrete implementations over more appropriate but slightly distant abstractions
- Example: Directly depends on existing classes rather than defining interfaces
Prioritizes Short-Term Implementation Ease
- Generates code that works now rather than considering long-term maintenance costs

Diagnosing with the Three Dimensions of Coupling Balance

Vlad Khononov, in his book “Balancing Coupling in Software Design”², proposes a new approach beyond traditional “loose coupling supremacy”: Balancing Coupling.

The Three Dimensions of Coupling

Khononov presents a framework for evaluating coupling in three dimensions²⁹:

1. Strength (Integration Strength)

Represents the density of coupling and degree of dependency.

Strength Levels (weak → strong):

Data Coupling
- Passing simple data types (int, string, etc.) as arguments
- Example: calculate_total(price: float, quantity: int)
Stamp Coupling
- Passing structures or objects but using only some fields
- Example: process_order(order: Order) (only using Order object’s id field)
Control Coupling
- Passing flags or control information to control callee behavior
- Example: send_notification(user: User, notification_type: str)
Common Coupling
- Dependency on global variables or shared data structures
- Example: Multiple modules referencing the same global config object
Content Coupling
- Direct dependency on internal implementation of other modules
- Example: user._internal_cache.clear() (direct access to private fields)

Typical Problems with AI-Generated Code:

Generates common or content coupling where data or stamp coupling should be used
Many dependencies on global variables
Direct access to private fields

2. Distance (Locality)

Represents the physical and logical separation between modules.

Distance Levels (close → far):

Within a class
Within a package/module
Within an application
Across services

Khononov’s Principle⁹:

“High strength coupling should have shortened distance”

Typical Problems with AI-Generated Code:

Creates strong coupling between modules at far distances
Example: Frontend depending on backend’s specific database schema

3. Volatility

Represents changeability and scope of impact.

Volatility Levels (stable → unstable):

Standard library: Extremely low change frequency
Third-party library: Stable between major versions
Internal shared library: Shared across projects, periodically updated
Application-specific code: Frequently changed

Khononov’s Principle⁹:

“Low volatility can tolerate high strength”

Typical Problems with AI-Generated Code:

Many modules strongly depend on highly volatile business logic
Wide-ranging modifications needed when business rules change

Connascence: A More Refined Measure of Coupling

Connascence proposed by Meilir Page-Jones in 1992¹⁰ is a more refined metric for measuring coupling. It’s also explained in detail in Chapter 6 of Khononov’s “Balancing Coupling”¹¹.

Definition of Connascence:

Connascence between two software elements A and B exists when A requires a change (or careful checking) due to a change in B, or when both A and B need to be changed simultaneously.

Three Dimensions of Connascence¹¹:

Strength: Difficulty and cost of change
Degree: Number of couplings
Locality: Proximity between related elements

Types of Connascence (weak → strong):

Connascence of Name (CoN)
- Needs to reference the same name
- Example: Function names, variable names
Connascence of Type (CoT)
- Needs to use the same type
- Example: Function argument and return types
Connascence of Meaning (CoM)
- Specific values have specific meanings
- Example: Magic numbers (if status == 1 where 1 means “active”)
Connascence of Position (CoP)
- Order of elements matters
- Example: Positional arguments create_user("John", "Doe", 30)
Connascence of Algorithm (CoA)
- Needs to use the same algorithm
- Example: Encryption and decryption

Connascence Diagnosis of AI-Generated Code:

AI-generated code often contains strong connascence:

Connascence of Meaning: Many magic numbers, magic strings
Connascence of Position: Heavy use of positional arguments, not using named arguments
Connascence of Algorithm: Same logic duplicated in multiple places

Practicing the Coupling Balance Model

Khononov presents the following practical judgment criteria based on 3D coupling evaluation⁹:

Acceptable Coupling:

Low strength and low volatility (e.g., standard library dependency)
High strength but close distance (e.g., inter-method dependency within same class)

Problematic Coupling:

High strength, far distance, and high volatility
Example: Dependency on specific database schema between microservices

Refactoring Guidelines for AI-Generated Code:

Connascence of Meaning → Connascence of Name

  
# Before (Connascence of Meaning)
if user.status == 1:
    send_email(user)

# After (Connascence of Name)
if user.status == UserStatus.ACTIVE:
    send_email(user)

Common Coupling → Data Coupling

  
# Before (Common Coupling)
global_config = {...}

def process_data():
    timeout = global_config['timeout']  # Dependency on global variable

# After (Data Coupling)
def process_data(timeout: int):
    # Explicitly passed as argument

Strong Coupling × Far Distance → Introduce Abstraction

  
# Before (Service A depends on Service B's concrete implementation)
from service_b.mysql_repository import MySQLUserRepository

class ServiceA:
    def __init__(self):
        self.user_repo = MySQLUserRepository()  # Dependency on concrete implementation

# After (Dependency on interface)
from service_b.interfaces import UserRepository

class ServiceA:
    def __init__(self, user_repo: UserRepository):
        self.user_repo = user_repo  # Dependency on abstraction

Practice: Prompt Design to Prevent Tight Coupling

To mitigate tight coupling problems in AI-generated code, prompt engineering is important. Recent research shows the effectiveness of modular prompting techniques.

MoT (Modularization of Thought): Modular Prompting

Research submitted to arXiv¹² proposes a new prompting technique called MoT (Modularization of Thought).

MoT Principles:

Decompose complex programming problems into small, independent reasoning steps
More structured and interpretable problem-solving process
Achieved Pass@1 scores of 58.1%-95.1% in experiments with GPT-4o-mini and DeepSeek-R1 on 6 datasets¹²

MoT Benefits:

Improved flexibility and generalizability
Error isolation
Easier integration of information retrieval, arithmetic operations, and external APIs

Prompts That Clarify Boundaries and Modules

Research¹³ points out the importance of separating “boundary (role/tone) prompts” from “adaptive control schemas.”

Recommended Prompt Structure:

  
## Context
[Project overview, tech stack, existing architecture]

## Constraints
- Use dependency injection
- Depend on interfaces, not concrete implementations
- Each class follows Single Responsibility Principle (SRP)
- Make design testable

## Module Boundaries
- Data access layer: repositories package
- Business logic layer: services package
- API layer: controllers package
- Each layer depends only on layers above (Dependency Inversion Principle)

## Task
[Specific implementation task]

Practical Example: RESTful API Implementation

❌ Bad Prompt (likely to generate tight coupling):

Implement a REST API in Python that retrieves user information.

With this prompt, AI is likely to generate tightly coupled code like:

Database connection written directly in API class
Business logic and data access mixed
Difficult to test

✅ Good Prompt (promotes loose coupling):

  
## Context
This is a Python REST API project using FastAPI.
Following Clean Architecture, we emphasize layer separation.

## Architecture Structure
- domain/: Business logic and entities (framework-independent)
- application/: Use cases (interface definitions)
- infrastructure/: External dependencies (DB, APIs, etc.)
- presentation/: API layer (FastAPI)

## Constraints
- Use dependency injection (leverage FastAPI's Depends)
- Use repository pattern to abstract data access
- Each layer depends only on abstractions (interfaces) of lower layers
- Business logic concentrated in domain layer
- All classes follow Single Responsibility Principle
- Testable design (easy mocks and stubs)

## Task
Implement a REST API endpoint (GET /users/{user_id}) to retrieve user information.
Implement with the following file structure:

1. domain/entities/user.py: User entity
2. application/interfaces/user_repository.py: UserRepository abstract class
3. infrastructure/repositories/user_repository_impl.py: UserRepository implementation
4. application/services/user_service.py: User retrieval logic
5. presentation/api/user_controller.py: FastAPI endpoint

Dependency diagram:
Controller → Service → Repository(interface) ← RepositoryImpl

Clarify each file's role and thoroughly separate responsibilities between layers.

Using this prompt increases the likelihood that AI will generate loosely coupled code like:

  
# domain/entities/user.py
from dataclasses import dataclass

@dataclass
class User:
    id: int
    name: str
    email: str

# application/interfaces/user_repository.py
from abc import ABC, abstractmethod
from typing import Optional
from domain.entities.user import User

class UserRepository(ABC):
    @abstractmethod
    async def find_by_id(self, user_id: int) -> Optional[User]:
        pass

# application/services/user_service.py
from typing import Optional
from application.interfaces.user_repository import UserRepository
from domain.entities.user import User

class UserService:
    def __init__(self, user_repository: UserRepository):
        self.user_repository = user_repository

    async def get_user(self, user_id: int) -> Optional[User]:
        return await self.user_repository.find_by_id(user_id)

# infrastructure/repositories/user_repository_impl.py
from typing import Optional
from application.interfaces.user_repository import UserRepository
from domain.entities.user import User
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy import select
from infrastructure.models.user_model import UserModel

class UserRepositoryImpl(UserRepository):
    def __init__(self, db_session: AsyncSession):
        self.db_session = db_session

    async def find_by_id(self, user_id: int) -> Optional[User]:
        result = await self.db_session.execute(
            select(UserModel).filter(UserModel.id == user_id)
        )
        user_model = result.scalar_one_or_none()
        if user_model is None:
            return None
        return User(
            id=user_model.id,
            name=user_model.name,
            email=user_model.email
        )

# presentation/api/user_controller.py
from fastapi import APIRouter, Depends, HTTPException
from application.services.user_service import UserService
from presentation.dependencies import get_user_service

router = APIRouter()

@router.get("/users/{user_id}")
async def get_user(
    user_id: int,
    user_service: UserService = Depends(get_user_service)
):
    user = await user_service.get_user(user_id)
    if user is None:
        raise HTTPException(status_code=404, detail="User not found")
    return user

Organizational Management of Prompt Templates

An article from Thoughtworks¹⁴ points out the importance of Test-Driven Development (TDD) and pair programming in AI pair programming. Similarly, organizational management of prompt templates is recommended.

Recommended Practices:

Utilize CLAUDE.md or Cursor Rules
- Document design principles at project root
- Configure AI to automatically reference them
Configure Custom Instructions
- ChatGPT Custom Instructions
- GitHub Copilot workspace settings
Build Prompt Library
- Document frequently used prompts
- Share within team

CLAUDE.md Example:

  
# Project Design Principles

## Architecture
This project follows Clean Architecture.

## Coupling Principles
1. Depend on abstractions (interfaces, abstract classes), not concrete implementations
2. Use dependency injection
3. Each class follows Single Responsibility Principle (SRP)
4. Minimize dependencies on highly volatile business logic

## Code Generation Notes
- Separate data access and business logic
- No global variables
- No magic numbers/magic strings (use constants or enums)
- Always consider testability

Review Perspectives: Post-Generation Refactoring Points

Use the following checklists when reviewing AI-generated code:

Coupling Checklist:

Strength
- No dependencies on global variables
- No direct access to private fields
- Data coupling appropriately used
Distance
- No strong coupling between modules at far distances
- Abstraction layers appropriately placed
Volatility
- Dependencies on highly volatile business logic minimized
- Depends on stable abstractions (interfaces)

Connascence Checklist:

No magic numbers/magic strings → Replace with constants/enums
Not overusing positional arguments → Replace with named arguments or data classes
No duplicated algorithms → Consider consolidation

Architecture Checklist:

No Single Responsibility Principle (SRP) violations
No Dependency Inversion Principle (DIP) violations
Layer boundaries appropriately maintained

Using Cynefin Theory to Decide: Areas for AI vs. Areas for Human Judgment

Chapter 2 of Vlad Khononov’s “Balancing Coupling”¹⁵ explains how to use the Cynefin framework to understand complexity and make appropriate design decisions.

What Is the Cynefin Framework?

The Cynefin framework¹⁶ is a decision-making framework developed by management consultant and complexity science researcher David J. Snowden in the late 1990s. It classifies problems into four domains:

Simple: Clear cause-and-effect, best practices exist
Complicated: Cause-and-effect determined through analysis, expertise required
Complex: Complex cause-and-effect, exploration and experimentation required
Chaotic: Unclear cause-and-effect, immediate action required

Cynefin theory is often used to explain why software development should adopt agile development and Scrum¹⁶. In highly complex problem domains, approaches like agile development are more suitable than waterfall.

AI’s Strengths and Weaknesses

Using the Cynefin framework, we can classify areas for applying AI pair programming:

1. Simple Domain: Should Delegate to AI

Characteristics:

Clear input/output specifications
Established patterns
Low volatility

Examples:

Standard CRUD operation implementation
Data validation
Implementation of well-known algorithms (sorting, searching, etc.)
Boilerplate test code generation

Recommended Approach:

Fully delegate to AI
Mechanical checks (linters, type checkers) sufficient for review

2. Complicated Domain: Human-AI Collaboration

Characteristics:

Cause-and-effect determined through analysis
Expertise required
Multiple correct answers possible

Examples:

Performance optimization
Security implementation (authentication, authorization)
Database schema design
Adding features to existing systems

Recommended Approach:

Have AI generate initial implementation
Human reviews and improves based on expertise
Scrutinize from coupling, security, performance perspectives

3. Complex Domain: Human-Led, AI Assists

Characteristics:

Complex cause-and-effect
Exploration and experimentation required
Emergent solutions required

Examples:

System architecture design
Domain modeling (Domain-Driven Design)
Determining microservice boundaries
Coupling balance judgments

Recommended Approach:

Humans lead design decisions
AI assists with partial implementation or prototype generation
Humans evaluate the three dimensions of coupling (strength, distance, volatility)

Why AI Has Limitations in the Complex Domain:

As research⁴ shows, AI proposes inappropriate solutions when facing large, complex problems. This is due to:

Lack of Big Picture: AI judges only within presented context
Difficulty Evaluating Trade-offs: Cannot appropriately evaluate trade-offs between short-term implementation ease and long-term maintainability
Lack of Business Context Understanding: Cannot consider factors like business requirement changes, organizational constraints, team capabilities

4. Chaotic Domain: Humans Only

Characteristics:

Unclear cause-and-effect
Immediate action and judgment required
Rapidly changing situation

Examples:

Production incident response
Security incident response
Emergency hotfixes

Recommended Approach:

Humans handle directly
AI limited to reference information searching

Practical Decision Criteria

Use the following criteria to determine the scope to delegate to AI:

Factor	Delegate to AI (Simple)	Collaborate (Complicated)	Human-Led (Complex)
Scope	Single function	Single module	Entire system
Volatility	Low (standard library)	Medium (shared library)	High (business logic)
Coupling Impact Range	Local	Moderate	Global
Requirement Clarity	Clear	Can be clarified through analysis	Ambiguous, exploration needed
Testability	Easy	Possible	Design-dependent

Example: Decision Flowchart

graph TD
    A[Task Start] --> B{Are requirements clear?}
    B -->|Yes| C{Is scope single function?}
    B -->|No| D[Human performs domain analysis]
    C -->|Yes| E{Is volatility low?}
    C -->|No| F{Impact on existing architecture?}
    E -->|Yes| G[Fully delegate to AI]
    E -->|No| F
    F -->|Local| H[Human-AI collaboration]
    F -->|Wide-ranging| I[Human-led, AI assists]
    D --> I

Case Study: Determining Microservice Boundaries

Scenario: When converting an EC site’s monolithic application to microservices, how should service boundaries be determined?

Cynefin Classification:

Complex domain: Complex cause-and-effect, exploration and experimentation required

Approach:

Human’s Role:
- Identify Bounded Contexts based on Domain-Driven Design (DDD)
- Analyze business capabilities
- Evaluate the three dimensions of coupling (strength, distance, volatility)
- Make trade-off judgments (e.g., network latency vs. independence)
AI’s Role:
- Dependency analysis of existing codebase (static analysis)
- Visualization of potential service boundaries
- Organize pros/cons of each boundary candidate
- Draft migration plans

Recommended Prompt:

  
## Context
We're converting an EC site monolithic application (Python/Django) to microservices.
The following main domain areas exist:
- Product Management
- Inventory Management
- Order Management
- Customer Management
- Payment Processing

## Task
Analyze the existing codebase dependencies and propose 3 potential service boundary candidates.
For each candidate, evaluate the following:

1. Coupling strength (how tightly coupled to other domains)
2. Data consistency requirements (transaction boundaries)
3. Change frequency (volatility)
4. Fit with team structure

Organize candidates in table format and clearly state pros/cons.
Since humans will make the final decision, output as "option presentation" not "recommendation."

While referencing AI’s output, humans make final decisions considering additional factors:

Business strategy (which areas to invest in)
Team expertise and size
Infrastructure costs
Feasibility of phased migration

Summary

AI pair programming tools significantly improve development speed while easily generating tightly coupled code and risking technical debt accumulation. As the GitClear 2025 report¹ shows, the 4-fold increase in duplicated code and 60% decrease in refactoring suggest increased long-term maintenance costs.

This article discussed the following points:

1. Tight Coupling Problems in AI-Generated Code

Empirical research¹³⁴ revealed that AI has the following tendencies:

Good quality in small code snippets
Significantly degraded partition quality in large-scale code generation
Prioritizes “working code,” postpones long-term maintainability
Ignores existing architecture due to context understanding limitations

2. Three-Dimensional Coupling Evaluation

Using the framework from Vlad Khononov’s “Balancing Coupling”², we showed methods for systematically evaluating coupling:

Strength: Degree of dependency
Distance: Separation between modules
Volatility: Changeability

Furthermore, we introduced the concept of Connascence¹⁰¹¹ to enable more refined coupling evaluation.

3. Practical Prompt Design

We proposed specific prompt design techniques to prevent tight coupling:

MoT (Modularization of Thought)¹²: Modular prompting
Prompts that clarify boundaries and modules
Organizational management using CLAUDE.md and Cursor Rules
Review perspectives and checklists for post-generation code

4. Application Domain Judgment Based on Cynefin Theory

Using the Cynefin framework¹⁵¹⁶, we clarified areas to delegate to AI versus areas requiring human judgment:

Simple: Fully delegate to AI
Complicated: Human-AI collaboration
Complex: Human-led, AI assists
Chaotic: Humans only

Toward Sustainable AI-Collaborative Development

To address quality issues in AI-generated code, the following approaches are effective:

Clarify Design Principles
- Place CLAUDE.md or Cursor Rules at project root
- Document coupling, cohesion, and layer separation principles
Organizational Prompt Engineering Efforts
- Build prompt template library
- Share and improve within team
Strengthen Review Process
- Utilize coupling checklists
- Introduce Connascence evaluation
- Use static analysis tools (e.g., dependency analysis, circular dependency detection)
Continuous Refactoring
- Treat AI-generated code as “initial draft”
- Regular architecture reviews
- Visualize and manage technical debt
Appropriate Responsibility Allocation
- Establish decision criteria based on Cynefin theory
- Humans lead design decisions, use AI as assistant

AI pair programming is a powerful tool that, when properly used, can significantly improve development productivity. However, by understanding its potential risks and systematically managing them from a coupling balance perspective, sustainable software development becomes possible.

As Vlad Khononov states², what’s important is not “loose coupling supremacy” but appropriate balance according to the situation. This principle remains unchanged even in the AI era.

Balancing Coupling in Software Design: Understanding Vlad Khononov’s Coupling Strategy - Detailed explanation of the coupling balance concepts that form the theoretical foundation of this article

References

Other References (Not Numbered in Text)

Resources consulted during article creation but not directly cited in the text.

Structured Design - Stevens, W. P., Myers, G. J., Constantine, L. L., IBM Systems Journal (1974). [Reliability: High] Classic paper that first proposed the concepts of coupling and cohesion.
Connascence: Coupling, Cohesion & Connascence - Khalil Stemmler. [Reliability: Medium] Practical explanation of connascence.
A Pair Programming Framework for Code Generation via Multi-Plan Exploration and Feedback-Driven Refinement - arXiv (2024). [Reliability: Medium-High] PairCoder framework. Navigator agent and Driver agent collaboration.
Practices and Challenges of Using GitHub Copilot: An Empirical Study - arXiv (2023). [Reliability: Medium-High] GitHub Copilot usage survey analyzing 169 posts and 655 discussions from Stack Overflow and GitHub Discussions.
Security Weaknesses of Copilot-Generated Code in GitHub Projects: An Empirical Study - ACM TOSEM (2024). [Reliability: High] Analyzes security vulnerabilities in AI-generated code. Python 29.5%, JavaScript 24.2% of snippets have vulnerabilities.

On Citation Accuracy:

The research cited in this article has been verified through the following methods:

Confirmation in academic databases (Google Scholar, arXiv, ACM Digital Library, IEEE Xplore, etc.)
Verification of paper information on official journal websites
Cross-verification through multiple independent sources (academic media, official research institution announcements, etc.)

Full PDF access may be restricted for some papers, but abstracts, DOIs, author information, and key findings have been confirmed through official academic databases and reliable secondary sources.

Coding on Copilot: AI Code Quality Research 2025 - GitClear (2025). [Reliability: High] Large-scale study analyzing 211 million lines of code changes. Reports 4-fold increase in duplicated code, 60% decrease in refactoring. ↩︎ ↩︎² ↩︎³ ↩︎⁴ ↩︎⁵
Balancing Coupling in Software Design: Universal Design Principles for Architecting Modular Software Systems - Vlad Khononov (2024). Addison-Wesley. [Reliability: High] Presents the three dimensions of coupling (strength, distance, volatility) and proposes the coupling balancing approach. ↩︎ ↩︎² ↩︎³ ↩︎⁴ ↩︎⁵
DORA Report 2024 - Google DevOps Research and Assessment (2024). [Reliability: High] Reports 7.2% decrease in delivery stability for every 25% increase in AI adoption. ↩︎ ↩︎²
The Impact of AI-Generated Solutions on Software Architecture and Productivity: Results from a Survey Study - arXiv (2024). [Reliability: Medium-High] Analyzes coupling and cohesion in AI-generated code. Reports significant quality degradation in partition quality during large-scale generation. ↩︎ ↩︎² ↩︎³ ↩︎⁴
The Impact of AI-Pair Programmers on Code Quality and Developer Satisfaction: Evidence from TiMi studio - ACM Digital Library (2024). [Reliability: High] TiMi Studio case study. Reports both positive and negative aspects of AI pair programming. ↩︎
Evaluating the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT - arXiv (2023). [Reliability: Medium-High] Comparative analysis of code quality from GitHub Copilot, Amazon CodeWhisperer, and ChatGPT. ↩︎
How AI generated code compounds technical debt - LeadDev (2025). [Reliability: Medium-High] Points out problems with highly coupled code, God Objects, and overly structured solutions generated by AI. ↩︎
Zero Human Code - What I learned from forcing AI to build (and fix) its own code for 27 straight days - Daniel Bentes, Medium (2024). [Reliability: Medium] 27-day AI experiment report. Reports AI struggling with abstract concepts (design principles, code maintainability, etc.). ↩︎ ↩︎²
Balancing Coupling in Software Design: Core Concepts - Vlad Khononov (2024). [Reliability: High] Detailed explanation of the three dimensions of coupling (strength, distance, volatility). Official site. ↩︎ ↩︎² ↩︎³ ↩︎⁴
Connascence - Wikipedia. [Reliability: Medium-High] Explains the concept of connascence proposed by Meilir Page-Jones in 1992. ↩︎ ↩︎²
Book review reference (details of Chapter 6 on Connascence) ↩︎ ↩︎² ↩︎³
Modularization is Better: Effective Code Generation with Modular Prompting - arXiv (2025). [Reliability: Medium-High] Proposes MoT (Modularization of Thought) prompting technique. Achieved Pass@1 scores of 58.1%-95.1%. ↩︎ ↩︎² ↩︎³
Prompting Robotic Modalities (PRM): A structured architecture for centralizing language models in complex systems - ScienceDirect (2025). [Reliability: High] Points out the importance of separating boundary prompts from adaptive control schemas. ↩︎
Why test-driven development and pair programming are perfect companions for GitHub Copilot - Thoughtworks (2024). [Reliability: Medium-High] Explains the importance of TDD and pair programming in AI pair programming. ↩︎
Balancing Coupling in Software Design - Chapter 2: Coupling and Complexity: Cynefin - Vlad Khononov (2024). [Reliability: High] Book Chapter 2. Explains Cynefin theory and complexity. ↩︎ ↩︎²
Cynefin Framework - Wikipedia. [Reliability: Medium] Explains application of Cynefin theory to software development. ↩︎ ↩︎² ↩︎³

Software Design

This post is licensed under CC BY 4.0 by the author.

Overview

The Reality of AI-Generated Code in Data

GitClear 2025 Report: Shocking Numbers

Google DORA 2024 Report: Impact on Stability

Academic Research: Collapse of Partition Quality in Large-Scale Generation

TiMi Studio Case Study

Why AI Generates Tightly Coupled Code

1. “Working Code” First Design Philosophy

2. Limitations in Context Understanding

3. Difficulty Understanding Abstract Concepts

4. Local Optimization Bias

Diagnosing with the Three Dimensions of Coupling Balance

The Three Dimensions of Coupling

1. Strength (Integration Strength)

2. Distance (Locality)

3. Volatility

Connascence: A More Refined Measure of Coupling

Practicing the Coupling Balance Model

Practice: Prompt Design to Prevent Tight Coupling

MoT (Modularization of Thought): Modular Prompting

Prompts That Clarify Boundaries and Modules

Practical Example: RESTful API Implementation

Organizational Management of Prompt Templates

Review Perspectives: Post-Generation Refactoring Points

Using Cynefin Theory to Decide: Areas for AI vs. Areas for Human Judgment

What Is the Cynefin Framework?

AI’s Strengths and Weaknesses

1. Simple Domain: Should Delegate to AI

2. Complicated Domain: Human-AI Collaboration

3. Complex Domain: Human-Led, AI Assists

4. Chaotic Domain: Humans Only

Practical Decision Criteria

Case Study: Determining Microservice Boundaries

Summary

1. Tight Coupling Problems in AI-Generated Code

2. Three-Dimensional Coupling Evaluation

3. Practical Prompt Design

4. Application Domain Judgment Based on Cynefin Theory

Toward Sustainable AI-Collaborative Development

Related Articles

References

Other References (Not Numbered in Text)

Trending Tags