Writing Markdown Documentation for AI Efficiency: A Practical Guide to Reducing Context Size
This article was generated by AI. The accuracy of the content is not guaranteed, and we accept no responsibility for any damages resulting from use of this article. By continuing to read, you agree to the Terms of Use.
- Target Audience: Python/JavaScript developers, engineers using AI tools (Claude, ChatGPT, Cursor, etc.) in their work
- Prerequisites: Markdown basics, basic experience with AI development tools
- Reading Time: 15 minutes
Overview
When leveraging large language models (LLMs) like Claude, ChatGPT, and Cursor in development, how you write project documentation (CLAUDE.md, README.md, prompt templates, etc.) directly impacts AI response quality and cost efficiency. This article explains Markdown writing techniques that minimize context size while maximizing information density, based on 2024-2025 research and best practices[1][2][3].
Note: The recommendations in this article explain optimization techniques that prioritize AI (LLM) readability.
Why You Should Be Mindful of Context Size
Current State of Context Windows (As of November 2025)
Major LLM context windows are as follows[4][5]:
Claude:
- Paid plan: 200K tokens (~500,000 characters, equivalent to 500 pages)
- Enterprise: 500K tokens (Claude Sonnet 4.5)
- API Beta: 1M tokens (Claude Sonnet 4, Tier 4 and above)
ChatGPT/GPT-4o:
- Free: 8K tokens
- Plus: 32K tokens
- Pro/Enterprise: 128K tokens
- API: 128K tokens
Cost Impact
Claude applies premium pricing for requests exceeding 200K tokens[4]:
- Input tokens: 2x
- Output tokens: 1.5x
Performance Impact
Inputting large amounts of context causes the following problems:
- Slower response times
- Increased noise from less relevant information
- Reduced accuracy (especially in RAG systems)[1]
Why Markdown Is Chosen
1. Token Efficiency
Markdown enables 20-30% token reduction compared to HTML, XML, and JSON[2].
1
2
3
4
5
6
7
8
9
10
11
# ❌ HTML (verbose)
<h1>Title</h1>
<ul>
<li>Item 1</li>
<li>Item 2</li>
</ul>
# ✅ Markdown (concise)
# Title
- Item 1
- Item 2
Reasons:
- Markdown symbols (
#,*,-,|) are often converted to single tokens[7] - No closing tags needed
- No attribute description overhead
2. LLM Tokenization Process
OpenAI’s tiktoken and Claude’s tokenizer use Byte Pair Encoding (BPE) to split text into tokens[7][8]:
- Convert to byte sequence using UTF-8 encoding
- Pre-tokenize with predefined regex patterns (split at word boundaries)
- Merge frequent byte sequences using BPE algorithm
Markdown structural symbols are learned as frequent patterns, enabling efficient tokenization.
3. RAG Search Accuracy Improvement
Clean Markdown has been reported to improve RAG search accuracy by up to 35% and reduce token usage by 20-30%[2].
Context-Efficient Markdown Writing
1. Heading Hierarchy Optimization
Principle: Clear hierarchical structure allows LLMs to instantly understand document structure[2].
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# ❌ Bad example: Flat structure
## Overview
This is the project overview.
## Installation
Installation instructions.
## Usage
How to use.
## API Reference
API description.
# ✅ Good example: Hierarchical structure
## Overview
Concise project description
## Quick Start
### Installation
npm install project-name
### Basic Usage
const app = new App()
## API Reference
### Class: App
#### Constructor
#### Methods
Effects:
- LLMs more easily understand relationships between sections
- RAG system chunk splitting is optimized
- Faster navigation to needed information
2. Eliminating Redundancy
Principle: Concise prompts can achieve 30-50% token reduction[11][12].
1
2
3
4
5
6
7
8
9
# ❌ Verbose expression (estimated 200 tokens)
This project is a modern web application framework designed to
be very convenient and easy to use for users, while having
extremely powerful features. By using this framework, developers
can rapidly build applications.
# ✅ Concise expression (estimated 80 tokens)
Modern web application framework.
Simple API enables rapid development.
Reduction techniques:
- Reduce modifiers (“very”, “extremely”, etc.)
- Eliminate duplicate expressions
- Passive voice → Active voice
- Reduce verbose conjunctions
3. Using Lists and Bullet Points
Principle: Structured lists are more token-efficient than prose[11].
1
2
3
4
5
6
7
8
9
10
11
12
13
# ❌ Prose format (estimated 150 tokens)
The main features of this library include, first of all,
high-speed processing. Next, type safety is guaranteed, which
is also important. Furthermore, it has extensibility through
plugins.
# ✅ List format (estimated 60 tokens)
Key features:
- High-speed processing
- Type safety
- Plugin extensibility
Details: [See documentation](./docs/features.md)
4. Code Block Optimization
Principle: Include only minimum necessary code examples, reference external files for details[13].
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# ❌ Verbose code example (estimated 300 tokens)
import { App } from 'framework'
import { Logger } from 'logger'
import { Config } from 'config'
const logger = new Logger()
const config = new Config({
port: 3000,
host: 'localhost',
debug: true
})
const app = new App(config, logger)
app.use(middleware1)
app.use(middleware2)
app.use(middleware3)
app.listen()
# ✅ Concise example (estimated 100 tokens)
import { App } from 'framework'
const app = new App({ port: 3000 })
app.listen()
// Full example: examples/basic-setup.ts
5. Table Token Optimization
Markdown tables consume significant tokens, so optimization is important[14].
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# ❌ Verbose table (estimated 200 tokens)
| Parameter Name | Data Type | Default Value | Description |
|----------------|-----------|---------------|-------------|
| port | number | 3000 | Port number for server to listen on |
| host | string | localhost | Server hostname |
| debug | boolean | false | Whether to enable debug mode |
# ✅ Concise table (estimated 120 tokens)
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| port | number | 3000 | Listen port |
| host | string | localhost | Hostname |
| debug | boolean | false | Debug mode |
# Or reference external file for complex tables
Complete list of config parameters: [config.md](./docs/config.md)
6. Directory Structure Representation
Principle: Tree line characters are for humans; for AI, use indentation or list format.
Tree line characters (├──, │, └──) are UTF-8 multibyte characters that are split into multiple tokens by BPE tokenizers, making them inefficient.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# ❌ Tree lines (estimated 120-150 tokens, visual for humans)
project/
├── src/
│ ├── components/
│ ├── api/
│ └── utils/
├── tests/
└── docs/
# ✅ Indentation (estimated 70-90 tokens, 40% reduction)
project/
src/
components/
api/
utils/
tests/
docs/
# ✅ List format (estimated 60-80 tokens, 50% reduction)
- project/
- src/
- components/
- api/
- utils/
- tests/
- docs/
# ✅ Path notation (estimated 40-50 tokens, 67% reduction, compact)
project/{src/{components,api,utils},tests,docs}
With descriptive comments:
In practice, directories often have descriptions added, requiring consideration of token efficiency in those cases as well.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Tree lines + descriptions (estimated 180-200 tokens)
project/
├── src/
│ ├── components/ # React components
│ ├── api/ # API endpoints
│ └── utils/ # Utility functions
├── tests/ # Test files
└── docs/ # Documentation
# Indentation + descriptions (estimated 110-130 tokens, 40% reduction)
project/
src/
components/ # React components
api/ # API endpoints
utils/ # Utility functions
tests/ # Test files
docs/ # Documentation
# List + descriptions (estimated 100-120 tokens, 45% reduction)
- project/
- src/
- components/ - React components
- api/ - API endpoints
- utils/ - Utility functions
- tests/ - Test files
- docs/ - Documentation
# Table format (estimated 90-110 tokens, 50% reduction, highest information density)
| Path | Description |
|------|-------------|
| src/components/ | React components |
| src/api/ | API endpoints |
| src/utils/ | Utility functions |
| tests/ | Test files |
| docs/ | Documentation |
Usage Guidelines:
- CLAUDE.md and AI-focused files: Indentation or list format
- README.md (humans + AI): Indentation (balances readability and token efficiency)
- Technical specifications: Path notation (most compact)
About Token Reduction Rates:
The token reduction rates in this section (“estimated 120-150 tokens”, “40% reduction”, etc.) are theoretical estimates based on typical directory structures. Actual effects vary by structure complexity, description length, and tokenizer used. Measuring effects in your own project before adoption is recommended.
Note: This article uses tree line format for human reader readability, but indentation format is recommended for AI-focused documentation.
7. Appropriate Use of Text Formatting
Principle: Use only formatting with semantic meaning, avoid excessive decoration[22][23].
Text formatting (**bold**, *italic*, `code`) helps LLM understanding, but formatting symbols themselves consume tokens. While Markdown symbols are often converted to single tokens[7][8], excessive use should be avoided[24].
1
2
3
4
5
6
7
8
9
10
11
12
# ❌ Excessive formatting (estimated 150 tokens)
**Important:** This **very important** feature must be used
***extremely*** carefully. You **absolutely must not** forget.
# ✅ Appropriate formatting (estimated 80 tokens)
**Important:** Use this feature carefully.
# ❌ Unnecessary formatting
This project is a *modern* *web* application *framework*.
# ✅ No formatting (meaning is clear)
This project is a modern web application framework.
When to use formatting:
- Important warnings or notes (
**Warning:**) Commands,filenames,variable namesand other technical elements- Emphasized technical terms (first occurrence only)
When to avoid formatting:
- Purely for visual improvement
- Formatting every sentence
- Multiple emphases in the same paragraph
Comparing Markdown and JSON: Markdown is 15% more token efficient than JSON[25]:
- JSON: 13,869 tokens
- Markdown: 11,612 tokens
However, excessive formatting within Markdown reduces this advantage.
8. Using Section References
Principle: Don’t repeat explanations; define once and reference.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# ❌ Information duplication (estimated 400 tokens)
## Installation
npm install framework
Dependencies:
- Node.js 18+
- npm 9+
- TypeScript 5+
## Development Environment Setup
To set up development environment, you need:
- Node.js 18+
- npm 9+
- TypeScript 5+
# ✅ Deduplication through references (estimated 200 tokens)
## Requirements
- Node.js 18+
- npm 9+
- TypeScript 5+
## Installation
npm install framework
## Development Environment
See Requirements section above
9. Key-Value Pair and Labeled List Optimization
Principle: Maintain semantics while reducing unnecessary formatting[26].
The **Title**: pattern is frequently seen in technical documentation, but formatting is often unnecessary.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# ❌ Unnecessary formatting (estimated 60 tokens)
**Prerequisites**:
- Node.js 18+
- Docker environment
**Installation Steps**:
1. Clone repository
2. Install dependencies
**Notes**:
- Do not use in production
# ✅ Improvement 1: Use headings (estimated 40 tokens, 33% reduction)
### Prerequisites
- Node.js 18+
- Docker environment
### Installation Steps
1. Clone repository
2. Install dependencies
### Notes
- Do not use in production
# ✅ Improvement 2: No formatting (estimated 35 tokens, 42% reduction)
Prerequisites:
- Node.js 18+
- Docker environment
Installation Steps:
1. Clone repository
2. Install dependencies
Notes:
- Do not use in production
Usage Guidelines:
| Pattern | Use Case | Token Efficiency |
|---|---|---|
### Title | Main document sections | High (high semantics) |
Title: | Inline labels, short lists | Highest |
**Title**: | Truly emphasized warnings/notes | Low (minimize use) |
CLAUDE.md Practical Example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# ❌ Excessive formatting
**Project Name**: MyApp
**Language**: TypeScript
**Framework**: React
**Database**: PostgreSQL
# ✅ Headings and lists
## Tech Stack
- Language: TypeScript
- Framework: React
- DB: PostgreSQL
# ✅ Compact inline (for short items)
Project: MyApp | Language: TypeScript | DB: PostgreSQL
Using Definition Lists (Extended Syntax):
Some Markdown parsers (Pandoc, Jekyll, etc.) support Definition Lists[26]:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Definition list syntax (if supported)
Title
: Description
API Key
: Secret key used for application authentication
: Set in environment variable `API_KEY`
# Rendered result (HTML)
<dl>
<dt>API Key</dt>
<dd>Secret key used for application authentication</dd>
<dd>Set in environment variable <code>API_KEY</code></dd>
</dl>
However, this is not supported in standard Markdown, so use headings or lists when prioritizing compatibility.
10. Related File References and SSOT Principle
Principle: Maintain Single Source of Truth (SSOT) and include only minimum necessary cross-references[27][28].
References to related files help navigation but consume tokens and increase cognitive load.
SSOT (Single Source of Truth) Principle
Definition: Define and manage each information element in one place only, using only references elsewhere[27][28].
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# ❌ SSOT violation: Information duplication (estimated 500 tokens)
<!-- README.md -->
## Prerequisites
- Node.js 18+
- Docker 20+
- PostgreSQL 14+
<!-- CONTRIBUTING.md -->
## Development Environment
Development requires:
- Node.js 18+
- Docker 20+
- PostgreSQL 14+
<!-- docs/setup.md -->
## Setup
Please install the following:
- Node.js 18+
- Docker 20+
- PostgreSQL 14+
# ✅ SSOT compliant (estimated 200 tokens, 60% reduction)
<!-- README.md -->
## Prerequisites
- Node.js 18+
- Docker 20+
- PostgreSQL 14+
Details: [docs/setup.md](docs/setup.md)
<!-- CONTRIBUTING.md -->
## Development Environment
Prerequisites: See [README.md](../README.md#prerequisites)
<!-- docs/setup.md -->
## Setup
Ensure prerequisites are met: [README.md](../README.md#prerequisites)
Installation steps...
Related File Reference Best Practices
1. Selective References
Don’t list all related files; include only truly necessary references[29]:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# ❌ Excessive cross-references (estimated 300 tokens)
## Related Documentation
- [Project Overview](./docs/overview.md)
- [Architecture](./docs/architecture.md)
- [API Specification](./docs/api.md)
- [Database Schema](./docs/schema.md)
- [Deployment Guide](./docs/deploy.md)
- [Troubleshooting](./docs/troubleshooting.md)
- [FAQ](./docs/faq.md)
- [Changelog](./CHANGELOG.md)
- [License](./LICENSE)
- [Code of Conduct](./CODE_OF_CONDUCT.md)
## See Also
- [Getting Started](./docs/getting-started.md)
- [Advanced Usage](./docs/advanced.md)
- [Examples](./examples/README.md)
# ✅ Minimal references (estimated 100 tokens, 67% reduction)
## Next Steps
- Quick start: [docs/getting-started.md](docs/getting-started.md)
- API spec: [docs/api.md](docs/api.md)
Other: See [docs/](docs/)
2. Concise Link Text
Keep link text short with important words first[29]:
1
2
3
4
5
# ❌ Verbose link text
For details, please refer to [comprehensive documentation about project architecture](docs/architecture.md).
# ✅ Concise link text
Details: [Architecture documentation](docs/architecture.md)
3. Avoid Duplicate References on Same Page
Make hyperlinks only for the first occurrence of links to the same destination on a page[29]:
1
2
3
4
5
6
7
8
9
# ❌ Duplicate references
The [API spec](docs/api.md) describes all endpoints.
For authentication, see the auth section of the [API spec](docs/api.md).
Error handling is detailed in the [API spec](docs/api.md).
# ✅ First occurrence only
The [API spec](docs/api.md) describes all endpoints.
For authentication, see the auth section of the API spec.
Error handling is detailed in the API spec.
4. Implicit Relationships Through Directory Structure
Show relationships through directory structure rather than explicit links:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# ❌ List all files
## Documentation
- [Overview](docs/overview.md)
- [Installation](docs/installation.md)
- [Configuration](docs/configuration.md)
- [Usage](docs/usage.md)
- [API](docs/api.md)
- [CLI](docs/cli.md)
- [Troubleshooting](docs/troubleshooting.md)
# ✅ Directory reference
## Documentation
Quick start: [docs/getting-started.md](docs/getting-started.md)
Other documentation: See [docs/](docs/)
docs/
getting-started.md # Quick start
installation.md # Installation
configuration.md # Configuration
usage.md # Usage
api.md # API specification
troubleshooting.md # Troubleshooting
CLAUDE.md Practical Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# ❌ Excessive file listing (estimated 400 tokens)
## Related Documentation
### Architecture
- [Overall Architecture](docs/architecture/overview.md)
- [Frontend Design](docs/architecture/frontend.md)
- [Backend Design](docs/architecture/backend.md)
- [Database Design](docs/architecture/database.md)
### Development Guide
- [Environment Setup](docs/dev/setup.md)
- [Coding Standards](docs/dev/coding-style.md)
- [Testing Strategy](docs/dev/testing.md)
- [CI/CD](docs/dev/cicd.md)
### API
- [REST API](docs/api/rest.md)
- [GraphQL API](docs/api/graphql.md)
- [WebSocket API](docs/api/websocket.md)
# ✅ Minimal references (estimated 120 tokens, 70% reduction)
## Guidance for Claude
**When generating code:**
- Prioritize type safety (coding standards: [CONTRIBUTING.md](CONTRIBUTING.md))
- API design patterns: [docs/api/](docs/api/)
- Testing: [docs/dev/testing.md](docs/dev/testing.md)
**Detailed documentation:**
See [docs/](docs/)
Optimizing References in AI Instructions
When having AI read documentation, specify only necessary files:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# ❌ List all documentation
Please reference the following documentation:
- README.md
- CONTRIBUTING.md
- docs/architecture.md
- docs/api.md
- docs/setup.md
- docs/coding-style.md
- docs/testing.md
(etc.)
# ✅ Specify only necessary files
Please reference the following:
- Coding standards: CONTRIBUTING.md
- Architecture: docs/architecture.md
Other: Reference files in docs/ as needed
11. File Integration vs Separation Trade-offs
Principle: Choose between integration and separation based on AI system type[30][31][32].
“Consolidating multiple small files into one” can significantly improve or worsen token efficiency depending on the situation.
Recommended Approaches by Use Case
| Use Case | Recommendation | Reason | Token Efficiency |
|---|---|---|---|
| Claude Code (integrated AI) | Integrate | Load full context at once[32] | High (reference reduction) |
| RAG Systems | Separate | Retrieve only needed parts via semantic search[31] | High (exclude unnecessary info) |
| ChatGPT Custom Instructions | Integrate | Full content loaded each time | High (reference reduction) |
| AI Agents (selective loading) | Separate | Dynamically read only needed files | High (minimum necessary) |
| Documentation Sites | Separate | Users browse by topic | Medium (human-focused) |
Pros and Cons of Integration
Pros:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Before: 3 separate files (total 600 tokens, example calculation)
<!-- overview.md -->
# Project Overview
...
Details: [setup.md](setup.md), [usage.md](usage.md)
<!-- setup.md -->
# Setup
Overview: See [overview.md](overview.md)
Usage: See [usage.md](usage.md)
<!-- usage.md -->
# Usage
Overview: See [overview.md](overview.md)
Setup: See [setup.md](setup.md)
# After: 1 integrated file (400 tokens, 33% reduction, example calculation)
# Project Overview
...
## Setup
...
## Usage
...
Elements That Can Be Reduced:
- Cross-reference links between files
- Duplicate headers/footers
- Repeated prerequisite explanations
- Navigation sections
Claude Code Recommendation: Keep it concise[32]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# CLAUDE.md (recommended: concise)
## Project Overview
[Concise]
## Tech Stack
[List format]
## Coding Standards
[Key points only]
## Guidance for Claude
[Specific instructions]
# Add with @import as needed
@.claude/advanced-config.md
Cons:
- File gets larger (avoid 1,000+ lines)
- Separation of concerns becomes difficult
- Increased conflicts during team editing
Pros and Cons of Separation
Pros:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Effect in RAG Systems
<!-- Integrated file: 5,000 tokens -->
README.md (5,000 tokens) vectorized
→ User question: "How does authentication work?"
→ Search entire 5,000 tokens
→ May include irrelevant information
<!-- Separated files: 500 tokens each -->
- overview.md (500 tokens)
- authentication.md (500 tokens) ← Match!
- deployment.md (500 tokens)
...
→ User question: "How does authentication work?"
→ Retrieve only authentication.md (500 tokens)
→ 90% token reduction
RAG Chunking Best Practices[31]:
- Semantic chunking: Split at logical boundaries (paragraphs, sections)
- Fixed-size chunking: 100-300 tokens/chunk (smaller = faster but less accurate)
- Document-based chunking: Split based on Markdown structure
- Overlap: 20-50 token overlap with adjacent chunks
Cons:
- Need reference descriptions between files
- Inefficient when reading everything
- Management becomes complex
Practical Decision Criteria
When to Integrate:
✅ Consider integration when ALL of the following apply
- AI reads everything each time (Claude Code, Custom Instructions, etc.)
- Total files under 1,000 tokens
- Topics are closely related
- Many cross-references (3+ locations)
Examples: CLAUDE.md, prompt templates, small project READMEs
When to Keep Separated:
✅ Keep separated when ANY of the following apply
- Used in RAG systems
- Total files over 2,000 tokens
- Topics are independent
- Selective loading is possible (AI agents, etc.)
Examples: Large documentation sites, API specifications, technical manuals
Hybrid Approach: Hierarchical CLAUDE.md
Claude Code supports hierarchical CLAUDE.md[32]:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
project/
├── CLAUDE.md # Top level
├── frontend/
│ ├── CLAUDE.md # Frontend specific
│ └── src/
└── backend/
├── CLAUDE.md # Backend specific
└── src/
# Top level CLAUDE.md
## Project Overview
Monorepo structure. Frontend (React), Backend (Node.js)
## Global Rules
- TypeScript required
- ESLint compliant
See each directory's CLAUDE.md for details
# frontend/CLAUDE.md
## Frontend Specific Rules
- React 18
- Styled Components used
# backend/CLAUDE.md
## Backend Specific Rules
- Express 4
- Prisma used
Benefits:
- Separation of concerns
- Keep each file small (under 100 lines)
- Claude Code auto-loads as needed
- Token efficient (loads only working directory)
Token Reduction Examples
Case Study: Project Documentation Integration[30][32]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Before: 5 separate files (total 2,500 tokens)
README.md: 800 tokens
- Overview
- Link list: INSTALL.md, USAGE.md, API.md, CONTRIBUTING.md
INSTALL.md: 400 tokens
- Installation instructions
- Links: README.md, USAGE.md
USAGE.md: 500 tokens
- How to use
- Links: README.md, API.md
API.md: 600 tokens
- API specification
- Links: README.md, USAGE.md
CONTRIBUTING.md: 200 tokens
- Contribution guide
- Links: README.md
# After: Integrated for Claude Code (1,800 tokens, 28% reduction)
CLAUDE.md: 1,800 tokens
- Overview (simplified)
- Installation instructions
- Basic usage
- Main APIs
- Contribution guide (key points only)
Detailed API spec: docs/api.md (reference only when needed)
# Reduction breakdown
- Cross-reference links: -300 tokens
- Duplicate overview explanations: -200 tokens
- Navigation sections: -150 tokens
- Verbose preambles: -50 tokens
Total reduction: -700 tokens (28%)
Important Notes:
- Keep human-facing documentation sites separated (prioritize usability)
- Consider integration for AI context files (CLAUDE.md, etc.)
- Choose based on purpose
12. Dynamic Optimization and Automatic Integration: Next-Generation Approach
Principle: Manage source files granularly, dynamically optimize and integrate when providing to AI[33][34][35].
“Granular management normally → auto-integrate as needed” is a new approach that goes beyond the binary choice of static integration vs separation.
Approach Overview
1
2
3
4
5
6
7
8
Source Management (Git, etc.) AI Delivery
───────────────────────────────── ─────────────
Granularly separated files → Dynamic optimization/integration
├── overview.md ↓
├── installation.md Auto-processing pipeline
├── api-auth.md ↓
├── api-users.md Integrated, optimized
└── deployment.md context
Key Technologies and Tools
1. GraphRAG (Knowledge Graph-based RAG)[33]
Traditional RAG retrieves similar text via vector search, but GraphRAG leverages knowledge graph relationship structures.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Traditional RAG
User question: "What's the error handling for user authentication?"
↓
Vector search: Retrieve only "authentication.md" (500 tokens)
↓
Problem: Error handling details are in separate file (error-handling.md)
# GraphRAG
User question: "What's the error handling for user authentication?"
↓
Knowledge graph search:
- authentication.md (500 tokens)
- Related: error-handling.md (300 tokens) ← Auto-detected by graph
- Related: api-response-codes.md (200 tokens) ← Auto-detected by graph
↓
Integrated context: 1,000 tokens (includes all related info)
GraphRAG Benefits[33]:
- Strong for multi-hop questions (“What’s the relationship between A and B?”, etc.)
- Automatically aggregates related entities
- Improved context relevance
2. Dynamic Context Switching[34]
Build context dynamically per request for LLM delivery.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Pseudocode: Dynamic context building
def build_context(user_query, document_pool):
# Step 1: Identify relevant documents
relevant_docs = semantic_search(user_query, document_pool)
# Step 2: Expand with knowledge graph
expanded_docs = knowledge_graph.expand(relevant_docs)
# Step 3: Optimize within token limit
optimized_context = pack_documents(
expanded_docs,
max_tokens=4000,
strategy="best_fit_packing" # Eliminate wasteful truncation
)
return optimized_context
# Example:
# User query A: Calendar related → calendar.md + user-prefs.md
# User query B: Auth related → auth.md + error-codes.md + api-docs.md
Benefits:
- Optimal context per query
- Eliminate unnecessary information (token reduction)
- High flexibility
3. llm-docs-builder (Auto-optimization Tool)[35]
A tool that automatically optimizes Markdown documentation for AI.
Features:
- HTML noise removal (navigation bars, footers, JavaScript, etc.)
- Token reduction: 85-95% reduction
- Automatic llms.txt generation
- Provides Markdown optimized for AI crawlers
Usage Example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# llm-docs-builder configuration
input_dir: ./docs
output_dir: ./docs-optimized
transformations:
- remove_frontmatter: true
- remove_html_comments: true
- remove_badges: true
- normalize_links: true
- optimize_headings: true
- add_hierarchical_context: true # Add heading context
# Result:
# Original HTML document: 5,000 tokens (including navigation, CSS, etc.)
# After optimization: 1,500 tokens (70% reduction)
Workflow Integration:
1
2
3
4
5
6
7
8
9
10
11
# Auto-optimize at build time
npm run build
→ llm-docs-builder transform
→ Generate AI-optimized docs (docs-ai/)
→ Maintain human-facing docs (docs/)
# Web server configuration
if user_agent == "AI-Crawler":
serve docs-ai/ # Optimized version
else:
serve docs/ # Normal version
Practical Strategies
Strategy 1: Hybrid Management
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Source Management (detailed separation)
├── authentication/
│ ├── overview.md
│ ├── oauth.md
│ ├── jwt.md
│ └── sessions.md
├── api/
│ ├── users.md
│ ├── posts.md
│ └── comments.md
└── errors/
├── codes.md
└── handling.md
↓ Dynamic integration at build time/request time
AI-focused integrated documents
├── authentication-full.md # authentication/* integrated
├── api-full.md # api/* integrated
└── errors-full.md # errors/* integrated
OR
RAG vector DB
└── Each file + knowledge graph maintains relationships
Strategy 2: On-demand Integration
1
2
3
4
5
6
7
8
9
10
# Dynamic processing at AI request time
1. Receive user query: "What's the error handling for OAuth authentication?"
2. Detect related files (GraphRAG):
- authentication/oauth.md
- errors/handling.md
- errors/codes.md (401, 403 related)
3. Dynamic integration:
# OAuth Authentication and Error Handling (Integrated)
## OAuth Authentication (from authentication/oauth.md) [content]
## Error Handling (from errors/handling.md) [OAuth-related parts only extracted]
## Related Error Codes (from errors/codes.md)
- 401 Unauthorized
- 403 Forbidden ```
- Provide to LLM (optimized) ```
Pros and Cons
✅ Pros:
- Source maintainability: Granular separation for easy management
- AI optimization: Automatic integration/optimization
- Flexibility: Dynamic optimization per query
- Token efficiency: Integrate only needed parts (up to 95% reduction possible)
- Human/AI compatibility: Optimize separately for humans and AI
❌ Cons:
- Complexity: Pipeline construction required
- Cost: Automation tool introduction/operation
- Initial investment: Setup takes time
- Overhead: Real-time integration increases processing time
Recommended Implementation Levels
| Project Scale | Recommended Approach | Reason |
|---|---|---|
| Small (<10 files) | Manual integration | Automation cost not justified |
| Medium (10-50 files) | Tools like llm-docs-builder | Significant automation benefits |
| Large (50+ files) | GraphRAG + dynamic integration | Complex relationship management required |
| Enterprise | Full pipeline | High ROI |
Summary: Choosing the Optimal Approach
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Decision Flowchart
Project scale?
├── Small (<1,000 tokens total)
│ → Manually integrate into single file
│
├── Medium (1,000-10,000 tokens)
│ → Auto-optimize with llm-docs-builder etc.
│ → If RAG system, keep separated
│
└── Large (10,000+ tokens)
├── Using RAG
│ → GraphRAG + dynamic integration
│
└── Integrated AI like Claude Code
→ Hierarchical CLAUDE.md + @import
Key Points:
- Source management: Always separate granularly (prioritize maintainability)
- AI delivery: Dynamically optimize based on use case
- Automation: Consider tool introduction based on scale
- Measurement: Quantitatively evaluate token reduction effects
Practical Example: CLAUDE.md Before/After
Before: Verbose CLAUDE.md (estimated 1,500 tokens)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Project Overview
This project is a web application with very convenient and powerful features.
This application is designed to be easy for users to use, and adopts a
modern technology stack.
## Tech Stack
In this project, we use the following technologies:
- Frontend: We use React 18
- Backend: We use Node.js 20 and Express 4
- Database: We use PostgreSQL 15
- Authentication: We use JWT (JSON Web Tokens)
## Directory Structure
The project directory structure is as follows:
- src/ directory: Contains source code
- components/ directory: Contains React components
- api/ directory: Contains API endpoints
- utils/ directory: Contains utility functions
- tests/ directory: Contains test files
- docs/ directory: Contains documentation
## Coding Standards
Please follow these coding standards for this project:
- Write all code in TypeScript
- Format code according to ESLint settings
- Add type definitions to all functions
- Add Props type definitions to all components
Estimated token count: ~1,500 tokens
After: Optimized CLAUDE.md (estimated 400 tokens)
1
2
3
4
5
6
7
8
9
10
11
12
13
# Project Overview
Modern web application (React + Node.js)
## Tech Stack
- Frontend: React 18, TypeScript 5
- Backend: Node.js 20, Express 4
- DB: PostgreSQL 15
- Auth: JWT
## Directory Structure
src/ components/ # React components api/ # API endpoints utils/ # Utility functions tests/ # Test files docs/ # Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
## Coding Standards
- TypeScript required (type definitions required)
- ESLint compliant
- Details: [CONTRIBUTING.md](./CONTRIBUTING.md)
## Guidance for Claude
### When generating code
- Prioritize type safety (`any` prohibited)
- Consider security best practices
- Generate test code as needed
### API implementation
- RESTful design
- Error handling required
- OpenAPI specification compliant
Estimated token count: ~400 tokens
Reduction rate: 73% reduction (1,500 → 400 tokens)
Optimization Points
- Reduce verbose explanations: “This project is…” → “Modern web application”
- Use list format: Prose → bullet points
- Use tree structure: Visually represent directory structure
- External references: Details in separate files
- Structured sections: Clear instructions with “Guidance for Claude”
Practical Example: Prompt Template Optimization
Before: Verbose Prompt (estimated 500 tokens)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
You are a professional software engineer.
Please review the code I provide and point out any issues.
Please pay particular attention to the following when reviewing:
1. Please check the code quality
2. Please check if there are any security issues
3. Please check if there are any performance issues
4. Please check if it follows best practices
5. Please check if tests are sufficiently written
Please output the review results in the following format:
- First, state the overall evaluation
- Next, list the issues
- Finally, state improvement suggestions
After: Optimized Prompt (estimated 150 tokens)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
## Role
Senior software engineer
## Task
Code review
## Focus Areas
- Code quality
- Security vulnerabilities
- Performance bottlenecks
- Best practices
- Test coverage
## Output Format
1. Overall assessment
2. Issues (prioritized)
3. Recommendations
Reduction rate: 70% reduction (500 → 150 tokens)
Optimization Techniques
- Markdown structuring:
##headings for clear sections - Bullet points: “Please check…” → list items
- Remove verbose conjunctions: “First of all” → “1.”
- English keywords: Technical terms may be more token-efficient in English
How to Have AI Write Context-Efficient Documentation
1. Give Clear Instructions
Include instructions like the following when having AI generate documentation:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
## Instructions Example
Please create README.md with the following requirements:
### Requirements
- Prioritize token efficiency
- Avoid verbose expressions
- Use bullet points and code blocks
- Separate detailed explanations to external files (docs/)
- Target token count: Under 500 tokens
### Sections to Include
- Project overview (2-3 sentences)
- Quick start (installation and basic usage only)
- Directory structure (tree format)
- Development guide (external reference)
### Do NOT Include
- Verbose preambles ("This project is...", etc.)
- Detailed API descriptions (separate to docs/api.md)
- Full license text (LICENSE reference is sufficient)
2. Provide Templates
Present the desired structure as a template for AI:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
## Template Example
Please generate documentation following this template:
\`\`\`markdown
# [Project Name]
[Single sentence description]
## Quick Start
### Installation
[Single command]
### Basic Usage
[Minimal code example]
## Directory Structure
[Tree format, with comments]
## Development
Details: [CONTRIBUTING.md](./CONTRIBUTING.md)
## License
[LICENSE](./LICENSE)
\`\`\`
3. Explicitly Constrain Token Count
1
2
3
4
5
6
7
8
## Instructions Example
Please generate CLAUDE.md.
### Constraints
- Maximum token count: 300 tokens
- If exceeded, remove lower priority information
- Compensate removed info with references to separate files (docs/project-details.md)
4. Build Feedback Loops
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
## Prompt Example
Please reduce the token count of the following documentation:
[Document content]
### Goal
- Reduce to 50% of current token count
- Maintain information importance
- Explain reductions made and reasons
### Optimization Methods
1. Reduce verbose expressions
2. Convert to bullet points
3. Separate to external references
4. Simplify code examples
5. Practical Prompt Examples
Save prompt templates for use in Claude Code in .claude/commands/:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<!-- .claude/commands/optimize-docs.md -->
Please optimize the following documentation for token efficiency:
### Optimization Criteria
- Minimize token count while maintaining information density
- Markdown structuring (headings, lists, code blocks)
- Remove verbose expressions
- Separate details to external files with references
### Output
1. Optimized documentation
2. Token reduction rate
3. Explanation of main changes
---
[Paste document here]
Usage:
1
2
# In Claude Code
/optimize-docs
2024-2025 Trends: AI-Native Documentation
llms.txt Standard
The llms.txt standard proposed by Jeremy Howard, co-founder of Answer.AI, in September 2024 is rapidly gaining adoption[15][16][17].
Overview:
- Placed in Markdown format in website root directory
- Provides structured information to LLM crawlers, similar to
robots.txt - Adopted by thousands of sites hosted by Anthropic, Cursor, and Mintlify
File Structure:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
<!-- https://example.com/llms.txt -->
# Project Name
> Brief project summary (1-2 sentences)
## Documentation
- [Getting Started](https://example.com/docs/getting-started)
- [API Reference](https://example.com/docs/api)
- [Examples](https://example.com/docs/examples)
## Optional: Full Documentation
See [llms-full.txt](https://example.com/llms-full.txt) for complete documentation.
Anthropic Implementation Examples:
- https://docs.anthropic.com/llms.txt
- https://docs.anthropic.com/llms-full.txt
llm-docs-builder Tool
Tools for automatically optimizing documentation have emerged[18]:
Features:
- 85-95% noise reduction from HTML documents
- Convert to Markdown
- Automatically generate
llms.txtindex - Provide documentation optimized for AI crawlers
CLAUDE.md Best Practices (2025)
Anthropic official best practices[19][20]:
- Conciseness: Recommend keeping it concise
- Hierarchical structure: Hierarchical context with nested CLAUDE.md
- Prompt templates: Save reusable prompts in
.claude/commands/ - Git management: Share CLAUDE.md with team
Recommended Sections:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# CLAUDE.md
## Project Overview
[2-3 sentences]
## Tech Stack
[List format]
## Directory Structure
[Tree format, important directories only]
## Coding Standards
[Key points only, details in external reference]
## Guidance for Claude
[Specific instructions for code generation]
## External Resources
- [Detailed Architecture](./docs/architecture.md)
- [API Specification](./docs/api.md)
- [Contribution Guide](./CONTRIBUTING.md)
Token Reduction Examples
Actual project token reduction effects have been reported[21]:
| Use Case | Before | After | Reduction Rate |
|---|---|---|---|
| Vercel deploy monitoring | 10,100 tokens | 300-500 tokens | 95-97% |
| Render log analysis | Data dependent | Concise summary | 98% |
| Supabase project filtering | Large JSON | Target projects only | 97% |
Reduction Methods:
- JSON filtering with jq
- Removal of unnecessary metadata
- Structured summary generation
Summary
Key points for writing Markdown documentation that AI can read efficiently:
Writing Principles
- Clarify hierarchical structure: Use appropriate heading levels
- Eliminate redundancy: Reduce modifiers, duplicate expressions (30-50% reduction possible)
- Use bullet points: List format over prose
- Use external references: Separate details to other files
- Minimize code examples: Only minimum necessary examples
- Optimize tables: Concise column names, consider external references
- Directory structures: Avoid tree line characters, use indentation or list format (40-67% reduction possible)
- Moderate text formatting: Use only formatting with semantic meaning, avoid excessive emphasis
- Simplify key-value pairs:
**Title**:→### TitleorTitle:(33-42% reduction possible) - Apply SSOT principle: Define information in one place only, minimize related file references (60-70% reduction possible)
- File integration vs separation: Choose based on AI system
- Claude Code/Custom Instructions: Integration recommended (28-33% reduction possible)
- RAG systems: Separation recommended (up to 90% reduction possible)
- Dynamic optimization and automatic integration (next-generation approach):
- Manage source files granularly
- Dynamically integrate/optimize for AI delivery (GraphRAG, llm-docs-builder, etc.)
- Choose implementation level based on project scale
Instructions When Having AI Write Documentation
- Constrain token count: Explicitly state “under 300 tokens maximum”
- Provide templates: Show desired structure as example
- Present optimization criteria: “Avoid verbose expressions”, “Prioritize bullet points”, etc.
- Feedback loops: Measure token count, iteratively optimize
2024-2025 Trends
- llms.txt standard: New standard for AI-native documentation
- Auto-optimization tools: 85-95% noise reduction possible
- Hierarchical CLAUDE.md: Context management according to project structure
Practical Effects
- Clean Markdown: 35% RAG search accuracy improvement, 20-30% token reduction[2]
- CLAUDE.md optimization: 70-95% token reduction cases[21]
- Cost reduction: Avoid 2x charges when exceeding 200K tokens[4]
By utilizing these techniques, you can significantly reduce context size and costs while maintaining AI response quality.
Caveats:
The token reduction rates shown in this article are based on calculation examples or reported cases under specific conditions. Actual reduction effects may vary depending on project structure, documentation content, and AI system used. Measuring effects in your own project before adoption is recommended.
References
References corresponding to citation numbers [1]-[35] in the main text are listed in numerical order.
Boosting AI Performance: The Power of LLM-Friendly Content in Markdown - Webex Developers Blog https://developer.webex.com/blog/boosting-ai-performance-the-power-of-llm-friendly-content-in-markdown [Reliability: High] Explains general benefits of LLM-friendly Markdown
Why Your LLM Needs Clean Markdown: A Deep Dive - AnythingMD https://anythingmd.com/blog/why-llms-need-clean-markdown [Reliability: Medium-High] Data on 35% RAG accuracy improvement, 20-30% token reduction
Why Markdown is the best format for LLMs - Wetrocloud, Medium (2024) https://medium.com/@wetrocloud/why-markdown-is-the-best-format-for-llms-aa0514a409a7 [Reliability: Medium]
Context windows - Claude Docs - Anthropic Official Documentation https://docs.claude.com/en/docs/build-with-claude/context-windows [Reliability: High] Context windows and premium pricing
ChatGPT Context Window and Token Limit - 16x Prompt (2024) https://prompt.16x.engineer/blog/chatgpt-context-window-token-limit [Reliability: Medium-High]
Markdown Prompting In AI Prompt Engineering Explained - Applied AI Tools https://appliedai.tools/prompt-engineering/markdown-prompting-in-ai-prompt-engineering-explained-examples-tips/ [Reliability: Medium-High]
Let’s Build the GPT Tokenizer: A Complete Guide to Tokenization in LLMs - fast.ai (2024) https://www.fast.ai/posts/2025-10-16-karpathy-tokenizers.html [Reliability: High] Explanation by Andrej Karpathy
Complete Guide to LLM Tokenization - LLM Calculator (2024) https://llm-calculator.com/blog/complete-guide-to-tokenization/ [Reliability: Medium-High]
Cutting Cost and Enhancing Performance: Minifying Markdown Tables - Budi Syahiddin, Government Digital Products Singapore (2024) https://medium.com/singapore-gds/cutting-cost-and-enhancing-performance-minifying-markdown-tables-to-improve-token-efficiency-in-af488a784fd5 [Reliability: High]
How to Optimize Token Efficiency When Prompting - Portkey.ai https://portkey.ai/blog/optimize-token-efficiency-in-prompts/ [Reliability: Medium-High]
LLM prompt optimization: Reducing tokens usage - Saulius Šaulys, Medium (2024) https://medium.com/@sauliusaulys/llm-prompt-optimization-reducing-tokens-usage-343f5de178a5 [Reliability: Medium] 30-50% reduction data
Token optimization: The backbone of effective prompt engineering - IBM Developer https://developer.ibm.com/articles/awb-token-optimization-backbone-of-effective-prompt-engineering/ [Reliability: High]
Claude Code Best Practices - Anthropic Official https://www.anthropic.com/engineering/claude-code-best-practices [Reliability: High]
Cutting Cost and Enhancing Performance: Minifying Markdown Tables to Improve Token Efficiency in RAG - Government Digital Products Singapore (2024) https://medium.com/singapore-gds/cutting-cost-and-enhancing-performance-minifying-markdown-tables-to-improve-token-efficiency-in-af488a784fd5 [Reliability: High]
What is llms.txt? Breaking down the skepticism - Mintlify Blog (2024) https://www.mintlify.com/blog/what-is-llms-txt [Reliability: High]
LLMs.txt Explained - TDS Archive, Medium (2024) https://medium.com/data-science/llms-txt-explained-414d5121bcb3 [Reliability: Medium-High]
Simplifying docs for AI with /llms.txt - Mintlify Blog (2024) https://www.mintlify.com/blog/simplifying-docs-with-llms-txt [Reliability: High]
Announcing llm-docs-builder: An Open Source Tool for Making Documentation AI-Friendly - Maciej Mensfeld (2025) https://mensfeld.pl/2025/10/llm-docs-builder/ [Reliability: Medium-High] 85-95% noise reduction data
Claude Code Best Practices - Anthropic Official (2025) https://www.anthropic.com/engineering/claude-code-best-practices [Reliability: High]
My 7 essential Claude Code best practices for production-ready AI in 2025 - eesel AI (2025) https://www.eesel.ai/blog/claude-code-best-practices [Reliability: Medium-High]
Optimizing Token Efficiency in Claude Code Workflows - Pierre-Emmanuel Féga, Medium (2025) https://medium.com/@pierreyohann16/optimizing-token-efficiency-in-claude-code-workflows-managing-large-model-context-protocol-f41eafdab423 [Reliability: Medium] 95-98% reduction examples
Marking Up the Prompt: How Markdown Formatting Influences LLM Responses - Neural Buddies (2024) https://www.neuralbuddies.com/p/marking-up-the-prompt-how-markdown-formatting-influences-llm-responses [Reliability: Medium-High] Analysis of Markdown formatting’s influence on LLM responses
Markdown Best Practices for Technical Writers - Markdown Toolbox https://www.markdowntoolbox.com/blog/markdown-best-practices-for-technical-writers/ [Reliability: Medium-High] Best practices for avoiding excessive formatting
A Guide to Markdown Styles in LLM Responses - DreamDrafts, Medium (2024) https://medium.com/@sketch.paintings/a-guide-to-markdown-styles-in-llm-responses-ed9a6e869cf4 [Reliability: Medium] Effective use of Markdown styles
Markdown is 15% more token efficient than JSON - OpenAI Developer Community (2024) https://community.openai.com/t/markdown-is-15-more-token-efficient-than-json/841742 [Reliability: High] Token efficiency comparison with measured data
How to list key/value pairs in a markdown - Stack Overflow https://stackoverflow.com/questions/28429750/how-to-list-key-value-pairs-in-a-markdown [Reliability: Medium-High] Practical discussion of key-value pair expression in Markdown
Single source of truth - Wikipedia https://en.wikipedia.org/wiki/Single_source_of_truth [Reliability: High] Definition and background of SSOT principle
About the Single Source of Truth (SSOT) and Don’t Repeat Yourself (DRY) principles - Webel IT Australia https://www.webel.com.au/node/889 [Reliability: Medium-High] Explanation of relationship between SSOT and DRY principles
Cross-references and linking - Google developer documentation style guide - Google for Developers https://developers.google.com/style/cross-references [Reliability: High] Best practices for cross-references (link text simplification, cognitive load reduction, etc.)
Breaking the LLM’s Token Limit: Introducing the Modular AI Systems Architecture - Amir Ghasemi, Medium (2024) https://medium.com/@amir.ghm/breaking-the-llms-16k-token-limit-introducing-the-modular-ai-systems-architecture-5a23b37139ac [Reliability: Medium] Overcoming token limits with modular AI systems architecture
Chunking for RAG: best practices - Unstructured (2024) https://unstructured.io/blog/chunking-for-rag-best-practices [Reliability: High] Chunking strategies for RAG systems (semantic, fixed-size, document-based, etc.)
Claude Code Best Practices - Anthropic Official (2025) https://www.anthropic.com/engineering/claude-code-best-practices [Reliability: High] CLAUDE.md hierarchical structure, @import syntax, etc.
Graph Retrieval-Augmented Generation: A Survey - arXiv (2024) https://arxiv.org/abs/2408.08921 [Reliability: Medium-High] Comprehensive survey paper on GraphRAG. Methods using knowledge graphs for RAG, effectiveness for multi-hop questions, etc. Note: arXiv paper (pre-print before peer review). Cited for technical overview of GraphRAG
Level Up Your LLMs: Dynamic Context Switching for Smarter, Faster Inference - Yair Stern, Medium (2024) https://medium.com/@yairms.il/level-up-your-llms-dynamic-context-switching-for-smarter-faster-inference-4986a49269d1 [Reliability: Medium] Optimizing LLM inference through dynamic context switching
Announcing llm-docs-builder: An Open Source Tool for Making Documentation AI-Friendly - Maciej Mensfeld (2025) https://mensfeld.pl/2025/10/llm-docs-builder/ [Reliability: Medium-High] 85-95% token reduction, automatic llms.txt generation, HTML noise removal, etc.
About Medium Articles:
Medium articles ([11][21][24][30][34]) are cited as author case studies, but rated [Reliability: Medium] as they haven’t gone through peer review. For token reduction rates and optimization methods shown in this article, measuring effects in your own environment before adoption is recommended.
Other References (Not Numbered in Text)
Resources consulted during article creation but not directly cited in the text.
Cognitive Load Theory: Methods to Manage Working Memory Load - Fred Paas, Jeroen J. G. van Merriënboer (2020) https://journals.sagepub.com/doi/10.1177/0963721420922183 [Reliability: High] Academic paper on cognitive load theory
Creating the information architecture for your documentation - KnowledgeOwl Blog https://blog.knowledgeowl.com/blog/posts/information-architecture/ [Reliability: Medium-High]
awesome-claude-code: A curated list - GitHub https://github.com/hesreallyhim/awesome-claude-code [Reliability: Medium] Community resource
Notes:
On Citation Accuracy: The information cited in this article has been verified through the following methods:
- Confirmation of official documentation (Anthropic, OpenAI, etc.)
- Cross-verification through multiple independent sources (technical blogs, specialist media)
- Priority given to 2024-2025 latest information
Some technical blogs and Medium articles have been cited after confirming author expertise and data backing, but are rated “Medium” or “Medium-High” reliability compared to official documentation and academic papers.