The True Value of AI: Multidimensional Assessment Beyond Time Savings

Posted Nov 21, 2025

12 min read

AI-Generated Content

This article was generated by AI. The accuracy of the content is not guaranteed, and we accept no responsibility for any damages resulting from use of this article. By continuing to read, you agree to the Terms of Use.

Target Audience: Software Engineers, DevOps Engineers, Development Managers
Prerequisites: Basic experience with AI development tools (GitHub Copilot, Cursor, ChatGPT, etc.)
Reading Time: 18 minutes

Overview

“50% time reduction thanks to AI!” “Productivity doubled!”—These metrics are frequently cited as representative measures of AI adoption effectiveness. However, multiple peer-reviewed studies from 2024-2025 reveal significant limitations in measuring AI’s value by time savings alone.

This article examines the multidimensional value AI provides—quality improvement, creativity changes, learning effects, and system-wide impacts—based on evidence from Harvard Business School and Boston Consulting Group’s 700-person experiment¹, creativity research published in Science Advances², and the 2024 DORA report³. We then propose a comprehensive evaluation framework that IT engineers can implement in practice.

1. The Prevalence and Limitations of Time-Saving Metrics

1.1 Why Time Savings Gets Chosen

Time savings is the most easily measured and compelling metric. The reasons are clear:

Ease of measurement: Task completion time can be measured with a stopwatch
Intuitive understanding: Anyone understands “30 minutes became 15 minutes”
Financial convertibility: Hourly rate × time saved = cost reduction, calculated instantly
Immediate visibility: Effects can be shown right after adoption

According to multiple reports, many engineering leaders cite “lack of clear measurement metrics” as a major challenge⁴. In this situation, time savings functions as the most convenient “success metric.”

1.2 However, Time Savings Alone Is Insufficient

The problem is that time savings captures only one aspect of AI’s impact. As GitLab points out, “simple metrics like lines of code or AI suggestion acceptance rate cannot capture downstream costs”⁵.

The Harvard/BCG large-scale study demonstrated that AI’s effects vary dramatically by task nature¹. In experiments with over 700 consultants, tasks within AI’s capability boundary showed 38-42.5% productivity improvement, while tasks outside the boundary showed 13-24 percentage point performance decline.

This “jagged technological frontier” demonstrates the danger of measuring time alone. Tasks completing faster doesn’t mean they’re being done with appropriate quality, addressing appropriate problems.

2. Value That Time Savings Can’t Capture

2.1 Output Quality Improvement

The most important finding from the Harvard/BCG study was significant quality improvement through AI¹.

Study Details:

Sample size: 758 consultants
Institutions: Harvard Business School, MIT, Wharton School
Published: 2023, Working Paper

Key Findings:

Groups using AI created deliverables 40% higher quality compared to control groups
Improvement was particularly notable for lower-skill participants (43% improvement)
Quality improvement was also observed in top performers, but improvement margin was smaller

This quality improvement holds fundamentally different value from simply “finishing faster.” In knowledge work, high-quality output means:

Downstream cost reduction: Reduced review and revision effort
Decision quality: Judgments based on better insights
Customer satisfaction: Value delivery to end users

However, note that this quality improvement is not guaranteed for all tasks. For tasks outside AI’s capability boundary, performance may decline significantly¹.

2.2 Complex Impact on Creativity

Research published in Science Advances in July 2024 revealed the complexity of AI’s impact on creativity².

Study Details:

Authors: Oliver Hauser (University of Exeter, UK), Anil Doshi (University College London)
Sample size: Approximately 300 non-professional writers, 600 evaluators
DOI: 10.1126/sciadv.adn5290
Published: July 12, 2024, Science Advances [Peer-reviewed]

Key Findings:

Individual level: Writers with access to 5 AI ideas produced works approximately 8% more original and 9% more useful
Especially low-skill participants: Those with lowest creativity benefited most
Collective level: However, overall diversity decreased

The “creativity paradox” this study reveals is important. Even if individual engineers can write better code using AI, there’s risk of the whole team converging on similar approaches.

Implications for IT Engineers:

Watch for homogenization of “AI-like” patterns in code reviews
Use multiple AI tools to maintain diversity
Consciously encourage different approaches within teams

2.3 Learning Effects and Skill Development

Conflicting research results have been reported regarding AI assistance’s impact on learning.

Positive Evidence (February 2025):

A pre-registered study published on arXiv⁶ suggested AI may promote learning.

Authors: Benjamin Lira, Todd Rogers, Daniel G. Goldstein, Lyle Ungar, Angela L. Duckworth
Sample size: 2,238 in Study 2, 2,003 in Study 3
Published: February 5, 2025, arXiv [Preprint]

Key Findings:

Participants who practiced with AI tools produced higher quality writing than those who practiced without tools (effect size d=0.40)
This effect persisted in tests one day later
Achieved learning outcomes improvement while reducing keystrokes by 26%

Negative Concerns (July 2024):

Meanwhile, a theoretical perspective paper published in Cognitive Research: Principles and Implications⁷ pointed out AI assistance may accelerate skill decline.

Authors: Brooke N. Macnamara et al. (Case Western Reserve University, etc.)
DOI: 10.1186/s41235-024-00572-8
Published: July 12, 2024 [Peer-reviewed, theoretical paper]

Key Concerns:

While task performance is maintained with AI assistance, independent cognitive abilities may decline
Risk that users won’t notice this decline (illusory sense of competence)
Impact may be more serious because AI mimics cognitive skills rather than just automating

Note: This paper is a theoretical perspective and doesn’t contain direct empirical data.

Current Conclusion:

AI’s impact on learning depends on usage
AI as a “learning from examples” model may promote learning
Passive dependency risks damaging cognitive abilities

2.4 System-Wide Impact: DORA Report Insights

The 2024 DORA (DevOps Research and Assessment) report³ revealed the complex impact of AI adoption across the entire software development process.

Key Findings:

Productivity Metrics Improvement:

With 25% increase in AI adoption:
- Overall productivity: +2.1%
- Code review speed: +3.1%
- Code quality: +3.4%
- Documentation quality: +7.5%
- Developer satisfaction: +2.2%
- Flow state: +2.6%

However, Delivery Metrics Decline:

Delivery throughput: -1.5%
Delivery stability: -7.2%

This seemingly contradictory result is analyzed as not a quality-speed tradeoff but caused by temptation toward large-scale code changes³⁸. Because AI can quickly generate code, developers tend to abandon small-batch principles (core of high-performing delivery) and make larger, riskier changes.

Key Insight: AI Is an Amplifier

The most important insight from the DORA report is that AI amplifies an organization’s existing strengths and weaknesses³. Organizations with mature development processes are further strengthened by AI, while immature organizations see problems amplified.

3. Understanding Tradeoffs and Complexity

3.1 Speed vs. Stability

As the DORA report showed, AI tools accelerate individual work while potentially compromising overall system stability³. This demonstrates the danger of pursuing time-saving metrics alone.

Practical Implications:

Consciously keep Pull Request sizes small
Strengthen review processes for AI-generated code
Continuously monitor Change Failure Rate

3.2 Individual vs. Collective

The “individual improvement vs. collective diversity reduction” revealed by creativity research² suggests the importance of team-level evaluation.

Practical Implications:

Monitor pattern diversity across the codebase
Code review culture that encourages different approaches
Avoid uniform use of AI tools

3.3 Short-term vs. Long-term

As learning effect studies⁶⁷ show, the impact of short-term efficiency gains on long-term skill development is complex.

Practical Implications:

Set aside regular time for coding without AI
Consider gradual AI introduction for junior engineers
Incorporate appropriate AI usage into mentoring processes

4. Proposing a Multidimensional Evaluation Framework

4.1 Four-Dimensional Evaluation Model

Comprehensive AI value evaluation beyond time savings requires the following four dimensions:

Dimension 1: Efficiency

Time savings is included here but is not the only metric.

Metrics:

Cycle time reduction rate
Review time reduction rate
Task completion speed

Tools:

GitLab Value Stream Analytics
GitHub Insights
Jira cycle time analysis

Dimension 2: Quality

Evaluate output quality from multiple angles.

Metrics:

Revision request rate in code reviews
Bug density (production environment)
Documentation quality score
User satisfaction

Tools:

SonarQube (code quality)
DORA metrics (Change Failure Rate)
Customer feedback systems

Dimension 3: Creativity & Diversity

Evaluate individual and team-wide creativity.

Metrics:

Architecture pattern diversity
Range of problem-solving approaches
Innovation experiment frequency
Technical debt management status

Measurement Methods:

Codebase analysis (pattern detection)
Qualitative retrospectives
Technical decision recording and review

Dimension 4: Learning & Growth

Track long-term skill development of individuals and teams.

Metrics:

New technology adoption speed
Ability to perform tasks without AI assistance
Mentoring effectiveness (junior growth rate)
Technical autonomy improvement

Measurement Methods:

Regular skill assessments (with and without AI)
Self-assessment in 1-on-1s
Observation during pair programming

4.2 Implementation Steps

Step 1: Baseline Measurement (1-2 weeks) Collect baseline data for all four dimensions before AI adoption or in current state.

Step 2: Combine Quantitative + Qualitative (Continuous)

Quantitative data: CI/CD pipelines, GitHub/GitLab metrics
Qualitative data: Retrospectives, developer interviews

As GitLab recommends⁵, “only by combining quantitative data with developers’ qualitative feedback can you get an accurate complete picture of productivity improvement.”

Step 3: Link to Business Outcomes (Monthly) Ultimately what matters is not technical metrics but business outcomes⁵.

Deploy frequency → Time to market
Production defect count → Customer satisfaction
Innovation experiment count → New feature success rate

Step 4: Continuous Adjustment (Quarterly) Continuously review AI usage, processes, and evaluation metrics themselves.

4.3 Implementation Example: Custom Instructions Optimization

In AI tools like ChatGPT, Cursor, and Claude, you can improve output quality by configuring Custom Instructions. Here’s an example configuration considering quality, learning, and creativity beyond just time savings:

  
# Custom Instructions (For Python Developer)

## What I want AI to know about me
I'm a Python developer, mainly handling backend API development.
- Frameworks: FastAPI, Django
- Database: PostgreSQL
- Infrastructure: Docker, Kubernetes
- Coding standards: PEP 8, type hints required
- Test framework: pytest

## Response approach

### Quality-focused
- Code must include type hints and docstrings
- Consider security best practices (OWASP Top 10)
- Mention performance considerations if relevant
- Provide test code when appropriate

### Learning promotion
- Ask questions that give me time to think before presenting solutions
- Include "why this approach?" explanations
- Show alternatives and tradeoffs

### Diversity maintenance
- When multiple implementations exist, show alternatives beyond just the most common
- Consider new approaches without being fixed on project-specific patterns

Expected Effects:

Efficiency: Task completion speed improvement
Quality: Type-safe, secure code generation
Creativity: Diverse approaches through alternative suggestions
Learning: Maintain thought processes through “thinking questions”

Notes:

Project-specific naming conventions need separate specification
Recommended methods may differ by framework version
Regularly review settings and adjust as project evolves

5. Summary

5.1 Key Conclusions

The clear conclusion from 2024-2025 research is that measuring AI’s value by time savings alone misses important impacts.

Evidence Summary:

Harvard/BCG study: AI brings 40%+ quality improvement¹
Science Advances study: Individual creativity improvement (8-9%) and collective diversity reduction occur simultaneously²
Learning effect studies: Appropriate use promotes learning (d=0.40), inappropriate use risks skill decline⁶⁷
DORA report: Code quality improvement (+3.4%) and delivery stability decline (-7.2%) occur simultaneously³

5.2 Practical Recommendations

1. Introduce Multidimensional Measurement

Evaluate efficiency, quality, creativity, and learning dimensions holistically
Combine quantitative data with qualitative feedback

2. Focus on Business Outcomes

Link technical metrics to ultimate business value
Define organization-specific success metrics based on DORA metrics

3. Understand AI’s “Amplifier” Characteristic

AI amplifies organizational strengths and weaknesses³
AI’s true value is realized only with mature processes

4. Continuous Learning and Adjustment

Continuously improve AI usage and evaluation methods
Share insights within teams and evolve best practices

5.3 Final Thoughts

Time savings is an important value that should not be ignored. However, it is only the beginning of the value AI provides.

Quality improvement, creativity changes, learning impacts, system-wide effects—by comprehensively evaluating all of these, AI’s true value becomes visible. And to maximize that value, organizations must continue evolving their processes, culture, and measurement methods.

AI is not just a tool, but an opportunity to reconsider the essence of knowledge work. Only organizations that sincerely engage with that question will enjoy the true benefits of the AI era.

References

Reference materials corresponding to in-text citation numbers, listed in order.

Additional References (Not Numbered in Text)

Materials referenced during article creation but not directly cited in text.

InfoQ. (2025, September). DORA Report Finds AI Is an Amplifier in Software Development, But Trust Remains Low. https://www.infoq.com/news/2025/09/dora-state-of-ai-in-dev-2025/ [Reliability: Medium-High - Tech news media]
St. Louis Fed. (2025, February). The Impact of Generative AI on Work Productivity. https://www.stlouisfed.org/on-the-economy/2025/feb/impact-generative-ai-work-productivity [Reliability: High - Federal Reserve Bank economic analysis]
NPR. (2024, July 12). Research shows AI can boost creativity for some, but at a cost. https://www.npr.org/2024/07/12/nx-s1-5033988/research-ai-chatbots-creativity-writing [Reliability: Medium-High - Reputable media, research coverage]

About Citation Accuracy: Research cited in this article was verified using the following methods:

Confirmation in academic databases (Google Scholar, arXiv, PubMed, etc.)
Verification of paper information on official journal websites
Cross-verification through multiple independent sources (academic media, official announcements from research institutions, etc.)

For some papers, direct access to full-text PDFs may be restricted, but abstracts, DOIs, author information, and key findings have been confirmed through official academic databases and reliable secondary sources.

Dell’Acqua, F., McFowland, E., Mollick, E. R., Lifshitz-Assaf, H., Kellogg, K., Rajendran, S., Krayer, L., Candelon, F., & Lakhani, K. R. (2023). Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality. Harvard Business School Working Paper. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321 [Reliability: High - Large-scale field experiment, prestigious research institutions] ↩︎ ↩︎² ↩︎³ ↩︎⁴ ↩︎⁵
Hauser, O., & Doshi, A. (2024). Generative AI enhances individual creativity but reduces the collective diversity of novel content. Science Advances, 10(28). https://doi.org/10.1126/sciadv.adn5290 [Reliability: High - Peer-reviewed, premier academic journal] ↩︎ ↩︎² ↩︎³ ↩︎⁴
DORA (DevOps Research and Assessment). (2024). 2024 State of DevOps Report. Google Cloud. https://dora.dev/ [Reliability: High - Industry standard report, large-scale data] ↩︎ ↩︎² ↩︎³ ↩︎⁴ ↩︎⁵ ↩︎⁶ ↩︎⁷
Paradiso Solutions. (2025). How to Measure AI Productivity Gains in 2025: Key Metrics That Matter. https://www.paradisosolutions.com/blog/measure-ai-productivity-gains-metrics/ [Reliability: Medium - Industry report citation] ↩︎
GitLab. (2024, February 20). Measuring AI effectiveness beyond developer productivity metrics. https://about.gitlab.com/blog/2024/02/20/measuring-ai-effectiveness-beyond-developer-productivity-metrics/ [Reliability: Medium-High - Industry leader tech blog] ↩︎ ↩︎² ↩︎³
Lira, B., Rogers, T., Goldstein, D. G., Ungar, L., & Duckworth, A. L. (2025, February 5). Learning from examples: AI assistance can enhance rather than hinder skill development. arXiv preprint. https://arxiv.org/html/2502.02880v1 [Reliability: Medium-High - Preprint, pre-registered study] ↩︎ ↩︎² ↩︎³
Macnamara, B. N., Berber, I., Çavuşoğlu, M. C., Krupinski, E. A., Nallapareddy, N., Nelson, N. E., Smith, P. J., Wilson-Delfosse, A. L., & Ray, S. (2024). Does using artificial intelligence assistance accelerate skill decay and hinder skill development without performers’ awareness? Cognitive Research: Principles and Implications, 9(1), 46. https://doi.org/10.1186/s41235-024-00572-8 [Reliability: Medium-High - Peer-reviewed, theoretical perspective paper] ↩︎ ↩︎² ↩︎³
Medium. (2024). AI Dev: The 2024 DORA Report Reviewed. https://medium.com/@julian.burns50/ai-dev-the-2024-dora-report-reviewed-efbcbecc3202 [Reliability: Medium - DORA report analysis article] ↩︎

AI, Productivity

AI Productivity Quality Creativity Learning Metrics DORA Knowledge-Work

This post is licensed under CC BY 4.0 by the author.

Overview

1. The Prevalence and Limitations of Time-Saving Metrics

1.1 Why Time Savings Gets Chosen

1.2 However, Time Savings Alone Is Insufficient

2. Value That Time Savings Can’t Capture

2.1 Output Quality Improvement

2.2 Complex Impact on Creativity

2.3 Learning Effects and Skill Development

2.4 System-Wide Impact: DORA Report Insights

3. Understanding Tradeoffs and Complexity

3.1 Speed vs. Stability

3.2 Individual vs. Collective

3.3 Short-term vs. Long-term

4. Proposing a Multidimensional Evaluation Framework

4.1 Four-Dimensional Evaluation Model

Dimension 1: Efficiency

Dimension 2: Quality

Dimension 3: Creativity & Diversity

Dimension 4: Learning & Growth

4.2 Implementation Steps

4.3 Implementation Example: Custom Instructions Optimization

5. Summary

5.1 Key Conclusions

5.2 Practical Recommendations

5.3 Final Thoughts

References

Additional References (Not Numbered in Text)

Trending Tags