The True Value of AI: Multidimensional Assessment Beyond Time Savings
This article was generated by AI. The accuracy of the content is not guaranteed, and we accept no responsibility for any damages resulting from use of this article. By continuing to read, you agree to the Terms of Use.
- Target Audience: Software Engineers, DevOps Engineers, Development Managers
- Prerequisites: Basic experience with AI development tools (GitHub Copilot, Cursor, ChatGPT, etc.)
- Reading Time: 18 minutes
Overview
“50% time reduction thanks to AI!” “Productivity doubled!”—These metrics are frequently cited as representative measures of AI adoption effectiveness. However, multiple peer-reviewed studies from 2024-2025 reveal significant limitations in measuring AI’s value by time savings alone.
This article examines the multidimensional value AI provides—quality improvement, creativity changes, learning effects, and system-wide impacts—based on evidence from Harvard Business School and Boston Consulting Group’s 700-person experiment1, creativity research published in Science Advances2, and the 2024 DORA report3. We then propose a comprehensive evaluation framework that IT engineers can implement in practice.
1. The Prevalence and Limitations of Time-Saving Metrics
1.1 Why Time Savings Gets Chosen
Time savings is the most easily measured and compelling metric. The reasons are clear:
- Ease of measurement: Task completion time can be measured with a stopwatch
- Intuitive understanding: Anyone understands “30 minutes became 15 minutes”
- Financial convertibility: Hourly rate × time saved = cost reduction, calculated instantly
- Immediate visibility: Effects can be shown right after adoption
According to multiple reports, many engineering leaders cite “lack of clear measurement metrics” as a major challenge4. In this situation, time savings functions as the most convenient “success metric.”
1.2 However, Time Savings Alone Is Insufficient
The problem is that time savings captures only one aspect of AI’s impact. As GitLab points out, “simple metrics like lines of code or AI suggestion acceptance rate cannot capture downstream costs”5.
The Harvard/BCG large-scale study demonstrated that AI’s effects vary dramatically by task nature1. In experiments with over 700 consultants, tasks within AI’s capability boundary showed 38-42.5% productivity improvement, while tasks outside the boundary showed 13-24 percentage point performance decline.
This “jagged technological frontier” demonstrates the danger of measuring time alone. Tasks completing faster doesn’t mean they’re being done with appropriate quality, addressing appropriate problems.
2. Value That Time Savings Can’t Capture
2.1 Output Quality Improvement
The most important finding from the Harvard/BCG study was significant quality improvement through AI1.
Study Details:
- Sample size: 758 consultants
- Institutions: Harvard Business School, MIT, Wharton School
- Published: 2023, Working Paper
Key Findings:
- Groups using AI created deliverables 40% higher quality compared to control groups
- Improvement was particularly notable for lower-skill participants (43% improvement)
- Quality improvement was also observed in top performers, but improvement margin was smaller
This quality improvement holds fundamentally different value from simply “finishing faster.” In knowledge work, high-quality output means:
- Downstream cost reduction: Reduced review and revision effort
- Decision quality: Judgments based on better insights
- Customer satisfaction: Value delivery to end users
However, note that this quality improvement is not guaranteed for all tasks. For tasks outside AI’s capability boundary, performance may decline significantly1.
2.2 Complex Impact on Creativity
Research published in Science Advances in July 2024 revealed the complexity of AI’s impact on creativity2.
Study Details:
- Authors: Oliver Hauser (University of Exeter, UK), Anil Doshi (University College London)
- Sample size: Approximately 300 non-professional writers, 600 evaluators
- DOI: 10.1126/sciadv.adn5290
- Published: July 12, 2024, Science Advances [Peer-reviewed]
Key Findings:
- Individual level: Writers with access to 5 AI ideas produced works approximately 8% more original and 9% more useful
- Especially low-skill participants: Those with lowest creativity benefited most
- Collective level: However, overall diversity decreased
The “creativity paradox” this study reveals is important. Even if individual engineers can write better code using AI, there’s risk of the whole team converging on similar approaches.
Implications for IT Engineers:
- Watch for homogenization of “AI-like” patterns in code reviews
- Use multiple AI tools to maintain diversity
- Consciously encourage different approaches within teams
2.3 Learning Effects and Skill Development
Conflicting research results have been reported regarding AI assistance’s impact on learning.
Positive Evidence (February 2025):
A pre-registered study published on arXiv6 suggested AI may promote learning.
- Authors: Benjamin Lira, Todd Rogers, Daniel G. Goldstein, Lyle Ungar, Angela L. Duckworth
- Sample size: 2,238 in Study 2, 2,003 in Study 3
- Published: February 5, 2025, arXiv [Preprint]
Key Findings:
- Participants who practiced with AI tools produced higher quality writing than those who practiced without tools (effect size d=0.40)
- This effect persisted in tests one day later
- Achieved learning outcomes improvement while reducing keystrokes by 26%
Negative Concerns (July 2024):
Meanwhile, a theoretical perspective paper published in Cognitive Research: Principles and Implications7 pointed out AI assistance may accelerate skill decline.
- Authors: Brooke N. Macnamara et al. (Case Western Reserve University, etc.)
- DOI: 10.1186/s41235-024-00572-8
- Published: July 12, 2024 [Peer-reviewed, theoretical paper]
Key Concerns:
- While task performance is maintained with AI assistance, independent cognitive abilities may decline
- Risk that users won’t notice this decline (illusory sense of competence)
- Impact may be more serious because AI mimics cognitive skills rather than just automating
Note: This paper is a theoretical perspective and doesn’t contain direct empirical data.
Current Conclusion:
- AI’s impact on learning depends on usage
- AI as a “learning from examples” model may promote learning
- Passive dependency risks damaging cognitive abilities
2.4 System-Wide Impact: DORA Report Insights
The 2024 DORA (DevOps Research and Assessment) report3 revealed the complex impact of AI adoption across the entire software development process.
Key Findings:
Productivity Metrics Improvement:
- With 25% increase in AI adoption:
- Overall productivity: +2.1%
- Code review speed: +3.1%
- Code quality: +3.4%
- Documentation quality: +7.5%
- Developer satisfaction: +2.2%
- Flow state: +2.6%
However, Delivery Metrics Decline:
- Delivery throughput: -1.5%
- Delivery stability: -7.2%
This seemingly contradictory result is analyzed as not a quality-speed tradeoff but caused by temptation toward large-scale code changes38. Because AI can quickly generate code, developers tend to abandon small-batch principles (core of high-performing delivery) and make larger, riskier changes.
Key Insight: AI Is an Amplifier
The most important insight from the DORA report is that AI amplifies an organization’s existing strengths and weaknesses3. Organizations with mature development processes are further strengthened by AI, while immature organizations see problems amplified.
3. Understanding Tradeoffs and Complexity
3.1 Speed vs. Stability
As the DORA report showed, AI tools accelerate individual work while potentially compromising overall system stability3. This demonstrates the danger of pursuing time-saving metrics alone.
Practical Implications:
- Consciously keep Pull Request sizes small
- Strengthen review processes for AI-generated code
- Continuously monitor Change Failure Rate
3.2 Individual vs. Collective
The “individual improvement vs. collective diversity reduction” revealed by creativity research2 suggests the importance of team-level evaluation.
Practical Implications:
- Monitor pattern diversity across the codebase
- Code review culture that encourages different approaches
- Avoid uniform use of AI tools
3.3 Short-term vs. Long-term
As learning effect studies67 show, the impact of short-term efficiency gains on long-term skill development is complex.
Practical Implications:
- Set aside regular time for coding without AI
- Consider gradual AI introduction for junior engineers
- Incorporate appropriate AI usage into mentoring processes
4. Proposing a Multidimensional Evaluation Framework
4.1 Four-Dimensional Evaluation Model
Comprehensive AI value evaluation beyond time savings requires the following four dimensions:
Dimension 1: Efficiency
Time savings is included here but is not the only metric.
Metrics:
- Cycle time reduction rate
- Review time reduction rate
- Task completion speed
Tools:
- GitLab Value Stream Analytics
- GitHub Insights
- Jira cycle time analysis
Dimension 2: Quality
Evaluate output quality from multiple angles.
Metrics:
- Revision request rate in code reviews
- Bug density (production environment)
- Documentation quality score
- User satisfaction
Tools:
- SonarQube (code quality)
- DORA metrics (Change Failure Rate)
- Customer feedback systems
Dimension 3: Creativity & Diversity
Evaluate individual and team-wide creativity.
Metrics:
- Architecture pattern diversity
- Range of problem-solving approaches
- Innovation experiment frequency
- Technical debt management status
Measurement Methods:
- Codebase analysis (pattern detection)
- Qualitative retrospectives
- Technical decision recording and review
Dimension 4: Learning & Growth
Track long-term skill development of individuals and teams.
Metrics:
- New technology adoption speed
- Ability to perform tasks without AI assistance
- Mentoring effectiveness (junior growth rate)
- Technical autonomy improvement
Measurement Methods:
- Regular skill assessments (with and without AI)
- Self-assessment in 1-on-1s
- Observation during pair programming
4.2 Implementation Steps
Step 1: Baseline Measurement (1-2 weeks) Collect baseline data for all four dimensions before AI adoption or in current state.
Step 2: Combine Quantitative + Qualitative (Continuous)
- Quantitative data: CI/CD pipelines, GitHub/GitLab metrics
- Qualitative data: Retrospectives, developer interviews
As GitLab recommends5, “only by combining quantitative data with developers’ qualitative feedback can you get an accurate complete picture of productivity improvement.”
Step 3: Link to Business Outcomes (Monthly) Ultimately what matters is not technical metrics but business outcomes5.
- Deploy frequency → Time to market
- Production defect count → Customer satisfaction
- Innovation experiment count → New feature success rate
Step 4: Continuous Adjustment (Quarterly) Continuously review AI usage, processes, and evaluation metrics themselves.
4.3 Implementation Example: Custom Instructions Optimization
In AI tools like ChatGPT, Cursor, and Claude, you can improve output quality by configuring Custom Instructions. Here’s an example configuration considering quality, learning, and creativity beyond just time savings:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Custom Instructions (For Python Developer)
## What I want AI to know about me
I'm a Python developer, mainly handling backend API development.
- Frameworks: FastAPI, Django
- Database: PostgreSQL
- Infrastructure: Docker, Kubernetes
- Coding standards: PEP 8, type hints required
- Test framework: pytest
## Response approach
### Quality-focused
- Code must include type hints and docstrings
- Consider security best practices (OWASP Top 10)
- Mention performance considerations if relevant
- Provide test code when appropriate
### Learning promotion
- Ask questions that give me time to think before presenting solutions
- Include "why this approach?" explanations
- Show alternatives and tradeoffs
### Diversity maintenance
- When multiple implementations exist, show alternatives beyond just the most common
- Consider new approaches without being fixed on project-specific patterns
Expected Effects:
- Efficiency: Task completion speed improvement
- Quality: Type-safe, secure code generation
- Creativity: Diverse approaches through alternative suggestions
- Learning: Maintain thought processes through “thinking questions”
Notes:
- Project-specific naming conventions need separate specification
- Recommended methods may differ by framework version
- Regularly review settings and adjust as project evolves
5. Summary
5.1 Key Conclusions
The clear conclusion from 2024-2025 research is that measuring AI’s value by time savings alone misses important impacts.
Evidence Summary:
- Harvard/BCG study: AI brings 40%+ quality improvement1
- Science Advances study: Individual creativity improvement (8-9%) and collective diversity reduction occur simultaneously2
- Learning effect studies: Appropriate use promotes learning (d=0.40), inappropriate use risks skill decline67
- DORA report: Code quality improvement (+3.4%) and delivery stability decline (-7.2%) occur simultaneously3
5.2 Practical Recommendations
1. Introduce Multidimensional Measurement
- Evaluate efficiency, quality, creativity, and learning dimensions holistically
- Combine quantitative data with qualitative feedback
2. Focus on Business Outcomes
- Link technical metrics to ultimate business value
- Define organization-specific success metrics based on DORA metrics
3. Understand AI’s “Amplifier” Characteristic
- AI amplifies organizational strengths and weaknesses3
- AI’s true value is realized only with mature processes
4. Continuous Learning and Adjustment
- Continuously improve AI usage and evaluation methods
- Share insights within teams and evolve best practices
5.3 Final Thoughts
Time savings is an important value that should not be ignored. However, it is only the beginning of the value AI provides.
Quality improvement, creativity changes, learning impacts, system-wide effects—by comprehensively evaluating all of these, AI’s true value becomes visible. And to maximize that value, organizations must continue evolving their processes, culture, and measurement methods.
AI is not just a tool, but an opportunity to reconsider the essence of knowledge work. Only organizations that sincerely engage with that question will enjoy the true benefits of the AI era.
References
Reference materials corresponding to in-text citation numbers, listed in order.
Additional References (Not Numbered in Text)
Materials referenced during article creation but not directly cited in text.
- InfoQ. (2025, September). DORA Report Finds AI Is an Amplifier in Software Development, But Trust Remains Low. https://www.infoq.com/news/2025/09/dora-state-of-ai-in-dev-2025/ [Reliability: Medium-High - Tech news media]
- St. Louis Fed. (2025, February). The Impact of Generative AI on Work Productivity. https://www.stlouisfed.org/on-the-economy/2025/feb/impact-generative-ai-work-productivity [Reliability: High - Federal Reserve Bank economic analysis]
- NPR. (2024, July 12). Research shows AI can boost creativity for some, but at a cost. https://www.npr.org/2024/07/12/nx-s1-5033988/research-ai-chatbots-creativity-writing [Reliability: Medium-High - Reputable media, research coverage]
About Citation Accuracy: Research cited in this article was verified using the following methods:
- Confirmation in academic databases (Google Scholar, arXiv, PubMed, etc.)
- Verification of paper information on official journal websites
- Cross-verification through multiple independent sources (academic media, official announcements from research institutions, etc.)
For some papers, direct access to full-text PDFs may be restricted, but abstracts, DOIs, author information, and key findings have been confirmed through official academic databases and reliable secondary sources.
Dell’Acqua, F., McFowland, E., Mollick, E. R., Lifshitz-Assaf, H., Kellogg, K., Rajendran, S., Krayer, L., Candelon, F., & Lakhani, K. R. (2023). Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality. Harvard Business School Working Paper. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321 [Reliability: High - Large-scale field experiment, prestigious research institutions] ↩︎ ↩︎2 ↩︎3 ↩︎4 ↩︎5
Hauser, O., & Doshi, A. (2024). Generative AI enhances individual creativity but reduces the collective diversity of novel content. Science Advances, 10(28). https://doi.org/10.1126/sciadv.adn5290 [Reliability: High - Peer-reviewed, premier academic journal] ↩︎ ↩︎2 ↩︎3 ↩︎4
DORA (DevOps Research and Assessment). (2024). 2024 State of DevOps Report. Google Cloud. https://dora.dev/ [Reliability: High - Industry standard report, large-scale data] ↩︎ ↩︎2 ↩︎3 ↩︎4 ↩︎5 ↩︎6 ↩︎7
Paradiso Solutions. (2025). How to Measure AI Productivity Gains in 2025: Key Metrics That Matter. https://www.paradisosolutions.com/blog/measure-ai-productivity-gains-metrics/ [Reliability: Medium - Industry report citation] ↩︎
GitLab. (2024, February 20). Measuring AI effectiveness beyond developer productivity metrics. https://about.gitlab.com/blog/2024/02/20/measuring-ai-effectiveness-beyond-developer-productivity-metrics/ [Reliability: Medium-High - Industry leader tech blog] ↩︎ ↩︎2 ↩︎3
Lira, B., Rogers, T., Goldstein, D. G., Ungar, L., & Duckworth, A. L. (2025, February 5). Learning from examples: AI assistance can enhance rather than hinder skill development. arXiv preprint. https://arxiv.org/html/2502.02880v1 [Reliability: Medium-High - Preprint, pre-registered study] ↩︎ ↩︎2 ↩︎3
Macnamara, B. N., Berber, I., Çavuşoğlu, M. C., Krupinski, E. A., Nallapareddy, N., Nelson, N. E., Smith, P. J., Wilson-Delfosse, A. L., & Ray, S. (2024). Does using artificial intelligence assistance accelerate skill decay and hinder skill development without performers’ awareness? Cognitive Research: Principles and Implications, 9(1), 46. https://doi.org/10.1186/s41235-024-00572-8 [Reliability: Medium-High - Peer-reviewed, theoretical perspective paper] ↩︎ ↩︎2 ↩︎3
Medium. (2024). AI Dev: The 2024 DORA Report Reviewed. https://medium.com/@julian.burns50/ai-dev-the-2024-dora-report-reviewed-efbcbecc3202 [Reliability: Medium - DORA report analysis article] ↩︎