DeputyDev

AI powered developer assistant

Back
AUGUST 13, 2025

Breaking the Code Review Bottleneck: How We Built DeputyDev, Our AI-Powered Code Review Assistant

DD

Posted By DeputyDev Team

7 Minutes read.

We're excited to share our research on DeputyDev, an AI-powered code review assistant that's transforming how we approach code reviews at TATA 1mg. Our paper, now published on arXiv, details how we reduced code review times by up to 47% while maintaining code quality.

Read the full paper: Breaking the Code Review Bottleneck: How We Built DeputyDev, Our AI-Powered Code Review Assistant

Read on arXiv

The Problem: Code Reviews Were Killing Our Velocity

Like many engineering organizations, we faced a serious code review bottleneck. Our telemetry data revealed some eye-opening statistics:

But the real cost wasn't just time—it was the constant context switching. Research from UC Irvine shows that interruptions cause an average of 23 minutes of lost focus. For our developers, waiting days for feedback meant repeatedly losing and rebuilding context, impacting both productivity and wellbeing.


Our Hypothesis: AI as the First Line of Defense

We proposed a two-stage code review process where DeputyDev acts as an AI first-reviewer before human reviewers step in. The idea was simple but powerful:


The Technical Challenge: Context is Everything

Here's where it gets interesting. You can't just throw code at an LLM and expect meaningful reviews. The key insight was that context is everything.

Think about it: when a human reviews code, they don't just look at the changed lines. They understand:

We needed to give our AI the same contextual awareness.

Building the Optimized Context

DeputyDev creates what we call an "optimized context" by pulling together:

This is the most crucial piece. When you change a function process_order, DeputyDev automatically identifies:

We use Abstract Syntax Trees (AST) and a combination of lexical and semantic search to find these relevant code chunks. The formula is elegantly simple:

Relevant_Chunks = Lexical_Search_Results ∪ Semantic_Search_Results

Why Not Just Send the Entire Codebase?

Great question! Three reasons:


The Architecture: Multi-Agent Workflow

Inspired by Andrew Ng's work on agentic AI, we built DeputyDev using a multi-agent architecture with reflection. Instead of one monolithic AI trying to review everything, we created specialized agents:

Our Six Specialized Agents

The Reflection Pattern

Here's where it gets really clever. After each agent generates its initial review, we send the response back to the LLM asking it to reflect on its own output. This iterative refinement dramatically improves quality.

Andrew Ng's research showed that GPT-3.5 with an agent loop achieved 95.1% accuracy compared to just 48.1% in zero-shot mode. That's the power of reflection.

The Blending Engine

All agent responses flow through our "blending engine" which:


The Experiment: Rigorous A/B Testing

We weren't going to deploy this without solid data. We ran a 30-day double-controlled A/B experiment with:

The Results Speak for Themselves

MetricControl Set 1Control Set 2Test Set (DeputyDev)Improvement
Avg Review Time239.57 hrs278.14 hrs197.97 hrs-17% to -29%
Avg Time per LOC12.97 hrs12.29 hrs7.50 hrs-38% to -42%
Median Review Time0.76 hrs0.78 hrs0.41 hrs-46% to -48%

The statistically significant reductions across all metrics validated our hypothesis.

Where DeputyDev Shines Brightest

Interestingly, we found DeputyDev's impact varies by PR size:

The tool excels at smaller PRs because it eliminates the fixed overhead of context switching—the biggest time sink for small changes.


Beyond Code Review: Additional Features

Context-Aware Chat

Developers can ask DeputyDev questions by starting with #dd:

It's like having a senior developer available 24/7, with full context of your PR.

PR Summaries

DeputyDev automatically generates:

This helps reviewers quickly understand changes without diving into the diff immediately.


Technical Choices

LLM Selection:

Integration Points:


Key Learnings

1. Structured vs. Unstructured Output

We discovered that enforcing JSON schema restrictions during the initial LLM reasoning phase significantly reduces quality. Our solution: let the LLM reason freely, then structure the output in a separate step.

2. Agentic Design Trade-offs

Benefits:

Challenges:

The quality improvements justified the added complexity.

3. Weak Correlation Between LOC and Review Time

Surprisingly, we found very weak correlation (0.004-0.095) between lines of code changed and review time. This aligns with what experienced developers intuitively know: a 5-line change can be more complex than a 500-line one.


Real-World Impact

Since deployment:


The Future of Code Review

DeputyDev represents a fundamental shift in how we think about code reviews. Rather than replacing human reviewers, it augments them—handling the mechanical, time-consuming aspects while freeing humans to focus on architectural decisions, business logic, and nuanced judgments that require deep expertise.

The immediate feedback loop also fundamentally changes the developer experience. Instead of context-switching nightmares, developers get instant, actionable feedback, make corrections, and move forward—all while staying in flow.


Open Questions and Future Work

We're continuing to explore:


Try It Yourself

DeputyDev is available as a SaaS solution. If you're facing similar code review bottlenecks, we'd love to help your team achieve similar productivity gains.