AI powered developer assistant
Posted By DeputyDev Team
7 Minutes read.
We're excited to share our research on DeputyDev, an AI-powered code review assistant that's transforming how we approach code reviews at TATA 1mg. Our paper, now published on arXiv, details how we reduced code review times by up to 47% while maintaining code quality.
Read the full paper: Breaking the Code Review Bottleneck: How We Built DeputyDev, Our AI-Powered Code Review Assistant
Read on arXivLike many engineering organizations, we faced a serious code review bottleneck. Our telemetry data revealed some eye-opening statistics:
But the real cost wasn't just time—it was the constant context switching. Research from UC Irvine shows that interruptions cause an average of 23 minutes of lost focus. For our developers, waiting days for feedback meant repeatedly losing and rebuilding context, impacting both productivity and wellbeing.
We proposed a two-stage code review process where DeputyDev acts as an AI first-reviewer before human reviewers step in. The idea was simple but powerful:
Here's where it gets interesting. You can't just throw code at an LLM and expect meaningful reviews. The key insight was that context is everything.
Think about it: when a human reviews code, they don't just look at the changed lines. They understand:
We needed to give our AI the same contextual awareness.
DeputyDev creates what we call an "optimized context" by pulling together:
This is the most crucial piece. When you change a function process_order, DeputyDev automatically identifies:
We use Abstract Syntax Trees (AST) and a combination of lexical and semantic search to find these relevant code chunks. The formula is elegantly simple:
Relevant_Chunks = Lexical_Search_Results ∪ Semantic_Search_Results
Great question! Three reasons:
Inspired by Andrew Ng's work on agentic AI, we built DeputyDev using a multi-agent architecture with reflection. Instead of one monolithic AI trying to review everything, we created specialized agents:
Here's where it gets really clever. After each agent generates its initial review, we send the response back to the LLM asking it to reflect on its own output. This iterative refinement dramatically improves quality.
Andrew Ng's research showed that GPT-3.5 with an agent loop achieved 95.1% accuracy compared to just 48.1% in zero-shot mode. That's the power of reflection.
All agent responses flow through our "blending engine" which:
We weren't going to deploy this without solid data. We ran a 30-day double-controlled A/B experiment with:
| Metric | Control Set 1 | Control Set 2 | Test Set (DeputyDev) | Improvement |
|---|---|---|---|---|
| Avg Review Time | 239.57 hrs | 278.14 hrs | 197.97 hrs | -17% to -29% |
| Avg Time per LOC | 12.97 hrs | 12.29 hrs | 7.50 hrs | -38% to -42% |
| Median Review Time | 0.76 hrs | 0.78 hrs | 0.41 hrs | -46% to -48% |
The statistically significant reductions across all metrics validated our hypothesis.
Interestingly, we found DeputyDev's impact varies by PR size:
The tool excels at smaller PRs because it eliminates the fixed overhead of context switching—the biggest time sink for small changes.
Developers can ask DeputyDev questions by starting with #dd:
It's like having a senior developer available 24/7, with full context of your PR.
DeputyDev automatically generates:
This helps reviewers quickly understand changes without diving into the diff immediately.
We discovered that enforcing JSON schema restrictions during the initial LLM reasoning phase significantly reduces quality. Our solution: let the LLM reason freely, then structure the output in a separate step.
The quality improvements justified the added complexity.
Surprisingly, we found very weak correlation (0.004-0.095) between lines of code changed and review time. This aligns with what experienced developers intuitively know: a 5-line change can be more complex than a 500-line one.
Since deployment:
DeputyDev represents a fundamental shift in how we think about code reviews. Rather than replacing human reviewers, it augments them—handling the mechanical, time-consuming aspects while freeing humans to focus on architectural decisions, business logic, and nuanced judgments that require deep expertise.
The immediate feedback loop also fundamentally changes the developer experience. Instead of context-switching nightmares, developers get instant, actionable feedback, make corrections, and move forward—all while staying in flow.
We're continuing to explore:
DeputyDev is available as a SaaS solution. If you're facing similar code review bottlenecks, we'd love to help your team achieve similar productivity gains.