Written by 3:51 pm IAH Automation Roundup

Indian AI Startup Sarvam Outperforms Global Tech Giants on India-Specific Tasks

A Bengaluru-based artificial intelligence company has achieved a significant breakthrough by surpassing major international AI systems on benchmarks focused on Indian languages and document understanding. Sarvam AI’s specialized models have demonstrated superior performance compared to well-known platforms from Google, OpenAI, and Anthropic in tasks involving optical character recognition and complex document processing—areas particularly challenging for India’s multilingual landscape.

The company’s Sarvam Vision system, a three-billion-parameter model trained specifically for document intelligence, scored 84.3 percent accuracy on olmOCR-Bench, outpacing Gemini 3 Pro and significantly exceeding ChatGPT’s performance. This benchmark evaluates how effectively systems can read text, follow reading order, and interpret structured content such as tables and multi-column layouts—critical capabilities for processing government records, financial documents, and historical archives across multiple Indian scripts.

Unlike traditional OCR tools that simply output plain text, Sarvam Vision interprets layout, reading order, charts, and structured elements in scanned or photographed documents. The model handles messy layouts, mathematical formulas, and technical documents that typically challenge conventional systems.

Sarvam Vision draws from training data including textbooks, financial records, government documents, magazines, newspapers, and historical material across multiple Indian languages and English. This India-focused approach allows the system to navigate the unique challenges of non-Latin scripts and complex page structures that global models frequently struggle with.

The training methodology combines real and synthetic documents, creating a robust foundation for understanding the diverse formats and writing systems prevalent across India. By incorporating content from the 1800s to the present day, the system can handle varying scan qualities and diverse content types with improved accuracy.

Alongside the vision model, Sarvam AI launched Bulbul V3, a text-to-speech system supporting 35 natural-sounding voices across 22 official Indian languages. The model excels at smooth language switching, moving seamlessly between languages like Tamil and English or Hindi and English without disruption—a critical feature for India’s multilingual population.

Independent third-party listening studies showed Bulbul V3 delivering highest listener preference and low error rates across use cases including speaking numerics, technical content, and named entities. This performance positions the system competitively against global voice AI platforms while addressing India-specific pronunciation and accent requirements.

Visited 4 times, 1 visit(s) today
Close Search Window
Close