The model has been enhanced through a three-step process – Supervised Fine-Tuning (SFT), Reinforcement Learning with Verifiable Rewards (RLVR), and Inference Optimisations.
Sarvam AI Official Blog

Key Facts
- Sarvam AI, an Indian startup, has developed Sarvam-M, a 24-billion-parameter open-weights hybrid language model built on Mistral Small.
- Sarvam-M underwent a rigorous three-step enhancement process including Supervised Fine-Tuning (SFT), Reinforcement Learning with Verifiable Rewards (RLVR), and Inference Optimisations.
- Sarvam-M has set new performance standards in mathematics, programming tasks, and Indian language understanding.
- On combined Indian language and math tasks such as the romanised GSM-8K benchmark, Sarvam-M demonstrated an impressive +86% improvement.
- Sarvam-M is now accessible via Sarvam's API and is available for download on Hugging Face for experimentation and integration.
Key Stats at a Glance
Model size of Sarvam-M
24 billion parameters

Performance improvement on GSM-8K benchmark
+86%
