The model has been enhanced through a three-step process – Supervised Fine-Tuning (SFT), Reinforcement Learning with Verifiable Rewards (RLVR), and Inference Optimisations.
Sarvam AI Official Blog
Key Facts
- Sarvam AI, an Indian startup, has developed Sarvam-M, a 24-billion-parameter open-weights hybrid language model built on Mistral Small.

- Sarvam-M underwent a rigorous three-step enhancement process including Supervised Fine-Tuning (SFT), Reinforcement Learning with Verifiable Rewards (RLVR), and Inference Optimisations.

- Sarvam-M has set new performance standards in mathematics, programming tasks, and Indian language understanding.

- On combined Indian language and math tasks such as the romanised GSM-8K benchmark, Sarvam-M demonstrated an impressive +86% improvement.

- Sarvam-M is now accessible via Sarvam's API and is available for download on Hugging Face for experimentation and integration.

Key Stats at a Glance
Model size of Sarvam-M
24 billion parameters
Performance improvement on GSM-8K benchmark
+86%