Tech Mahindra is on the brink of launching Project Indus, an innovative Large Language Model (LLM) specifically tailored for Hindi and its 37 dialects. The announcement was made by CP Gurnani, Tech Mahindra’s outgoing chief, who expressed pride in the successful completion of the GenerativeAI project challenge undertaken by the company’s research team at Makers Lab.
With 19 years of service at Tech Mahindra, Gurnani highlighted the significant milestones achieved during his tenure. Project Indus, currently in the beta testing phase within the company, boasts a pure Hindi LLM comprising an impressive 539 million parameters and 10 billion Hindi+ dialect tokens. Gurnani emphasized that this model is likely the world’s only one encompassing all Hindi tokens, trained comprehensively from the ground up. He sees it as a pivotal development that will shape the company’s deep tech capabilities for years to come.
As part of the succession plan, Gurnani handed over the reins to Mohit Joshi, Nikhil Malhotra, and the team, expressing confidence in their ability to elevate Project Indus to new heights. The model, initially supporting 40 different Hindi dialects, is slated for release by the end of December or early January, as announced by Tech Mahindra in October. The company plans to expand its language support in subsequent releases, encompassing additional languages and dialects.
Over the past two months, the 15-member Project Indus team diligently collected 1.2 terabytes of data in Hindi and related dialects, ensuring a robust and comprehensive language model. Gurnani’s update aligns with a recent trend in the Indian tech landscape, witnessing several companies and startups unveiling their own Large Language Models. Notable mentions include CoRover’s BharatGPT, Sarvam.ai’s OpenHathi, Kissan AI’s Dhenu backed by Microsoft, and Ola’s Krutrim, reflecting a collective stride toward advancing natural language processing technologies in the Indian market.
You can check out the official Project Indus site here.