Google and Microsoft bets on a 27-year-old to propel AI advancements in India
PTC Web Desk: In a quiet village, Agara, located three hours southwest of Bangalore, lives Preethi P., who usually spends her days sewing clothes, earning less than $1 a day. However, on a unique day, Preethi finds herself reading sentences in her native Kannada language into a smartphone app. She is among the 70 workers in Agara and nearby villages employed by a startup named Karya.
Their mission is to gather text, voice, and image data in India's vernacular languages. This unseen global workforce operates in countries like India, Kenya, and the Philippines, collecting and labeling data essential for AI chatbots and virtual assistants to provide relevant responses. What sets Karya apart from other data vendors is its commitment to offering fair compensation, especially to women in rural communities, which is often up to 20 times the prevailing minimum wage, thus ensuring the creation of higher-quality Indian-language data.
Karya was established in 2021, and the recent surge in generative AI has amplified the tech industry's demand for data. India alone is projected to have nearly one million data annotation workers by 2030, as reported by Nasscom, the Indian tech industry's trade body.
Tech giants like Microsoft have engaged Karya to source local speech data for their AI products. The Bill & Melinda Gates Foundation collaborates with Karya to address gender biases in data feeding into large language models. Google, under Alphabet Inc., also relies on Karya and other local partners to accumulate speech data across 85 Indian districts. Google's ambitious plan is to expand into every district, encompassing the majority of languages and dialects spoken in India, creating a generative AI model for 125 Indian languages.
AI services have largely been developed with English-language data, but as billions of non-English speaking users turn to AI-powered technologies in India, there is a growing need to diversify the datasets. This challenge is further compounded by the fact that non-English datasets are of low quality and lack conversational data in languages like Hindi. Karya, a social impact startup based in Bangalore, addresses these issues by focusing on underserved workers in rural areas and providing an app that operates without internet access and offers voice support for those with limited literacy.
For Karya's founder, Manu Chopra, the goal goes beyond improving data supply; it aims to combat poverty. Growing up in an impoverished neighbourhood, he received a scholarship to study computer science at Stanford. However, he was disillusioned by the focus on profit-seeking mindsets and sought to use technology to alleviate poverty.
Silicon Valley's involvement in Karya represents a notable shift in the data industry's economics and its relationship with data providers. Karya is also looking to expand its platform to organisations in Africa and South America.
The impact of Karya extends beyond data collection; it has a transformative effect on the lives of thousands of women like Preethi and Shambhavi, who are seizing the opportunity to earn and support their families through meaningful work. These efforts not only enrich the data landscape but also help bridge the digital divide and empower marginalised communities.
- With inputs from agencies