Skip to content

Commit

Permalink
Merge pull request #91 from AI4Bharat/publications
Browse files Browse the repository at this point in the history
Info Update
  • Loading branch information
Shanks0465 authored Oct 22, 2024
2 parents bb3a459 + a13b0f1 commit 5b08716
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion frontend/components/Dynamic/Area.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Additionally, we introduced the Bharat Parallel Corpus Collection (BPCC), which
description: `At AI4Bharat, our dedication to building language models and datasets for all 22 constitutionally
recognized Indian languages is central to our mission. We employ a multifaceted approach, leveraging
large-scale data crawling, synthetic data creation, and human annotation/crowd collections to create
comprehensive datasets. Our efforts have resulted in an extensive pretraining corpus of 251 million
comprehensive datasets. Our efforts have resulted in an extensive pretraining corpus of 251 billion
tokens across 22 languages, complemented by 74.7 million prompt-response pairs in 20 Indian
languages. Tools like Setu play a crucial role in large-scale crawling and data cleaning, enabling
us to build state-of-the-art models such as Airavata, IndicBART, and IndicBERT. We also emphasize
Expand Down

0 comments on commit 5b08716

Please sign in to comment.