Skip to content
@dsfsi

Data Science for Social Impact Research Group @ University of Pretoria

We are the Data Science for Social Impact research group at the Computer Science Department, University of Pretoria.

We are the Data Science for Social Impact research group at the Computer Science Department, University of Pretoria.

Our general areas of work straddle Data Science for Society as well as Local Language Natural Language Processing. These two strands are complementary. Our work in Data Science and Society has allowed us to have a more nuanced approach to understanding the systematic challenges that face being able to do excellent science with local languages. Through Data Science for Society, we have to understand how when one carries through Data Science research, we situate how the users are part of the process. We find that we need to adjust our research to take care of these challenges and innovate in ways we gather direct data or alternative data.

For us, Data Science for Society means being able to improve approaches/methods or scientific tools for DS while enhancing the ways decision-makers can use the insights that come from these tools. Local Language Natural Language Processing is focused on ways to develop new tools, new data and methodology to improve the state of African languages.

DSFSI Vision, Mission and Values.

Vision

To be a leading inclusive lab that creates and harnesses data and multidisciplinary scientific exploration for societal impact.

Mission

Data-driven collaborative innovation to empower society to tackle challenges and preserve our languages.

Values

  • Community and Collaboration
  • Shared responsibility
  • Inclusiveness
  • Integrity and openness
  • Agency
  • Generosity

Pinned Loading

  1. PuoBERTa PuoBERTa Public

    A Roberta-based language model specially designed for Setswana, using the new PuoData dataset.

    Makefile 4

  2. vukuzenzele-nlp vukuzenzele-nlp Public

    Forked from dsfsi/dsfsi-dataset-template

    The dataset contains editions from the South African government magazine Vuk'uzenzele. Data was scraped from PDFs that have been placed in the data/raw folder. The PDFS were obtained from the Vuk'u…

    Jupyter Notebook 6 4

  3. textaugment textaugment Public

    TextAugment: Text Augmentation Library

    Python 408 60

  4. covid19za covid19za Public

    Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa

    Jupyter Notebook 254 200

  5. gov-za-multilingual gov-za-multilingual Public

    The data set contains cabinet statements from the South African government. Data was scraped from the governments website: https://www.gov.za/cabinet-statements

    Jupyter Notebook 4

  6. masakhane-web masakhane-web Public

    Masakhane Web is a translation web application for solely African Languages.

    Jupyter Notebook 36 15

Repositories

Showing 10 of 52 repositories
  • dsfsi/data-commons-data’s past year of commit activity
    0 0 0 0 Updated Dec 19, 2024
  • za-mavito Public

    DSFSI South African Terminlogy Lists and Lexicon Project

    dsfsi/za-mavito’s past year of commit activity
    HTML 2 0 0 0 Updated Dec 15, 2024
  • zasca-sum Public
    dsfsi/zasca-sum’s past year of commit activity
    Python 0 0 0 0 Updated Nov 21, 2024
  • deadlines Public Forked from vukosim/ai-ds-africa-deadlines

    ⏰ AI/ML/DS conference/workshop/event deadlines on the African continent

    dsfsi/deadlines’s past year of commit activity
    HTML 18 999 0 1 Updated Nov 12, 2024
  • cos802 Public

    Defense against the dark text arts

    dsfsi/cos802’s past year of commit activity
    SCSS 0 MIT 0 0 0 Updated Nov 7, 2024
  • dsfsi/flores-fix-4-africa’s past year of commit activity
    Python 0 0 0 0 Updated Oct 5, 2024
  • za-lid Public

    This repository contains datasets extracted from Vuk'zenzele prepared to train N-gram models, and traditional ML models (Naive Bases, SVM, and Logistic Regression), and Large pretrained multilingual models for language identification

    dsfsi/za-lid’s past year of commit activity
    Jupyter Notebook 0 0 0 0 Updated Sep 29, 2024
  • Higher_Education_EDA Public

    This is an EDA Git for education researchers and practitioners

    dsfsi/Higher_Education_EDA’s past year of commit activity
    Jupyter Notebook 3 Apache-2.0 1 0 0 Updated Sep 16, 2024
  • dsfsi-datasets Public

    Datasets made available for different small projects

    dsfsi/dsfsi-datasets’s past year of commit activity
    Jupyter Notebook 2 MIT 2 0 0 Updated Sep 1, 2024
  • dsfsi/absa-masterclass-hands-on’s past year of commit activity
    Jupyter Notebook 0 1 0 0 Updated Jul 29, 2024

Top languages

Loading…

Most used topics

Loading…