An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: dsfsi-datasets

dsfsi/gov-za-multilingual

The data set contains cabinet statements from the South African government. Data was scraped from the governments website: https://www.gov.za/cabinet-statements

Language: Jupyter Notebook - Size: 1.33 GB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 4 - Forks: 0

dsfsi/covid19za

Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa

Language: Jupyter Notebook - Size: 87.4 MB - Last synced at: 25 days ago - Pushed at: over 1 year ago - Stars: 254 - Forks: 199

dsfsi/za-marito

DSFSI South African Terminlogy Lists and Lexicon Project

Language: HTML - Size: 21.3 MB - Last synced at: 13 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

dsfsi/Higher_Education_EDA

This is an EDA Git for education researchers and practitioners

Language: Jupyter Notebook - Size: 4.61 MB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 3 - Forks: 1

dsfsi/za-bank-risk

This repository is an initial pipeline for reading, processing, labelling and classifying unstructured annual reports of South African (SA) banks with the aim of identifying financial risk. It leveraged work by the Corporate Financial Information Environment-Final Report Structure Extractor (CFIE–FRSE) of El-Haj et al. which created a corpus of annual reports of United Kingdom (UK) companies.

Language: Jupyter Notebook - Size: 1.31 GB - Last synced at: about 5 hours ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

dsfsi/izindaba-zesizulu

Categorised isiZulu News. Source data is the isiZulu news from the SABC social media posts.

Size: 2.15 MB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

dsfsi/gov-za-sona-multilingual

Language: Python - Size: 14.5 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

dsfsi/PuoBERTa

A Roberta-based language model specially designed for Setswana, using the new PuoData dataset.

Language: Makefile - Size: 3.56 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

dsfsi/PuoData

Curated corpora for Setswana. Used to train PuoBERTa.

Size: 8.32 MB - Last synced at: about 10 hours ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

dsfsi/vukuzenzele-nlp Fork of dsfsi/dsfsi-dataset-template

The dataset contains editions from the South African government magazine Vuk'uzenzele. Data was scraped from PDFs that have been placed in the data/raw folder. The PDFS were obtained from the Vuk'uzenzele website.

Language: Jupyter Notebook - Size: 5.28 GB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 4

dsfsi/za-isizulu-siswati-news-2022

IsiZulu News (articles and headlines) and Siswati News (headlines) Corpora - za-isizulu-siswati-news-2022

Size: 292 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

dsfsi/edu-assessment-llm-prompt

Educational Assesement using LLMs

Language: Python - Size: 1.4 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

dsfsi/dlindaba-2019-uber

UBER Rider Rating Data from the DLIndaba 2019

Size: 11.7 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

dsfsi/sa-parliament

South African Member Of Parliament Data

Language: Python - Size: 61.5 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 5

dsfsi/StatsSA-Language

StatsSA statistical language glossary in machine-readable format

Language: Jupyter Notebook - Size: 30.3 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 2

dsfsi/project-state-capture

Zondo Commission or State Capture Commission Transcripts

Size: 50.3 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

dsfsi/healthfacilitymap

South African Health Facility map. Created to aid in covid19za responses

Language: JavaScript - Size: 485 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 2

dsfsi/embedding-eval-data

Embedding Evaluation Data for South African Languages

Size: 8.79 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

dsfsi/za-fake-news-2020

Dataset of South African Disinformation [Fake News] Website Data collected in 2020

Size: 15.6 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 3