An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: farsi-datasets

M-Taghizadeh/Persian_Question_Answering_Voice2Voice_AI

This repository hosts BonyadAI, a Persian question answering AI Model. We developed an initial web crawler and scraper to gather the dataset. The second phase involved building a machine learning model based on word embeddings and NLP techniques. This AI model operates end-to-end, receiving user voice input and providing responses in Persian voice.

Language: Jupyter Notebook - Size: 89.4 MB - Last synced at: 6 days ago - Pushed at: 10 months ago - Stars: 5 - Forks: 3

kargaranamir/Persian-Datasets

Persian Datasets including: Wikipedia, Twitter, Hamshahri, Hellokish, NSURL'19, Peyma, Text_mining.ir

Size: 1000 Bytes - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 1

RezaGooner/PerSent

Python library for analyzing Persian texts. With the ability to analyze customer opinions and their offer status, analyzing the seven emotions in Persian sentences at the moment.

Language: Python - Size: 233 MB - Last synced at: 15 days ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 1

farbodbj/iranian-surname-frequencies

Welcome to the Persian Last Names Dataset, a comprehensive collection of over 100,000 Persian surnames accompanied by their respective frequencies. This dataset is curated from a substantial real-world sample of more than 10 million records, ensuring reliable and representative data for various applications.

Size: 2.12 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 0

farbodbj/persian-gender-by-name

A comprehensive dataset for determining gender based on Persian names, enriched with English representations.

Size: 847 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 32 - Forks: 4

karim23657/Persian-tts-coqui

Persian/Farsi text to speech(TTS) training using coqui tts

Language: Jupyter Notebook - Size: 53.7 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 128 - Forks: 18

nabidam/persian-names

Persian names dataset

Language: Python - Size: 267 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 0

amirzenoozi/persian-news-crawler

Simple Script To Crawl Data From Persian News Agencies Including Fars, Mehr.

Language: Python - Size: 101 KB - Last synced at: 27 days ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 0

sajjjadayobi/CLIPfa

CLIPfa: Connecting Farsi Text and Images

Language: Jupyter Notebook - Size: 6.7 MB - Last synced at: 6 months ago - Pushed at: over 2 years ago - Stars: 80 - Forks: 8

acdh-oeaw/pes_eng_dict-app

The Small Farsi-English Internet Dictionary is a bilingual dictionary that has been in development for many years as part of a language teaching programme at the Department of Oriental Studies at the University of Vienna.

Language: XSLT - Size: 67.4 KB - Last synced at: 2 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

Sarasadeghii/Sharif-Wav2vec2

This repo shows how to finetune the wav2vec2.0 model along with its prerequisites.

Language: Jupyter Notebook - Size: 297 KB - Last synced at: 11 months ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

mmahdibarghi/finglish-dataset

Persian to Finglish dataset with all the sentences voice for TTS dataset used to train tacotron2

Language: Python - Size: 91.9 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 2

mehrdad-dev/persis

Official github repository, Persis: A persian font recognition pipeline using convolutional neural networks.

Size: 1.14 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

FtmsdtHosseini/IDPL-PFOD

An Image Dataset of Printed Farsi Text for OCR Research

Size: 2.79 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 12 - Forks: 1

Sarasadeghii/Sharif-WavLM

In this repository, the wavLM model is used for quality and poor quality data for speaker verification task, and the PyCM library is used for evaluation.

Language: Jupyter Notebook - Size: 744 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

arm-on/PREDICT-Persian-Reverse-Dictionary Fork of AUT-Data-Group/PREDICT-Persian-Reverse-Dictionary

The first intelligent Persian reverse dictionary

Language: Jupyter Notebook - Size: 171 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 0

mallahyari/Farsi-datasets

A collection of Farsi (Persian) datasets

Language: Python - Size: 381 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 21 - Forks: 6

mo-amininasab/FarsiDigits-Recognizer

This is a trained model to recognize Farsi digits.

Language: Jupyter Notebook - Size: 16 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Zarharan/ParsFEVER

The first dataset for Farsi fact extraction and verification

Language: JavaScript - Size: 2.16 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 1

Related Keywords
farsi-datasets 19 farsi 10 dataset 9 persian 8 persian-dataset 7 nlp 6 machine-learning 4 python 4 persian-language 4 deep-learning 3 natural-language-processing 3 persian-nlp 3 ocr 2 crawler 2 tts 2 speech-to-text 2 text-to-speech 2 database 2 text-processing 2 research 1 optical-character-recognition 1 ieee 1 research-paper 1 vfr 1 visual-font-recognition 1 farsi-ocr 1 farsi-ocr-dataset 1 image-classification 1 font-recognition 1 convolutional-neural-networks 1 cnn 1 tts-dataset 1 finglish-dataset 1 xlsr 1 wer 1 wav2vec2 1 speech-recognition 1 language-model 1 kenlm 1 tei-xml 1 tei-lex0 1 persian-fact-checking-dataset 1 persian-fact-checking 1 fact-verification 1 fact-extraction 1 fact-checking-guideline 1 fact-checking 1 annotation-tool 1 svc 1 logistic-regression 1 hoda-dataset 1 gaussian-naive-bayes 1 farsi-nlp 1 reverse-dictionary 1 computational-linguistics 1 wavlm 1 speaker-verification 1 pycm 1 confusion-matrix 1 persian-ocr-dataset 1 persian-ocr 1 ocr-recognition 1 ocr-python 1 image-processing 1 image-generators 1 image-generation 1 speech 1 persian-tts 1 hifigan 1 glow-tts 1 coqui-tts 1 coqui-ai 1 coqui 1 text-classification 1 sentimental-analysis 1 sentiment-classification 1 sentiment-analysis 1 python-library 1 pip 1 opensource 1 library 1 farsi-language 1 emotion-analysis 1 word2vec 1 transformer-architecture 1 scraping-python 1 question-answering 1 large-language-models 1 corpus-linguistics 1 artificial-intelligence 1 dictionaries 1 zero-shot-learning 1 openai-clip 1 image-search 1 clip 1 tensorflow2 1 tensorflow 1 sqlite3 1 shargh-news 1 script 1