Topic: "text-data"
asyml/texar
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Language: Python - Size: 13.6 MB - Last synced at: 15 days ago - Pushed at: over 3 years ago - Stars: 2,388 - Forks: 372

microsoft/DialoGPT
Large-scale pretraining for dialogue
Language: Python - Size: 43.6 MB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 2,380 - Forks: 346

microsoft/GODEL
Large-scale pretrained models for goal-directed dialog
Language: Python - Size: 49.8 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 865 - Forks: 112

asyml/texar-pytorch
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
Language: Python - Size: 3.08 MB - Last synced at: 20 days ago - Pushed at: about 3 years ago - Stars: 745 - Forks: 115

asyml/forte
Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/
Language: Python - Size: 17.8 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 244 - Forks: 60

thu-coai/cotk
Conversational Toolkit. An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation
Language: Python - Size: 10.5 MB - Last synced at: 16 days ago - Pushed at: over 4 years ago - Stars: 127 - Forks: 26

LoLei/redditcleaner
Cleans Reddit Text Data :scroll: :broom:
Language: Python - Size: 41 KB - Last synced at: 27 days ago - Pushed at: about 5 years ago - Stars: 81 - Forks: 2

trinker/textreadr
Tools to uniformly read in text data including semi-structured transcripts
Language: R - Size: 1.78 MB - Last synced at: 27 days ago - Pushed at: about 2 years ago - Stars: 74 - Forks: 5

trinker/textshape
Tools for reshaping text data
Language: R - Size: 1.08 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 50 - Forks: 2

PratikBarhate/question-classification
Question Classification for the dataset CogComp QC Dataset - [ http://cogcomp.org/Data/QA/QC/ ].
Language: Python - Size: 57.2 MB - Last synced at: 10 months ago - Pushed at: over 4 years ago - Stars: 29 - Forks: 13

BALaka-18/rake_new2
A Python library that enables smooth keyword extraction from any text using the RAKE(Rapid Automatic Keyword Extraction) algorithm.
Language: Python - Size: 15.5 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 28 - Forks: 20

YaleDHLab/wordmap 📦
Visualize large text collections with WebGL
Language: JavaScript - Size: 7.02 MB - Last synced at: 13 days ago - Pushed at: 8 months ago - Stars: 25 - Forks: 5

carted/processing-text-data
Presents an optimized Apache Beam pipeline for generating sentence embeddings (runnable on Cloud Dataflow).
Language: Python - Size: 16.6 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 20 - Forks: 6

tylerjthomas9/ScrapeSEC.jl
Scrape EDGAR filings from https://www.sec.gov/
Language: Julia - Size: 199 KB - Last synced at: 6 days ago - Pushed at: about 2 months ago - Stars: 14 - Forks: 0

PedroBarcha/old-books-dataset
Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binarization). Noised and denoised sets (done by several methods) are eventually going to be uploaded.
Language: HTML - Size: 1.29 GB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 11 - Forks: 2

tayebiarasteh/retweet
How Will Your Tweet Be Received? Predicting theSentiment Polarity of Tweet Replies
Language: Python - Size: 155 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 6

Hsankesara/The-Tweets-of-Wisdom
A dataset which contains 30k+ so called "self-help" tweets from 100+ authors.
Language: Jupyter Notebook - Size: 5.35 MB - Last synced at: 26 days ago - Pushed at: over 5 years ago - Stars: 9 - Forks: 2

mrchypark/gomSubtitleData
곰tv 자막 데이터 수집 코드
Language: R - Size: 145 MB - Last synced at: about 23 hours ago - Pushed at: about 8 years ago - Stars: 6 - Forks: 6

SignalN/parallelio
For reading from and writing to parallel data files in Python
Language: Python - Size: 10.7 KB - Last synced at: 11 days ago - Pushed at: over 7 years ago - Stars: 3 - Forks: 0

XMU-Kuangnan-Fang-Team/SpecificLDA
A Python package implementing the Directed LDA model for targeted extraction of specific topics from text data
Language: Python - Size: 510 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 1

jfjelstul/regular-expressions-tutorial
A tutorial on using regular expressions in R
Size: 1.27 MB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

ccubc/GlassdoorReviews
classifying employee reviews on glassdoor.com
Language: Jupyter Notebook - Size: 711 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 1

DolbyUUU/Top-Economics-Journals-Publications-Dataset
Top Economics Journals Publications Dataset and Data Analysis: Top 5 English Journals and Top 3 Chinese Journals
Language: Python - Size: 20.4 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

DolbyUUU/Focus-Report-Dataset
2003-2023焦点访谈节目文本数据及数据分析 Text Data and Data Analysis of Focus Report, a Chinese Investigative TV Program, 2003-2023
Language: Python - Size: 3.5 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

FareedKhan-dev/NLP-1K-Stories-Dataset-Genres-100
This repository hosts a diverse NLP dataset comprising 1,000 stories spanning 100 genres for comprehensive language understanding tasks.
Size: 2.17 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

ptthanh02/VietNam-News-Crawler
Language: Jupyter Notebook - Size: 229 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

cauchi94/airbnb-customer-sentiment
Analysis of text data by extracting the main topics from airbnb dataset using Latent Dirichlet Allocation (LDA) and then Linear Regression to interpret the topics.
Language: Jupyter Notebook - Size: 584 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

bchryzal/Detecting-Generated-Scientific-Papers
Can you spot automatically generated scientific excerpts?
Language: Jupyter Notebook - Size: 514 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Ankit152/StackOverflow-Tag-Prediction
A machine learning model that predicts tags for a given question and body.
Language: Jupyter Notebook - Size: 1.02 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

vraul92/NLP-on-Whatsapp-Group-Chat
Applying NLP techniques on WhatsApp text to gain insights.
Language: Jupyter Notebook - Size: 128 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 1

KlaraGtknst/text_topic
This repository implements a pipeline to store various data of files from a large unstructured dataset. These fields are used for topic modeling (wordclouds, based on low-dimensional versions of embedding vectors, Named Entity Clustering and document-topic incidences). The information is aggregated and visualised using FCA.
Language: Python - Size: 4.22 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

FinnishCancerRegistry/fwf
Read and write fixed-width format data.
Language: R - Size: 17.6 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

Infinitode/CRSD
A synthetic customer review sentiment dataset for sentiment analysis generated using different AI models.
Size: 83 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Infinitode/DupliPy
DupliPy is a quick and easy-to-use package that can handle text formatting and data augmentation tasks for NLP in Python. It now offers support for image augmentation tasks as well.
Language: Python - Size: 65.4 KB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

DolbyUUU/Spring-Festival-Gala-Dataset
中国40年春晚小品类节目的文本数据及数据分析 Text Data and Data Analysis of Chinese Spring Festival Gala Comedy Sketches Over 40 Years
Language: Python - Size: 0 Bytes - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

sevvalckc/Turkish-SAD
Python script to perform sentiment analysis on Turkish text data using multiple pre-trained transformer models and list of Turkish Sentiment Analysis Datasets between 2012 to 2022.
Language: Python - Size: 144 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

MHenderson/pages2df
Read morning pages into a data frame in R.
Language: R - Size: 8.79 KB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

Nexdata-AI/28237-Intent-type-single-sentence-annotation-data
28237-Intent-type-single-sentence-annotation-data
Size: 264 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Nexdata-AI/80000-sets-Multi-domain-Customer-Service-Dialogue-Text-Data
80000-sets-Multi-domain-Customer-Service-Dialogue-Text-Data
Size: 304 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Nexdata-AI/13000000-Groups-Man-Machine-Conversation-Interactive-Text-Data
13000000-Groups-Man-Machine-Conversation-Interactive-Text-Data
Size: 1.39 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Nexdata-AI/8178-Chinese-Social-Comments-Events-Annotation-Data
8178-Chinese-Social-Comments-Events-Annotation-Data
Size: 711 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Nexdata-AI/13-Modules-Entity-Name-Single-sentence-Annotation-Data
13-Modules-Entity-Name-Single-sentence-Annotation-Data
Size: 3.75 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

putuwaw/slr-emotion-classification
Systematic Literature Review: Machine Learning Methods in Emotion Classification in Textual Data
Language: Jupyter Notebook - Size: 1.86 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 1

sugatagh/Natural-Language-Processing-with-Disaster-Tweets
The objective of the project is to predict whether a particular tweet, of which the text (occasionally the keyword and the location as well) is provided, indicates a real disaster or not. We use various NLP techniques and classification models for this purpose and objectively compare these models by means of appropriate evaluation metric.
Language: Jupyter Notebook - Size: 4.24 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Xin-Bu/Coffee_review_text_QA_LLMs
Connects to OpenAI, applies Large Language Models (LLMs) & LangChain, and builds a platform to chat with coffee customer review text data using Python. Visualizes text data with R
Language: HTML - Size: 907 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

TZNcse209/Text-Data-Sentiment-Analysis
Text Data: Sentiment Analysis
Language: Jupyter Notebook - Size: 47.9 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

PriyankaSett/predicting_instagram_likes
The aim of this work is to predict number of instagram likes. The text vectorization is done using TF-IDF Vectorizer.
Language: Jupyter Notebook - Size: 641 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

intro-to-data-science-22-workshop/10-Text-analysis-with-quanteda-roa-fonseca-kraess
Welcome to the amazing world of quanteda. Text analysis, allocations, sentiment analysis and more. Welcome!
Language: HTML - Size: 50.2 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

k-loki/Extract-tech-skills
Extract technical skills from data of skills
Language: Jupyter Notebook - Size: 222 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

shoaebkiyani/magic_rules
This project is built using React. The data is fetched online in a text form and then split them into chunks of arrays depending on the Table of Contents. For example, main headings were put in one array and subheading in another array and the content in different array. In the end data is displayed on the screen.
Language: JavaScript - Size: 381 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

AkashBangalkar/Amazon-Apparel-Recommendations-System
Machine Learning - Content Based Recommendation System
Language: Jupyter Notebook - Size: 8.51 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 1

mounaiban/bakdoh
Just a bunch of experiments with embedded graph databases
Language: Python - Size: 428 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

python-supply/strings-regular-expressions-and-text-data-analysis
While built-in string methods and regular expressions have limitations, they can be leveraged in creative ways to implement scalable workflows that process and analyze text data. This article explores these tools and introduces a few useful peripheral techniques within the context of a use case involving a large text data corpus.
Language: Jupyter Notebook - Size: 8.95 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

YugantM/yugantm.github.io
Language: HTML - Size: 6.8 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

roshankoirala/WSS_2019_Roshan-Koirala Fork of KyleKeane/WSS-Template
An algorithm to generate the word cloud for time-varying dynamical text data in order to minimize the relative movement of the word over time.
Language: Mathematica - Size: 53.1 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

chaitanyakasaraneni/nlp_pipeline
This repository contains examples on stages in NLP pipeline
Language: Jupyter Notebook - Size: 638 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 3

ibraaaa/news-credibility
Size: 142 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0
