An open API service providing repository metadata for many open source software ecosystems.

Topic: "text-data"

asyml/texar

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Language: Python - Size: 13.6 MB - Last synced at: 15 days ago - Pushed at: over 3 years ago - Stars: 2,388 - Forks: 372

microsoft/DialoGPT

Large-scale pretraining for dialogue

Language: Python - Size: 43.6 MB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 2,380 - Forks: 346

microsoft/GODEL

Large-scale pretrained models for goal-directed dialog

Language: Python - Size: 49.8 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 865 - Forks: 112

asyml/texar-pytorch

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Language: Python - Size: 3.08 MB - Last synced at: 20 days ago - Pushed at: about 3 years ago - Stars: 745 - Forks: 115

asyml/forte

Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/

Language: Python - Size: 17.8 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 244 - Forks: 60

thu-coai/cotk

Conversational Toolkit. An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation

Language: Python - Size: 10.5 MB - Last synced at: 16 days ago - Pushed at: over 4 years ago - Stars: 127 - Forks: 26

LoLei/redditcleaner

Cleans Reddit Text Data :scroll: :broom:

Language: Python - Size: 41 KB - Last synced at: 27 days ago - Pushed at: about 5 years ago - Stars: 81 - Forks: 2

trinker/textreadr

Tools to uniformly read in text data including semi-structured transcripts

Language: R - Size: 1.78 MB - Last synced at: 27 days ago - Pushed at: about 2 years ago - Stars: 74 - Forks: 5

trinker/textshape

Tools for reshaping text data

Language: R - Size: 1.08 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 50 - Forks: 2

PratikBarhate/question-classification

Question Classification for the dataset CogComp QC Dataset - [ http://cogcomp.org/Data/QA/QC/ ].

Language: Python - Size: 57.2 MB - Last synced at: 10 months ago - Pushed at: over 4 years ago - Stars: 29 - Forks: 13

BALaka-18/rake_new2

A Python library that enables smooth keyword extraction from any text using the RAKE(Rapid Automatic Keyword Extraction) algorithm.

Language: Python - Size: 15.5 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 28 - Forks: 20

YaleDHLab/wordmap 📦

Visualize large text collections with WebGL

Language: JavaScript - Size: 7.02 MB - Last synced at: 13 days ago - Pushed at: 8 months ago - Stars: 25 - Forks: 5

carted/processing-text-data

Presents an optimized Apache Beam pipeline for generating sentence embeddings (runnable on Cloud Dataflow).

Language: Python - Size: 16.6 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 20 - Forks: 6

tylerjthomas9/ScrapeSEC.jl

Scrape EDGAR filings from https://www.sec.gov/

Language: Julia - Size: 199 KB - Last synced at: 6 days ago - Pushed at: about 2 months ago - Stars: 14 - Forks: 0

PedroBarcha/old-books-dataset

Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binarization). Noised and denoised sets (done by several methods) are eventually going to be uploaded.

Language: HTML - Size: 1.29 GB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 11 - Forks: 2

tayebiarasteh/retweet

How Will Your Tweet Be Received? Predicting theSentiment Polarity of Tweet Replies

Language: Python - Size: 155 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 6

Hsankesara/The-Tweets-of-Wisdom

A dataset which contains 30k+ so called "self-help" tweets from 100+ authors.

Language: Jupyter Notebook - Size: 5.35 MB - Last synced at: 26 days ago - Pushed at: over 5 years ago - Stars: 9 - Forks: 2

mrchypark/gomSubtitleData

곰tv 자막 데이터 수집 코드

Language: R - Size: 145 MB - Last synced at: about 23 hours ago - Pushed at: about 8 years ago - Stars: 6 - Forks: 6

SignalN/parallelio

For reading from and writing to parallel data files in Python

Language: Python - Size: 10.7 KB - Last synced at: 11 days ago - Pushed at: over 7 years ago - Stars: 3 - Forks: 0

XMU-Kuangnan-Fang-Team/SpecificLDA

A Python package implementing the Directed LDA model for targeted extraction of specific topics from text data

Language: Python - Size: 510 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 1

jfjelstul/regular-expressions-tutorial

A tutorial on using regular expressions in R

Size: 1.27 MB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

ccubc/GlassdoorReviews

classifying employee reviews on glassdoor.com

Language: Jupyter Notebook - Size: 711 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 1

DolbyUUU/Top-Economics-Journals-Publications-Dataset

Top Economics Journals Publications Dataset and Data Analysis: Top 5 English Journals and Top 3 Chinese Journals

Language: Python - Size: 20.4 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

DolbyUUU/Focus-Report-Dataset

2003-2023焦点访谈节目文本数据及数据分析 Text Data and Data Analysis of Focus Report, a Chinese Investigative TV Program, 2003-2023

Language: Python - Size: 3.5 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

FareedKhan-dev/NLP-1K-Stories-Dataset-Genres-100

This repository hosts a diverse NLP dataset comprising 1,000 stories spanning 100 genres for comprehensive language understanding tasks.

Size: 2.17 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

ptthanh02/VietNam-News-Crawler

Language: Jupyter Notebook - Size: 229 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

cauchi94/airbnb-customer-sentiment

Analysis of text data by extracting the main topics from airbnb dataset using Latent Dirichlet Allocation (LDA) and then Linear Regression to interpret the topics.

Language: Jupyter Notebook - Size: 584 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

bchryzal/Detecting-Generated-Scientific-Papers

Can you spot automatically generated scientific excerpts?

Language: Jupyter Notebook - Size: 514 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Ankit152/StackOverflow-Tag-Prediction

A machine learning model that predicts tags for a given question and body.

Language: Jupyter Notebook - Size: 1.02 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

vraul92/NLP-on-Whatsapp-Group-Chat

Applying NLP techniques on WhatsApp text to gain insights.

Language: Jupyter Notebook - Size: 128 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 1

KlaraGtknst/text_topic

This repository implements a pipeline to store various data of files from a large unstructured dataset. These fields are used for topic modeling (wordclouds, based on low-dimensional versions of embedding vectors, Named Entity Clustering and document-topic incidences). The information is aggregated and visualised using FCA.

Language: Python - Size: 4.22 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

FinnishCancerRegistry/fwf

Read and write fixed-width format data.

Language: R - Size: 17.6 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

Infinitode/CRSD

A synthetic customer review sentiment dataset for sentiment analysis generated using different AI models.

Size: 83 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Infinitode/DupliPy

DupliPy is a quick and easy-to-use package that can handle text formatting and data augmentation tasks for NLP in Python. It now offers support for image augmentation tasks as well.

Language: Python - Size: 65.4 KB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

DolbyUUU/Spring-Festival-Gala-Dataset

中国40年春晚小品类节目的文本数据及数据分析 Text Data and Data Analysis of Chinese Spring Festival Gala Comedy Sketches Over 40 Years

Language: Python - Size: 0 Bytes - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

sevvalckc/Turkish-SAD

Python script to perform sentiment analysis on Turkish text data using multiple pre-trained transformer models and list of Turkish Sentiment Analysis Datasets between 2012 to 2022.

Language: Python - Size: 144 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

MHenderson/pages2df

Read morning pages into a data frame in R.

Language: R - Size: 8.79 KB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

Nexdata-AI/28237-Intent-type-single-sentence-annotation-data

28237-Intent-type-single-sentence-annotation-data

Size: 264 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Nexdata-AI/80000-sets-Multi-domain-Customer-Service-Dialogue-Text-Data

80000-sets-Multi-domain-Customer-Service-Dialogue-Text-Data

Size: 304 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Nexdata-AI/13000000-Groups-Man-Machine-Conversation-Interactive-Text-Data

13000000-Groups-Man-Machine-Conversation-Interactive-Text-Data

Size: 1.39 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Nexdata-AI/8178-Chinese-Social-Comments-Events-Annotation-Data

8178-Chinese-Social-Comments-Events-Annotation-Data

Size: 711 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Nexdata-AI/13-Modules-Entity-Name-Single-sentence-Annotation-Data

13-Modules-Entity-Name-Single-sentence-Annotation-Data

Size: 3.75 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

putuwaw/slr-emotion-classification

Systematic Literature Review: Machine Learning Methods in Emotion Classification in Textual Data

Language: Jupyter Notebook - Size: 1.86 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 1

sugatagh/Natural-Language-Processing-with-Disaster-Tweets

The objective of the project is to predict whether a particular tweet, of which the text (occasionally the keyword and the location as well) is provided, indicates a real disaster or not. We use various NLP techniques and classification models for this purpose and objectively compare these models by means of appropriate evaluation metric.

Language: Jupyter Notebook - Size: 4.24 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Xin-Bu/Coffee_review_text_QA_LLMs

Connects to OpenAI, applies Large Language Models (LLMs) & LangChain, and builds a platform to chat with coffee customer review text data using Python. Visualizes text data with R

Language: HTML - Size: 907 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

TZNcse209/Text-Data-Sentiment-Analysis

Text Data: Sentiment Analysis

Language: Jupyter Notebook - Size: 47.9 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

PriyankaSett/predicting_instagram_likes

The aim of this work is to predict number of instagram likes. The text vectorization is done using TF-IDF Vectorizer.

Language: Jupyter Notebook - Size: 641 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

intro-to-data-science-22-workshop/10-Text-analysis-with-quanteda-roa-fonseca-kraess

Welcome to the amazing world of quanteda. Text analysis, allocations, sentiment analysis and more. Welcome!

Language: HTML - Size: 50.2 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

k-loki/Extract-tech-skills

Extract technical skills from data of skills

Language: Jupyter Notebook - Size: 222 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

shoaebkiyani/magic_rules

This project is built using React. The data is fetched online in a text form and then split them into chunks of arrays depending on the Table of Contents. For example, main headings were put in one array and subheading in another array and the content in different array. In the end data is displayed on the screen.

Language: JavaScript - Size: 381 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

AkashBangalkar/Amazon-Apparel-Recommendations-System

Machine Learning - Content Based Recommendation System

Language: Jupyter Notebook - Size: 8.51 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 1

mounaiban/bakdoh

Just a bunch of experiments with embedded graph databases

Language: Python - Size: 428 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

python-supply/strings-regular-expressions-and-text-data-analysis

While built-in string methods and regular expressions have limitations, they can be leveraged in creative ways to implement scalable workflows that process and analyze text data. This article explores these tools and introduces a few useful peripheral techniques within the context of a use case involving a large text data corpus.

Language: Jupyter Notebook - Size: 8.95 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

YugantM/yugantm.github.io

Language: HTML - Size: 6.8 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

roshankoirala/WSS_2019_Roshan-Koirala Fork of KyleKeane/WSS-Template

An algorithm to generate the word cloud for time-varying dynamical text data in order to minimize the relative movement of the word over time.

Language: Mathematica - Size: 53.1 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

chaitanyakasaraneni/nlp_pipeline

This repository contains examples on stages in NLP pipeline

Language: Jupyter Notebook - Size: 638 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 3

ibraaaa/news-credibility

Size: 142 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0