An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: text-as-data

MilaNLProc/contextualized-topic-models

A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).

Language: Python - Size: 32 MB - Last synced at: about 1 hour ago - Pushed at: 4 months ago - Stars: 1,231 - Forks: 152

cjerzak/LinkOrgs-software

LinkOrgs: An R package for linking linking records on organizations using half a billion open-collaborated records from LinkedIn

Language: HTML - Size: 178 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 12 - Forks: 1

bgonzalezbustamante/TextClass-Benchmark

TextClass Benchmark Leaderboards

Language: Jupyter Notebook - Size: 148 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

jboynyc/textnets

Text analysis with networks.

Language: Python - Size: 2.92 MB - Last synced at: 8 days ago - Pushed at: 2 months ago - Stars: 285 - Forks: 25

varvarailyina/mds_thesis

all code and results for my MDS thesis at the hertie school

Language: Jupyter Notebook - Size: 72.3 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 0 - Forks: 0

JasonKessler/scattertext

Beautiful visualizations of how language differs among document types.

Language: Python - Size: 39.4 MB - Last synced at: 19 days ago - Pushed at: about 1 month ago - Stars: 2,302 - Forks: 292

chkla/CSS-Events

Summer/ winter schools, workshops and conferences in computational social science 🫂

Size: 225 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 40 - Forks: 2

ichalkiad/datadescriptor_uselections2020

Code for collecting and cleaning speeches (text) of the US 2020 election campaign. Corresponding publication: "A text dataset of campaign speeches of the main tickets in the 2020 US presidential election", by Ioannis Chalkiadakis, Louise Anglès d’Auriac, Gareth W. Peters, and Divina Frau-Meigs

Language: Python - Size: 38.3 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

ryanjgallagher/shifterator

Interpretable data visualizations for understanding how texts differ at the word level

Language: Python - Size: 40.1 MB - Last synced at: 27 days ago - Pushed at: 4 months ago - Stars: 275 - Forks: 29

davidycliao/bisCrawler

An Automation Webcrawler for Extracting Central Bankers' Speeches

Language: Python - Size: 59.7 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 10 - Forks: 2

chkla/Populism-Text-Analysis

Literature 📄 and datasets 📚 on automatic populism detection

Size: 268 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 16 - Forks: 0

JasonKessler/Scattertext-PyData

Notebooks for the Seattle PyData 2017 talk on Scattertext

Language: HTML - Size: 20.7 MB - Last synced at: about 1 month ago - Pushed at: over 7 years ago - Stars: 142 - Forks: 53

marek-chadim/Empirical-Economics

Coding and Machine Learning for Economists PhD course

Language: HTML - Size: 326 MB - Last synced at: 22 days ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

fedenanni/Computational-Text-Analysis-2018-19

2018 Computational Text Analysis Notebooks, University of Mannheim

Language: Jupyter Notebook - Size: 25.3 MB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 13 - Forks: 7

Jszabo16/EU-sentiments_NRSR

Replication script for mining sentiments towards the EU from Parliamentary Speeches in the National Council of the Slovak Republic (1994-2023)

Language: R - Size: 93.8 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

umanlp/SemScale Fork of codogogo/topfish

A tool for Semantic Scaling of Political Text (branch of Topfish, a suite of tools for Political Text Analysis)

Language: Python - Size: 17.5 MB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 26 - Forks: 4

thieled/dictvectoR

'dictvectoR' measures the similarity between a concept dictionary and documents, using fastText word vectors. Implements the "Distributed-Dictionary-Representation" (Garten et al. 2018) method in R.

Language: R - Size: 5.6 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 2

WZBSocialScienceCenter/tm_corona

A small showcase for topic modeling with the tmtoolkit Python package. I use a corpus of articles from the German online news website Spiegel Online (SPON) to create a topic model for before and during the COVID-19 pandemic.

Language: Jupyter Notebook - Size: 51.5 MB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 0

marcosfanton/stm_filobr

Uso de structural topic modeling para análise de teses e dissertações da pós-graduação em filosofia no Brasil.

Language: R - Size: 461 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

Sam-Gartenstein/Machine-Learning-for-the-Social-Sciences

Material from my Machine Learning for the Social Sciences course

Language: Jupyter Notebook - Size: 1.75 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

tweedmann/3x8emotions

Code and models for 3 different tools to measure appeals to 8 discrete emotions in German political text

Language: Jupyter Notebook - Size: 3.12 MB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 10 - Forks: 0

alexgatsby/Code-Samples---Alexsandra-Cavalcanti

A little sample of my recent work as a data analyst.

Language: HTML - Size: 11 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

aflueckiger/KED2022

The ABC of Computational Text Analysis. BA Seminar, Spring 2022, University of Lucerne

Language: HTML - Size: 187 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

BenjaminFReese/american_constitutional_praxis

This repository uses text-as-data methods alongside traditional primary source reading to analyze early American state constitutions. The R scripts create a function to scrape and clean the constitutional text, run sentiment analysis, calculate tf-idf, and perform LDA. This is a work-in-progress.

Language: HTML - Size: 2.78 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

CT-P/portuguese_open_data

Empirical framework applied to parliament discourses and Twitter data, with a Discourse Polarization Index.

Language: Jupyter Notebook - Size: 17.3 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

jfjelstul/regular-expressions-tutorial

A tutorial on using regular expressions in R

Size: 1.27 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 0

aflueckiger/KED2021

The ABC of Computational Text Analysis. BA Seminar, Spring 2021, University of Lucerne

Language: HTML - Size: 241 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

wesslen/summer2017-socialmedia

Summer 2017 Social Media Analytics Workshop Series

Language: HTML - Size: 22.5 MB - Last synced at: 2 months ago - Pushed at: about 7 years ago - Stars: 11 - Forks: 3

adamlauretig/gensim_in_R

Code for estimating word embeddings with gensim in R.

Size: 225 KB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 1

Related Keywords
text-as-data 29 computational-social-science 10 text-analysis 7 natural-language-processing 6 nlp 5 political-science 4 r 4 topic-modeling 4 machine-learning 3 sociology 3 sentiment-analysis 3 visualization 3 social-science 2 word2vec 2 teaching 2 word-embeddings 2 text-visualization 2 text-mining 2 webscraping 2 covid-19 2 emotions 2 python 2 scraping 2 political-communication 2 word-vectors 1 corona 1 word-representations 1 scaling 1 ideology 1 news 1 dictionary 1 text-scaling 1 slovakia 1 parliamentary-debate 1 european-union 1 teaching-materials 1 workflow 1 version-control 1 replication 1 programming 1 gis 1 pydata 1 political-parties 1 gender 1 populism 1 literature-review 1 wordfish 1 constitution 1 latent-dirichlet-allocation 1 political-theory 1 tf-idf 1 discourse 1 gentzkow 1 political-polarization 1 regular-expressions 1 stringr 1 text-data 1 tidyverse 1 tutorial 1 facebook-api 1 geospatial 1 twitter-api 1 gensim 1 topicmodeling 1 capes 1 educacao 1 lda 1 philosophy 1 pos-graduacao 1 stm 1 neural-networks 1 supervised-machine-learning 1 unsupervised-machine-learning 1 data-visualization-python 1 elections 1 geobr 1 impeachment-of-brazilian-president 1 logistic-regression 1 python-functions 1 python-functions-examples 1 speeches 1 qwen2-5 1 perspective-api 1 openai 1 ollama 1 nous-hermes 1 mistral 1 misinformation 1 llms-benchmarking 1 llm 1 llama 1 leaderboards 1 gpt-4o 1 gpt-4 1 elo-rating 1 deepseek 1 transformer-architecture 1 record-linkage 1 organizational-units 1 jax 1