Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
Package Usage: pypi: trafilatura
Python package and command-line tool designed to gather text on the Web. It includes discovery, extraction and text processing components. Its main applications are web crawling, downloads, scraping, and extraction of main texts, metadata and comments.
40 versions
Latest release: 5 months ago
62 dependent packages
190,592 downloads last month
View more package details: https://packages.ecosyste.ms/registries/pypi.org/packages/trafilatura
View more repository details: https://repos.ecosyste.ms/hosts/GitHub/repositories/adbar%2Ftrafilatura
Dependent Repos 1,140
admariner/obsei Fork of obsei/obsei
Obsei is a low code AI powered automation tool. It can be used in various business flows like social listening, AI based alerting, brand image analysis, comparative study and more .- * binder/requirements.txt
- * sample-ui/requirements.txt
Size: 16.3 MB - Last synced: 5 days ago - Pushed: 18 days ago
noandrea/theNewsroom
what country is on the spot today- 0.3.0 poetry.lock
- ^0.3.0 pyproject.toml
- ==0.3.0 requirements.txt
Size: 6.35 MB - Last synced: about 1 year ago - Pushed: over 1 year ago
Nanosplitter/DadBot
A feature-filled Discord bot- * requirements.txt
Size: 1.99 MB - Last synced: about 2 months ago - Pushed: about 2 months ago
BeataStultica/WebDataScanner
- ==1.0.0 requirements.txt
Size: 2.04 MB - Last synced: about 1 year ago - Pushed: over 2 years ago
shaneluna/disCOVr
This project was created to analyze misinformation on Twitter regarding COVID-19. The objective is to create a Neo4j graph database with relevant data for querying and analysis.- ==1.0.0 requirements.txt
Size: 127 KB - Last synced: about 1 year ago - Pushed: over 2 years ago
keshe4ka/analytic_web_organiser
Аналитический веб-органайзер закладок статей для изучения тематического направления- ==1.2.2 requirements.txt
Size: 209 KB - Last synced: about 1 year ago - Pushed: almost 2 years ago
BeyondMachines/goya-core
- ==1.2.1 goya_core/requirements.txt
Size: 3.06 MB - Last synced: about 1 year ago - Pushed: over 1 year ago
adbar/trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments- * docs/requirements.txt
- ==1.2.2 tests/eval-requirements.txt
Size: 23.1 MB - Last synced: 5 days ago - Pushed: 7 days ago
ELTE-DH/HTML2TEI
Map the HTML schema of portals to valid TEI XML with the tags and structures used in them using small manual portal-specific configurations- 1.0.0 poetry.lock
- ^1.0.0 pyproject.toml
Size: 17.6 MB - Last synced: 23 days ago - Pushed: 12 months ago
ShrutiBiradarrr/test
- * openagent/knowledgebase/document_loaders/web/trafilatura_web/requirements.txt
- 1.6.1 poetry.lock
- ^1.6 pyproject.toml
- ==1.6.1 requirements.txt
Size: 5.97 MB - Last synced: 4 months ago - Pushed: 9 months ago
obsei/obsei
Obsei is a low code AI powered automation tool. It can be used in various business flows like social listening, AI based alerting, brand image analysis, comparative study and more .- * sample-ui/requirements.txt
Size: 16.2 MB - Last synced: 3 days ago - Pushed: 3 days ago
kevinriste/podcast-transcribe
- * imap/Pipfile
- ==1.0.0 imap/Pipfile.lock
Size: 431 KB - Last synced: 2 months ago - Pushed: 2 months ago
dltj/km-tools
Personal Knowledge Management tools- * Pipfile
- ==1.0.0 Pipfile.lock
Size: 215 KB - Last synced: about 1 month ago - Pushed: about 1 month ago
rudrajikadra/document-viewer-with-enhanced-reading-experience
- ==1.3.0 requirements.txt
Size: 127 MB - Last synced: 7 months ago - Pushed: almost 2 years ago
rmwkwok/crawler
Multi-process crawler which extracts main content and sustain itself by extracting more links to crawl.- ==0.7.0 requirements.txt
Size: 85.9 KB - Last synced: 9 months ago - Pushed: about 3 years ago
GLAM-Workbench/web-archives
- * requirements-unpinned.txt
- * requirements.in
- ==1.2.0 requirements.txt
Size: 40.5 MB - Last synced: about 1 year ago - Pushed: about 1 year ago
christianvadillo/InfoVac
Repositorio de proyecto final realizado en Saturdays.ai edición LATAM- ==0.5.2 requirements.txt
Size: 52.6 MB - Last synced: about 1 year ago - Pushed: over 3 years ago
FutureMakers2022Team13/PolitiParser
- ==1.3.0 website/requirements.txt
Size: 6.42 MB - Last synced: 7 months ago - Pushed: almost 2 years ago
waser-technologies/data/nlu/en/web-search
Get summarized answers from the web.- * requirements.txt
Last synced: about 1 year ago
internetarchive/sandcrawler
Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki- >=1 python/Pipfile
- ==1.0.0 python/Pipfile.lock
Size: 2.55 MB - Last synced: about 1 month ago - Pushed: over 1 year ago
Leibniz-HBI/newsfeedback
Tool for extracting and saving news article metadata (and optionally content) at regular intervals.- ^1.4.1 pyproject.toml
Size: 230 KB - Last synced: about 1 year ago - Pushed: about 1 year ago
waser-technologies/data/nlu/fr/web-search
Obtenez des réponses résumées à partir du Web.- * requirements.txt
Last synced: over 1 year ago
varisha-025/fake-news-app
This website is an ML model created using Sklearn and NLP, integrated with Django, hosted on Heroku. It predicts whether the given news headline or news URL is fake or not. The dataset we used is the one available in kaggle with some Asian news web scraped from the internet using the trafilatura library.- ==0.9.1 requirements.txt
Size: 5.53 MB - Last synced: about 1 month ago - Pushed: almost 2 years ago
leonov-av/vulristics
Extensible framework for analyzing publicly available information about vulnerabilities- * requirements.txt
Size: 1.67 MB - Last synced: 8 days ago - Pushed: 9 days ago
nilecui/keywords_en
- ==1.3.0 requirements_dev.txt
Size: 36.1 KB - Last synced: 14 days ago - Pushed: almost 2 years ago
Ben-Apps/Backend_Tigergraph_KnowledgeKeeper
- * requirements.txt
Size: 5.92 MB - Last synced: about 1 year ago - Pushed: about 2 years ago
moehmeni/ezweb
Easy to use web page analyzer- * requirements.txt
- 1.4.0 poetry.lock
- ^1.4.0 pyproject.toml
Size: 533 KB - Last synced: 6 days ago - Pushed: over 1 year ago
EMU-Compsci-Discord/CompsciBot
A discord bot for a computer science discord server- * requirements.txt
Size: 1.27 MB - Last synced: 27 days ago - Pushed: 27 days ago
Giriraj-Roy/Fake_News_Detector
- ==0.9.1 requirements.txt
Size: 6.07 MB - Last synced: about 1 year ago - Pushed: over 2 years ago
vsalamand/superscraper
- ==1.2.0 Pipfile.lock
Size: 1.1 MB - Last synced: about 1 year ago - Pushed: about 2 years ago
marcus-dislab/Data-for-Discourse-Analysis
- ==1.2.2 requirements.txt
Size: 2.85 MB - Last synced: about 1 year ago - Pushed: almost 2 years ago
canelhasmateus/roi
A collection of data engineering scripts for the Gnosis project.- 1.2.2 poetry.lock
- ^1.2.2 pyproject.toml
Size: 28.4 MB - Last synced: about 1 year ago - Pushed: over 1 year ago
scrapinghub/article-extraction-benchmark
Article extraction benchmark: dataset and evaluation scripts- ==0.5.1 requirements.txt
Size: 10.6 MB - Last synced: about 1 month ago - Pushed: about 1 month ago
nhatduy227/FinHelp
- ==1.2.0 Backend/requirements.txt
Size: 10.3 MB - Last synced: 10 months ago - Pushed: about 2 years ago
zeoagency/mobile-first-indexing-tool
Mobile First Indexing Tool- * mfi-contents/requirements.txt
Size: 43.6 MB - Last synced: about 1 year ago - Pushed: over 1 year ago
hirofumi/docker-trafilatura
- ==0.8.2 requirements.txt
Size: 19.5 KB - Last synced: about 1 month ago - Pushed: about 1 month ago
nddbk/microservices
A set of regular microservices for practice and reference- 0.9.3 article-parser-py/poetry.lock
- ^0.9.3 article-parser-py/pyproject.toml
Size: 248 KB - Last synced: 28 days ago - Pushed: about 1 year ago
KajaBraz/Mini-Projects
Some of the projects developed for learning reasons.- ==0.8.2 requirements.txt
Size: 29 MB - Last synced: 4 months ago - Pushed: 4 months ago
oknowl/ldig
ldig - Link Digger - Finding sources online- * requirements.txt
Size: 33.2 KB - Last synced: about 1 year ago - Pushed: almost 2 years ago
nakamichiworks/openai-bot
Slack bot for OpenAI API- 1.4.1 backend/poetry.lock
- ^1.4.1 backend/pyproject.toml
Size: 548 KB - Last synced: 9 months ago - Pushed: over 1 year ago
leroyanders/article-generator
Crawl search results by Google Search API, unique, translate and manage using client side.- ==1.2.1 requirements.txt
Size: 193 MB - Last synced: about 1 year ago - Pushed: over 1 year ago
asridharbaskari/langreader
- ==0.9.0 requirements.txt
Size: 50.8 MB - Last synced: about 1 year ago - Pushed: almost 3 years ago
Vitalijus0/Question_gen
- ==0.9.3 requirements2.txt
Size: 2.21 MB - Last synced: about 1 year ago - Pushed: over 1 year ago
dhchenx/opinionx
A toolkit to extract opinions and useful information from text- * setup.py
Size: 34.2 KB - Last synced: 16 days ago - Pushed: almost 2 years ago
yigitcevk/otogalerim
- ==1.0.0 backend/requirements.txt
Size: 880 KB - Last synced: 6 months ago - Pushed: almost 2 years ago
ShaoXiangChien/STI
- * requirements.txt
Size: 3.62 MB - Last synced: about 1 year ago - Pushed: about 1 year ago
Nanosplitter/DadBotV2.0
A dad for any discord server- * requirements.txt
Size: 680 KB - Last synced: about 1 year ago - Pushed: almost 2 years ago
kat-kel/mining-fake-news2
Practice using Minet's fetching/scraping features, and apply the extracted data in NLP contexts.- ==1.2.2 requirements.txt
Size: 56.6 KB - Last synced: about 1 month ago - Pushed: over 1 year ago
South-IN/GPT-3.5-ON-STEROIDS Fork of programmingninjas/GPT-3.5-ON-STEROIDS
GPT-3.5-ON-STEROIDS combines GPT with Python tools, empowering dynamic web scraping, language processing, and data retrieval. Contribute to advancing text generation with AI. 🚀- * requirements.txt
Size: 36.1 KB - Last synced: 7 months ago - Pushed: 7 months ago
sandeeptuluri/new
- ==0.9.1 requirement.txt
Size: 140 KB - Last synced: about 1 year ago - Pushed: over 1 year ago
mndhlovu/trafilatura
Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)- * docs/requirements.txt
- ==1.2.2 tests/eval-requirements.txt
Last synced: over 1 year ago
BieniekAlexander/ltt
- ==1.0.0 backend/requirements.txt
Size: 3.06 MB - Last synced: 9 months ago - Pushed: 9 months ago
inesgrd/Scraping-Reddit
premier repository !- ==1.2.2 requirements.txt
Size: 18.7 MB - Last synced: about 1 year ago - Pushed: about 1 year ago
anmol562002/Panda-site-gpt
chat with any website- * requirements.txt
Size: 2.93 KB - Last synced: 7 months ago - Pushed: 7 months ago
dla-marbach/warc2graph
Warc2graph extracts a graph data structure from WARC files.- ==1.0.0 requirements.txt
Size: 473 KB - Last synced: 21 days ago - Pushed: almost 2 years ago
RyanCShelley/internal_links
Find internal link opportunities- * requirements.txt
Size: 159 KB - Last synced: 8 months ago - Pushed: over 1 year ago
Nootka-io/wee-benchmarking-tool
- ==1.4.0 freeze-requirements.txt
- ==1.4.0 requirements.txt
Size: 7.8 MB - Last synced: 6 months ago - Pushed: 7 months ago
medialab/minet
A webmining CLI tool & library for python.- ==1.2.0 requirements.txt
- >=1.2.0,<1.3 setup.py
Size: 16 MB - Last synced: about 1 month ago - Pushed: about 2 months ago
gauthieret/online-fake-news-detection
online-fake-news-detection- * requirements.txt
Size: 85 KB - Last synced: about 1 month ago - Pushed: over 1 year ago
stephenkfrey/whisper-toolkit
- ==1.4.0 requirements.txt
Size: 364 KB - Last synced: about 1 month ago - Pushed: over 1 year ago
mifkoff/cutterbot-oxforddigithon
- ==0.4.1 requirements.txt
Size: 26.4 KB - Last synced: 6 months ago - Pushed: over 1 year ago
lingyaoz-325/llama_index Fork of run-llama/llama_index
LlamaIndex (GPT Index) is a data framework for your LLM applications- 1.6.2 poetry.lock
Size: 59.6 MB - Last synced: 7 months ago - Pushed: 7 months ago
Felliks/python-seo-analyzer Fork of sethblack/python-seo-analyzer
An SEO tool that analyzes the structure of a site, crawls the site, count words in the body of the site and warns of any technical SEO issues.- ==1.5.0 requirements.txt
Size: 164 KB - Last synced: 5 months ago - Pushed: 5 months ago
GirishPatel/obsei Fork of obsei/obsei
Obsei is intended to be a workflow automation tool for text segmentation need. Docs: https://lalitpagaria.github.io/obsei/- * conda/environment.yml
- * sample-ui/requirements.txt
Size: 16.2 MB - Last synced: about 1 year ago - Pushed: over 1 year ago
anosharahim/myna-voice-ai
- * backend/requirements.txt
Size: 17.2 MB - Last synced: 20 days ago - Pushed: 21 days ago
iit-Demokritos/clarin-el-annotation-tool
CLARIN-EL Web-based Annotation Tool- * docker/conf/requirements.txt
Size: 27.6 MB - Last synced: about 1 month ago - Pushed: 6 months ago
immortal-autumn/trafilatura Fork of adbar/trafilatura
Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)- * docs/requirements.txt
- ==1.2.0 tests/eval-requirements.txt
Size: 36.7 MB - Last synced: about 1 year ago - Pushed: about 1 year ago
zanachka/trafilatura Fork of adbar/trafilatura
Web scraping library: downloads pages, extracts metadata, main text and comments, converts to TXT, CSV, XML & TEI- * docs/requirements.txt
Size: 15.3 MB - Last synced: about 1 year ago - Pushed: about 1 year ago
vishalbelsare/trafilatura Fork of adbar/trafilatura
Web scraping library and command-line tool to download, extract (metadata, main text, comments), and convert the output- * docs/requirements.txt
Size: 15.3 MB - Last synced: about 1 year ago - Pushed: about 1 year ago
admariner/minet Fork of medialab/minet
A webmining CLI tool & library for python.- ==1.4.1 requirements.txt
- >=1.2.0,<1.3 setup.py
Size: 15.2 MB - Last synced: 5 days ago - Pushed: about 1 month ago
zanachka/minet Fork of medialab/minet
A webmining CLI tool & library for python.- ==1.2.0 requirements.txt
- >=1.2.0,<1.3 setup.py
Size: 14.1 MB - Last synced: about 1 year ago - Pushed: about 1 year ago
seitzquest/square-core Fork of UKP-SQuARE/square-core 📦
SQuARE: Software for question answering research.- ==1.4.0 datastore-api/requirements.txt
Size: 37.4 MB - Last synced: 4 months ago - Pushed: about 1 year ago
opensanctions/storyweb 📦
Extract networks of entities from journalistic reporting- * setup.py
Size: 7.41 MB - Last synced: 2 months ago - Pushed: 10 months ago