Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

Package Usage: pypi: trafilatura

Python package and command-line tool designed to gather text on the Web. It includes discovery, extraction and text processing components. Its main applications are web crawling, downloads, scraping, and extraction of main texts, metadata and comments.
40 versions
Latest release: 5 months ago
62 dependent packages
190,592 downloads last month

View more package details: https://packages.ecosyste.ms/registries/pypi.org/packages/trafilatura

View more repository details: https://repos.ecosyste.ms/hosts/GitHub/repositories/adbar%2Ftrafilatura

Dependent Repos 1,140

admariner/obsei Fork of obsei/obsei
Obsei is a low code AI powered automation tool. It can be used in various business flows like social listening, AI based alerting, brand image analysis, comparative study and more .
  • * binder/requirements.txt
  • * sample-ui/requirements.txt

Size: 16.3 MB - Last synced: 5 days ago - Pushed: 18 days ago

noandrea/theNewsroom
what country is on the spot today
  • 0.3.0 poetry.lock
  • ^0.3.0 pyproject.toml
  • ==0.3.0 requirements.txt

Size: 6.35 MB - Last synced: about 1 year ago - Pushed: over 1 year ago

Nanosplitter/DadBot
A feature-filled Discord bot
  • * requirements.txt

Size: 1.99 MB - Last synced: about 2 months ago - Pushed: about 2 months ago

BeataStultica/WebDataScanner
  • ==1.0.0 requirements.txt

Size: 2.04 MB - Last synced: about 1 year ago - Pushed: over 2 years ago

shaneluna/disCOVr
This project was created to analyze misinformation on Twitter regarding COVID-19. The objective is to create a Neo4j graph database with relevant data for querying and analysis.
  • ==1.0.0 requirements.txt

Size: 127 KB - Last synced: about 1 year ago - Pushed: over 2 years ago

keshe4ka/analytic_web_organiser
Аналитический веб-органайзер закладок статей для изучения тематического направления
  • ==1.2.2 requirements.txt

Size: 209 KB - Last synced: about 1 year ago - Pushed: almost 2 years ago

BeyondMachines/goya-core
  • ==1.2.1 goya_core/requirements.txt

Size: 3.06 MB - Last synced: about 1 year ago - Pushed: over 1 year ago

adbar/trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
  • * docs/requirements.txt
  • ==1.2.2 tests/eval-requirements.txt

Size: 23.1 MB - Last synced: 5 days ago - Pushed: 7 days ago

ELTE-DH/HTML2TEI
Map the HTML schema of portals to valid TEI XML with the tags and structures used in them using small manual portal-specific configurations
  • 1.0.0 poetry.lock
  • ^1.0.0 pyproject.toml

Size: 17.6 MB - Last synced: 23 days ago - Pushed: 12 months ago

ShrutiBiradarrr/test
  • * openagent/knowledgebase/document_loaders/web/trafilatura_web/requirements.txt
  • 1.6.1 poetry.lock
  • ^1.6 pyproject.toml
  • ==1.6.1 requirements.txt

Size: 5.97 MB - Last synced: 4 months ago - Pushed: 9 months ago

obsei/obsei
Obsei is a low code AI powered automation tool. It can be used in various business flows like social listening, AI based alerting, brand image analysis, comparative study and more .
  • * sample-ui/requirements.txt

Size: 16.2 MB - Last synced: 3 days ago - Pushed: 3 days ago

kevinriste/podcast-transcribe
  • * imap/Pipfile
  • ==1.0.0 imap/Pipfile.lock

Size: 431 KB - Last synced: 2 months ago - Pushed: 2 months ago

dltj/km-tools
Personal Knowledge Management tools
  • * Pipfile
  • ==1.0.0 Pipfile.lock

Size: 215 KB - Last synced: about 1 month ago - Pushed: about 1 month ago

rudrajikadra/document-viewer-with-enhanced-reading-experience
  • ==1.3.0 requirements.txt

Size: 127 MB - Last synced: 7 months ago - Pushed: almost 2 years ago

rmwkwok/crawler
Multi-process crawler which extracts main content and sustain itself by extracting more links to crawl.
  • ==0.7.0 requirements.txt

Size: 85.9 KB - Last synced: 9 months ago - Pushed: about 3 years ago

GLAM-Workbench/web-archives
  • * requirements-unpinned.txt
  • * requirements.in
  • ==1.2.0 requirements.txt

Size: 40.5 MB - Last synced: about 1 year ago - Pushed: about 1 year ago

christianvadillo/InfoVac
Repositorio de proyecto final realizado en Saturdays.ai edición LATAM
  • ==0.5.2 requirements.txt

Size: 52.6 MB - Last synced: about 1 year ago - Pushed: over 3 years ago

FutureMakers2022Team13/PolitiParser
  • ==1.3.0 website/requirements.txt

Size: 6.42 MB - Last synced: 7 months ago - Pushed: almost 2 years ago

waser-technologies/data/nlu/en/web-search
Get summarized answers from the web.
  • * requirements.txt

Last synced: about 1 year ago

internetarchive/sandcrawler
Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki
  • >=1 python/Pipfile
  • ==1.0.0 python/Pipfile.lock

Size: 2.55 MB - Last synced: about 1 month ago - Pushed: over 1 year ago

Leibniz-HBI/newsfeedback
Tool for extracting and saving news article metadata (and optionally content) at regular intervals.
  • ^1.4.1 pyproject.toml

Size: 230 KB - Last synced: about 1 year ago - Pushed: about 1 year ago

waser-technologies/data/nlu/fr/web-search
Obtenez des réponses résumées à partir du Web.
  • * requirements.txt

Last synced: over 1 year ago

varisha-025/fake-news-app
This website is an ML model created using Sklearn and NLP, integrated with Django, hosted on Heroku. It predicts whether the given news headline or news URL is fake or not. The dataset we used is the one available in kaggle with some Asian news web scraped from the internet using the trafilatura library.
  • ==0.9.1 requirements.txt

Size: 5.53 MB - Last synced: about 1 month ago - Pushed: almost 2 years ago

leonov-av/vulristics
Extensible framework for analyzing publicly available information about vulnerabilities
  • * requirements.txt

Size: 1.67 MB - Last synced: 8 days ago - Pushed: 9 days ago

nilecui/keywords_en
  • ==1.3.0 requirements_dev.txt

Size: 36.1 KB - Last synced: 14 days ago - Pushed: almost 2 years ago

Ben-Apps/Backend_Tigergraph_KnowledgeKeeper
  • * requirements.txt

Size: 5.92 MB - Last synced: about 1 year ago - Pushed: about 2 years ago

moehmeni/ezweb
Easy to use web page analyzer
  • * requirements.txt
  • 1.4.0 poetry.lock
  • ^1.4.0 pyproject.toml

Size: 533 KB - Last synced: 6 days ago - Pushed: over 1 year ago

EMU-Compsci-Discord/CompsciBot
A discord bot for a computer science discord server
  • * requirements.txt

Size: 1.27 MB - Last synced: 27 days ago - Pushed: 27 days ago

Giriraj-Roy/Fake_News_Detector
  • ==0.9.1 requirements.txt

Size: 6.07 MB - Last synced: about 1 year ago - Pushed: over 2 years ago

vsalamand/superscraper
  • ==1.2.0 Pipfile.lock

Size: 1.1 MB - Last synced: about 1 year ago - Pushed: about 2 years ago

marcus-dislab/Data-for-Discourse-Analysis
  • ==1.2.2 requirements.txt

Size: 2.85 MB - Last synced: about 1 year ago - Pushed: almost 2 years ago

canelhasmateus/roi
A collection of data engineering scripts for the Gnosis project.
  • 1.2.2 poetry.lock
  • ^1.2.2 pyproject.toml

Size: 28.4 MB - Last synced: about 1 year ago - Pushed: over 1 year ago

scrapinghub/article-extraction-benchmark
Article extraction benchmark: dataset and evaluation scripts
  • ==0.5.1 requirements.txt

Size: 10.6 MB - Last synced: about 1 month ago - Pushed: about 1 month ago

nhatduy227/FinHelp
  • ==1.2.0 Backend/requirements.txt

Size: 10.3 MB - Last synced: 10 months ago - Pushed: about 2 years ago

zeoagency/mobile-first-indexing-tool
Mobile First Indexing Tool
  • * mfi-contents/requirements.txt

Size: 43.6 MB - Last synced: about 1 year ago - Pushed: over 1 year ago

hirofumi/docker-trafilatura
  • ==0.8.2 requirements.txt

Size: 19.5 KB - Last synced: about 1 month ago - Pushed: about 1 month ago

nddbk/microservices
A set of regular microservices for practice and reference
  • 0.9.3 article-parser-py/poetry.lock
  • ^0.9.3 article-parser-py/pyproject.toml

Size: 248 KB - Last synced: 28 days ago - Pushed: about 1 year ago

KajaBraz/Mini-Projects
Some of the projects developed for learning reasons.
  • ==0.8.2 requirements.txt

Size: 29 MB - Last synced: 4 months ago - Pushed: 4 months ago

oknowl/ldig
ldig - Link Digger - Finding sources online
  • * requirements.txt

Size: 33.2 KB - Last synced: about 1 year ago - Pushed: almost 2 years ago

nakamichiworks/openai-bot
Slack bot for OpenAI API
  • 1.4.1 backend/poetry.lock
  • ^1.4.1 backend/pyproject.toml

Size: 548 KB - Last synced: 9 months ago - Pushed: over 1 year ago

leroyanders/article-generator
Crawl search results by Google Search API, unique, translate and manage using client side.
  • ==1.2.1 requirements.txt

Size: 193 MB - Last synced: about 1 year ago - Pushed: over 1 year ago

asridharbaskari/langreader
  • ==0.9.0 requirements.txt

Size: 50.8 MB - Last synced: about 1 year ago - Pushed: almost 3 years ago

Vitalijus0/Question_gen
  • ==0.9.3 requirements2.txt

Size: 2.21 MB - Last synced: about 1 year ago - Pushed: over 1 year ago

dhchenx/opinionx
A toolkit to extract opinions and useful information from text
  • * setup.py

Size: 34.2 KB - Last synced: 16 days ago - Pushed: almost 2 years ago

yigitcevk/otogalerim
  • ==1.0.0 backend/requirements.txt

Size: 880 KB - Last synced: 6 months ago - Pushed: almost 2 years ago

ShaoXiangChien/STI
  • * requirements.txt

Size: 3.62 MB - Last synced: about 1 year ago - Pushed: about 1 year ago

Nanosplitter/DadBotV2.0
A dad for any discord server
  • * requirements.txt

Size: 680 KB - Last synced: about 1 year ago - Pushed: almost 2 years ago

kat-kel/mining-fake-news2
Practice using Minet's fetching/scraping features, and apply the extracted data in NLP contexts.
  • ==1.2.2 requirements.txt

Size: 56.6 KB - Last synced: about 1 month ago - Pushed: over 1 year ago

South-IN/GPT-3.5-ON-STEROIDS Fork of programmingninjas/GPT-3.5-ON-STEROIDS
GPT-3.5-ON-STEROIDS combines GPT with Python tools, empowering dynamic web scraping, language processing, and data retrieval. Contribute to advancing text generation with AI. 🚀
  • * requirements.txt

Size: 36.1 KB - Last synced: 7 months ago - Pushed: 7 months ago

h1788/ipn_search_engine
  • * backend/requirements.txt

Last synced: over 1 year ago

sandeeptuluri/new
  • ==0.9.1 requirement.txt

Size: 140 KB - Last synced: about 1 year ago - Pushed: over 1 year ago

opensorcerer19/ldig
  • * requirements.txt

Last synced: over 1 year ago

mndhlovu/trafilatura
Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)
  • * docs/requirements.txt
  • ==1.2.2 tests/eval-requirements.txt

Last synced: over 1 year ago

BieniekAlexander/ltt
  • ==1.0.0 backend/requirements.txt

Size: 3.06 MB - Last synced: 9 months ago - Pushed: 9 months ago

inesgrd/Scraping-Reddit
premier repository !
  • ==1.2.2 requirements.txt

Size: 18.7 MB - Last synced: about 1 year ago - Pushed: about 1 year ago

anmol562002/Panda-site-gpt
chat with any website
  • * requirements.txt

Size: 2.93 KB - Last synced: 7 months ago - Pushed: 7 months ago

dla-marbach/warc2graph
Warc2graph extracts a graph data structure from WARC files.
  • ==1.0.0 requirements.txt

Size: 473 KB - Last synced: 21 days ago - Pushed: almost 2 years ago

RyanCShelley/internal_links
Find internal link opportunities
  • * requirements.txt

Size: 159 KB - Last synced: 8 months ago - Pushed: over 1 year ago

Nootka-io/wee-benchmarking-tool
  • ==1.4.0 freeze-requirements.txt
  • ==1.4.0 requirements.txt

Size: 7.8 MB - Last synced: 6 months ago - Pushed: 7 months ago

medialab/minet
A webmining CLI tool & library for python.
  • ==1.2.0 requirements.txt
  • >=1.2.0,<1.3 setup.py

Size: 16 MB - Last synced: about 1 month ago - Pushed: about 2 months ago

gauthieret/online-fake-news-detection
online-fake-news-detection
  • * requirements.txt

Size: 85 KB - Last synced: about 1 month ago - Pushed: over 1 year ago

stephenkfrey/whisper-toolkit
  • ==1.4.0 requirements.txt

Size: 364 KB - Last synced: about 1 month ago - Pushed: over 1 year ago

mifkoff/cutterbot-oxforddigithon
  • ==0.4.1 requirements.txt

Size: 26.4 KB - Last synced: 6 months ago - Pushed: over 1 year ago

lingyaoz-325/llama_index Fork of run-llama/llama_index
LlamaIndex (GPT Index) is a data framework for your LLM applications
  • 1.6.2 poetry.lock

Size: 59.6 MB - Last synced: 7 months ago - Pushed: 7 months ago

Felliks/python-seo-analyzer Fork of sethblack/python-seo-analyzer
An SEO tool that analyzes the structure of a site, crawls the site, count words in the body of the site and warns of any technical SEO issues.
  • ==1.5.0 requirements.txt

Size: 164 KB - Last synced: 5 months ago - Pushed: 5 months ago

GirishPatel/obsei Fork of obsei/obsei
Obsei is intended to be a workflow automation tool for text segmentation need. Docs: https://lalitpagaria.github.io/obsei/
  • * conda/environment.yml
  • * sample-ui/requirements.txt

Size: 16.2 MB - Last synced: about 1 year ago - Pushed: over 1 year ago

anosharahim/myna-voice-ai
  • * backend/requirements.txt

Size: 17.2 MB - Last synced: 20 days ago - Pushed: 21 days ago

iit-Demokritos/clarin-el-annotation-tool
CLARIN-EL Web-based Annotation Tool
  • * docker/conf/requirements.txt

Size: 27.6 MB - Last synced: about 1 month ago - Pushed: 6 months ago

immortal-autumn/trafilatura Fork of adbar/trafilatura
Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)
  • * docs/requirements.txt
  • ==1.2.0 tests/eval-requirements.txt

Size: 36.7 MB - Last synced: about 1 year ago - Pushed: about 1 year ago

zanachka/trafilatura Fork of adbar/trafilatura
Web scraping library: downloads pages, extracts metadata, main text and comments, converts to TXT, CSV, XML & TEI
  • * docs/requirements.txt

Size: 15.3 MB - Last synced: about 1 year ago - Pushed: about 1 year ago

vishalbelsare/trafilatura Fork of adbar/trafilatura
Web scraping library and command-line tool to download, extract (metadata, main text, comments), and convert the output
  • * docs/requirements.txt

Size: 15.3 MB - Last synced: about 1 year ago - Pushed: about 1 year ago

admariner/minet Fork of medialab/minet
A webmining CLI tool & library for python.
  • ==1.4.1 requirements.txt
  • >=1.2.0,<1.3 setup.py

Size: 15.2 MB - Last synced: 5 days ago - Pushed: about 1 month ago

zanachka/minet Fork of medialab/minet
A webmining CLI tool & library for python.
  • ==1.2.0 requirements.txt
  • >=1.2.0,<1.3 setup.py

Size: 14.1 MB - Last synced: about 1 year ago - Pushed: about 1 year ago

seitzquest/square-core Fork of UKP-SQuARE/square-core 📦
SQuARE: Software for question answering research.
  • ==1.4.0 datastore-api/requirements.txt

Size: 37.4 MB - Last synced: 4 months ago - Pushed: about 1 year ago

opensanctions/storyweb 📦
Extract networks of entities from journalistic reporting
  • * setup.py

Size: 7.41 MB - Last synced: 2 months ago - Pushed: 10 months ago