Topic: "html2text"
adbar/trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Language: Python - Size: 33.8 MB - Last synced at: 8 days ago - Pushed at: about 2 months ago - Stars: 4,170 - Forks: 290

jaytaylor/html2text
Golang HTML to plaintext conversion library
Language: Go - Size: 56.6 KB - Last synced at: 26 days ago - Pushed at: over 1 year ago - Stars: 556 - Forks: 141

weblyzard/inscriptis
A python based HTML to text conversion library, command line client and Web service.
Language: Python - Size: 2.22 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 303 - Forks: 30

inaridiy/webforai
The best HTML to Markdown library, A esm-native & Useful Utilities with simple, lightweight and epic quality.
Language: TypeScript - Size: 3.45 MB - Last synced at: 6 months ago - Pushed at: 7 months ago - Stars: 51 - Forks: 5

voku/html2text Fork of mtibben/html2text
:memo: Html2Text - Convert HTML to formatted plain text, e.g. for text mails.
Language: PHP - Size: 333 KB - Last synced at: 6 days ago - Pushed at: 11 months ago - Stars: 37 - Forks: 8

ThatXliner/unmarkd
An extremely configurable markdown reverser for Python3.
Language: Python - Size: 2.17 MB - Last synced at: about 14 hours ago - Pushed at: about 1 year ago - Stars: 16 - Forks: 5

RxNLP/nlp-cloud-apis
RxNLP APIs for clustering sentences, extracting topics, counting words & n-grams, extracting text from html or URL, computing similarity between texts and more.
Size: 146 KB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 15 - Forks: 8

pH-7/Html2Text
A very simple (but efficient) "HTML to plain text" converter ✍️
Language: PHP - Size: 22.5 KB - Last synced at: 26 days ago - Pushed at: almost 2 years ago - Stars: 10 - Forks: 0

deedy5/html2text_rs
Python library for converting HTML to markup or plain text
Language: Rust - Size: 42 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 8 - Forks: 1

zacanger/html2txt 📦
html2text but in node
Language: JavaScript - Size: 1.17 MB - Last synced at: 8 days ago - Pushed at: 10 months ago - Stars: 8 - Forks: 0

AndyTheFactory/article-extraction-dataset
Article title, authors, date and body extraction dataset.
Language: HTML - Size: 31.9 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 1

x28/inscriptis-java
inscriptis - HTML to text conversion library for Java
Language: Java - Size: 111 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 2

kr1shnasomani/WebScrub
Python code which extracts the html content, converts it to clean text and pre-processes the text
Language: Python - Size: 558 KB - Last synced at: 28 days ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

gereoffy/deepspam2
DeepSpam milter v2
Language: Python - Size: 715 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

importcjj/go-readability Fork of go-shiori/go-readability
Go package that cleans a HTML page for better readability.
Language: HTML - Size: 95.7 KB - Last synced at: 11 months ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

dreipunktnull/twig-extensions 📦
A collection of useful, generic twig extensions.
Language: PHP - Size: 3.91 KB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 2 - Forks: 0

BrenoFariasdaSilva/Python
My Python Codes.
Language: Python - Size: 15.9 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

susilthapa/knowledge-retrieval-with-imgs
AI chat app to response data in Markdown format with text and images. Tutorial from: https://youtu.be/qKtM2AlDTs8
Language: Python - Size: 2.93 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 1

masroore/php-html2text
A PHP package to convert HTML into a plain text format
Language: PHP - Size: 10.7 KB - Last synced at: 13 days ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

puhoy/readability_cli
a cli tool to fetch webpages main content and print it as markdown
Language: Python - Size: 1.95 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

hcq0618/html-files-to-markdown-files
batch convert html files to mardown files
Language: Python - Size: 2.93 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 1 - Forks: 0

erayon/PubMed
This project involves building a robust classifier that classifies whether a document (from abstract content) belongs to cancer class or not.
Language: HTML - Size: 6.39 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 2

luminati-io/rag-chatbot
A Python-based RAG chatbot leveraging GPT-4o and Bright Data's SERP API to deliver contextually rich and up-to-date AI responses using real-time search engine data.
Size: 1.1 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

gemichelst/notesConverter
converts any .html file in a specified folder into a .txt file and combines all single .txt files into one big text file
Language: Shell - Size: 3.91 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

MattJeanLouis/scrap_web
C'est un projet de web scraping qui utilise Streamlit, BeautifulSoup, et html2text pour extraire, convertir en Markdown, et afficher le contenu de toutes les pages liées à une URL donnée. Il fournit un sommaire interactif des URL visitées et permet d'afficher le contenu extrait dans un format facile à lire.
Language: Python - Size: 10.7 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

LukaszNiewinski/Microservice-for-retrieving-img-and-text
Microservice for text and images collection for data science purposes.
Language: Python - Size: 371 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

afeiship/next-html2text
Strip html to text for next.
Language: JavaScript - Size: 17.6 KB - Last synced at: 3 days ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

gsdefender/packtpub_telegram_bot
Receive Packt Publishing Ltd. Free Learning updates in Telegram every day
Language: Python - Size: 48.8 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 1

AbdellatifCHE/Collect_Store_Search
The goal is to create a solution that crawls for articles from a news website (Theguardian), cleanses the response, stores it in a hosted mongo database (MongoDB Atlas), then makes it available to search via an API.
Language: Python - Size: 52.6 MB - Last synced at: 5 months ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 1

cycloidio/docker-image-html2text
Dockerized html2text command-line tool
Language: Makefile - Size: 11.7 KB - Last synced at: 2 months ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

cycloidio/docker-image-python-html2text
Dockerized Python html2text command-line tool
Language: Makefile - Size: 3.91 KB - Last synced at: 2 months ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0
