GitHub topics: token-classification

Repositories

mhaugestad/chisel

A library to help with common NLP pre-processing tasks.

Language: Python - Size: 29 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 1

The MERIT Dataset is a fully synthetic, labeled dataset created for training and benchmarking LLMs on Visually Rich Document Understanding tasks. It is also designed to help detect biases and improve interpretability in LLMs, where we are actively working. This repository is actively maintained, and new features are continuously being added.

Language: Python - Size: 603 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 9 - Forks: 1

anudeepvanjavakam1/lit_or_not_on_reddit

This app searches reddit posts and comments to determine if a product or service has a positive or negative sentiment and predicts top product mentions using Named Entity Recognition

Language: Jupyter Notebook - Size: 33.1 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 2 - Forks: 0

KRLabsOrg/LettuceDetect

LettuceDetect is a hallucination detection framework for RAG applications.

Language: Python - Size: 7.12 MB - Last synced at: 11 days ago - Pushed at: about 2 months ago - Stars: 458 - Forks: 30

ufal/factgenie

Lightweight self-hosted span annotation tool

Language: Python - Size: 31.3 MB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 34 - Forks: 7

AadityaArunSingh/RoBERTa-Token-Classification-with-Additional-PLODv2-Data

This repo explores token classification for abbreviation and long-form detection using RoBERTa. We evaluate the impact of adding 50% of the PLODv2-filtered dataset, achieving improved F1 and recall. The repo includes methodology, evaluation using seqeval, and confusion matrix analysis.

Language: Python - Size: 11.7 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

tiansztiansz/python-data-science

b站 AI日日新不定期更新使用Python框架完成机器学习、深度学习、数据科学任务

Language: Jupyter Notebook - Size: 4.78 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

fido-ai/ua-datasets

A collection of datasets for Ukrainian language

Language: Python - Size: 2.08 MB - Last synced at: 19 days ago - Pushed at: 11 months ago - Stars: 57 - Forks: 2

modelscope/AdaSeq

AdaSeq: An All-in-One Library for Developing State-of-the-Art Sequence Understanding Models

Language: Python - Size: 5.03 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 437 - Forks: 41

4AI/LS-LLaMA

A Simple but Powerful SOTA NER Model | Official Code For Label Supervised LLaMA Finetuning

Language: Python - Size: 3.54 MB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 155 - Forks: 24

viktor-shcherb/vive_la_ner

The default way to fine-tune BERT is wrong. Here is why

Language: Jupyter Notebook - Size: 107 KB - Last synced at: 15 days ago - Pushed at: 7 months ago - Stars: 4 - Forks: 0

nlp4se/RE-Miner-Dashboard

NLP interactive dashboard for users to interact with the RE-Miner Ecosystem for data analysis, visualization, and NLP-based insights.

Language: SCSS - Size: 5.48 MB - Last synced at: 6 days ago - Pushed at: 5 months ago - Stars: 2 - Forks: 1

AshutoshDongare/softskill-NER

Fine tuning 🤗 transformer model for softskill NER task

Language: Jupyter Notebook - Size: 65.4 KB - Last synced at: 30 days ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 1

satya77/Transformer_Temporal_Tagger

Code and data form the paper BERT Got a Date: Introducing Transformers to Temporal Tagging

Language: Python - Size: 365 KB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 66 - Forks: 5

Kardbord/hfapigo

Unofficial (Golang) Go bindings for the Hugging Face Inference API

Language: Go - Size: 3.35 MB - Last synced at: 9 days ago - Pushed at: about 2 months ago - Stars: 62 - Forks: 5

TiagoSanti/LID-token-classification

Scrap, token classification and model deployment for a selective process.

Language: Python - Size: 415 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Antarlekhaka/code

Multi-task NLP Annotation Framework

Language: JavaScript - Size: 10.6 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 6 - Forks: 2

C-bianc/NER-task

Token classification for named entities

Language: Jupyter Notebook - Size: 3.37 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

arnabd64/spacy-ner-hf-space

A webapp built using Gradio for demonstrating the capabilities of the Spacy NER pipeline.

Language: Python - Size: 15.6 KB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Ahwar/NER-NLP-with-ONNX-Java

A Java NLP application that identifies names, organizations, and locations in text by utilizing Hugging Face's RoBERTa NER model through the ONNX runtime and the Deep Java Library.

Language: Java - Size: 218 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 4 - Forks: 2

nubebytes/Yoda-API

API for Yoda-NER and Yoda-FITS model. NLP models for Google Feed product optimization

Language: Python - Size: 1.15 MB - Last synced at: 4 months ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

matteo-stat/transformers-nlp-ner-token-classification

This repo provides scripts for fine-tuning HuggingFace Transformers, setting up pipelines and optimizing token classification models for inference. They are based on my experience developing a custom chatbot, I’m sharing these in the hope they will help others to quickly fine-tune and use models in their projects! 😊

Language: Python - Size: 22.5 KB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

luozhouyang/transformers-keras

Transformer-based models implemented in tensorflow 2.x(using keras).

Language: Python - Size: 696 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 75 - Forks: 13

1024-m/NAACL-2024-SemEval-TASK-8C

Code for the paper : Black-Box Word-Level Text Boundary Detection in Partially Machine Generated Texts

Language: Jupyter Notebook - Size: 28.2 MB - Last synced at: 4 months ago - Pushed at: 12 months ago - Stars: 2 - Forks: 0

JersonGB22/TokenClassification-TensorFlow

Language: Jupyter Notebook - Size: 586 KB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

aditeyabaral/maple-v2

MAPLEv2 - Multi-task Approach for generating blackout Poetry with Linguistic Evaluation

Language: Python - Size: 55 MB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

koshkidadanet/lilt-finetuning-piad-ya-ocr

Проект в рамках ВКР под названием "Разработка программного модуля для анализа документов, подтверждающих индивидуальные достижения"

Language: Jupyter Notebook - Size: 12.2 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

vedantMahangade/PII-Data-Detection

A reliable automated LLM based Model for detecting PII in Student Writing

Language: Jupyter Notebook - Size: 650 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Semihocakli/nlp-with-hugging-face

Language: Jupyter Notebook - Size: 247 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

TirendazAcademy/Multilingual-NER-App

Building a multilingual NER app with HuggingFace, Gradio and Comet

Language: Jupyter Notebook - Size: 23.7 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

MohammedAly22/ArabNizer

ArabiNizer is a state-of-the-art Arabic named entity recognizer (NER) leveraging the XLMR transformer model with an impressive testing accuracy of 95.00% and a remarkable testing F1-score of 88.00% on the PAN-X.AR subset from XTREME.

Language: Jupyter Notebook - Size: 147 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

MohammedAly22/Tasneef

A state-of-the-art Arabic part-of-speech tagger leveraging the XLMR transformer model With an impressive testing accuracy of 97.49% and a remarkable testing F1-score of 96.44% on the Arabic UD Treebank.

Language: Jupyter Notebook - Size: 217 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

lucien1011/kaggle-coleridgeinitiative-show-us-the-data

Keyword extraction to automate the discovery of dataset in publications and public reports

Language: Python - Size: 504 KB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

datnnt1997/VPhoBertTagger

Token classification using Phobert Models for Vietnamese

Language: Python - Size: 17.7 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 9 - Forks: 3

awsaf49/pii-data-detection

The Learning Agency Lab - PII Data Detection || Develop automated techniques to detect and remove PII from educational data.

Language: Jupyter Notebook - Size: 39.1 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

WikKam/roberta-pos-finetuning

Part-Of-Speech tagging in polish with finetuned RoBERTa model

Language: Jupyter Notebook - Size: 43.9 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

frankl1/SCIA-MMF-POS Fork of NTeALan/Sangkak-Challenge-IA

A 16M LLM for POS tagging in African languages

Language: Jupyter Notebook - Size: 6.87 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

token-classification 50 named-entity-recognition 17 nlp 16 ner 14 bert 13 transformers 12 pytorch 10 natural-language-processing 9 huggingface 9 huggingface-transformers 6 sequence-classification 5 part-of-speech-tagging 5 sequence-labeling 5 text-classification 5 fine-tuning 4 sentiment-analysis 4 deep-learning 4 tensorflow 4 question-answering 4 python 3 kaggle 3 transformer 3 bert-model 3 transfer-learning 3 nlp-machine-learning 3 ai 3 xlm-roberta 3 text-generation 2 distilbert 2 roberta 2 onnx 2 neural-network 2 machine-learning 2 keras 2 large-language-models 2 blackout-poetry 2 dataset 2 gradio 2 sequence-tagging 2 natural-language-understanding 2 part-of-speech-tagger 2 keyword-extraction 2 information-extraction 2 summarization 2 annotation-tool 2 roberta-model 2 data-annotation 2 llm 2 masked-language-models 2 simcse 2 finetuning 2 arabic-language 2 bert-fine-tuning 2 arabic-nlp 2 python3 2 yandex-cloud 1 russian-language 1 ocr 1 lilt 1 data-validation 1 spacy-pipeline 1 classfication 1 deep-java-library 1 djl 1 java 1 google-feed 1 text-summarization 1 yoda 1 huggingface-pipelines 1 inference-optimization 1 onnxruntime 1 albert 1 tensorflow-keras 1 instruct-gpt 1 bert-large 1 hugging-face 1 plotly 1 sickit-learn 1 grammar-checker 1 perplexity 1 documents 1 bigbird 1 pretrained-models 1 argument-mining 1 distill-bert 1 artificial-intelligence 1 token-tagging 1 camembert 1 french 1 keywords 1 nlp-keywords-extraction 1 pytorch-lightning 1 sentence-similarity 1 gan 1 gan-bert 1 multi-label-classification 1 multiple-choice 1 semi-supervised-learning 1 absa 1 electra 1