GitHub / shamspias / gpt3-data-preprocessing
This repository containing code for preprocessing text data from PDF and DOCX files for use with GPT-3. It includes steps such as tokenization, removal of stop words and punctuation, and formatting for GPT-3 input.
JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shamspias%2Fgpt3-data-preprocessing
Stars: 6
Forks: 1
Open issues: 0
License: None
Language: Python
Size: 11.7 KB
Dependencies parsed at: Pending
Created at: over 2 years ago
Updated at: 2 months ago
Pushed at: over 2 years ago
Last synced at: 27 days ago
Topics: artificial-intelligence, data-preprocessing, data-preprocessing-pipelines, data-science, gpt-3, machine-learning