An open API service providing repository metadata for many open source software ecosystems.

Topic: "text-clustering"

jbesomi/texthero

Text preprocessing, representation and visualization from zero to hero.

Language: Python - Size: 22.1 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 2,904 - Forks: 240

xlang-ai/instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings

Language: Python - Size: 170 MB - Last synced at: 23 days ago - Pushed at: 4 months ago - Stars: 1,933 - Forks: 146

murray-z/text_analysis_tools

中文文本分析工具包(包括- 文本分类 - 文本聚类 - 文本相似性 - 关键词抽取 - 关键短语抽取 - 情感分析 - 文本纠错 - 文本摘要 - 主题关键词-同义词、近义词-事件三元组抽取)

Language: Python - Size: 9.98 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 533 - Forks: 114

RandyPen/TextCluster

短文本聚类预处理模块 Short text cluster

Language: Python - Size: 1.25 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 238 - Forks: 60

Edward1Chou/textClustering

Language: Jupyter Notebook - Size: 3.51 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 129 - Forks: 52

plkmo/NLP_Toolkit

Library of state-of-the-art models (PyTorch) for NLP tasks

Language: Python - Size: 4.3 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 108 - Forks: 27

sidphbot/Auto-Research

Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

Language: Python - Size: 429 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 57 - Forks: 7

LMU-Seminar-LLMs/TopicGPT

TopicGPT allows to integrate the benefits of LLMs into Topic Modelling

Language: Python - Size: 14 MB - Last synced at: 3 days ago - Pushed at: 11 months ago - Stars: 25 - Forks: 3

KeremZaman/semantic-sh

semantic-sh is a SimHash implementation to detect and group similar texts by taking power of word vectors and transformer-based language models (BERT).

Language: Python - Size: 40 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 23 - Forks: 3

Tikquuss/meta_XLM

Cross-lingual Language Model (XLM) pretraining and Model-Agnostic Meta-Learning (MAML) for fast adaptation of deep networks

Language: Jupyter Notebook - Size: 32.7 MB - Last synced at: 4 days ago - Pushed at: about 4 years ago - Stars: 20 - Forks: 4

trinker/clustext

Easy, fast clustering of texts

Language: R - Size: 732 KB - Last synced at: about 1 month ago - Pushed at: about 8 years ago - Stars: 18 - Forks: 3

ArikReuter/TopicGPT

TopicGPT allows to integrate the benefits of LLMs into Topic Modelling

Language: Python - Size: 15.3 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 14 - Forks: 3

JayKumarr/OSDM

This code belongs to ACL conference paper entitled as "An Online Semantic-enhanced Dirichlet Model for Short Text Stream Clustering"

Language: Python - Size: 831 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 14 - Forks: 4

ttavni/2D_Text_Clustering

Using word embeddings, TFIDF and text-hashing to cluster and visualise text documents

Language: Python - Size: 8.15 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 14 - Forks: 5

1997alireza/QA-Clustering

Implementation of some algorithms for text clustering

Language: Python - Size: 75.2 KB - Last synced at: 12 months ago - Pushed at: over 6 years ago - Stars: 14 - Forks: 1

Navy10021/SLS

SLS : Neural Information Retrieval(IR)-based Semantic Search model

Language: Jupyter Notebook - Size: 1.83 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 13 - Forks: 4

pemagrg1/sentence-clustering

Sentence Clustering and visualization. Created Date: 25 Apr 2018

Language: Python - Size: 85.9 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 13 - Forks: 9

chrisPiemonte/url2vec

Graph clustering and Node embeddings with word2vec

Language: Python - Size: 159 MB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 13 - Forks: 7

durgeshsamariya/awesome-clustering-resources

Clustering related books and research papers.

Size: 41 KB - Last synced at: 5 days ago - Pushed at: over 4 years ago - Stars: 12 - Forks: 1

alaradirik/TR-NLP-workshop

2020 Açık Seminer - Turkish NLP workshop

Language: Jupyter Notebook - Size: 8.43 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 12 - Forks: 3

ThorstenDoherr/searchengine

heuristic matching of large databases by fuzzy criteria like addresses

Language: xBase - Size: 87.9 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 11 - Forks: 1

SpringerNLP/Chapter3

Chapter 3: Text and Speech Basics

Language: Jupyter Notebook - Size: 6.88 MB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 11 - Forks: 6

eigenfoo/reddit-clusters

Understanding hateful subreddits through text clustering

Language: Python - Size: 80.4 MB - Last synced at: 22 days ago - Pushed at: over 6 years ago - Stars: 11 - Forks: 2

VIDA-NYU/domain_discovery_API 📦

Domain Discovery Operations API formalizes the human domain discovery process by defining a set of operations that capture the essential tasks that lead to domain discovery on the Web as we have discovered in interacting with the Subject Matter Experts (SME)s.

Language: Python - Size: 1.23 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 8 - Forks: 4

Navy10021/Parallel_Clustering_based_TM

Parallel clustering-based Topic Modeling

Language: Python - Size: 3.28 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 7 - Forks: 1

sowmyagowri/Text-Clustering

Python Program for Text Clustering using Bisecting k-means

Language: Jupyter Notebook - Size: 2.88 MB - Last synced at: 12 months ago - Pushed at: over 7 years ago - Stars: 7 - Forks: 8

gulabpatel/Transformers

Language: Jupyter Notebook - Size: 931 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 6 - Forks: 2

pemagrg1/Magic-Of-TFIDF

TFIDF being the most basic and simple topic in NLP, there's alot that can be done using TFIDF only! So, in this repo, I'll be adding the blog, TFIDF basics, wonders done using tfidf etc.

Language: Jupyter Notebook - Size: 465 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 6 - Forks: 2

MNoorFawi/text-kmeans-clustering-with-python

simple text clustering using kmeans algorithm

Language: Python - Size: 3.91 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 6 - Forks: 3

Narius2030/Find-Similar-Vietnamese-Texts

This project build a classification model for topics of news. With the target is automatically recognize suitable topic (class) to a random article. There are two architectures implemented which are LSTM and Hybrid models

Language: Jupyter Notebook - Size: 264 MB - Last synced at: 26 days ago - Pushed at: 11 months ago - Stars: 5 - Forks: 0

alexiszamanidis/news_articles_text_mining

News Articles Text Classification and Clustering using Machine Learning in Python. Also, KNN implementation from scratch using max heap.

Language: Jupyter Notebook - Size: 30.9 MB - Last synced at: 4 days ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 4

pokarats/gsdmm

Gibbs Sampling Dirichlet Multinomial Model (GSDMM) for Short-Text Clustering

Language: Python - Size: 4.51 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 5 - Forks: 1

scionoftech/TopicModeling_and_Text_Clustering

Topic Modeling and Text Cluster Analysis

Language: Jupyter Notebook - Size: 192 KB - Last synced at: 30 days ago - Pushed at: over 5 years ago - Stars: 5 - Forks: 1

Dennis1989/textClustPy

This is an implementation of the TextClust algorithm in Python 3.

Language: Python - Size: 165 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 2

sharmaroshan/Text-Clustering

It is a very different task, as here I am going to cluster 200 different texts related to games and sports in 2 or more different clusters. we can also use zipf plot to determine how many useful clusters can be formed.

Language: Jupyter Notebook - Size: 495 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 4 - Forks: 5

goodman1204/Discovering-Topic-Representative-Terms-for-Short-Text-Clustering

Source code for "Discovering Topic Representative Terms for Short Text Clustering (IEEE Access)"

Language: Python - Size: 3.86 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

borisfoko/Spark-Text-Clustering

Text clustering in spark with scala using LDA Model on a TF-IDF matrix

Language: Scala - Size: 52.8 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0

tychen5/IR_TextMining

Information Retrieval project implementation

Language: Jupyter Notebook - Size: 8.96 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 0

AFAgarap/ag-news-ae-clustering

Using an Autoencoder to encode features for k-Means Clustering on the AG News Dataset

Language: Python - Size: 27.6 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 0

bhattbhavesh91/texthero-demo

Tutorial to demonstrate the power of Texthero which is a library used for Text preprocessing, representation and visualization from zero to hero.

Language: Jupyter Notebook - Size: 511 KB - Last synced at: about 2 months ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 0

souravs17031999/MicrosoftBing-search-query-prediction

Analysis and Visualizations for COVID-19 Bing search engine queries + Classifier pipeline for predicting country based on search query.

Language: HTML - Size: 11.8 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

tiansztiansz/python-data-science

b站 AI日日新 不定期更新使用Python框架完成机器学习、深度学习、数据科学任务

Language: Jupyter Notebook - Size: 3.42 MB - Last synced at: about 11 hours ago - Pushed at: about 12 hours ago - Stars: 1 - Forks: 0

till-tietz/gsdmm

GSDMM Short Text Clustering via Dirichlet Mixture Models

Language: C++ - Size: 1.28 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 1

DrKenReid/Generalized-Analysis-of-Text-Data

A comprehensive toolkit for analyzing text data using various AI and NLP techniques, including topic modeling, sentiment analysis, and text classification, demonstrated on the 20 Newsgroups dataset.

Language: Jupyter Notebook - Size: 1.45 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

jwchoi95/matsciexp

Official source codes for implementing "Quantitative Topic Analysis of Materials Science Literature Using Natural Language Processing"

Language: Python - Size: 3.13 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 1

binhetech/text_clustering

Text Clustering 文本聚类

Language: Python - Size: 16.6 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

JayKumarr/OSGM

This code belongs to paper entitled "An Online Semantic-enhanced Graphical Model for Evolving Short Text Stream Clustering"

Language: Python - Size: 838 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Savaw/MIR-project

Implementation of an information retrieval system, an error correction system, a crawler, and some classification and clustering algorithms on text date

Language: Jupyter Notebook - Size: 27.9 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 1

mwritescode/text-categorization-with-WEKA

This repository contains the data used for our paper 'Text categorization with WEKA: a survey'.

Size: 216 KB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

kimdanny/Olfactory-NLP

Language: Jupyter Notebook - Size: 21.7 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 1

intellisol/amazon_nlp

Amazon Fine Foods Review

Language: Jupyter Notebook - Size: 1.18 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 1

aqstack/Document-Clustering

Document Clustering using bisecting K-Means algorithm.

Language: Jupyter Notebook - Size: 2.33 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

bluella/Text-clusterization-overview

This project is created to test different text vectorization techniques in order to perform further clusterization..

Language: Python - Size: 7.33 MB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

olha-kaminska/sentence_similarity

NLP tools for sentence similarity, text classification, text clusterization etc.

Language: Jupyter Notebook - Size: 107 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

DannyMerkx/Text_Clustering

Ant colony optimisation algorithm for text clustering

Language: Matlab - Size: 1.03 MB - Last synced at: over 1 year ago - Pushed at: almost 8 years ago - Stars: 1 - Forks: 1

SkywardAI/hackathon-leaderboard

Automated Leaderboard System for Hackathon Evaluation Using Large Language Models

Language: JavaScript - Size: 238 KB - Last synced at: 28 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

pngo1997/Document-Clustering-using-K-Means

Performs unsupervised clustering on text documents.

Language: Jupyter Notebook - Size: 1.36 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

michabirklbauer/hgb_dse_text_mining 📦

Contents for the practical part of the lecture Text Mining

Language: Jupyter Notebook - Size: 60.1 MB - Last synced at: 21 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 3

milican04/Data-Mining-Poetry-Dataset

Language: Jupyter Notebook - Size: 7.29 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Mohana-Murugan/NLP

NLP

Language: Jupyter Notebook - Size: 4.44 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

vectorkoz/my-nlp

Unsupervised learning. My Natural Language Processing project on Topic Extraction and Text Clustering.

Language: Jupyter Notebook - Size: 12.2 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

saaadiqh/NLP-Learning_Analytics

Developing Natural Language Processing tools to enhance Learning Analytics. Creating an automated dashboard that diagnoses strengths and weaknesses from educational data.

Language: Python - Size: 3.07 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

Viper373/Chengdu-Emotion

网易云音乐《成都》评论的文本聚类与情感分析

Language: Jupyter Notebook - Size: 21.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

RimTouny/Enhancing-Gutenberg-Book-Clustering-using-Advanced-NLP-Techniques

Text clustering, an unsupervised ML technique in NLP, groups similar texts based on content. Techniques like hierarchical, k-means, or density-based clustering categorize unstructured data, unveiling insights and patterns in diverse datasets. This exploration was part of the NLP course in my University of Ottawa master's program in 2023.

Language: Jupyter Notebook - Size: 1.81 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

LinggarM/Movie-Synopsis-Text-Clustering

Movie Synopsis Text Clustering using K-Means Clustering and TF-IDF Vectorizer and deployment using framework Flask

Language: Jupyter Notebook - Size: 49.1 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

melliottgithub/text-analytics

Text analytics projects using NLTK, Scikit-learn and others

Language: Jupyter Notebook - Size: 1.42 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Purushothaman-natarajan/NLP-TEXT-PROCESSING

This project offers advanced techniques in text preprocessing, word embeddings, and text classification. Explore methods like Word2Vec and GloVe, and master Multinomial Naive Bayes for accurate predictions. Dive into the world of text clustering and conquer challenges like unbalanced data.

Language: Jupyter Notebook - Size: 7.38 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

michabirklbauer/hgb_dse_text_mining_solutions 📦

Solutions for the practical part of the lecture Text Mining

Language: HTML - Size: 38.1 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

juman-j/text_clustering_Word2Vec_FastText

Text clustering. KNN and hierarchical algorithms.

Language: Jupyter Notebook - Size: 2.04 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

lucoliv23/Data-Analysis-with-Human-Generated-Text

Clustering of human generated text

Language: Jupyter Notebook - Size: 940 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

imis-lab/inpoint-ai-backend

The AI backend server for the inPOINT project.

Language: Python - Size: 105 KB - Last synced at: 11 months ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 1

Kairixir/URL-domain-name-clustering

Bachelor's thesis project with potential applications in Computer security

Language: Jupyter Notebook - Size: 41.7 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

RodolfoLSS/wine_analysis

Data analysis of a wine's dataset.

Language: Jupyter Notebook - Size: 270 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

elsdes3/text-clustering

Clustering stackexchange posts using text processing and unsupervised machine learning

Language: Jupyter Notebook - Size: 132 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

angel870326/Hierarchical-Clustering

hierarchical clustering for texts, scatter plots, and word clouds

Language: Jupyter Notebook - Size: 40.3 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

SaikatPhys/latent-space-text-clustering

Clustering sentences/text snippets by employing Sentence Embeddings (e.g., BERT, Universal Sentence Encoder etc) & various clustering algorithms like T-SNE, K-Means etc.

Language: Jupyter Notebook - Size: 62.5 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 1

97varun/restaurant-clustering

Cluster restaurants based on their menu items

Language: Jupyter Notebook - Size: 417 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

emeraldrains/tech_review Fork of BillyZhaohengLi/tech_review

Survey of Text Clustering Methods

Size: 244 KB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

agus2121/Clustering--PPKM-Tweet

Indonesian Text Clustering about PPKM(Pemberlakuan Pembatasan Kegiatan Masyarakat)

Language: Jupyter Notebook - Size: 1.86 MB - Last synced at: 4 months ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

CoreDotToday/CoreDotText

Python Text Mining Library

Language: Python - Size: 71.3 KB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 1

CrispenGari/RE-python

💎 Regular expression in python.

Language: Jupyter Notebook - Size: 69.3 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

fatihkykc/Text_clustering-classification

heuristic Implementations of text clustering and classification with k-means and naive bayes

Language: Jupyter Notebook - Size: 6.74 MB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

shrutibhutaiya/text-clustering

Text Clustering by K-Means

Language: Jupyter Notebook - Size: 249 KB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

ocramz/rational-kernels

Extending kernel methods to variable-length sequences

Language: Haskell - Size: 10.7 KB - Last synced at: about 2 months ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 1

fukuchancat/internship-coding-tasks

https://github.com/da-recruiting/internship-coding-tasks

Size: 11.2 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

Related Topics
text-classification 31 nlp 25 python 19 clustering 16 text-mining 15 machine-learning 14 natural-language-processing 13 topic-modeling 11 text-processing 9 word-embeddings 6 text-similarity 6 deep-learning 5 nlp-machine-learning 5 text-summarization 5 unsupervised-learning 5 tf-idf 5 tfidf 4 sentiment-analysis 4 word2vec 4 classification 4 python3 4 k-means-clustering 4 data-mining 4 text-preprocessing 4 kmeans-clustering 4 information-retrieval 4 pytorch 3 artificial-intelligence 3 clustering-algorithm 3 unsupervised-machine-learning 3 kmeans 3 text-visualization 3 tensorflow 3 spacy 3 sentence-embeddings 3 stream-clustering 2 scikit-learn 2 data-science 2 chatgpt 2 data-analysis 2 data-preprocessing 2 tfidf-vectorizer 2 gpt 2 openai-api 2 large-language-models 2 huggingface-transformers 2 gpt-3 2 embeddings 2 ner 2 bachelor-thesis 2 crawler 2 graph-embedding 2 nltk 2 data-stream 2 dirichlet-process-mixtures 2 wordcloud 2 text-stream 2 hierarchical-clustering 2 lda 2 bert 2 text-analysis 2 machine-translation 2 educational 2 r 2 how-to 2 summarization 2 knn 2 keras 2 kmeans-algorithm 2 texthero 2 nlp-pipeline 2 text-representation 2 sklearn 1 pos 1 speech-to-text 1 tlm 1 text-to-speech 1 wordembedding 1 transformer-architecture 1 token-classification 1 amazon-food-reviews 1 learning-analytics 1 newsgroups 1 dependency-parser 1 image-classification 1 embedding 1 network-visualization 1 topic-extraction 1 exploratory-data-analysis 1 document-similarity 1 micro-cluster 1 stick-breaking 1 apache-spark 1 idf 1 intellij 1 java 1 lda-model 1 lda-topic-modeling 1 scala 1 spark 1