Add BERTopic model

0b7364f verified 9 months ago

5.13 kB


	---
	tags:
	- bertopic
	library_name: bertopic
	pipeline_tag: text-classification
	---

	# transformers_issues_topics

	This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
	BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

	## Usage

	To use this model, please install BERTopic:

	```
	pip install -U bertopic
	```

	You can use the model as follows:

	```python
	from bertopic import BERTopic
	topic_model = BERTopic.load("asoria/transformers_issues_topics")

	topic_model.get_topic_info()
	```

	## Topic overview

	* Number of topics: 30
	* Number of training documents: 9000

	<details>
	<summary>Click here for an overview of all topics.</summary>

	\| Topic ID \| Topic Keywords \| Topic Frequency \| Label \|
	\|----------\|----------------\|-----------------\|-------\|
	\| -1 \| pytorch - tensorflow - bert - tf - pretrained \| 15 \| -1_pytorch_tensorflow_bert_tf \|
	\| 0 \| bert - bertforsequenceclassification - berttokenizer - bart - batchencodeplus \| 2321 \| 0_bert_bertforsequenceclassification_berttokenizer_bart \|
	\| 1 \| cuda - memory - trainertrain - tensorflow - trainer \| 1554 \| 1_cuda_memory_trainertrain_tensorflow \|
	\| 2 \| transformerscli - transformers - transformer - importerror - transformerxl \| 882 \| 2_transformerscli_transformers_transformer_importerror \|
	\| 3 \| modelcard - modelcards - card - model - models \| 490 \| 3_modelcard_modelcards_card_model \|
	\| 4 \| gpt2 - gpt2tokenizer - gpt2xl - gpt2tokenizerfast - gpt2model \| 462 \| 4_gpt2_gpt2tokenizer_gpt2xl_gpt2tokenizerfast \|
	\| 5 \| attributeerror - typeerror - valueerror - runtimeerror - indexerror \| 437 \| 5_attributeerror_typeerror_valueerror_runtimeerror \|
	\| 6 \| typos - typo - doc - docstring - fix \| 336 \| 6_typos_typo_doc_docstring \|
	\| 7 \| t5 - t5model - t5base - tf - t5large \| 298 \| 7_t5_t5model_t5base_tf \|
	\| 8 \| readmemd - readmetxt - readme - modelcard - file \| 270 \| 8_readmemd_readmetxt_readme_modelcard \|
	\| 9 \| ci - testing - tests - test - speedup \| 254 \| 9_ci_testing_tests_test \|
	\| 10 \| s2s - s2sdistill - s2t - s2strainer - exampless2s \| 245 \| 10_s2s_s2sdistill_s2t_s2strainer \|
	\| 11 \| glue - gluepy - glueconvertexamplestofeatures - roberta - huggingfacetransformers \| 214 \| 11_glue_gluepy_glueconvertexamplestofeatures_roberta \|
	\| 12 \| ner - pipeline - pipelines - nerpipeline - fillmaskpipeline \| 158 \| 12_ner_pipeline_pipelines_nerpipeline \|
	\| 13 \| rag - ragtokenforgeneration - ragsequenceforgeneration - clean - tests \| 153 \| 13_rag_ragtokenforgeneration_ragsequenceforgeneration_clean \|
	\| 14 \| questionansweringpipeline - questionanswering - answering - tfalbertforquestionanswering - questionasnwering \| 143 \| 14_questionansweringpipeline_questionanswering_answering_tfalbertforquestionanswering \|
	\| 15 \| onnx - 04onnxexport - 04onnxexportipynb - aionnx - sphynx \| 131 \| 15_onnx_04onnxexport_04onnxexportipynb_aionnx \|
	\| 16 \| longformer - longformers - longform - longformerlayer - longformermodel \| 104 \| 16_longformer_longformers_longform_longformerlayer \|
	\| 17 \| labelsmoothednllloss - label - labelsmoothingfactor - labels - labelsmoothing \| 76 \| 17_labelsmoothednllloss_label_labelsmoothingfactor_labels \|
	\| 18 \| benchmark - benchmarking - benchmarks - accuracy - evaluation \| 73 \| 18_benchmark_benchmarking_benchmarks_accuracy \|
	\| 19 \| wav2vec2 - wav2vec - wav2vec20 - wav2vec2forctc - wav2vec2xlrswav2vec2 \| 67 \| 19_wav2vec2_wav2vec_wav2vec20_wav2vec2forctc \|
	\| 20 \| flax - flaxelectraformaskedlm - flaxelectraforpretraining - flaxjax - flaxelectramodel \| 51 \| 20_flax_flaxelectraformaskedlm_flaxelectraforpretraining_flaxjax \|
	\| 21 \| configpath - configs - config - configuration - modelconfigs \| 49 \| 21_configpath_configs_config_configuration \|
	\| 22 \| logging - logs - log - logger - loghistory \| 40 \| 22_logging_logs_log_logger \|
	\| 23 \| cachedir - cache - cachedpath - caching - cached \| 38 \| 23_cachedir_cache_cachedpath_caching \|
	\| 24 \| wandbproject - wandb - sagemaker - sagemakertrainer - wandbcallback \| 36 \| 24_wandbproject_wandb_sagemaker_sagemakertrainer \|
	\| 25 \| notebook - notebooks - community - colab - t5 \| 33 \| 25_notebook_notebooks_community_colab \|
	\| 26 \| electra - electrapretrainedmodel - electraformaskedlm - electraformultiplechoice - electrafortokenclassification \| 30 \| 26_electra_electrapretrainedmodel_electraformaskedlm_electraformultiplechoice \|
	\| 27 \| layoutlm - layout - layoutlmtokenizer - layoutlmbaseuncased - tf \| 25 \| 27_layoutlm_layout_layoutlmtokenizer_layoutlmbaseuncased \|
	\| 28 \| pplm - pr - deprecated - variable - ppl \| 15 \| 28_pplm_pr_deprecated_variable \|

	</details>

	## Training hyperparameters

	* calculate_probabilities: False
	* language: english
	* low_memory: False
	* min_topic_size: 10
	* n_gram_range: (1, 1)
	* nr_topics: 30
	* seed_topic_list: None
	* top_n_words: 10
	* verbose: True
	* zeroshot_min_similarity: 0.7
	* zeroshot_topic_list: None

	## Framework versions

	* Numpy: 1.26.4
	* HDBSCAN: 0.8.38.post1
	* UMAP: 0.5.6
	* Pandas: 2.1.4
	* Scikit-Learn: 1.5.2
	* Sentence-transformers: 3.1.1
	* Transformers: 4.44.2
	* Numba: 0.60.0
	* Plotly: 5.24.1
	* Python: 3.10.12