bertopic_github_dataset_viewer_issues
This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
Usage
To use this model, please install BERTopic:
pip install -U bertopic
You can use the model as follows:
from bertopic import BERTopic
topic_model = BERTopic.load("asoria/bertopic_github_dataset_viewer_issues")
topic_model.get_topic_info()
Topic overview
- Number of topics: 78
- Number of training documents: 3066
Click here for an overview of all topics.
Topic ID | Topic Keywords | Topic Frequency | Label |
---|---|---|---|
-1 | jobs - datasets - cache - fix - pandas | 11 | -1_jobs_datasets_cache_fix |
0 | issue - viewer - dataset - for - bigsciencep3 | 534 | 0_issue_viewer_dataset_for |
1 | parquet - files - metadata - parquetanddatasetinfo - configparquetandinfo | 144 | 1_parquet_files_metadata_parquetanddatasetinfo |
2 | vulnerability - cryptography - dependencies - 4106 - update | 132 | 2_vulnerability_cryptography_dependencies_4106 |
3 | docs - doc - page - add - md | 109 | 3_docs_doc_page_add |
4 | rows - firstrows - row - truncated - response | 90 | 4_rows_firstrows_row_truncated |
5 | duckdb - index - splitduckdbindex - fts - try | 78 | 5_duckdb_index_splitduckdbindex_fts |
6 | hub - hubcache - timeout - datasethubcache - tags | 75 | 6_hub_hubcache_timeout_datasethubcache |
7 | audio - opus - extension - torchaudio - torch | 59 | 7_audio_opus_extension_torchaudio |
8 | filter - endpoint - isvalid - column - parameters | 54 | 8_filter_endpoint_isvalid_column |
9 | datasets - update - upgrade - dependency - to | 54 | 9_datasets_update_upgrade_dependency |
10 | docker - images - build - image - compose | 53 | 10_docker_images_build_image |
11 | cache - refresh - entries - entry - warm | 51 | 11_cache_refresh_entries_entry |
12 | mongo - mongodb - indexes - atlas - index | 48 | 12_mongo_mongodb_indexes_atlas |
13 | image - images - modality - support - pdf2image | 47 | 13_image_images_modality_support |
14 | unblock - block - blocked - blocklist - datasets | 46 | 14_unblock_block_blocked_blocklist |
15 | error - expected - xerrorcode - messages - catch | 44 | 15_error_expected_xerrorcode_messages |
16 | backfill - cron - job - time - move | 44 | 16_backfill_cron_job_time |
17 | jobs - waiting - job - finishedat - started | 44 | 17_jobs_waiting_job_finishedat |
18 | env - config - configs - vars - default | 41 | 18_env_config_configs_vars |
19 | gitpython - 3137 - 3141 - github - builddepsdev | 41 | 19_gitpython_3137_3141_github |
20 | assets - s3 - cachedassets - cached - fsspec | 40 | 20_assets_s3_cachedassets_cached |
21 | splitnamesfromstreaming - split - streaming - rename - names | 39 | 21_splitnamesfromstreaming_split_streaming_rename |
22 | statistics - stats - descriptive - splitdescriptivestatistics - class | 38 | 22_statistics_stats_descriptive_splitdescriptivestatistics |
23 | private - gated - datasets - public - gatedauto | 35 | 23_private_gated_datasets_public |
24 | metrics - healthcheck - port - adminmetrics - admin | 33 | 24_metrics_healthcheck_port_adminmetrics |
25 | steps - processing - step - triggers - graph | 32 | 25_steps_processing_step_triggers |
26 | ci - codecov - pr - fork - invalid | 31 | 26_ci_codecov_pr_fork |
27 | splits - split - list - configs - returned | 31 | 27_splits_split_list_configs |
28 | openapi - openapijson - spec - publish - spectral | 31 | 28_openapi_openapijson_spec_publish |
29 | queue - incremental - based - field - jobs | 31 | 29_queue_incremental_based_field |
30 | error - datasetwithscriptnotsupportederror - exist - no - datasetgenerationerror | 31 | 30_error_datasetwithscriptnotsupportederror_exist_no |
31 | ram - 5gb - heavy - reduce - overcommitment | 31 | 31_ram_5gb_heavy_reduce |
32 | workers - number - reduce - increase - heavy | 30 | 32_workers_number_reduce_increase |
33 | admin - ui - app - difficulty - prefix | 30 | 33_admin_ui_app_difficulty |
34 | chart - fixchart - helm - alb - featchart | 28 | 34_chart_fixchart_helm_alb |
35 | aiohttp - 386 - bump - 392 - 391 | 27 | 35_aiohttp_386_bump_392 |
36 | e2e - tests - test - ci - testmetrics | 27 | 36_e2e_tests_test_ci |
37 | huggingfacehub - upgrade - 0151 - version - branch | 27 | 37_huggingfacehub_upgrade_0151_version |
38 | test - tests - unit - pytestmemray - fixtures | 26 | 38_test_tests_unit_pytestmemray |
39 | webhook - webhooks - payload - visibility - hub | 26 | 39_webhook_webhooks_payload_visibility |
40 | migration - migrations - database - scripts - databases | 26 | 40_migration_migrations_database_scripts |
41 | refactor - dead - code - remove - abstractions | 25 | 41_refactor_dead_code_remove |
42 | retry - retryable - codes - every - createcommiterror | 25 | 42_retry_retryable_codes_every |
43 | log - logs - debug - level - crashes | 25 | 43_log_logs_debug_level |
44 | croissant - jsonld - fields - either - recordset | 25 | 44_croissant_jsonld_fields_either |
45 | pods - pod - number - scale - reverseproxy | 24 | 45_pods_pod_number_scale |
46 | scan - urls - spawning - presidio - optinouturls | 24 | 46_scan_urls_spawning_presidio |
47 | resources - feat - reduce - increase - production | 22 | 47_resources_feat_reduce_increase |
48 | download - manual - require - enum - extracted | 21 | 48_download_manual_require_enum |
49 | comment - issues - close - fix - tag | 20 | 49_comment_issues_close_fix |
50 | cache - entries - clean - hf - blocked | 19 | 50_cache_entries_clean_hf |
51 | worker - generic - workerjobtypesblocked - treccartools - dependencies | 19 | 51_worker_generic_workerjobtypesblocked_treccartools |
52 | datasetviewer - rename - datasetsserver - domain - server | 18 | 52_datasetviewer_rename_datasetsserver_domain |
53 | across - group - pip - directories - bump | 18 | 53_across_group_pip_directories |
54 | runner - runners - validation - job - parent | 18 | 54_runner_runners_validation_job |
55 | upgrade - datasets - feat - 221 - 1162dev0 | 18 | 55_upgrade_datasets_feat_221 |
56 | jwt - array - authorization - cookies - bypass | 18 | 56_jwt_array_authorization_cookies |
57 | allow - script - scriptbased - scripts - redpajamadata1t | 17 | 57_allow_script_scriptbased_scripts |
58 | unique - metrics - metric - cache - cron | 16 | 58_unique_metrics_metric_cache |
59 | aiohttp - libslibcommon - libslibapi - 386 - 385 | 16 | 59_aiohttp_libslibcommon_libslibapi_386 |
60 | pillow - 1001 - 1020 - bump - from | 16 | 60_pillow_1001_1020_bump |
61 | storage - disk - storageclient - storageadmin - client | 15 | 61_storage_disk_storageclient_storageadmin |
62 | resources - increase - 108010 - reduce - 2468 | 15 | 62_resources_increase_108010_reduce |
63 | poetry - dependabot - align - version - 20 | 14 | 63_poetry_dependabot_align_version |
64 | upgrade - datasets - 188 - pufanyimimicit - meaning | 14 | 64_upgrade_datasets_188_pufanyimimicit |
65 | auth - authentication - asynchronous - authcheck - 307 | 14 | 65_auth_authentication_asynchronous_authcheck |
66 | lock - locks - finishing - release - ttl | 14 | 66_lock_locks_finishing_release |
67 | nginx - proxy - reverse - reverseproxy - 1253 | 14 | 67_nginx_proxy_reverse_reverseproxy |
68 | orjson - 3915 - 390 - bump - from | 13 | 68_orjson_3915_390_bump |
69 | gradio - 3340 - 4110 - frontadminui - upgrade | 13 | 69_gradio_3340_4110_frontadminui |
70 | starlette - 0280 - 0362 - bump - 0231 | 13 | 70_starlette_0280_0362_bump |
71 | secrets - fixs3 - correct - secret - name | 13 | 71_secrets_fixs3_correct_secret |
72 | search - elastic - functionality - times - currently | 13 | 72_search_elastic_functionality_times |
73 | token - hftoken - app - secret - hf | 12 | 73_token_hftoken_app_secret |
74 | efs - nfs - mount - parquetmetadata - storage | 12 | 74_efs_nfs_mount_parquetmetadata |
75 | ruff - vscode - 045 - settings - ruffcache | 12 | 75_ruff_vscode_045_settings |
76 | kubernetes - kube - infrastructure - pdb - disruption | 12 | 76_kubernetes_kube_infrastructure_pdb |
Training hyperparameters
- calculate_probabilities: False
- language: english
- low_memory: False
- min_topic_size: 10
- n_gram_range: (1, 1)
- nr_topics: None
- seed_topic_list: None
- top_n_words: 10
- verbose: False
- zeroshot_min_similarity: 0.7
- zeroshot_topic_list: None
Framework versions
- Numpy: 1.26.4
- HDBSCAN: 0.8.38.post1
- UMAP: 0.5.6
- Pandas: 2.1.4
- Scikit-Learn: 1.5.2
- Sentence-transformers: 3.1.1
- Transformers: 4.44.2
- Numba: 0.60.0
- Plotly: 5.24.1
- Python: 3.10.12
- Downloads last month
- 4
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.