gbyuvd commited on
Commit
5e82159
·
verified ·
1 Parent(s): 347aa5f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -11,18 +11,15 @@ model-index:
11
  - name: Accuracy
12
  type: Accuracy
13
  value: 0.6199
14
- - name: Macro F1
15
- type: F1
16
- value: 0.6127
17
- - name: Weighted F1
18
- type: F1
19
- value: 0.6127
20
- - name: Macro Precision
21
  type: Precision
22
  value: 0.6142
23
- - name: Macro Recall
24
  type: Recall
25
  value: 0.6199
 
 
 
26
  license: cc-by-nc-sa-4.0
27
  metrics:
28
  - accuracy
@@ -40,6 +37,9 @@ tags:
40
 
41
  # ChemFIE-DTP (DrugTargetPrediction - 221 Classes)
42
 
 
 
 
43
  This model is a multiclass sequence classification for 221 human protein drug targets, based on [gbyuvd/chemselfies-base-bertmlm](https://huggingface.co/gbyuvd/chemselfies-base-bertmlm) fine-tuned on a dataset derived from ChemBL34 (Zdrazil et al. 2023). It predicts potential drug targets using chemical structures represented as SELFIES (Self-Referencing Embedded Strings). The model was trained on a selected and balanced dataset of around 154k examples covering 221 distinct human protein targets. Data selection criteria included specific activity types (IC50, Ki, EC50) with values ≤ 10 µM, assay confidence scores ≥ 7, and exact activity relations. Among all drug target classes found in ChemBL34, classes with at least 1000 examples are selected then capped at 1000 for those with more samples. Building upon the pre-trained base model's pre-existing knowledge of SELFIES, this model is originally intended to validate the capabilities of the light-weight base model to be fine-tuned for various tasks, and for this model case, it might be useful for tasks related to early-stage drug discovery and target prediction (e.g. compounds annotations) - though its performance and applicability should be carefully evaluated for specific use cases (see [Evaluation](#evaluation))
44
 
45
  - List of classes available in the "label_dict.json"
 
11
  - name: Accuracy
12
  type: Accuracy
13
  value: 0.6199
14
+ - name: Weighted Precision
 
 
 
 
 
 
15
  type: Precision
16
  value: 0.6142
17
+ - name: Weighted Recall
18
  type: Recall
19
  value: 0.6199
20
+ - name: Weighted F1
21
+ type: F1
22
+ value: 0.6127
23
  license: cc-by-nc-sa-4.0
24
  metrics:
25
  - accuracy
 
37
 
38
  # ChemFIE-DTP (DrugTargetPrediction - 221 Classes)
39
 
40
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/667da868d653c0b02d6a2399/MrqaC51jl_8Qh4rVvkl2h.png)
41
+ [![ko-fi](https://ko-fi.com/img/githubbutton_sm.svg)](https://ko-fi.com/O4O710GFBZ)
42
+
43
  This model is a multiclass sequence classification for 221 human protein drug targets, based on [gbyuvd/chemselfies-base-bertmlm](https://huggingface.co/gbyuvd/chemselfies-base-bertmlm) fine-tuned on a dataset derived from ChemBL34 (Zdrazil et al. 2023). It predicts potential drug targets using chemical structures represented as SELFIES (Self-Referencing Embedded Strings). The model was trained on a selected and balanced dataset of around 154k examples covering 221 distinct human protein targets. Data selection criteria included specific activity types (IC50, Ki, EC50) with values ≤ 10 µM, assay confidence scores ≥ 7, and exact activity relations. Among all drug target classes found in ChemBL34, classes with at least 1000 examples are selected then capped at 1000 for those with more samples. Building upon the pre-trained base model's pre-existing knowledge of SELFIES, this model is originally intended to validate the capabilities of the light-weight base model to be fine-tuned for various tasks, and for this model case, it might be useful for tasks related to early-stage drug discovery and target prediction (e.g. compounds annotations) - though its performance and applicability should be carefully evaluated for specific use cases (see [Evaluation](#evaluation))
44
 
45
  - List of classes available in the "label_dict.json"