Update README.md
Browse files
README.md
CHANGED
@@ -14,15 +14,15 @@ model-index:
|
|
14 |
name: Text Classification
|
15 |
metrics:
|
16 |
- type: loss
|
17 |
-
value: 0.
|
18 |
- type: mse
|
19 |
-
value: 0.
|
20 |
name: Validation Mean Squared Error
|
21 |
- type: r2
|
22 |
-
value: 0.
|
23 |
name: Validation R-Squared
|
24 |
- type: mae
|
25 |
-
value: 0.
|
26 |
name: Validation Mean Absolute Error
|
27 |
language:
|
28 |
- en
|
@@ -33,57 +33,68 @@ This model utilizes the [Distilroberta base](https://huggingface.co/distilrobert
|
|
33 |
|
34 |
## Model description
|
35 |
|
36 |
-
The model evaluates the query for completeness and grammatical correctness, providing a score between 0 and 1, where 1 indicates correctness.
|
37 |
|
38 |
## Usage
|
39 |
|
40 |
Inference API has been disabled as this is a regression task, not a text classification task, and HuggingFace does not provide a pipeline for regression tasks.
|
41 |
|
42 |
```python
|
43 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
sentences = [
|
45 |
-
"The cat and dog in the yard.",
|
46 |
-
"she don't like apples.",
|
47 |
-
"Is rain sunny days sometimes?",
|
48 |
-
"She enjoys reading books and playing chess.",
|
49 |
-
"How many planets are there in our solar system?"
|
50 |
]
|
51 |
|
52 |
-
# Tokenizing the sentences
|
53 |
inputs = tokenizer(sentences, truncation=True, padding=True, return_tensors='pt')
|
54 |
|
55 |
-
|
56 |
-
|
57 |
-
model.eval() # Setting the model to evaluation mode
|
58 |
-
predicted_ratings = model(
|
59 |
-
input_ids=inputs['input_ids'],
|
60 |
-
attention_mask=inputs['attention_mask']
|
61 |
-
)
|
62 |
|
63 |
-
|
64 |
-
|
65 |
-
|
66 |
-
# Printing the predicted ratings
|
67 |
-
for i, rating in enumerate(predicted_ratings):
|
68 |
print(f'Sentence: {sentences[i]}')
|
69 |
print(f'Predicted Rating: {rating}\n')
|
70 |
```
|
71 |
Output:
|
72 |
```
|
73 |
Sentence: The cat and dog in the yard.
|
74 |
-
Predicted Rating: 0.
|
75 |
|
76 |
Sentence: she don't like apples.
|
77 |
-
Predicted Rating: 0.
|
78 |
|
79 |
Sentence: Is rain sunny days sometimes?
|
80 |
-
Predicted Rating: 0.
|
81 |
|
82 |
Sentence: She enjoys reading books and playing chess.
|
83 |
-
Predicted Rating: 0.
|
84 |
|
85 |
Sentence: How many planets are there in our solar system?
|
86 |
-
Predicted Rating: 0.
|
87 |
```
|
88 |
|
89 |
## Training and evaluation data
|
@@ -101,7 +112,7 @@ The following hyperparameters were used during training:
|
|
101 |
- seed: 42
|
102 |
- optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
|
103 |
- lr_scheduler_type: linear
|
104 |
-
- lr_scheduler_warmup_steps:
|
105 |
- num_epochs: 5
|
106 |
|
107 |
### Training results
|
@@ -109,10 +120,10 @@ The following hyperparameters were used during training:
|
|
109 |
Metrics: Mean Squared Error, R-Squared, Mean Absolute Error
|
110 |
|
111 |
```
|
112 |
-
'test_loss': 0.
|
113 |
-
'test_mse': 0.
|
114 |
-
'test_r2': 0.
|
115 |
-
'test_mae': 0.
|
116 |
```
|
117 |
|
118 |
### Framework versions
|
|
|
14 |
name: Text Classification
|
15 |
metrics:
|
16 |
- type: loss
|
17 |
+
value: 0.061837393790483475
|
18 |
- type: mse
|
19 |
+
value: 0.061837393790483475
|
20 |
name: Validation Mean Squared Error
|
21 |
- type: r2
|
22 |
+
value: 0.5726782083511353
|
23 |
name: Validation R-Squared
|
24 |
- type: mae
|
25 |
+
value: 0.183049738407135
|
26 |
name: Validation Mean Absolute Error
|
27 |
language:
|
28 |
- en
|
|
|
33 |
|
34 |
## Model description
|
35 |
|
36 |
+
A regression head has been appended to the DistilRoBERTa model to tailor it for a regression task. This additional component is crucial and needs to be loaded alongside the base model during inference to ensure accurate predictions. The model evaluates the query for completeness and grammatical correctness, providing a score between 0 and 1, where 1 indicates correctness.
|
37 |
|
38 |
## Usage
|
39 |
|
40 |
Inference API has been disabled as this is a regression task, not a text classification task, and HuggingFace does not provide a pipeline for regression tasks.
|
41 |
|
42 |
```python
|
43 |
+
import torch
|
44 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
45 |
+
tokenizer = AutoTokenizer.from_pretrained("AdamCodd/distilroberta-query-wellformedness")
|
46 |
+
|
47 |
+
class RegressionModel(torch.nn.Module):
|
48 |
+
def __init__(self):
|
49 |
+
super().__init__()
|
50 |
+
self.model = AutoModelForSequenceClassification.from_pretrained("AdamCodd/distilroberta-query-wellformedness")
|
51 |
+
self.regression_head = torch.nn.Linear(self.model.config.hidden_size, 1)
|
52 |
+
|
53 |
+
def forward(self, input_ids, attention_mask, **kwargs):
|
54 |
+
outputs = self.model.base_model(input_ids=input_ids, attention_mask=attention_mask)
|
55 |
+
rating = self.regression_head(outputs.last_hidden_state[:, 0, :])
|
56 |
+
rating = torch.sigmoid(rating)
|
57 |
+
return rating.squeeze()
|
58 |
+
|
59 |
+
regression_model = RegressionModel()
|
60 |
+
# Do not forget to set the correct path to load the regression head
|
61 |
+
regression_model.regression_head.load_state_dict(torch.load(r"path_to_the_regression_head.pth"))
|
62 |
+
regression_model.eval()
|
63 |
+
# Examples
|
64 |
sentences = [
|
65 |
+
"The cat and dog in the yard.",
|
66 |
+
"she don't like apples.",
|
67 |
+
"Is rain sunny days sometimes?",
|
68 |
+
"She enjoys reading books and playing chess.",
|
69 |
+
"How many planets are there in our solar system?"
|
70 |
]
|
71 |
|
|
|
72 |
inputs = tokenizer(sentences, truncation=True, padding=True, return_tensors='pt')
|
73 |
|
74 |
+
with torch.no_grad():
|
75 |
+
outputs = regression_model(input_ids=inputs['input_ids'], attention_mask=inputs['attention_mask'])
|
|
|
|
|
|
|
|
|
|
|
76 |
|
77 |
+
predictions = outputs.tolist()
|
78 |
+
for i, rating in enumerate(predictions):
|
|
|
|
|
|
|
79 |
print(f'Sentence: {sentences[i]}')
|
80 |
print(f'Predicted Rating: {rating}\n')
|
81 |
```
|
82 |
Output:
|
83 |
```
|
84 |
Sentence: The cat and dog in the yard.
|
85 |
+
Predicted Rating: 0.20011138916015625
|
86 |
|
87 |
Sentence: she don't like apples.
|
88 |
+
Predicted Rating: 0.08289700001478195
|
89 |
|
90 |
Sentence: Is rain sunny days sometimes?
|
91 |
+
Predicted Rating: 0.20011138916015625
|
92 |
|
93 |
Sentence: She enjoys reading books and playing chess.
|
94 |
+
Predicted Rating: 0.8915354013442993
|
95 |
|
96 |
Sentence: How many planets are there in our solar system?
|
97 |
+
Predicted Rating: 0.974799394607544
|
98 |
```
|
99 |
|
100 |
## Training and evaluation data
|
|
|
112 |
- seed: 42
|
113 |
- optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
|
114 |
- lr_scheduler_type: linear
|
115 |
+
- lr_scheduler_warmup_steps: 400
|
116 |
- num_epochs: 5
|
117 |
|
118 |
### Training results
|
|
|
120 |
Metrics: Mean Squared Error, R-Squared, Mean Absolute Error
|
121 |
|
122 |
```
|
123 |
+
'test_loss': 0.061837393790483475,
|
124 |
+
'test_mse': 0.061837393790483475,
|
125 |
+
'test_r2': 0.5726782083511353,
|
126 |
+
'test_mae': 0.183049738407135
|
127 |
```
|
128 |
|
129 |
### Framework versions
|