Update README.md
Browse files
README.md
CHANGED
@@ -150,9 +150,8 @@ model (CPTR) using an encoder-decoder transformer [[1]](#1). The source image is
|
|
150 |
to the transformer encoder in sequence patches. Hence, one can treat the image
|
151 |
captioning problem as a machine translation task.
|
152 |
|
153 |
-
|
154 |
-
|
155 |
-
width="80%" padding="100px 100px 100px 10px">
|
156 |
|
157 |
Figure 1: Encoder Decoder Architecture
|
158 |
|
@@ -183,9 +182,8 @@ The encoder side deals solely with the image part, where it is beneficial to
|
|
183 |
exploit the relative position of the features we have. Refer to Figure 2 for
|
184 |
the model architecture.
|
185 |
|
186 |
-
|
187 |
-
|
188 |
-
width="80%" padding="100px 100px 100px 10px">
|
189 |
|
190 |
Figure 2: Model Architecture
|
191 |
|
@@ -344,9 +342,7 @@ The reason for overfitting may be due to the following reasons:
|
|
344 |
|
345 |
4. Unsuitable hyperparameters
|
346 |
|
347 |
-
|
348 |
-
| :--: | :--: |
|
349 |
-
| Figure 3: Loss Curve | Figure 4: Bleu-4 score curv |
|
350 |
|
351 |
### Inference Output
|
352 |
|
@@ -359,9 +355,7 @@ distribution of the lengths is positively skewed. More specifically, the
|
|
359 |
maximum caption length generated by the model (21 tokens) accounts for 98.66%
|
360 |
of the lengths in the training set. See “code/experiment.ipynb Section 1.3”.
|
361 |
|
362 |
-
|
363 |
-
src="https://github.com/zarzouram/xformer_img_captnng/blob/main/images/report/lens.png"
|
364 |
-
padding="100px 100px 100px 10px">
|
365 |
|
366 |
Figure 5: Generated caption's lengths distribution
|
367 |
|
|
|
150 |
to the transformer encoder in sequence patches. Hence, one can treat the image
|
151 |
captioning problem as a machine translation task.
|
152 |
|
153 |
+
|
154 |
+

|
|
|
155 |
|
156 |
Figure 1: Encoder Decoder Architecture
|
157 |
|
|
|
182 |
exploit the relative position of the features we have. Refer to Figure 2 for
|
183 |
the model architecture.
|
184 |
|
185 |
+
|
186 |
+

|
|
|
187 |
|
188 |
Figure 2: Model Architecture
|
189 |
|
|
|
342 |
|
343 |
4. Unsuitable hyperparameters
|
344 |
|
345 |
+

|
|
|
|
|
346 |
|
347 |
### Inference Output
|
348 |
|
|
|
355 |
maximum caption length generated by the model (21 tokens) accounts for 98.66%
|
356 |
of the lengths in the training set. See “code/experiment.ipynb Section 1.3”.
|
357 |
|
358 |
+

|
|
|
|
|
359 |
|
360 |
Figure 5: Generated caption's lengths distribution
|
361 |
|