premanth15 commited on
Commit
612092f
·
verified ·
1 Parent(s): b2ca582

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -12
README.md CHANGED
@@ -150,9 +150,8 @@ model (CPTR) using an encoder-decoder transformer [[1]](#1). The source image is
150
  to the transformer encoder in sequence patches. Hence, one can treat the image
151
  captioning problem as a machine translation task.
152
 
153
- <img
154
- src="https://github.com/zarzouram/xformer_img_captnng/blob/main/images/report/Encoder-Decoder.png"
155
- width="80%" padding="100px 100px 100px 10px">
156
 
157
  Figure 1: Encoder Decoder Architecture
158
 
@@ -183,9 +182,8 @@ The encoder side deals solely with the image part, where it is beneficial to
183
  exploit the relative position of the features we have. Refer to Figure 2 for
184
  the model architecture.
185
 
186
- <img
187
- src="https://github.com/zarzouram/xformer_img_captnng/blob/main/images/report/Architectures.png"
188
- width="80%" padding="100px 100px 100px 10px">
189
 
190
  Figure 2: Model Architecture
191
 
@@ -344,9 +342,7 @@ The reason for overfitting may be due to the following reasons:
344
 
345
  4. Unsuitable hyperparameters
346
 
347
- | <img src="https://github.com/zarzouram/xformer_img_captnng/blob/main/images/report/LossChart.png"/> | <img src="https://github.com/zarzouram/xformer_img_captnng/blob/main/images/report/Bleu4Chart.png"> |
348
- | :--: | :--: |
349
- | Figure 3: Loss Curve | Figure 4: Bleu-4 score curv |
350
 
351
  ### Inference Output
352
 
@@ -359,9 +355,7 @@ distribution of the lengths is positively skewed. More specifically, the
359
  maximum caption length generated by the model (21 tokens) accounts for 98.66%
360
  of the lengths in the training set. See “code/experiment.ipynb Section 1.3”.
361
 
362
- <img
363
- src="https://github.com/zarzouram/xformer_img_captnng/blob/main/images/report/lens.png"
364
- padding="100px 100px 100px 10px">
365
 
366
  Figure 5: Generated caption's lengths distribution
367
 
 
150
  to the transformer encoder in sequence patches. Hence, one can treat the image
151
  captioning problem as a machine translation task.
152
 
153
+
154
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/672cd2eafa7f9a2a4711d3bc/NBP0ONvIs02htFwzD39z7.jpeg)
 
155
 
156
  Figure 1: Encoder Decoder Architecture
157
 
 
182
  exploit the relative position of the features we have. Refer to Figure 2 for
183
  the model architecture.
184
 
185
+
186
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/672cd2eafa7f9a2a4711d3bc/CUSlU9R2oTeYCohHnzOuB.jpeg)
 
187
 
188
  Figure 2: Model Architecture
189
 
 
342
 
343
  4. Unsuitable hyperparameters
344
 
345
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/672cd2eafa7f9a2a4711d3bc/VzxSQfSGDYlU5gY6mZ6nX.jpeg)
 
 
346
 
347
  ### Inference Output
348
 
 
355
  maximum caption length generated by the model (21 tokens) accounts for 98.66%
356
  of the lengths in the training set. See “code/experiment.ipynb Section 1.3”.
357
 
358
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/672cd2eafa7f9a2a4711d3bc/2IBBqt-G1d2WlDZ1rXpCF.jpeg)
 
 
359
 
360
  Figure 5: Generated caption's lengths distribution
361