alakxender commited on
Commit
0d07585
·
verified ·
1 Parent(s): 4a74736

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +401 -185
README.md CHANGED
@@ -1,199 +1,415 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
 
 
10
 
 
 
 
 
 
 
11
 
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
 
189
- ## More Information [optional]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
190
 
191
- [More Information Needed]
192
 
193
- ## Model Card Authors [optional]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
194
 
195
- [More Information Needed]
196
 
197
- ## Model Card Contact
198
 
199
- [More Information Needed]
 
 
 
1
  ---
2
  library_name: transformers
3
+ tags:
4
+ - dhivehi
5
+ - thaana
6
+ - layout-analysis
7
+ license: apache-2.0
8
+ datasets:
9
+ - alakxender/dhivehi-layout-syn-b1-paligemma
10
+ language:
11
+ - dv
12
+ base_model:
13
+ - facebook/detr-resnet-50-dc5
14
  ---
15
 
16
+ # DETR ResNet-50 DC5 for Dhivehi Layout-Aware Document Parsing
17
 
18
+ A fine-tuned DETR (DEtection TRansformer) model based on `facebook/detr-resnet-50-dc5`, trained on a custom COCO-style dataset for layout-aware document understanding in Dhivehi and similar documents. The model can detect key structural elements such as headings, authorship, paragraphs, and text lines — with awareness of document reading direction (LTR/RTL).
19
 
20
+ ## Model Summary
21
 
22
+ - **Base Model:** facebook/detr-resnet-50-dc5
23
+ - **Dataset:** Custom COCO-format document layout dataset (`coco-dv-layout`)
24
+ - **Categories:**
25
+ - `layout-analysis-QvA6`, `author`, `caption`, `columns`, `date`, `footnote`, `heading`, `paragraph`, `picture`, `textline`
26
+ - **Reading Direction Support:** Left-to-Right (LTR) and Right-to-Left (RTL) documents
27
+ - **Backbone:** ResNet-50 DC5
28
 
29
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
+ ## Usage
32
+
33
+ ### Inference Script
34
+
35
+ ```python
36
+ from transformers import pipeline
37
+ from PIL import Image
38
+ import torch
39
+
40
+ image = Image.open("ocr.png")
41
+
42
+ obj_detector = pipeline(
43
+ "object-detection",
44
+ model="alakxender/detr-resnet-50-dc5-dv-layout-sm1",
45
+ device=torch.device("cuda:0" if torch.cuda.is_available() else "cpu"),
46
+ use_fast=True
47
+ )
48
+
49
+ results = obj_detector(image)
50
+ print(results)
51
+ ```
52
+
53
+ ### Test Script:
54
+
55
+ ```python
56
+ import requests
57
+ from transformers import pipeline
58
+ import numpy as np
59
+ from PIL import Image, ImageDraw, ImageFont
60
+ import torch
61
+ import argparse
62
+ import json
63
+ import re
64
+
65
+ parser = argparse.ArgumentParser()
66
+ parser.add_argument("--threshold", type=float, default=0.6)
67
+ parser.add_argument("--rtl", action="store_true", default=True, help="Process as right-to-left language document")
68
+ args = parser.parse_args()
69
+
70
+ threshold = args.threshold
71
+ is_rtl = args.rtl
72
+
73
+ # Set device
74
+ device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
75
+ print(f"Device set to use {device}")
76
+ print(f"Document direction: {'Right-to-Left' if is_rtl else 'Left-to-Right'}")
77
+
78
+ image = Image.open("ocr-bill.jpeg")
79
+
80
+ obj_detector = pipeline(
81
+ "object-detection",
82
+ model="alakxender/detr-resnet-50-dc5-dv-layout-sm1",
83
+ device=device,
84
+ use_fast=True # Set use_fast=True to avoid slow processor warning
85
+ )
86
+
87
+ results = obj_detector(image)
88
+ print(results)
89
+
90
+ # Define colors for different labels
91
+ category_colors = {
92
+ "author": (0, 255, 0), # Green
93
+ "caption": (0, 0, 255), # Blue
94
+ "columns": (255, 255, 0), # Yellow
95
+ "date": (255, 0, 255), # Magenta
96
+ "footnote": (0, 255, 255), # Cyan
97
+ "heading": (128, 0, 0), # Dark Red
98
+ "paragraph": (0, 128, 0), # Dark Green
99
+ "picture": (0, 0, 128), # Dark Blue
100
+ "textline": (128, 128, 0) # Olive
101
+ }
102
+
103
+ # Define document element hierarchy (lower value = higher priority)
104
+ element_priority = {
105
+ "heading": 1,
106
+ "author": 2,
107
+ "date": 3,
108
+ "columns": 4,
109
+ "paragraph": 5,
110
+ "textline": 6,
111
+ "picture": 7,
112
+ "caption": 8,
113
+ "footnote": 9
114
+ }
115
+
116
+ def detect_text_direction(results, threshold=0.6):
117
+ """
118
+ Attempt to automatically detect if the document is RTL based on detected text elements.
119
+ This is a heuristic approach - for production use, consider using language detection.
120
+ """
121
+ # Filter by confidence threshold
122
+ filtered_results = [r for r in results if r['score'] > threshold]
123
+
124
+ # Focus on text elements (textline, paragraph, heading)
125
+ text_elements = [r for r in filtered_results if r['label'] in ['textline', 'paragraph', 'heading']]
126
+
127
+ if not text_elements:
128
+ return False # Default to LTR if no text elements
129
+
130
+ # Get coordinates
131
+ coordinates = []
132
+ for r in text_elements:
133
+ box = list(r['box'].values())
134
+ if len(box) == 4:
135
+ x1, y1, x2, y2 = box
136
+ width = x2 - x1
137
+ # Store element with its position info
138
+ coordinates.append({
139
+ 'xmin': x1,
140
+ 'xmax': x2,
141
+ 'width': width,
142
+ 'x_center': (x1 + x2) / 2
143
+ })
144
+
145
+ if not coordinates:
146
+ return False # Default to LTR
147
+
148
+ # Analyze the horizontal distribution of elements
149
+ image_width = max([c['xmax'] for c in coordinates])
150
+
151
+ # Calculate the average center position relative to image width
152
+ avg_center_position = sum([c['x_center'] for c in coordinates]) / len(coordinates)
153
+ relative_position = avg_center_position / image_width
154
+
155
+ # If elements tend to be more on the right side, it might be RTL
156
+ # This is a simple heuristic - a more sophisticated approach would use OCR or language detection
157
+ is_rtl_detected = relative_position > 0.55 # Slight bias to right side suggests RTL
158
+
159
+ print(f"Auto-detected document direction: {'Right-to-Left' if is_rtl_detected else 'Left-to-Right'}")
160
+ print(f"Average element center position: {relative_position:.2f} of document width")
161
+
162
+ return is_rtl_detected
163
+
164
+ def get_reading_order(results, threshold=0.6, rtl=is_rtl):
165
+ """
166
+ Sort detection results in natural reading order for both LTR and RTL documents:
167
+ 1. First by element priority (headings first)
168
+ 2. Then by vertical position (top to bottom)
169
+ 3. For elements with similar y-values, sort by horizontal position based on text direction
170
+ """
171
+ # Filter by confidence threshold
172
+ filtered_results = [r for r in results if r['score'] > threshold]
173
+
174
+ # If no manual RTL flag is set, try to auto-detect
175
+ if rtl is None:
176
+ rtl = detect_text_direction(results, threshold)
177
+
178
+ # Group text lines by their vertical position
179
+ # Text lines within ~20 pixels vertically are considered on the same line
180
+ y_tolerance = 20
181
+
182
+ # Let's first check the structure of box to understand its keys
183
+ if filtered_results and 'box' in filtered_results[0]:
184
+ box_keys = filtered_results[0]['box'].keys()
185
+ print(f"Box structure keys: {box_keys}")
186
+
187
+ # Extract coordinates based on the box format
188
+ # Assuming box format is {'xmin', 'ymin', 'xmax', 'ymax'} or similar
189
+ if 'ymin' in box_keys:
190
+ y_key, height_key = 'ymin', None
191
+ x_key = 'xmin'
192
+ elif 'top' in box_keys:
193
+ y_key, height_key = 'top', 'height'
194
+ x_key = 'left'
195
+ else:
196
+ print("Unknown box format, defaulting to list unpacking")
197
+ # Default case using list unpacking method
198
+ y_key, x_key, height_key = None, None, None
199
+ else:
200
+ print("No box format detected, defaulting to list unpacking")
201
+ y_key, x_key, height_key = None, None, None
202
+
203
+ # Separate heading and non-heading elements
204
+ structural_elements = []
205
+ content_elements = []
206
+
207
+ for r in filtered_results:
208
+ if r['label'] in ["heading", "author", "date"]:
209
+ structural_elements.append(r)
210
+ else:
211
+ content_elements.append(r)
212
+
213
+ # Extract coordinate functions based on the format we have
214
+ def get_y(element):
215
+ if y_key:
216
+ return element['box'][y_key]
217
+ else:
218
+ # If we don't know the format, assume box values() returns [xmin, ymin, xmax, ymax]
219
+ return list(element['box'].values())[1] # ymin is typically the second value
220
+
221
+ def get_x(element):
222
+ if x_key:
223
+ return element['box'][x_key]
224
+ else:
225
+ # If we don't know the format, assume box values() returns [xmin, ymin, xmax, ymax]
226
+ return list(element['box'].values())[0] # xmin is typically the first value
227
+
228
+ def get_x_max(element):
229
+ box_values = list(element['box'].values())
230
+ if len(box_values) >= 4:
231
+ return box_values[2] # xmax is typically the third value
232
+ return get_x(element) # fallback
233
+
234
+ def get_y_center(element):
235
+ if y_key and height_key:
236
+ return element['box'][y_key] + (element['box'][height_key] / 2)
237
+ else:
238
+ # If using list format [xmin, ymin, xmax, ymax]
239
+ box_values = list(element['box'].values())
240
+ return (box_values[1] + box_values[3]) / 2 # (ymin + ymax) / 2
241
+
242
+ # Sort structural elements by priority first, then by y position
243
+ sorted_structural = sorted(
244
+ structural_elements,
245
+ key=lambda x: (
246
+ element_priority.get(x['label'], 999),
247
+ get_y(x)
248
+ )
249
+ )
250
+
251
+ # Group content elements that may be in the same row (similar y-coordinate)
252
+ rows = []
253
+ for element in content_elements:
254
+ y_center = get_y_center(element)
255
+
256
+ # Check if this element belongs to an existing row
257
+ found_row = False
258
+ for row in rows:
259
+ row_y_centers = [get_y_center(e) for e in row]
260
+ row_y_center = sum(row_y_centers) / len(row_y_centers)
261
+ if abs(y_center - row_y_center) < y_tolerance:
262
+ row.append(element)
263
+ found_row = True
264
+ break
265
+
266
+ # If not found in any existing row, create a new row
267
+ if not found_row:
268
+ rows.append([element])
269
+
270
+ # Sort elements within each row according to reading direction (left-to-right or right-to-left)
271
+ for row in rows:
272
+ if rtl:
273
+ # For RTL, sort from right to left (descending x values)
274
+ row.sort(key=lambda x: get_x(x), reverse=True)
275
+ else:
276
+ # For LTR, sort from left to right (ascending x values)
277
+ row.sort(key=lambda x: get_x(x))
278
+
279
+ # Sort rows by y position (top to bottom)
280
+ rows.sort(key=lambda row: sum(get_y_center(e) for e in row) / len(row))
281
+
282
+ # Flatten the rows into a single list
283
+ sorted_content = [element for row in rows for element in row]
284
+
285
+ # Combine structural and content elements
286
+ return sorted_structural + sorted_content
287
+
288
+ def plot_results(image, results, threshold=threshold, save_path='output.jpg', rtl=is_rtl):
289
+ # Convert image to appropriate format if it's not already a PIL Image
290
+ if not isinstance(image, Image.Image):
291
+ image = Image.fromarray(np.uint8(image))
292
+
293
+ draw = ImageDraw.Draw(image)
294
+ width, height = image.size
295
+
296
+ # If rtl is None (not explicitly specified), try to auto-detect
297
+ if rtl is None:
298
+ rtl = detect_text_direction(results, threshold)
299
+
300
+ # Get results in reading order
301
+ ordered_results = get_reading_order(results, threshold, rtl)
302
+
303
+ # Create a list to store formatted results
304
+ formatted_results = []
305
+
306
+ # Add order number to visualize the detection sequence
307
+ for i, result in enumerate(ordered_results):
308
+ label = result['label']
309
+ box = list(result['box'].values())
310
+ score = result['score']
311
+
312
+ # Make sure box has exactly 4 values
313
+ if len(box) == 4:
314
+ x1, y1, x2, y2 = tuple(box)
315
+ else:
316
+ print(f"Warning: Unexpected box format for {label}: {box}")
317
+ continue
318
+
319
+ color = category_colors.get(label, (255, 255, 255)) # Default to white if label not found
320
+
321
+ # Draw bounding box and labels
322
+ draw.rectangle((x1, y1, x2, y2), outline=color, width=2)
323
+
324
+ # Add order number to visualize the reading sequence
325
+ draw.text((x1 + 5, y1 - 20), f'#{i+1}', fill=(255, 255, 255))
326
+
327
+ # For RTL languages, draw indicators differently
328
+ if rtl and label in ['textline', 'paragraph', 'heading']:
329
+ draw.text((x1 + 5, y1 - 10), f'{label} (RTL)', fill=color)
330
+ # Draw arrow showing reading direction (right to left)
331
+ arrow_y = y1 - 5
332
+ draw.line([(x2 - 20, arrow_y), (x1 + 20, arrow_y)], fill=color, width=1)
333
+ draw.polygon([(x1 + 20, arrow_y - 3), (x1 + 20, arrow_y + 3), (x1 + 15, arrow_y)], fill=color)
334
+ else:
335
+ draw.text((x1 + 5, y1 - 10), label, fill=color)
336
+
337
+ draw.text((x1 + 5, y1 + 10), f'{score:.2f}', fill='green' if score > 0.7 else 'red')
338
+
339
+ # Add result to formatted list with order index
340
+ formatted_results.append({
341
+ "order_index": i,
342
+ "label": label,
343
+ "is_rtl": rtl if label in ['textline', 'paragraph', 'heading'] else False,
344
+ "score": float(score),
345
+ "bbox": {
346
+ "x1": float(x1),
347
+ "y1": float(y1),
348
+ "x2": float(x2),
349
+ "y2": float(y2)
350
+ }
351
+ })
352
+
353
+ image.save(save_path)
354
+
355
+ # Save results to JSON file with RTL information
356
+ with open('results.json', 'w') as f:
357
+ json.dump({
358
+ "document_direction": "rtl" if rtl else "ltr",
359
+ "elements": formatted_results
360
+ }, f, indent=2)
361
+
362
+ return image
363
+
364
+ image.save(save_path)
365
+
366
+ # Save results to JSON file
367
+ with open('results.json', 'w') as f:
368
+ json.dump(formatted_results, f, indent=2)
369
+
370
+ return image
371
+
372
+ if len(results) > 0: # Only plot if there are results
373
+ # If RTL flag not set, try to auto-detect
374
+ if not hasattr(args, 'rtl') or args.rtl is None:
375
+ is_rtl = detect_text_direction(results)
376
+
377
+ plot_results(image, results, rtl=is_rtl)
378
+ print(f"Processing complete. Document interpreted as {'RTL' if is_rtl else 'LTR'}")
379
+ else:
380
+ print("No objects detected in the image")
381
+ ```
382
 
383
+ ---
384
 
385
+ ## Output Example
386
+
387
+ - **Visual Output**: Bounding boxes with labels and order
388
+ - **JSON Output:**
389
+ ```json
390
+ {
391
+ "document_direction": "rtl",
392
+ "elements": [
393
+ {
394
+ "order_index": 0,
395
+ "label": "heading",
396
+ "is_rtl": true,
397
+ "score": 0.97,
398
+ "bbox": {
399
+ "x1": 120.5,
400
+ "y1": 65.2,
401
+ "x2": 620.4,
402
+ "y2": 120.7
403
+ }
404
+ }
405
+ ]
406
+ }
407
+ ```
408
 
409
+ ---
410
 
411
+ ## Training Summary
412
 
413
+ - **Training script**: Uses Hugging Face `Trainer` API
414
+ - **Eval Strategy**: `steps` with `MeanAveragePrecision` via `torchmetrics`
415
+ ---