davanstrien HF Staff commited on
Commit
10aaf2c
Β·
1 Parent(s): 84944f5
Files changed (3) hide show
  1. README.md +184 -4
  2. index.html +445 -19
  3. style.css +0 -28
README.md CHANGED
@@ -1,10 +1,190 @@
1
  ---
2
- title: Ocr Time Capsule
3
- emoji: πŸ‘
4
- colorFrom: purple
5
  colorTo: indigo
6
  sdk: static
7
  pinned: false
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: OCR Time Capsule
3
+ emoji: πŸ“¦
4
+ colorFrom: blue
5
  colorTo: indigo
6
  sdk: static
7
  pinned: false
8
  ---
9
 
10
+ # OCR Time Capsule πŸ“¦
11
+
12
+ A fast, modern web interface for exploring and comparing OCR text improvements in HuggingFace datasets. Browse through pre-processed OCR improvements to see how AI models enhance historical document transcriptions.
13
+
14
+ ![OCR Time Capsule](https://img.shields.io/badge/OCR-Time%20Capsule-blue)
15
+
16
+ ## Features
17
+
18
+ - **Fast Navigation**: Browse through large OCR datasets with keyboard shortcuts (J/K or arrow keys)
19
+ - **Side-by-Side Comparison**: View original OCR and improved text simultaneously
20
+ - **Advanced Diff Visualization**: Character, word, or line-level differences with color highlighting
21
+ - **No Backend Required**: Direct integration with HuggingFace Dataset Viewer API
22
+ - **Responsive Design**: Works seamlessly on desktop and mobile devices
23
+ - **Dark Mode**: Easy on the eyes for extended reading sessions
24
+ - **URL Sharing**: Share specific dataset samples with direct links
25
+
26
+ ## Quick Start
27
+
28
+ ### Option 1: Local Development
29
+
30
+ 1. Clone or download this directory
31
+ 2. Serve the files using any static web server:
32
+
33
+ ```bash
34
+ # Using Python
35
+ python -m http.server 8000
36
+
37
+ # Using Node.js
38
+ npx serve .
39
+
40
+ # Using PHP
41
+ php -S localhost:8000
42
+ ```
43
+
44
+ 3. Open http://localhost:8000 in your browser
45
+
46
+ ### Option 2: GitHub Pages
47
+
48
+ 1. Push this directory to a GitHub repository
49
+ 2. Enable GitHub Pages in repository settings
50
+ 3. Access via `https://[username].github.io/[repo-name]/`
51
+
52
+ ### Option 3: Direct File Access
53
+
54
+ Simply open `index.html` in a modern web browser. Note: Some features may be limited due to CORS restrictions.
55
+
56
+ ## Usage
57
+
58
+ ### Loading a Dataset
59
+
60
+ 1. Enter a HuggingFace dataset ID (e.g., `davanstrien/exams-ocr`)
61
+ 2. Click "Load" or press Enter
62
+ 3. The explorer will automatically detect text columns
63
+
64
+ ### Navigation
65
+
66
+ - **Next**: Press `J` or `β†’` arrow key
67
+ - **Previous**: Press `K` or `←` arrow key
68
+ - **Switch Views**: Press `1` (comparison), `2` (diff), or `3` (improved only)
69
+
70
+ ### Supported Column Names
71
+
72
+ The explorer automatically detects these column patterns:
73
+
74
+ **Original OCR**: `text`, `ocr`, `original_text`, `ground_truth`
75
+ **Improved OCR**: `markdown`, `new_ocr`, `corrected_text`, `vlm_ocr`
76
+
77
+ ## Technical Details
78
+
79
+ ### Architecture
80
+
81
+ ```
82
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
83
+ β”‚ Browser UI │────▢│ HF Dataset Viewer APIβ”‚
84
+ β”‚ (Alpine.js) β”‚ β”‚ (datasets-server) β”‚
85
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
86
+ β”‚
87
+ β–Ό
88
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
89
+ β”‚ Local Cache β”‚
90
+ β”‚ (JavaScript) β”‚
91
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
92
+ ```
93
+
94
+ ### API Integration
95
+
96
+ Uses the HuggingFace Dataset Viewer API:
97
+ - Base URL: `https://datasets-server.huggingface.co`
98
+ - No authentication required for public datasets
99
+ - Automatic handling of image URL expiration
100
+ - Smart batching for efficient data loading
101
+
102
+ ### Performance Optimizations
103
+
104
+ - **Batch Loading**: Fetches 100 rows at a time
105
+ - **Smart Caching**: Reduces API calls
106
+ - **Lazy Loading**: Only loads visible content
107
+ - **URL Refresh**: Automatically refreshes expired image URLs
108
+
109
+ ## Customization
110
+
111
+ ### Adding New Column Patterns
112
+
113
+ Edit `js/dataset-api.js` and update the `detectColumns` method:
114
+
115
+ ```javascript
116
+ if (!originalTextColumn && ['your_column_name'].includes(name)) {
117
+ originalTextColumn = name;
118
+ }
119
+ ```
120
+
121
+ ### Styling
122
+
123
+ The UI uses Tailwind CSS. Modify styles in:
124
+ - `css/styles.css` for custom styles
125
+ - Tailwind classes directly in `index.html`
126
+
127
+ ### Keyboard Shortcuts
128
+
129
+ Add new shortcuts in `js/app.js`:
130
+
131
+ ```javascript
132
+ case 'your_key':
133
+ // Your action here
134
+ break;
135
+ ```
136
+
137
+ ## Browser Support
138
+
139
+ - Chrome/Edge: Full support
140
+ - Firefox: Full support
141
+ - Safari: Full support (14+)
142
+ - Mobile browsers: Full support with touch navigation
143
+
144
+ ## Limitations
145
+
146
+ - Maximum 100 rows per API request
147
+ - Image URLs expire after ~1 hour
148
+ - No authentication support for private datasets (yet)
149
+ - Read-only interface (no editing capabilities)
150
+
151
+ ## Future Enhancements
152
+
153
+ - [ ] Export functionality for improved texts
154
+ - [ ] Batch processing capabilities
155
+ - [ ] Search within dataset
156
+ - [ ] Bookmarking system
157
+ - [ ] Authentication for private datasets
158
+ - [ ] Confidence scores visualization
159
+ - [ ] Multi-dataset comparison
160
+
161
+ ## Troubleshooting
162
+
163
+ ### "Dataset viewer is not available"
164
+ - Check if the dataset exists on HuggingFace
165
+ - Ensure the dataset has viewer enabled
166
+ - Try a known working dataset like `davanstrien/exams-ocr`
167
+
168
+ ### Images not loading
169
+ - Image URLs expire after ~1 hour
170
+ - The app automatically refreshes URLs on error
171
+ - Check browser console for detailed errors
172
+
173
+ ### Slow loading
174
+ - Large datasets may take time for initial load
175
+ - Consider using datasets with pre-computed statistics
176
+ - Check your internet connection
177
+
178
+ ## Contributing
179
+
180
+ This is a standalone tool designed for OCR exploration. Feel free to fork and customize for your needs!
181
+
182
+ ## License
183
+
184
+ MIT License - Use freely for any purpose
185
+
186
+ ## Related Projects
187
+
188
+ - [OCR Time Machine](../app.py) - Interactive OCR improvement with VLMs
189
+ - [HuggingFace Datasets](https://huggingface.co/datasets) - Browse available datasets
190
+ - [Dataset Viewer Docs](https://huggingface.co/docs/dataset-viewer) - API documentation
index.html CHANGED
@@ -1,19 +1,445 @@
1
- <!doctype html>
2
- <html>
3
- <head>
4
- <meta charset="utf-8" />
5
- <meta name="viewport" content="width=device-width" />
6
- <title>My static Space</title>
7
- <link rel="stylesheet" href="style.css" />
8
- </head>
9
- <body>
10
- <div class="card">
11
- <h1>Welcome to your static Space!</h1>
12
- <p>You can modify this app directly by editing <i>index.html</i> in the Files and versions tab.</p>
13
- <p>
14
- Also don't forget to check the
15
- <a href="https://huggingface.co/docs/hub/spaces" target="_blank">Spaces documentation</a>.
16
- </p>
17
- </div>
18
- </body>
19
- </html>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en" class="h-full">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>OCR Text Explorer</title>
7
+
8
+ <!-- External Dependencies -->
9
+ <script src="https://unpkg.com/[email protected]"></script>
10
+ <script src="https://unpkg.com/[email protected]/dist/cdn.min.js" defer></script>
11
+ <script src="https://cdn.tailwindcss.com"></script>
12
+
13
+ <!-- Tailwind Config -->
14
+ <script>
15
+ tailwind.config = {
16
+ darkMode: 'class',
17
+ theme: {
18
+ extend: {
19
+ animation: {
20
+ 'fade-in': 'fadeIn 0.3s ease-in-out',
21
+ },
22
+ keyframes: {
23
+ fadeIn: {
24
+ '0%': { opacity: '0' },
25
+ '100%': { opacity: '1' },
26
+ }
27
+ }
28
+ }
29
+ }
30
+ }
31
+ </script>
32
+
33
+ <!-- Custom Styles -->
34
+ <link rel="stylesheet" href="css/styles.css">
35
+ </head>
36
+ <body class="h-full bg-gray-50 dark:bg-gray-900" x-data="ocrExplorer">
37
+ <!-- Header -->
38
+ <header class="bg-white dark:bg-gray-800 shadow-sm border-b border-gray-200 dark:border-gray-700">
39
+ <div class="px-4 sm:px-6 lg:px-8 py-4">
40
+ <div class="flex items-center justify-between">
41
+ <div class="flex-1">
42
+ <div class="flex items-center space-x-4">
43
+ <h1 class="text-xl font-semibold text-gray-900 dark:text-white">
44
+ πŸ“š OCR Text Explorer
45
+ </h1>
46
+ <span class="text-sm text-gray-600 dark:text-gray-400 hidden sm:inline">
47
+ Compare original and AI-improved OCR text from historical documents
48
+ </span>
49
+ </div>
50
+ <div class="flex items-center space-x-2 mt-3">
51
+ <input
52
+ type="text"
53
+ x-model="datasetId"
54
+ @keyup.enter="loadDataset()"
55
+ class="px-3 py-1.5 text-sm border border-gray-300 rounded-md focus:ring-2 focus:ring-blue-500 focus:border-transparent dark:bg-gray-700 dark:border-gray-600 dark:text-white"
56
+ placeholder="Dataset ID (e.g., username/dataset-name)"
57
+ style="width: 300px;"
58
+ >
59
+ <button
60
+ @click="loadDataset()"
61
+ class="px-4 py-1.5 text-sm bg-blue-600 text-white rounded-md hover:bg-blue-700 focus:outline-none focus:ring-2 focus:ring-blue-500"
62
+ >
63
+ Load
64
+ </button>
65
+ </div>
66
+ </div>
67
+
68
+ <!-- Settings -->
69
+ <div class="flex items-center space-x-4">
70
+ <button
71
+ @click="showAbout = !showAbout"
72
+ class="px-3 py-1.5 text-sm text-gray-600 hover:text-gray-900 dark:text-gray-400 dark:hover:text-gray-200"
73
+ title="About this tool"
74
+ >
75
+ <svg class="w-5 h-5 inline-block mr-1" fill="none" stroke="currentColor" viewBox="0 0 24 24">
76
+ <path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M13 16h-1v-4h-1m1-4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z"></path>
77
+ </svg>
78
+ <span class="hidden sm:inline">About</span>
79
+ </button>
80
+
81
+ <button
82
+ @click="darkMode = !darkMode"
83
+ class="p-2 text-gray-500 hover:text-gray-700 dark:text-gray-400 dark:hover:text-gray-200"
84
+ title="Toggle dark mode"
85
+ >
86
+ <svg x-show="!darkMode" class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
87
+ <path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M20.354 15.354A9 9 0 018.646 3.646 9.003 9.003 0 0012 21a9.003 9.003 0 008.354-5.646z"></path>
88
+ </svg>
89
+ <svg x-show="darkMode" class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
90
+ <path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M12 3v1m0 16v1m9-9h-1M4 12H3m15.364 6.364l-.707-.707M6.343 6.343l-.707-.707m12.728 0l-.707.707M6.343 17.657l-.707.707M16 12a4 4 0 11-8 0 4 4 0 018 0z"></path>
91
+ </svg>
92
+ </button>
93
+
94
+ <select
95
+ x-model="diffMode"
96
+ class="px-3 py-1.5 text-sm border border-gray-300 rounded-md focus:ring-2 focus:ring-blue-500 dark:bg-gray-700 dark:border-gray-600 dark:text-white"
97
+ >
98
+ <option value="char">Character Diff</option>
99
+ <option value="word">Word Diff</option>
100
+ <option value="line">Line Diff</option>
101
+ </select>
102
+
103
+ <button
104
+ @click="exportComparison()"
105
+ class="p-2 text-gray-500 hover:text-gray-700 dark:text-gray-400 dark:hover:text-gray-200"
106
+ title="Export comparison"
107
+ >
108
+ <svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
109
+ <path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M12 10v6m0 0l-3-3m3 3l3-3m2 8H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z"></path>
110
+ </svg>
111
+ </button>
112
+ </div>
113
+ </div>
114
+ </div>
115
+ </header>
116
+
117
+ <!-- About Panel -->
118
+ <div x-show="showAbout" x-transition class="bg-blue-50 dark:bg-blue-950/30 border-b border-blue-200 dark:border-blue-800">
119
+ <div class="px-4 sm:px-6 lg:px-8 py-6">
120
+ <div class="max-w-4xl">
121
+ <h2 class="text-lg font-semibold text-gray-900 dark:text-white mb-3">About OCR Text Explorer</h2>
122
+ <div class="prose prose-sm dark:prose-invert text-gray-700 dark:text-gray-300">
123
+ <p class="mb-3">
124
+ OCR Text Explorer helps researchers and digital humanities professionals compare original OCR text
125
+ with AI-improved versions from historical documents. This tool is designed for browsing pre-processed
126
+ OCR improvements stored in HuggingFace datasets.
127
+ </p>
128
+ <div class="grid grid-cols-1 md:grid-cols-2 gap-4 mb-4">
129
+ <div>
130
+ <h3 class="font-medium mb-2">🎯 Use Cases</h3>
131
+ <ul class="list-disc list-inside space-y-1 text-sm">
132
+ <li>Review OCR corrections from historical newspapers</li>
133
+ <li>Quality assessment of digitization projects</li>
134
+ <li>Training data validation for OCR models</li>
135
+ <li>Accessibility improvements for scanned texts</li>
136
+ </ul>
137
+ </div>
138
+ <div>
139
+ <h3 class="font-medium mb-2">⚑ Key Features</h3>
140
+ <ul class="list-disc list-inside space-y-1 text-sm">
141
+ <li>Side-by-side text comparison</li>
142
+ <li>Character, word, and line-level diffs</li>
143
+ <li>Keyboard navigation (J/K or arrows)</li>
144
+ <li>Direct HuggingFace dataset integration</li>
145
+ </ul>
146
+ </div>
147
+ </div>
148
+ <p class="text-sm">
149
+ πŸ’‘ <strong>Tip:</strong> For live OCR processing with vision-language models, check out
150
+ <a href="../app.py" class="text-blue-600 dark:text-blue-400 underline">OCR Time Machine</a>.
151
+ This tool focuses on exploring already-processed datasets for faster navigation and analysis.
152
+ </p>
153
+ </div>
154
+ <button
155
+ @click="showAbout = false"
156
+ class="mt-4 text-sm text-blue-600 dark:text-blue-400 hover:underline"
157
+ >
158
+ Close
159
+ </button>
160
+ </div>
161
+ </div>
162
+ </div>
163
+
164
+ <!-- Main Content -->
165
+ <main class="h-full flex">
166
+ <!-- Loading State -->
167
+ <div x-show="loading" class="flex-1 flex items-center justify-center">
168
+ <div class="text-center">
169
+ <div class="animate-spin rounded-full h-12 w-12 border-b-2 border-blue-600 mx-auto"></div>
170
+ <p class="mt-4 text-gray-600 dark:text-gray-400">Loading dataset...</p>
171
+ </div>
172
+ </div>
173
+
174
+ <!-- Error State -->
175
+ <div x-show="error" class="flex-1 flex items-center justify-center p-8">
176
+ <div class="max-w-md w-full bg-red-50 dark:bg-red-900/20 border border-red-200 dark:border-red-800 rounded-lg p-6">
177
+ <h3 class="text-lg font-medium text-red-800 dark:text-red-400 mb-2">Error</h3>
178
+ <p class="text-red-600 dark:text-red-300" x-text="error"></p>
179
+ <button
180
+ @click="error = null; loadDataset()"
181
+ class="mt-4 px-4 py-2 bg-red-600 text-white rounded-md hover:bg-red-700"
182
+ >
183
+ Try Again
184
+ </button>
185
+ </div>
186
+ </div>
187
+
188
+ <!-- Content Area -->
189
+ <div x-show="!loading && !error && currentSample" class="flex-1 flex h-full">
190
+ <!-- Image Panel -->
191
+ <div class="w-1/3 bg-gray-100 dark:bg-gray-800 p-4 overflow-hidden border-r border-gray-200 dark:border-gray-700">
192
+ <div class="sticky top-0">
193
+ <div class="bg-white dark:bg-gray-700 rounded-lg shadow-sm overflow-hidden">
194
+ <img
195
+ :src="getImageSrc()"
196
+ :alt="`Page ${currentIndex + 1}`"
197
+ class="w-full h-auto"
198
+ @error="handleImageError"
199
+ >
200
+ <div class="p-3 border-t border-gray-200 dark:border-gray-600">
201
+ <p class="text-sm text-gray-600 dark:text-gray-400">
202
+ <span x-text="`Page ${currentIndex + 1} of ${totalSamples || '?'}`"></span>
203
+ <span x-show="getImageDimensions()" class="ml-2">
204
+ β€’ <span x-text="getImageDimensions()"></span>
205
+ </span>
206
+ </p>
207
+ </div>
208
+ </div>
209
+
210
+ <!-- Statistics Panel -->
211
+ <div class="mt-4 bg-white dark:bg-gray-700 rounded-lg shadow-sm p-4">
212
+ <h3 class="text-sm font-medium text-gray-700 dark:text-gray-300 mb-3">OCR Quality Metrics</h3>
213
+ <div class="space-y-2">
214
+ <div class="flex justify-between items-center">
215
+ <span class="text-xs text-gray-600 dark:text-gray-400">Similarity</span>
216
+ <span class="text-xs font-medium text-gray-900 dark:text-gray-100" x-text="`${similarity}%`"></span>
217
+ </div>
218
+ <div class="w-full bg-gray-200 dark:bg-gray-600 rounded-full h-2">
219
+ <div class="bg-blue-600 h-2 rounded-full transition-all duration-300" :style="`width: ${similarity}%`"></div>
220
+ </div>
221
+
222
+ <div class="grid grid-cols-3 gap-2 mt-3">
223
+ <div class="text-center">
224
+ <div class="text-lg font-semibold text-gray-900 dark:text-gray-100" x-text="charStats.total || '-'"></div>
225
+ <div class="text-xs text-gray-600 dark:text-gray-400">Characters</div>
226
+ </div>
227
+ <div class="text-center">
228
+ <div class="text-lg font-semibold text-green-600 dark:text-green-400" x-text="charStats.added || '0'"></div>
229
+ <div class="text-xs text-gray-600 dark:text-gray-400">Added</div>
230
+ </div>
231
+ <div class="text-center">
232
+ <div class="text-lg font-semibold text-red-600 dark:text-red-400" x-text="charStats.removed || '0'"></div>
233
+ <div class="text-xs text-gray-600 dark:text-gray-400">Removed</div>
234
+ </div>
235
+ </div>
236
+
237
+ <div class="mt-3 pt-3 border-t border-gray-200 dark:border-gray-600">
238
+ <div class="flex justify-between text-xs">
239
+ <span class="text-gray-600 dark:text-gray-400">Words</span>
240
+ <span class="text-gray-900 dark:text-gray-100">
241
+ <span x-text="wordStats.original || '-'"></span> β†’ <span x-text="wordStats.improved || '-'"></span>
242
+ </span>
243
+ </div>
244
+ </div>
245
+ </div>
246
+ </div>
247
+ </div>
248
+ </div>
249
+
250
+ <!-- Text Comparison Panel -->
251
+ <div class="flex-1 bg-white dark:bg-gray-900 overflow-hidden">
252
+ <!-- Tab Navigation -->
253
+ <div class="border-b border-gray-200 dark:border-gray-700">
254
+ <nav class="flex -mb-px">
255
+ <button
256
+ @click="activeTab = 'comparison'"
257
+ :class="activeTab === 'comparison' ? 'border-blue-500 text-blue-600 dark:text-blue-400' : 'border-transparent text-gray-500 hover:text-gray-700 dark:text-gray-400 dark:hover:text-gray-200'"
258
+ class="px-6 py-3 border-b-2 font-medium text-sm transition-colors"
259
+ >
260
+ Side by Side
261
+ </button>
262
+ <button
263
+ @click="activeTab = 'diff'"
264
+ :class="activeTab === 'diff' ? 'border-blue-500 text-blue-600 dark:text-blue-400' : 'border-transparent text-gray-500 hover:text-gray-700 dark:text-gray-400 dark:hover:text-gray-200'"
265
+ class="px-6 py-3 border-b-2 font-medium text-sm transition-colors"
266
+ >
267
+ Inline Diff
268
+ </button>
269
+ <button
270
+ @click="activeTab = 'improved'"
271
+ :class="activeTab === 'improved' ? 'border-blue-500 text-blue-600 dark:text-blue-400' : 'border-transparent text-gray-500 hover:text-gray-700 dark:text-gray-400 dark:hover:text-gray-200'"
272
+ class="px-6 py-3 border-b-2 font-medium text-sm transition-colors"
273
+ >
274
+ Improved Only
275
+ </button>
276
+ </nav>
277
+ </div>
278
+
279
+ <!-- Tab Content -->
280
+ <div class="p-6 overflow-y-auto" style="height: calc(100% - 49px);">
281
+ <!-- Side by Side Comparison -->
282
+ <div x-show="activeTab === 'comparison'" class="grid grid-cols-2 gap-6">
283
+ <div>
284
+ <h3 class="text-sm font-medium text-gray-700 dark:text-gray-300 mb-3">Original OCR</h3>
285
+ <div class="prose prose-sm dark:prose-invert max-w-none">
286
+ <pre class="whitespace-pre-wrap font-mono text-xs bg-gray-50 dark:bg-gray-800 text-gray-900 dark:text-gray-100 p-4 rounded-lg" x-text="getOriginalText()"></pre>
287
+ </div>
288
+ </div>
289
+ <div>
290
+ <h3 class="text-sm font-medium text-gray-700 dark:text-gray-300 mb-3">Improved OCR</h3>
291
+ <div class="prose prose-sm dark:prose-invert max-w-none">
292
+ <pre class="whitespace-pre-wrap font-mono text-xs bg-gray-50 dark:bg-gray-800 text-gray-900 dark:text-gray-100 p-4 rounded-lg" x-text="getImprovedText()"></pre>
293
+ </div>
294
+ </div>
295
+ </div>
296
+
297
+ <!-- Inline Diff -->
298
+ <div x-show="activeTab === 'diff'" class="prose prose-sm dark:prose-invert max-w-none">
299
+ <div x-html="diffHtml" class="diff-content"></div>
300
+ </div>
301
+
302
+ <!-- Improved Only -->
303
+ <div x-show="activeTab === 'improved'" class="prose prose-sm dark:prose-invert max-w-none">
304
+ <pre class="whitespace-pre-wrap font-mono text-xs bg-gray-50 dark:bg-gray-800 text-gray-900 dark:text-gray-100 p-4 rounded-lg" x-text="getImprovedText()"></pre>
305
+ </div>
306
+ </div>
307
+ </div>
308
+ </div>
309
+ </main>
310
+
311
+ <!-- Navigation Footer -->
312
+ <footer x-show="!loading && !error && currentSample" class="fixed bottom-0 left-0 right-0">
313
+ <!-- Enhanced Visual Page Browser -->
314
+ <div x-show="showDock"
315
+ x-transition:enter="transition ease-out duration-300"
316
+ x-transition:enter-start="transform translate-y-full opacity-0"
317
+ x-transition:enter-end="transform translate-y-0 opacity-100"
318
+ x-transition:leave="transition ease-in duration-200"
319
+ x-transition:leave-start="transform translate-y-0 opacity-100"
320
+ x-transition:leave-end="transform translate-y-full opacity-0"
321
+ @mouseenter="showDockPreview()"
322
+ @mouseleave="hideDockPreview()"
323
+ class="bg-white dark:bg-gray-800 border-t border-gray-200 dark:border-gray-700 shadow-lg">
324
+ <div class="relative px-16 py-4">
325
+ <!-- Left scroll button -->
326
+ <button @click="scrollDockLeft()"
327
+ :disabled="dockStartIndex <= 0"
328
+ class="absolute left-2 top-1/2 -translate-y-1/2 p-2 rounded-full bg-gray-100 dark:bg-gray-700 hover:bg-gray-200 dark:hover:bg-gray-600 disabled:opacity-50 disabled:cursor-not-allowed transition-all">
329
+ <svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
330
+ <path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M15 19l-7-7 7-7"></path>
331
+ </svg>
332
+ </button>
333
+
334
+ <!-- Thumbnails container -->
335
+ <div class="flex items-center justify-center space-x-3 overflow-hidden">
336
+ <template x-for="item in dockItems" :key="item.index">
337
+ <div @click="jumpToDockPage(item.index)"
338
+ class="relative cursor-pointer transition-all duration-200 hover:scale-105 flex-shrink-0"
339
+ :class="item.index === currentIndex ? 'ring-2 ring-blue-500 scale-105' : ''">
340
+ <img :src="item.imageSrc"
341
+ :alt="`Page ${item.index + 1}`"
342
+ class="w-32 h-44 object-cover rounded shadow-lg"
343
+ @error="handleFlowImageError($event, item.index)">
344
+ <div class="absolute bottom-0 inset-x-0 bg-gradient-to-t from-black/80 to-transparent p-2 rounded-b">
345
+ <p class="text-sm text-white font-medium text-center" x-text="`${item.index + 1}`"></p>
346
+ </div>
347
+ </div>
348
+ </template>
349
+ </div>
350
+
351
+ <!-- Right scroll button -->
352
+ <button @click="scrollDockRight()"
353
+ :disabled="dockStartIndex >= totalSamples - dockVisibleCount"
354
+ class="absolute right-2 top-1/2 -translate-y-1/2 p-2 rounded-full bg-gray-100 dark:bg-gray-700 hover:bg-gray-200 dark:hover:bg-gray-600 disabled:opacity-50 disabled:cursor-not-allowed transition-all">
355
+ <svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
356
+ <path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M9 5l7 7-7 7"></path>
357
+ </svg>
358
+ </button>
359
+
360
+ <!-- Position indicator -->
361
+ <div class="absolute bottom-1 left-1/2 -translate-x-1/2 flex items-center space-x-2">
362
+ <div class="w-32 h-1 bg-gray-200 dark:bg-gray-600 rounded-full overflow-hidden">
363
+ <div class="h-full bg-blue-500 transition-all duration-300"
364
+ :style="`width: ${((dockStartIndex + Math.floor(dockVisibleCount/2)) / (totalSamples - 1)) * 100}%`"></div>
365
+ </div>
366
+ <span class="text-xs text-gray-500 dark:text-gray-400"
367
+ x-text="`${dockStartIndex + 1}-${Math.min(dockStartIndex + dockVisibleCount, totalSamples)} of ${totalSamples}`"></span>
368
+ </div>
369
+ </div>
370
+ </div>
371
+
372
+ <!-- Navigation controls -->
373
+ <div class="bg-white dark:bg-gray-800 border-t border-gray-200 dark:border-gray-700 px-6 py-3">
374
+ <div class="flex items-center justify-between">
375
+ <div class="flex items-center space-x-4">
376
+ <button
377
+ @click="previousSample()"
378
+ :disabled="currentIndex <= 0"
379
+ class="px-4 py-2 bg-gray-200 dark:bg-gray-700 text-gray-700 dark:text-gray-300 rounded-md hover:bg-gray-300 dark:hover:bg-gray-600 disabled:opacity-50 disabled:cursor-not-allowed"
380
+ >
381
+ ← Previous
382
+ </button>
383
+
384
+ <!-- Interactive Page Counter with Dock Trigger -->
385
+ <div class="relative">
386
+ <div @mouseenter="showDockPreview()"
387
+ @mouseleave="hideDockPreview()"
388
+ class="flex items-center space-x-2 px-4 py-2 rounded-md cursor-pointer transition-all duration-200 hover:bg-gray-100 dark:hover:bg-gray-700">
389
+ <div class="flex items-center space-x-2">
390
+ <span class="text-sm font-medium text-gray-700 dark:text-gray-300">
391
+ Page <span x-text="currentIndex + 1"></span> of <span x-text="totalSamples || '?'"></span>
392
+ </span>
393
+ <svg class="w-4 h-4 text-gray-400 transition-transform duration-200"
394
+ :class="showDock ? 'rotate-180' : ''"
395
+ fill="none" stroke="currentColor" viewBox="0 0 24 24">
396
+ <path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M5 15l7-7 7 7"></path>
397
+ </svg>
398
+ </div>
399
+ <!-- Visual progress bar -->
400
+ <div class="w-32 h-1 bg-gray-200 dark:bg-gray-600 rounded-full overflow-hidden">
401
+ <div class="h-full bg-blue-500 transition-all duration-300"
402
+ :style="`width: ${(currentIndex / (totalSamples - 1)) * 100}%`"></div>
403
+ </div>
404
+ </div>
405
+ </div>
406
+
407
+ <div class="flex items-center space-x-1">
408
+ <input
409
+ type="number"
410
+ x-model.number="jumpToPage"
411
+ @keyup.enter="jumpToSample()"
412
+ :min="1"
413
+ :max="totalSamples"
414
+ class="w-20 px-2 py-1 text-sm border border-gray-300 rounded focus:ring-2 focus:ring-blue-500 focus:border-transparent dark:bg-gray-700 dark:border-gray-600 dark:text-white"
415
+ placeholder="Go to"
416
+ >
417
+ <button
418
+ @click="jumpToSample()"
419
+ class="px-2 py-1 text-sm bg-blue-600 text-white rounded hover:bg-blue-700 focus:outline-none focus:ring-2 focus:ring-blue-500"
420
+ >
421
+ Go
422
+ </button>
423
+ </div>
424
+
425
+ <button
426
+ @click="nextSample()"
427
+ :disabled="currentIndex >= (totalSamples - 1)"
428
+ class="px-4 py-2 bg-gray-200 dark:bg-gray-700 text-gray-700 dark:text-gray-300 rounded-md hover:bg-gray-300 dark:hover:bg-gray-600 disabled:opacity-50 disabled:cursor-not-allowed"
429
+ >
430
+ Next β†’
431
+ </button>
432
+ </div>
433
+
434
+ <div class="text-sm text-gray-500 dark:text-gray-400">
435
+ Press <kbd class="px-2 py-1 bg-gray-100 dark:bg-gray-700 rounded">J</kbd> / <kbd class="px-2 py-1 bg-gray-100 dark:bg-gray-700 rounded">K</kbd> or <kbd class="px-2 py-1 bg-gray-100 dark:bg-gray-700 rounded">←</kbd> / <kbd class="px-2 py-1 bg-gray-100 dark:bg-gray-700 rounded">β†’</kbd> to navigate | <kbd class="px-2 py-1 bg-gray-100 dark:bg-gray-700 rounded">V</kbd> for visual browser
436
+ </div>
437
+ </div>
438
+ </footer>
439
+
440
+ <!-- Local Scripts -->
441
+ <script src="js/diff-utils.js"></script>
442
+ <script src="js/dataset-api.js"></script>
443
+ <script src="js/app.js"></script>
444
+ </body>
445
+ </html>
style.css DELETED
@@ -1,28 +0,0 @@
1
- body {
2
- padding: 2rem;
3
- font-family: -apple-system, BlinkMacSystemFont, "Arial", sans-serif;
4
- }
5
-
6
- h1 {
7
- font-size: 16px;
8
- margin-top: 0;
9
- }
10
-
11
- p {
12
- color: rgb(107, 114, 128);
13
- font-size: 15px;
14
- margin-bottom: 10px;
15
- margin-top: 5px;
16
- }
17
-
18
- .card {
19
- max-width: 620px;
20
- margin: 0 auto;
21
- padding: 16px;
22
- border: 1px solid lightgray;
23
- border-radius: 16px;
24
- }
25
-
26
- .card p:last-child {
27
- margin-bottom: 0;
28
- }