Commit
·
87b8a8a
1
Parent(s):
1bf36cc
ct2 translator
Browse files- Framework.md +62 -56
- dataset/audio/metadata/test1_segments_20250506_141232.json +126 -0
- translator/README.md +165 -0
- translator/translator.py +64 -108
Framework.md
CHANGED
@@ -1,71 +1,77 @@
|
|
1 |
-
|
2 |
|
3 |
-
|
4 |
|
5 |
```mermaid
|
6 |
graph TD
|
7 |
-
A[音频流输入] --> B[VAD]
|
8 |
-
B --> C[Transcribe]
|
9 |
-
C --> D[
|
|
|
10 |
|
11 |
-
|
12 |
-
|
|
|
13 |
|
14 |
-
F --> G[优化后回填模块]
|
15 |
-
G --> E
|
16 |
-
E --> H[翻译模块]
|
17 |
```
|
18 |
-
|
19 |
---
|
20 |
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
- **职责**:接收用户麦克风或远程语音流(如 WebRTC
|
25 |
-
- **特点**:持续运行的监听器,向下游推送 PCM
|
26 |
-
-
|
27 |
-
|
28 |
-
|
29 |
-
-
|
30 |
-
- **输出**:segment
|
31 |
-
-
|
32 |
-
-
|
33 |
-
|
34 |
-
|
35 |
-
-
|
36 |
-
-
|
37 |
-
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
-
|
44 |
-
-
|
45 |
-
|
46 |
-
|
47 |
-
-
|
48 |
-
-
|
49 |
-
-
|
50 |
-
|
51 |
-
|
52 |
-
-
|
53 |
-
-
|
54 |
-
-
|
55 |
-
|
56 |
-
|
57 |
-
-
|
58 |
-
-
|
59 |
-
-
|
60 |
-
|
61 |
-
|
62 |
-
-
|
63 |
-
-
|
64 |
-
-
|
65 |
|
66 |
---
|
67 |
|
68 |
### 🔧 模块功能说明
|
69 |
|
70 |
-
|
|
|
|
|
|
|
|
|
71 |
|
|
|
|
|
|
|
|
|
|
1 |
+
# 【伪流式音频转写 + LLM优化系统架构图】
|
2 |
|
3 |
+
## 🌊 总体流程图
|
4 |
|
5 |
```mermaid
|
6 |
graph TD
|
7 |
+
A[音频流输入] --> B[VAD (20ms)]
|
8 |
+
B --> C[Transcribe(200ms)]
|
9 |
+
C --> D[快速翻译模块(200ms)]
|
10 |
+
D --> E[即时输出模块(非确认状态)]
|
11 |
|
12 |
+
C --> F[翻译确认模块(可选优化)]
|
13 |
+
F --> G[优化翻译模块(LLM或重转录)(500ms)]
|
14 |
+
G --> H[异步输出模块(确认状态)]
|
15 |
|
|
|
|
|
|
|
16 |
```
|
|
|
17 |
---
|
18 |
|
19 |
+
## 🧱 模块划分(以伪流式为核心)
|
20 |
+
|
21 |
+
### **模块 A:音频流输入**
|
22 |
+
- **职责**:接收用户麦克风或远程语音流(如 WebRTC、WebSocket),将连续音频切分为帧(如每帧 20ms)。
|
23 |
+
- **特点**:持续运行的监听器,向下游推送 PCM 帧或 numpy array。
|
24 |
+
- **实时性保障**:限制帧缓冲长度(防止阻塞);异步 IO 实现(支持本地或 Web 场景)。
|
25 |
+
|
26 |
+
### **模块 B:VAD 分段器**
|
27 |
+
- **职责**:根据语音能量、静音检测、语音边界等逻辑将音频切分成语音段(segment)。
|
28 |
+
- **输出**:segment 音频数据块及时间戳。
|
29 |
+
- **特点**:基于滑动窗口,支持帧重叠;优化 Whisper 特征提取。
|
30 |
+
- **实时性保障**:极低延迟;segment 生成即推送下游模块。
|
31 |
+
|
32 |
+
### **模块 C:Whisper 转录模块**
|
33 |
+
- **职责**:对 VAD 输出的 segment 执行 Whisper 推理,生成转写文本。
|
34 |
+
- **输出**:原始文本段落(含时间戳)。
|
35 |
+
- **特点**:segment 单元并行处理;可通过 GPU 加速。
|
36 |
+
- **实时性保障**:每段 1~5 秒,支持异步 worker 并行转写。
|
37 |
+
|
38 |
+
### **模块 D:快速翻译模块**
|
39 |
+
- **职责**:在转写完成后立即对文本进行机器翻译(如使用 CTranslate2+NLLB 模型)。
|
40 |
+
- **输出**:翻译文本(第一时间展示用)。
|
41 |
+
- **特点**:轻量翻译模块,适配实时性需求。
|
42 |
+
- **实时性保障**:200ms 内完成翻译并传递至��示模块。
|
43 |
+
|
44 |
+
### **模块 E:即时输出模块(非确认状态)**
|
45 |
+
- **职责**:接收翻译结果,第一时间展示给用户。
|
46 |
+
- **特点**:无等待、无确认,仅为初版输出。
|
47 |
+
- **实时性保障**:面向用户 UI 的主响应路径,保证极低延迟。
|
48 |
+
|
49 |
+
### **模块 F:翻译确认模块(控制器)**
|
50 |
+
- **职责**:判断是否需要对当前句子进行 LLM 优化或更深层次的重转录。
|
51 |
+
- **特点**:分析内容质量、标点情况或上下文完整度,触发优化流程。
|
52 |
+
- **实时性保障**:判断延迟可控,不阻塞主流程。
|
53 |
+
|
54 |
+
### **模块 G:优化翻译模块(LLM或重转录)**
|
55 |
+
- **职责**:使用 LLM 或重新转写提升句子质量,适用于更复杂表达、用户配置优化等情景。
|
56 |
+
- **特点**:异步执行,支持任务排队与超时处理;高质量输出。
|
57 |
+
- **实时性保障**:不影响主路径,优化输出采用回填策略。
|
58 |
+
|
59 |
+
### **模块 H:异步输出模块(确认状态)**
|
60 |
+
- **职责**:将优化后的结果替换展示或做差分更新,供用户确认或查看。
|
61 |
+
- **特点**:支持区分原始和优化版本的展示策略。
|
62 |
+
- **实时性保障**:异步更新,不影响当前交互。
|
63 |
|
64 |
---
|
65 |
|
66 |
### 🔧 模块功能说明
|
67 |
|
68 |
+
上述模块可单独部署为微服务,也可组合为本地流式推理程序,适配不同设备和场景需求。
|
69 |
+
|
70 |
+
- Whisper 模块支持 CUDA / CPU 切换;
|
71 |
+
- 翻译模块支持 NLLB 量化模型,响应时间控制在百毫秒级;
|
72 |
+
- VAD 模块可基于 WebRTC VAD、Silero VAD 等方案替换。
|
73 |
|
74 |
+
未来可拓展功能包括:
|
75 |
+
- 多用户通话流识别(扬声器分离);
|
76 |
+
- 跨语种对话自动识别与应答生成;
|
77 |
+
- 可控 LLM 插槽,用于个性化纠错 / 术语优化等场景。
|
dataset/audio/metadata/test1_segments_20250506_141232.json
ADDED
@@ -0,0 +1,126 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"audio_file": "dataset/audio/test1.wav",
|
3 |
+
"timestamp": "20250506_141232",
|
4 |
+
"total_segments": 17,
|
5 |
+
"segments": [
|
6 |
+
{
|
7 |
+
"index": 0,
|
8 |
+
"start_time": 3.26,
|
9 |
+
"end_time": 3.92,
|
10 |
+
"duration": 0.6600000000000001,
|
11 |
+
"is_speech": true
|
12 |
+
},
|
13 |
+
{
|
14 |
+
"index": 1,
|
15 |
+
"start_time": 4.34,
|
16 |
+
"end_time": 5.56,
|
17 |
+
"duration": 1.2199999999999998,
|
18 |
+
"is_speech": true
|
19 |
+
},
|
20 |
+
{
|
21 |
+
"index": 2,
|
22 |
+
"start_time": 7.1,
|
23 |
+
"end_time": 7.8,
|
24 |
+
"duration": 0.7000000000000002,
|
25 |
+
"is_speech": true
|
26 |
+
},
|
27 |
+
{
|
28 |
+
"index": 3,
|
29 |
+
"start_time": 8.8,
|
30 |
+
"end_time": 12.44,
|
31 |
+
"duration": 3.639999999999999,
|
32 |
+
"is_speech": true
|
33 |
+
},
|
34 |
+
{
|
35 |
+
"index": 4,
|
36 |
+
"start_time": 12.8,
|
37 |
+
"end_time": 16.74,
|
38 |
+
"duration": 3.9399999999999977,
|
39 |
+
"is_speech": true
|
40 |
+
},
|
41 |
+
{
|
42 |
+
"index": 5,
|
43 |
+
"start_time": 17.32,
|
44 |
+
"end_time": 18.76,
|
45 |
+
"duration": 1.4400000000000013,
|
46 |
+
"is_speech": true
|
47 |
+
},
|
48 |
+
{
|
49 |
+
"index": 6,
|
50 |
+
"start_time": 19.76,
|
51 |
+
"end_time": 21.1,
|
52 |
+
"duration": 1.3399999999999999,
|
53 |
+
"is_speech": true
|
54 |
+
},
|
55 |
+
{
|
56 |
+
"index": 7,
|
57 |
+
"start_time": 21.62,
|
58 |
+
"end_time": 25.68,
|
59 |
+
"duration": 4.059999999999999,
|
60 |
+
"is_speech": true
|
61 |
+
},
|
62 |
+
{
|
63 |
+
"index": 8,
|
64 |
+
"start_time": 26.28,
|
65 |
+
"end_time": 28.2,
|
66 |
+
"duration": 1.9199999999999982,
|
67 |
+
"is_speech": true
|
68 |
+
},
|
69 |
+
{
|
70 |
+
"index": 9,
|
71 |
+
"start_time": 28.56,
|
72 |
+
"end_time": 31.6,
|
73 |
+
"duration": 3.0400000000000027,
|
74 |
+
"is_speech": true
|
75 |
+
},
|
76 |
+
{
|
77 |
+
"index": 10,
|
78 |
+
"start_time": 31.98,
|
79 |
+
"end_time": 33.2,
|
80 |
+
"duration": 1.2200000000000024,
|
81 |
+
"is_speech": true
|
82 |
+
},
|
83 |
+
{
|
84 |
+
"index": 11,
|
85 |
+
"start_time": 33.54,
|
86 |
+
"end_time": 36.52,
|
87 |
+
"duration": 2.980000000000004,
|
88 |
+
"is_speech": true
|
89 |
+
},
|
90 |
+
{
|
91 |
+
"index": 12,
|
92 |
+
"start_time": 37.82,
|
93 |
+
"end_time": 38.94,
|
94 |
+
"duration": 1.1199999999999974,
|
95 |
+
"is_speech": true
|
96 |
+
},
|
97 |
+
{
|
98 |
+
"index": 13,
|
99 |
+
"start_time": 39.34,
|
100 |
+
"end_time": 40.34,
|
101 |
+
"duration": 1.0,
|
102 |
+
"is_speech": true
|
103 |
+
},
|
104 |
+
{
|
105 |
+
"index": 14,
|
106 |
+
"start_time": 40.86,
|
107 |
+
"end_time": 42.4,
|
108 |
+
"duration": 1.5399999999999991,
|
109 |
+
"is_speech": true
|
110 |
+
},
|
111 |
+
{
|
112 |
+
"index": 15,
|
113 |
+
"start_time": 43.04,
|
114 |
+
"end_time": 46.6,
|
115 |
+
"duration": 3.5600000000000023,
|
116 |
+
"is_speech": true
|
117 |
+
},
|
118 |
+
{
|
119 |
+
"index": 16,
|
120 |
+
"start_time": 47.5,
|
121 |
+
"end_time": 49.8,
|
122 |
+
"duration": 2.299999999999997,
|
123 |
+
"is_speech": true
|
124 |
+
}
|
125 |
+
]
|
126 |
+
}
|
translator/README.md
ADDED
@@ -0,0 +1,165 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# 测试结果
|
2 |
+
|
3 |
+
```bash
|
4 |
+
2025-05-07 20:23:02,565 - translator - DEBUG - 使用设备: cuda
|
5 |
+
2025-05-07 20:23:04,366 - translator - INFO -
|
6 |
+
==== 测试用例 1 ====
|
7 |
+
2025-05-07 20:23:04,367 - translator - DEBUG - 开始翻译
|
8 |
+
2025-05-07 20:23:04,367 - translator - INFO - [翻译原文] 请问这附近有地铁站吗?
|
9 |
+
2025-05-07 20:23:04,367 - translator - DEBUG - 源语言: zho_Hans, 目标语言: eng_Latn
|
10 |
+
2025-05-07 20:23:04,515 - translator - DEBUG - 输出分词: ['eng_Latn', '▁Please', '▁ask', ',', '▁is', '▁there', '▁a', '▁rail', 'way', '▁station', '▁near', 'by', '?']
|
11 |
+
2025-05-07 20:23:04,516 - translator - DEBUG - 翻译完成: zho_Hans -> eng_Latn, 耗时: 146.86ms
|
12 |
+
2025-05-07 20:23:04,516 - translator - INFO - [翻译结果] Please ask, is there a railway station nearby?
|
13 |
+
2025-05-07 20:23:04,516 - translator - INFO - 最终翻译结果: Please ask, is there a railway station nearby?
|
14 |
+
2025-05-07 20:23:04,516 - translator - INFO - 总耗时: 148.93ms
|
15 |
+
2025-05-07 20:23:04,516 - translator - INFO -
|
16 |
+
==== 测试用例 2 ====
|
17 |
+
2025-05-07 20:23:04,517 - translator - DEBUG - 开始翻译
|
18 |
+
2025-05-07 20:23:04,517 - translator - INFO - [翻译原文] 我们今天要讨论人工智能的发展趋势。
|
19 |
+
2025-05-07 20:23:04,517 - translator - DEBUG - 源语言: zho_Hans, 目标语言: eng_Latn
|
20 |
+
2025-05-07 20:23:04,628 - translator - DEBUG - 输出分词: ['eng_Latn', '▁We', '▁are', '▁going', '▁to', '▁discuss', '▁today', '▁the', '▁tr', 'ends', '▁in', '▁the', '▁development', '▁of', '▁artificial', '▁intelligence', '.']
|
21 |
+
2025-05-07 20:23:04,628 - translator - DEBUG - 翻译完成: zho_Hans -> eng_Latn, 耗时: 111.20ms
|
22 |
+
2025-05-07 20:23:04,628 - translator - INFO - [翻译结果] We are going to discuss today the trends in the development of artificial intelligence.
|
23 |
+
2025-05-07 20:23:04,628 - translator - INFO - 最终翻译结果: We are going to discuss today the trends in the development of artificial intelligence.
|
24 |
+
2025-05-07 20:23:04,628 - translator - INFO - 总耗时: 111.20ms
|
25 |
+
2025-05-07 20:23:04,628 - translator - INFO -
|
26 |
+
==== 测试用例 3 ====
|
27 |
+
2025-05-07 20:23:04,628 - translator - DEBUG - 开始翻译
|
28 |
+
2025-05-07 20:23:04,628 - translator - INFO - [翻译原文] 他的回答令人非常失望。
|
29 |
+
2025-05-07 20:23:04,628 - translator - DEBUG - 源语言: zho_Hans, 目标语言: eng_Latn
|
30 |
+
2025-05-07 20:23:04,684 - translator - DEBUG - 输出分词: ['eng_Latn', '▁His', '▁answer', '▁was', '▁very', '▁disappoint', 'ing', '.']
|
31 |
+
2025-05-07 20:23:04,684 - translator - DEBUG - 翻译完成: zho_Hans -> eng_Latn, 耗时: 55.06ms
|
32 |
+
2025-05-07 20:23:04,684 - translator - INFO - [翻译结果] His answer was very disappointing.
|
33 |
+
2025-05-07 20:23:04,684 - translator - INFO - 最终翻译结果: His answer was very disappointing.
|
34 |
+
2025-05-07 20:23:04,684 - translator - INFO - 总耗时: 56.07ms
|
35 |
+
2025-05-07 20:23:04,684 - translator - INFO -
|
36 |
+
==== 测试用例 4 ====
|
37 |
+
2025-05-07 20:23:04,684 - translator - DEBUG - 开始翻译
|
38 |
+
2025-05-07 20:23:04,684 - translator - INFO - [翻译原文] 这个项目已经进行了三个月,还需要更多资源支持。
|
39 |
+
2025-05-07 20:23:04,684 - translator - DEBUG - 源语言: zho_Hans, 目标语言: eng_Latn
|
40 |
+
2025-05-07 20:23:04,787 - translator - DEBUG - 输出分词: ['eng_Latn', '▁The', '▁project', '▁has', '▁been', '▁running', '▁for', '▁three', '▁months', '▁and', '▁requires', '▁more', '▁resources', '▁to', '▁support', '▁it', '.']
|
41 |
+
2025-05-07 20:23:04,788 - translator - DEBUG - 翻译完成: zho_Hans -> eng_Latn, 耗时: 102.36ms
|
42 |
+
2025-05-07 20:23:04,788 - translator - INFO - [翻译结果] The project has been running for three months and requires more resources to support it.
|
43 |
+
2025-05-07 20:23:04,788 - translator - INFO - 最终翻译结果: The project has been running for three months and requires more resources to support it.
|
44 |
+
2025-05-07 20:23:04,788 - translator - INFO - 总耗时: 104.35ms
|
45 |
+
2025-05-07 20:23:04,788 - translator - INFO -
|
46 |
+
==== 测试用例 5 ====
|
47 |
+
2025-05-07 20:23:04,788 - translator - DEBUG - 开始翻译
|
48 |
+
2025-05-07 20:23:04,788 - translator - INFO - [翻译原文] 天气预报说明天会有暴雨,请大家注意安全。
|
49 |
+
2025-05-07 20:23:04,788 - translator - DEBUG - 源语言: zho_Hans, 目标语言: eng_Latn
|
50 |
+
2025-05-07 20:23:04,898 - translator - DEBUG - 输出分词: ['eng_Latn', '▁Weather', '▁fore', 'cas', 'ts', '▁indicate', '▁that', '▁there', '▁will', '▁be', '▁heavy', '▁rain', ',', '▁please', '▁pay', '▁attention', '▁to', '▁safety', '.']
|
51 |
+
2025-05-07 20:23:04,898 - translator - DEBUG - 翻译完成: zho_Hans -> eng_Latn, 耗时: 109.08ms
|
52 |
+
2025-05-07 20:23:04,898 - translator - INFO - [翻译结果] Weather forecasts indicate that there will be heavy rain, please pay attention to safety.
|
53 |
+
2025-05-07 20:23:04,899 - translator - INFO - 最终翻译结果: Weather forecasts indicate that there will be heavy rain, please pay attention to safety.
|
54 |
+
2025-05-07 20:23:04,899 - translator - INFO - 总耗时: 110.14ms
|
55 |
+
2025-05-07 20:23:04,899 - translator - INFO -
|
56 |
+
==== 测试用例 6 ====
|
57 |
+
2025-05-07 20:23:04,899 - translator - DEBUG - 开始翻译
|
58 |
+
2025-05-07 20:23:04,899 - translator - INFO - [翻译原文] 是时候重新思考我们的计划了。
|
59 |
+
2025-05-07 20:23:04,899 - translator - DEBUG - 源语言: zho_Hans, 目标语言: eng_Latn
|
60 |
+
2025-05-07 20:23:04,976 - translator - DEBUG - 输出分词: ['eng_Latn', '▁It', "'", 's', '▁time', '▁to', '▁r', 'eth', 'ink', '▁our', '▁plans', '.']
|
61 |
+
2025-05-07 20:23:04,976 - translator - DEBUG - 翻译完成: zho_Hans -> eng_Latn, 耗时: 77.24ms
|
62 |
+
2025-05-07 20:23:04,976 - translator - INFO - [翻译结果] It's time to rethink our plans.
|
63 |
+
2025-05-07 20:23:04,976 - translator - INFO - 最终翻译结果: It's time to rethink our plans.
|
64 |
+
2025-05-07 20:23:04,976 - translator - INFO - 总耗时: 77.76ms
|
65 |
+
2025-05-07 20:23:04,976 - translator - INFO -
|
66 |
+
==== 测试用例 7 ====
|
67 |
+
2025-05-07 20:23:04,976 - translator - DEBUG - 开始翻译
|
68 |
+
2025-05-07 20:23:04,977 - translator - INFO - [翻译原文] 我对这个结果非常满意,感谢你的努力。
|
69 |
+
2025-05-07 20:23:04,977 - translator - DEBUG - 源语言: zho_Hans, 目标语言: eng_Latn
|
70 |
+
2025-05-07 20:23:05,076 - translator - DEBUG - 输出分词: ['eng_Latn', '▁I', "'", 'm', '▁very', '▁happy', '▁with', '▁this', '▁result', ',', '▁thank', '▁you', '▁for', '▁your', '▁efforts', '.']
|
71 |
+
2025-05-07 20:23:05,076 - translator - DEBUG - 翻译完成: zho_Hans -> eng_Latn, 耗时: 98.25ms
|
72 |
+
2025-05-07 20:23:05,076 - translator - INFO - [翻译结果] I'm very happy with this result, thank you for your efforts.
|
73 |
+
2025-05-07 20:23:05,076 - translator - INFO - 最终翻译结果: I'm very happy with this result, thank you for your efforts.
|
74 |
+
2025-05-07 20:23:05,076 - translator - INFO - 总耗时: 99.88ms
|
75 |
+
2025-05-07 20:23:05,076 - translator - INFO -
|
76 |
+
==== 测试用例 8 ====
|
77 |
+
2025-05-07 20:23:05,076 - translator - DEBUG - 开始翻译
|
78 |
+
2025-05-07 20:23:05,077 - translator - INFO - [翻译原文] 她穿着一件红色的连衣裙,在人群中格外显眼。
|
79 |
+
2025-05-07 20:23:05,077 - translator - DEBUG - 源语言: zho_Hans, 目标语言: eng_Latn
|
80 |
+
2025-05-07 20:23:05,178 - translator - DEBUG - 输出分词: ['eng_Latn', '▁She', '▁we', 'ars', '▁a', '▁red', '▁dress', ',', '▁which', '▁is', '▁very', '▁prom', 'inent', '▁among', '▁the', '▁crowd', '.']
|
81 |
+
2025-05-07 20:23:05,178 - translator - DEBUG - 翻译完成: zho_Hans -> eng_Latn, 耗时: 100.78ms
|
82 |
+
2025-05-07 20:23:05,178 - translator - INFO - [翻译结果] She wears a red dress, which is very prominent among the crowd.
|
83 |
+
2025-05-07 20:23:05,178 - translator - INFO - 最终翻译结果: She wears a red dress, which is very prominent among the crowd.
|
84 |
+
2025-05-07 20:23:05,179 - translator - INFO - 总耗时: 102.00ms
|
85 |
+
2025-05-07 20:23:05,179 - translator - INFO -
|
86 |
+
==== 测试用例 9 ====
|
87 |
+
2025-05-07 20:23:05,179 - translator - DEBUG - 开始翻译
|
88 |
+
2025-05-07 20:23:05,179 - translator - INFO - [翻译原文] Can you help me find the nearest bus station?
|
89 |
+
2025-05-07 20:23:05,179 - translator - DEBUG - 源语言: eng_Latn, 目标语言: zho_Hans
|
90 |
+
2025-05-07 20:23:05,271 - translator - DEBUG - 输出分词: ['zho_Hans', '▁你', '能', '帮', '我', '找到', '最近', '的', '公 共', '汽', '车', '站', '吗', '?']
|
91 |
+
2025-05-07 20:23:05,271 - translator - DEBUG - 翻译完成: eng_Latn -> zho_Hans, 耗时: 91.77ms
|
92 |
+
2025-05-07 20:23:05,271 - translator - INFO - [翻译结果] 你能帮我找到最近的公共汽车站吗?
|
93 |
+
2025-05-07 20:23:05,271 - translator - INFO - 最终翻译结果: 你能帮我找到最近的公共汽车站吗?
|
94 |
+
2025-05-07 20:23:05,272 - translator - INFO - 总耗时: 91.77ms
|
95 |
+
2025-05-07 20:23:05,272 - translator - INFO -
|
96 |
+
==== 测试用例 10 ====
|
97 |
+
2025-05-07 20:23:05,272 - translator - DEBUG - 开始翻译
|
98 |
+
2025-05-07 20:23:05,272 - translator - INFO - [翻译原文] The machine learning model achieved an accuracy of 95%.
|
99 |
+
2025-05-07 20:23:05,272 - translator - DEBUG - 源语言: eng_Latn, 目标语言: zho_Hans
|
100 |
+
2025-05-07 20:23:05,368 - translator - DEBUG - 输出分词: ['zho_Hans', '▁', '机', '器', '学习', '模型', '达到', '9', '5%', '的', '准', '确', '性', '.']
|
101 |
+
2025-05-07 20:23:05,368 - translator - DEBUG - 翻译完成: eng_Latn -> zho_Hans, 耗时: 95.58ms
|
102 |
+
2025-05-07 20:23:05,369 - translator - INFO - [翻译结果] 机器学习模型达到95%的准确性.
|
103 |
+
2025-05-07 20:23:05,369 - translator - INFO - 最终翻译结果: 机器学习模型达到95%的准确性.
|
104 |
+
2025-05-07 20:23:05,369 - translator - INFO - 总耗时: 96.62ms
|
105 |
+
2025-05-07 20:23:05,369 - translator - INFO -
|
106 |
+
==== 测试用例 11 ====
|
107 |
+
2025-05-07 20:23:05,370 - translator - DEBUG - 开始翻译
|
108 |
+
2025-05-07 20:23:05,370 - translator - INFO - [翻译原文] He was overwhelmed by the unexpected response from the audience.
|
109 |
+
2025-05-07 20:23:05,370 - translator - DEBUG - 源语言: eng_Latn, 目标语言: zho_Hans
|
110 |
+
2025-05-07 20:23:05,471 - translator - DEBUG - 输出分词: ['zho_Hans', '▁他', '被', '观', '众', '的', '意', '想', '不', ' 到', '的', '反应', '压', '倒', '了', '.']
|
111 |
+
2025-05-07 20:23:05,471 - translator - DEBUG - 翻译完成: eng_Latn -> zho_Hans, 耗时: 100.42ms
|
112 |
+
2025-05-07 20:23:05,472 - translator - INFO - [翻译结果] 他被观众的意想不到的反应压倒了.
|
113 |
+
2025-05-07 20:23:05,472 - translator - INFO - 最终翻译结果: 他被观众的意想不到的反应压倒了.
|
114 |
+
2025-05-07 20:23:05,472 - translator - INFO - 总耗时: 102.39ms
|
115 |
+
2025-05-07 20:23:05,472 - translator - INFO -
|
116 |
+
==== 测试用例 12 ====
|
117 |
+
2025-05-07 20:23:05,472 - translator - DEBUG - 开始翻译
|
118 |
+
2025-05-07 20:23:05,472 - translator - INFO - [翻译原文] It’s important to stay hydrated during hot summer days.
|
119 |
+
2025-05-07 20:23:05,473 - translator - DEBUG - 源语言: eng_Latn, 目标语言: zho_Hans
|
120 |
+
2025-05-07 20:23:05,557 - translator - DEBUG - 输出分词: ['zho_Hans', '▁在', '炎', '热', '的', '夏', '天', '保持', '水', '分', '很', '重要', '.']
|
121 |
+
2025-05-07 20:23:05,557 - translator - DEBUG - 翻译完成: eng_Latn -> zho_Hans, 耗时: 84.14ms
|
122 |
+
2025-05-07 20:23:05,557 - translator - INFO - [翻译结果] 在炎热的夏天保持水分很重要.
|
123 |
+
2025-05-07 20:23:05,557 - translator - INFO - 最终翻译结果: 在炎热的夏天保持水分很重要.
|
124 |
+
2025-05-07 20:23:05,557 - translator - INFO - 总耗时: 85.14ms
|
125 |
+
2025-05-07 20:23:05,557 - translator - INFO -
|
126 |
+
==== 测试用例 13 ====
|
127 |
+
2025-05-07 20:23:05,557 - translator - DEBUG - 开始翻译
|
128 |
+
2025-05-07 20:23:05,557 - translator - INFO - [翻译原文] Although she was tired, she continued working late into the night.
|
129 |
+
2025-05-07 20:23:05,557 - translator - DEBUG - 源语言: eng_Latn, 目标语言: zho_Hans
|
130 |
+
2025-05-07 20:23:05,649 - translator - DEBUG - 输出分词: ['zho_Hans', '▁', '虽然', '她', '很', '累', ',', '但', '她', '继续', '工作', '直到', '深', '夜', '.']
|
131 |
+
2025-05-07 20:23:05,650 - translator - DEBUG - 翻译完成: eng_Latn -> zho_Hans, 耗时: 92.03ms
|
132 |
+
2025-05-07 20:23:05,650 - translator - INFO - [翻译结果] 虽然她很累,但她继续工作直到深夜.
|
133 |
+
2025-05-07 20:23:05,650 - translator - INFO - 最终翻译结果: 虽然她很累,但她继续工作直到深夜.
|
134 |
+
2025-05-07 20:23:05,650 - translator - INFO - 总耗时: 93.03ms
|
135 |
+
2025-05-07 20:23:05,650 - translator - INFO -
|
136 |
+
==== 测试用例 14 ====
|
137 |
+
2025-05-07 20:23:05,650 - translator - DEBUG - 开始翻译
|
138 |
+
2025-05-07 20:23:05,650 - translator - INFO - [翻译原文] The concert was amazing, and the crowd was full of energy.
|
139 |
+
2025-05-07 20:23:05,650 - translator - DEBUG - 源语言: eng_Latn, 目标语言: zho_Hans
|
140 |
+
2025-05-07 20:23:05,747 - translator - DEBUG - 输出分词: ['zho_Hans', '▁', '音乐', '会', '是', '惊', '人的', ',', '群', '众', '充', '满', '了', '能量', '.']
|
141 |
+
2025-05-07 20:23:05,747 - translator - DEBUG - 翻译完成: eng_Latn -> zho_Hans, 耗时: 95.60ms
|
142 |
+
2025-05-07 20:23:05,747 - translator - INFO - [翻译结果] 音乐会是惊人的,群众充满了能量.
|
143 |
+
2025-05-07 20:23:05,748 - translator - INFO - 最终翻译结果: 音乐会是惊人的,群众充满了能量.
|
144 |
+
2025-05-07 20:23:05,748 - translator - INFO - 总耗时: 97.54ms
|
145 |
+
2025-05-07 20:23:05,748 - translator - INFO -
|
146 |
+
==== 测试用例 15 ====
|
147 |
+
2025-05-07 20:23:05,748 - translator - DEBUG - 开始翻译
|
148 |
+
2025-05-07 20:23:05,748 - translator - INFO - [翻译原文] Please make sure to submit your application before the deadline.
|
149 |
+
2025-05-07 20:23:05,748 - translator - DEBUG - 源语言: eng_Latn, 目标语言: zho_Hans
|
150 |
+
2025-05-07 20:23:05,817 - translator - DEBUG - 输出分词: ['zho_Hans', '▁请', '确保', '在', '截', '止', '日', '期', '之前', '提交', '申请', '.']
|
151 |
+
2025-05-07 20:23:05,817 - translator - DEBUG - 翻译完成: eng_Latn -> zho_Hans, 耗时: 69.40ms
|
152 |
+
2025-05-07 20:23:05,817 - translator - INFO - [翻译结果] 请确保在截止日期之前提交申请.
|
153 |
+
2025-05-07 20:23:05,817 - translator - INFO - 最终翻译结果: 请确保在截止日期之前提交申请.
|
154 |
+
2025-05-07 20:23:05,817 - translator - INFO - 总耗时: 69.40ms
|
155 |
+
2025-05-07 20:23:05,817 - translator - INFO -
|
156 |
+
==== 测试用例 16 ====
|
157 |
+
2025-05-07 20:23:05,817 - translator - DEBUG - 开始翻译
|
158 |
+
2025-05-07 20:23:05,817 - translator - INFO - [翻译原文] After months of preparation, the product was finally launched.
|
159 |
+
2025-05-07 20:23:05,817 - translator - DEBUG - 源语言: eng_Latn, 目标语言: zho_Hans
|
160 |
+
2025-05-07 20:23:05,920 - translator - DEBUG - 输出分词: ['zho_Hans', '▁', '经', '过', '数', '月', '的', '准', '备', ',', '该', '产', '品', '最终', '推', '出', '.']
|
161 |
+
2025-05-07 20:23:05,920 - translator - DEBUG - 翻译完成: eng_Latn -> zho_Hans, 耗时: 102.10ms
|
162 |
+
2025-05-07 20:23:05,921 - translator - INFO - [翻译结果] 经过数月的准备,该产品最终推出.
|
163 |
+
2025-05-07 20:23:05,921 - translator - INFO - 最终翻译结果: 经过数月的准备,该产品最终推出.
|
164 |
+
2025-05-07 20:23:05,921 - translator - INFO - 总耗时: 104.10ms
|
165 |
+
```
|
translator/translator.py
CHANGED
@@ -1,8 +1,9 @@
|
|
1 |
-
|
2 |
-
翻译模块 - 使用NLLB模型进行多语言翻译
|
3 |
-
|
4 |
|
5 |
-
from
|
|
|
6 |
from langdetect import detect
|
7 |
import torch
|
8 |
import time
|
@@ -10,140 +11,95 @@ import logging
|
|
10 |
|
11 |
# 配置日志
|
12 |
def setup_logger(name, level=logging.INFO):
|
13 |
-
"""设置日志记录器"""
|
14 |
logger = logging.getLogger(name)
|
15 |
-
# 清除所有已有的handler,避免重复
|
16 |
if logger.handlers:
|
17 |
logger.handlers.clear()
|
18 |
-
|
19 |
-
# 添加新的handler
|
20 |
handler = logging.StreamHandler()
|
21 |
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
|
22 |
handler.setFormatter(formatter)
|
23 |
logger.addHandler(handler)
|
24 |
logger.setLevel(level)
|
25 |
-
# 禁止传播到父logger,避免重复日志
|
26 |
logger.propagate = False
|
27 |
return logger
|
28 |
|
29 |
-
# 创建日志记录器
|
30 |
logger = setup_logger("translator")
|
31 |
|
32 |
class NLLBTranslator:
|
33 |
-
"""
|
34 |
-
|
35 |
-
"""
|
36 |
-
|
37 |
-
def __init__(self, model_name="facebook/nllb-200-distilled-600M", default_target="eng_Latn"):
|
38 |
-
"""
|
39 |
-
初始化NLLB翻译器
|
40 |
-
|
41 |
-
:param model_name: 模型名称
|
42 |
-
:param default_target: 默认目标语言代码
|
43 |
-
"""
|
44 |
-
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
45 |
logger.debug(f"使用设备: {self.device}")
|
46 |
-
|
47 |
-
if self.device.type == "cuda":
|
48 |
-
logger.debug(f"GPU设备: {torch.cuda.get_device_name(0)}")
|
49 |
-
total_mem = torch.cuda.get_device_properties(0).total_memory / 1024**3
|
50 |
-
logger.debug(f"GPU显存: {total_mem:.1f} GB")
|
51 |
-
|
52 |
-
# 加载模型和分词器
|
53 |
-
logger.debug(f"加载模型: {model_name}")
|
54 |
-
self.tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
|
55 |
-
self.model = AutoModelForSeq2SeqLM.from_pretrained(
|
56 |
-
model_name,
|
57 |
-
torch_dtype=torch.float16 if self.device.type == "cuda" else torch.float32
|
58 |
-
).to(self.device)
|
59 |
|
|
|
|
|
60 |
self.default_target = default_target
|
61 |
-
logger.debug(f"翻译器初始化完成,默认目标语言: {default_target}")
|
62 |
-
|
63 |
-
def detect_lang_code(self, text: str) -> str:
|
64 |
-
"""
|
65 |
-
检测文本语言并返回NLLB语言代码
|
66 |
-
|
67 |
-
:param text: 要检测的文本
|
68 |
-
:return: NLLB语言代码
|
69 |
-
"""
|
70 |
-
try:
|
71 |
-
lang = detect(text)
|
72 |
-
logger.debug(f"检测到语言: {lang}")
|
73 |
-
except Exception:
|
74 |
-
logger.debug("语言检测失败,默认使用中文(zh)")
|
75 |
-
lang = "zh-cn"
|
76 |
-
|
77 |
-
# 语言代码映射
|
78 |
-
lang_map = {
|
79 |
-
"zh-cn": "zho_Hans", "zh": "zho_Hans", "en": "eng_Latn", "fr": "fra_Latn",
|
80 |
-
"de": "deu_Latn", "ja": "jpn_Jpan", "ko": "kor_Hang", "ar": "arb_Arab"
|
81 |
-
}
|
82 |
-
|
83 |
-
lang_code = lang_map.get(lang.lower(), "eng_Latn")
|
84 |
-
logger.debug(f"映射语言代码: {lang} -> {lang_code}")
|
85 |
-
return lang_code
|
86 |
|
87 |
-
def translate(self, text: str, target_lang_code: str = None) -> str:
|
88 |
-
"""
|
89 |
-
翻译文本到目标语言
|
90 |
-
|
91 |
-
:param text: 要翻译的文本
|
92 |
-
:param target_lang_code: 目标语言代码,如果为None则使用默认目标语言
|
93 |
-
:return: 翻译后的文本
|
94 |
-
"""
|
95 |
logger.debug("开始翻译")
|
96 |
-
|
97 |
-
# 记录原文(INFO级别)
|
98 |
logger.info(f"[翻译原文] {text}")
|
99 |
|
100 |
-
|
101 |
-
src_lang = self.detect_lang_code(text)
|
102 |
tgt_lang = target_lang_code or self.default_target
|
103 |
-
|
104 |
-
# 准备输入
|
105 |
-
self.tokenizer.src_lang = src_lang
|
106 |
-
inputs = self.tokenizer(text, return_tensors="pt", padding=True, truncation=True).to(self.device)
|
107 |
-
inputs["forced_bos_token_id"] = self.tokenizer.convert_tokens_to_ids(tgt_lang)
|
108 |
-
|
109 |
-
# 执行翻译
|
110 |
-
start = time.time()
|
111 |
-
with torch.no_grad():
|
112 |
-
output = self.model.generate(**inputs, max_new_tokens=80)
|
113 |
|
114 |
-
|
115 |
-
result = self.tokenizer.decode(output[0], skip_special_tokens=True)
|
116 |
|
117 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
118 |
duration = time.time() - start
|
119 |
-
|
|
|
|
|
120 |
|
121 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
122 |
logger.info(f"[翻译结果] {result}")
|
123 |
-
|
124 |
-
return result
|
125 |
|
|
|
126 |
|
127 |
if __name__ == "__main__":
|
128 |
-
# 设置日志级别为DEBUG以查看详细信息
|
129 |
logger.setLevel(logging.DEBUG)
|
130 |
-
|
131 |
-
# 创建翻译器
|
132 |
translator = NLLBTranslator()
|
133 |
|
134 |
-
|
135 |
-
|
136 |
-
|
137 |
-
|
138 |
-
|
139 |
-
|
140 |
-
|
141 |
-
|
142 |
-
|
143 |
-
|
144 |
-
|
145 |
-
|
146 |
-
|
147 |
-
|
148 |
-
|
149 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
'''
|
2 |
+
翻译模块 - 使用CTranslate2加速的NLLB模型进行多语言翻译
|
3 |
+
'''
|
4 |
|
5 |
+
from ctranslate2 import Translator
|
6 |
+
from transformers import AutoTokenizer
|
7 |
from langdetect import detect
|
8 |
import torch
|
9 |
import time
|
|
|
11 |
|
12 |
# 配置日志
|
13 |
def setup_logger(name, level=logging.INFO):
|
|
|
14 |
logger = logging.getLogger(name)
|
|
|
15 |
if logger.handlers:
|
16 |
logger.handlers.clear()
|
|
|
|
|
17 |
handler = logging.StreamHandler()
|
18 |
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
|
19 |
handler.setFormatter(formatter)
|
20 |
logger.addHandler(handler)
|
21 |
logger.setLevel(level)
|
|
|
22 |
logger.propagate = False
|
23 |
return logger
|
24 |
|
|
|
25 |
logger = setup_logger("translator")
|
26 |
|
27 |
class NLLBTranslator:
|
28 |
+
def __init__(self, model_dir="nllb-600m-ct2-int8-fp16", default_target="eng_Latn"):
|
29 |
+
self.device = "cuda" if torch.cuda.is_available() else "cpu"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
logger.debug(f"使用设备: {self.device}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
|
32 |
+
self.tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
|
33 |
+
self.translator = Translator(model_dir, device=self.device, compute_type="int8_float16")
|
34 |
self.default_target = default_target
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|
36 |
+
def translate(self, text: str, source_lang_code: str, target_lang_code: str = None) -> str:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
logger.debug("开始翻译")
|
|
|
|
|
38 |
logger.info(f"[翻译原文] {text}")
|
39 |
|
40 |
+
src_lang = source_lang_code
|
|
|
41 |
tgt_lang = target_lang_code or self.default_target
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
|
43 |
+
logger.debug(f"源语言: {src_lang}, 目标语言: {tgt_lang}")
|
|
|
44 |
|
45 |
+
# # 使用NLLB的标准格式处理源语言和目标语言
|
46 |
+
source = self.tokenizer.convert_ids_to_tokens(self.tokenizer.encode(text))
|
47 |
+
|
48 |
+
start = time.time()
|
49 |
+
target_prefix = [tgt_lang]
|
50 |
+
results = self.translator.translate_batch(
|
51 |
+
[source],
|
52 |
+
#beam_size=6,
|
53 |
+
length_penalty=1.2,
|
54 |
+
target_prefix=[target_prefix]
|
55 |
+
)
|
56 |
duration = time.time() - start
|
57 |
+
|
58 |
+
output_tokens = results[0].hypotheses[0]
|
59 |
+
logger.debug(f"输出分词: {output_tokens}")
|
60 |
|
61 |
+
# 转换输出tokens为文本并清理
|
62 |
+
result = self.tokenizer.convert_tokens_to_string(output_tokens)
|
63 |
+
result = result.replace("<pad>", "").replace("</s>", "").replace("<s>", "").strip()
|
64 |
+
for lang_code in ["kor_Hang", "eng_Latn", "zho_Hans", "jpn_Jpan", "fra_Latn", "deu_Latn", "arb_Arab"]:
|
65 |
+
result = result.replace(lang_code, "").strip()
|
66 |
+
|
67 |
+
logger.debug(f"翻译完成: {src_lang} -> {tgt_lang}, 耗时: {duration * 1000:.2f}ms")
|
68 |
logger.info(f"[翻译结果] {result}")
|
|
|
|
|
69 |
|
70 |
+
return result
|
71 |
|
72 |
if __name__ == "__main__":
|
|
|
73 |
logger.setLevel(logging.DEBUG)
|
|
|
|
|
74 |
translator = NLLBTranslator()
|
75 |
|
76 |
+
test_cases = [
|
77 |
+
# 中文 -> 英文
|
78 |
+
("请问这附近有地铁站吗?", "zho_Hans", "eng_Latn"),
|
79 |
+
("我们今天要讨论人工智能的发展趋势。", "zho_Hans", "eng_Latn"),
|
80 |
+
("他的回答令人非常失望。", "zho_Hans", "eng_Latn"),
|
81 |
+
("这个项目已经进行了三个月,还需要更多资源支持。", "zho_Hans", "eng_Latn"),
|
82 |
+
("天气预报说明天会有暴雨,请大家注意安全。", "zho_Hans", "eng_Latn"),
|
83 |
+
("是时候重新思考我们的计划了。", "zho_Hans", "eng_Latn"),
|
84 |
+
("我对这个结果非常满意,感谢你的努力。", "zho_Hans", "eng_Latn"),
|
85 |
+
("她穿着一件红色的连衣裙,在人群中格外显眼。", "zho_Hans", "eng_Latn"),
|
86 |
+
|
87 |
+
# 英文 -> 中文
|
88 |
+
("Can you help me find the nearest bus station?", "eng_Latn", "zho_Hans"),
|
89 |
+
("The machine learning model achieved an accuracy of 95%.", "eng_Latn", "zho_Hans"),
|
90 |
+
("He was overwhelmed by the unexpected response from the audience.", "eng_Latn", "zho_Hans"),
|
91 |
+
("It’s important to stay hydrated during hot summer days.", "eng_Latn", "zho_Hans"),
|
92 |
+
("Although she was tired, she continued working late into the night.", "eng_Latn", "zho_Hans"),
|
93 |
+
("The concert was amazing, and the crowd was full of energy.", "eng_Latn", "zho_Hans"),
|
94 |
+
("Please make sure to submit your application before the deadline.", "eng_Latn", "zho_Hans"),
|
95 |
+
("After months of preparation, the product was finally launched.", "eng_Latn", "zho_Hans")
|
96 |
+
]
|
97 |
+
|
98 |
+
|
99 |
+
for i, (text, src_lang, tgt_lang) in enumerate(test_cases):
|
100 |
+
logger.info(f"\n==== 测试用例 {i + 1} ====")
|
101 |
+
start_total = time.time()
|
102 |
+
result = translator.translate(text, source_lang_code=src_lang, target_lang_code=tgt_lang)
|
103 |
+
end_total = time.time()
|
104 |
+
logger.info(f"最终翻译结果: {result}")
|
105 |
+
logger.info(f"总耗时: {(end_total - start_total) * 1000:.2f}ms")
|