JacobLinCool
/

gemma-3n-E2B-transcribe-zh-tw-1

Generated from Trainer

Model card Files Files and versions

Metrics Training metrics Community

JacobLinCool commited on Jun 28

Commit

b5b6ac9

·

verified ·

1 Parent(s): 28ac369

Update README.md

Files changed (1) hide show

README.md +56 -5

README.md CHANGED Viewed

@@ -17,12 +17,63 @@ It has been trained using [TRL](https://github.com/huggingface/trl).
 ## Quick start
 ```python
-from transformers import pipeline
-question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="JacobLinCool/gemma-3n-E2B-transcribe-zh-tw-1", device="cuda")
-output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
-print(output["generated_text"])
 ```
 ## Training procedure

 ## Quick start
 ```python
+import torch
+from peft import PeftModel
+from transformers import AutoModelForCausalLM, AutoProcessor
+device = "cuda" if torch.cuda.is_available() else "cpu"
+processor = AutoProcessor.from_pretrained("google/gemma-3n-E2B-it", device_map="auto")
+base_model = AutoModelForCausalLM.from_pretrained("google/gemma-3n-E2B-it")
+model = PeftModel.from_pretrained(
+    base_model, "JacobLinCool/gemma-3n-E2B-transcribe-zh-tw-1"
+).to(device)
+def trascribe(model, processor, audio):
+    messages = [
+        {
+            "role": "system",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "You are an assistant that transcribes speech accurately.",
+                }
+            ],
+        },
+        {
+            "role": "user",
+            "content": [
+                {"type": "audio", "audio": audio},
+                {"type": "text", "text": "Transcribe this audio."},
+            ],
+        },
+    ]
+    input_ids = processor.apply_chat_template(
+        messages,
+        add_generation_prompt=True,
+        tokenize=True,
+        return_dict=True,
+        return_tensors="pt",
+    )
+    input_ids = input_ids.to(device, dtype=model.dtype)
+    model.eval()
+    with torch.no_grad():
+        outputs = model.generate(**input_ids, max_new_tokens=128)
+    prediction = processor.batch_decode(
+        outputs, skip_special_tokens=True, clean_up_tokenization_spaces=False
+    )[0]
+    prediction = prediction.split("\nmodel\n")[-1].strip()
+    return prediction
+if __name__ == "__main__":
+    prediction = trascribe(model, processor, "/workspace/audio.mp3")
+    print(prediction)
 ```
 ## Training procedure