bigmoyan commited on
Commit
9e6699f
·
verified ·
1 Parent(s): 61a01cb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -5
README.md CHANGED
@@ -21,12 +21,12 @@ tags:
21
  <p>
22
 
23
  <p align="center">
24
- Kimi-Audio-7B <a href="https://huggingface.co/moonshotai/Kimi-Audio-7B">🤗</a>&nbsp; | Kimi-Audio-7B-Instruct <a href="https://huggingface.co/moonshotai/Kimi-Audio-7B-Instruct">🤗</a>&nbsp; | 📑 <a href="https://raw.githubusercontent.com/MoonshotAI/Kimi-Audio/main/assets/kimia_report.pdf">Paper</a>
25
  </p>
26
 
27
  ## Introduction
28
 
29
- We present Kimi-Audio, an open-source audio foundation model excelling in **audio understanding, generation, and conversation**. This repository hosts the model checkpoints for Kimi-Audio-7B and Kimi-Audio-7B-Instruct.
30
 
31
  Kimi-Audio is designed as a universal audio foundation model capable of handling a wide variety of audio processing tasks within a single unified framework. Key features include:
32
 
@@ -41,14 +41,24 @@ For more details, please refer to our [GitHub Repository](https://github.com/Moo
41
 
42
  ## Requirements
43
 
44
- To run the inference code, you need to install the necessary dependencies. It's recommended to clone the main Kimi-Audio repository and install the `kimia_infer` package from there.
45
-
46
  ```bash
47
  git clone https://github.com/MoonshotAI/Kimi-Audio
48
  cd Kimi-Audio
49
- pip install -e . # install the package for inference
 
 
 
 
50
  ```
51
 
 
 
 
 
 
 
 
52
  ## Quickstart
53
 
54
  This example demonstrates basic usage for generating text from audio (ASR) and generating both text and speech in a conversational turn using the `Kimi-Audio-7B-Instruct` model.
 
21
  <p>
22
 
23
  <p align="center">
24
+ Kimi-Audio-7B-Instruct <a href="https://huggingface.co/moonshotai/Kimi-Audio-7B-Instruct">🤗</a>&nbsp; | 📑 <a href="https://raw.githubusercontent.com/MoonshotAI/Kimi-Audio/main/assets/kimia_report.pdf">Paper</a>
25
  </p>
26
 
27
  ## Introduction
28
 
29
+ We present Kimi-Audio, an open-source audio foundation model excelling in **audio understanding, generation, and conversation**. This repository hosts the model checkpoints for Kimi-Audio-7B-Instruct.
30
 
31
  Kimi-Audio is designed as a universal audio foundation model capable of handling a wide variety of audio processing tasks within a single unified framework. Key features include:
32
 
 
41
 
42
  ## Requirements
43
 
44
+ We recommend that you build a Docker image to run the inference. After cloning the inference code, you can construct the image using the `docker build` command.
 
45
  ```bash
46
  git clone https://github.com/MoonshotAI/Kimi-Audio
47
  cd Kimi-Audio
48
+ docker build -t kimi-audio:v0.1 .
49
+ ```
50
+ Alternatively, You can also use our pre-built image:
51
+ ```bash
52
+ docker pull moonshotai/kimi-audio:v0.1
53
  ```
54
 
55
+ Or, you can install requirments by:
56
+ ```bash
57
+ pip install -r requirements.txt
58
+ ```
59
+
60
+ You may refer to the Dockerfile in case of any environment issues.
61
+
62
  ## Quickstart
63
 
64
  This example demonstrates basic usage for generating text from audio (ASR) and generating both text and speech in a conversational turn using the `Kimi-Audio-7B-Instruct` model.