baidu
/

ERNIE-4.5-300B-A47B-PT

@@ -32,6 +32,10 @@ library_name: transformers
 # ERNIE-4.5-300B-A47B
 ## ERNIE 4.5 Highlights
 The advanced capabilities of the ERNIE 4.5 models, particularly the MoE-based A47B and A3B series, are underpinned by several key technical innovations:
@@ -59,92 +63,6 @@ ERNIE-4.5-300B-A47B is a text MoE Post-trained model, with 300B total parameters
 ## Quickstart
-### Model Finetuning with ERNIEKit
-[ERNIEKit](https://github.com/PaddlePaddle/ERNIE) is a training toolkit based on PaddlePaddle, specifically designed for the ERNIE series of open-source large models. It provides comprehensive support for scenarios such as instruction fine-tuning (SFT, LoRA) and alignment training (DPO), ensuring optimal performance.
-Usage Examples:
-```bash
-# Download model
-huggingface-cli download baidu/ERNIE-4.5-300B-A47B-Paddle --local-dir baidu/ERNIE-4.5-300B-A47B-Paddle
-# SFT
-erniekit train examples/configs/ERNIE-4.5-300B-A47B/sft/run_sft_wint8mix_lora_8k.yaml
-# DPO
-erniekit train examples/configs/ERNIE-4.5-300B-A47B/dpo/run_dpo_wint8mix_lora_8k.yaml
-```
-For more detailed examples, including SFT with LoRA, multi-GPU configurations, and advanced scripts, please refer to the examples folder within the [ERNIEKit](https://github.com/PaddlePaddle/ERNIE) repository.
-### Using FastDeploy
-Service deployment can be quickly completed using FastDeploy in the following command. For more detailed usage instructions, please refer to the [FastDeploy Repository](https://github.com/PaddlePaddle/FastDeploy).
-**Note**: To deploy on a configuration with 4 GPUs each having at least 80G of memory, specify ```--quantization wint4```. If you specify ```--quantization wint8```, then resources for 8 GPUs are required.
-```bash
-python -m fastdeploy.entrypoints.openai.api_server \
-       --model baidu/ERNIE-4.5-300B-A47B-Paddle \
-       --port 8180 \
-       --metrics-port 8181 \
-       --quantization wint4 \
-       --tensor-parallel-size 8 \
-       --engine-worker-queue-port 8182 \
-       --max-model-len 32768 \
-       --max-num-seqs 32
-```
-To deploy the W4A8C8 quantized version using FastDeploy, you can run the following command.
-```bash
-python -m fastdeploy.entrypoints.openai.api_server \
-       --model baidu/ERNIE-4.5-300B-A47B-W4A8C8-TP4-Paddle \
-       --port 8180 \
-       --metrics-port 8181 \
-       --engine-worker-queue-port 8182 \
-       --tensor-parallel-size 4 \
-       --max-model-len 32768 \
-       --max-num-seqs 32
-```
-To deploy the WINT2 quantized version using FastDeploy on a single 141G GPU, you can run the following command.
-```bash
-python -m fastdeploy.entrypoints.openai.api_server \
-       --model "baidu/ERNIE-4.5-300B-A47B-2Bits-Paddle" \
-       --port 8180 \
-       --metrics-port 8181 \
-       --engine-worker-queue-port 8182 \
-       --tensor-parallel-size 1 \
-       --max-model-len  32768 \
-       --max-num-seqs 128
-```
-The following contains a code snippet illustrating how to use ERNIE-4.5-300B-A47B-FP8 generate content based on given inputs.
-```python
-from fastdeploy import LLM, SamplingParams
-prompts = [
-    "Hello, my name is",
-]
-sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=128)
-model = "baidu/ERNIE-4.5-300B-A47B-FP8-Paddle"
-llm = LLM(model=model, tensor_parallel_size=8, max_model_len=8192, num_gpu_blocks_override=1024, engine_worker_queue_port=9981)
-outputs = llm.generate(prompts, sampling_params)
-for output in outputs:
-    prompt = output.prompt
-    generated_text = output.outputs.text
-    print("generated_text", generated_text)
-```
 ### Using `transformers` library
 **Note**: Before using the model, please ensure you have the `transformers` library installed (version 4.50.0 or higher)

 # ERNIE-4.5-300B-A47B
+> [!NOTE]
+> Note: "**-Paddle**" models use [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) weights, while "**-PT**" models use Transformer-style PyTorch weights.
 ## ERNIE 4.5 Highlights
 The advanced capabilities of the ERNIE 4.5 models, particularly the MoE-based A47B and A3B series, are underpinned by several key technical innovations:
 ## Quickstart
 ### Using `transformers` library
 **Note**: Before using the model, please ensure you have the `transformers` library installed (version 4.50.0 or higher)