简介 Brief Introduction

姜子牙写作大模型V2是基于LlaMa-2的130亿参数的指令微调模型,在写作任务上进行了能力增强,是专注于写作的大模型。姜子牙写作模型可以完成公文报告、讲稿书信、创意文案等多类的写作任务。

Ziya-Writing-13B-v2 is a 13-billion parameter instruction fine-tuned model based on LlaMa-2, which has been enhanced for better performance in writing tasks. It is a large model that focuses on writing. Ziya-Writing-LLaMa-13B-v1 can handle several types of writing tasks, including official reports, speeches, creative copywriting, and more.

软件依赖

pip install torch==1.12.1 tokenizers==0.13.3 git+https://github.com/huggingface/transformers

模型分类 Model Taxonomy

需求 Demand 任务 Task 系列 Series 模型 Model 参数 Parameter 额外 Extra
写作 Writing AGI模型 姜子牙 Ziya LLaMA2 13B English&Chinese

模型信息 Model Information

有监督微调 Supervised finetuning

我们从网络中收集并清洗了大量真实的真人写作数据,利用GPT-3.5生成对应的写作指令,并进行了极为严格的人工校验。

同时,我们训练了一个Answer-to-Instruction的模型,用于从无监督写作数据中,生成高质量的增强的写作指令数据,进一步提高了我们的数据质量。

在此基础上,我们利用奖励模型和一定的清洗逻辑,精心挑选了难度更高的写作指令,剔除了简单的数据,并保证了指令的多样性。

最后,我们利用evol-instruct的方法,生成了约30万条高质量的通用指令数据。我们混合了通用指令数据和写作指令数据,这使得ziya-writing-v2不仅拥有良好的意图理解能力,也能够生成优秀的回答。

We have collected and cleaned a large amount of authentic human writing data from the internet. Using GPT-3.5, we generated corresponding writing prompts and conducted rigorous manual verification.

Additionally, we trained an Answer-to-Instruction model to generate high-quality enhanced writing prompt data from unsupervised writing data, further improving the quality of our data.

Based on this, we carefully selected more challenging writing prompts using a reward model and specific cleaning logic, filtering out simple data and ensuring prompt diversity.

Finally, using the evol-instruct method, we generated approximately 300,000 high-quality general instruction data. By combining this with the writing prompt data, ziya-writing-v2 not only possesses strong intent understanding capabilities but also generates excellent responses.

对齐学习 Alignment training

我们使用GPT4、Minimax、Baichuan2、Qwen-14B等优秀的对话模型,对同一个指令生成不同的回答,我们利用奖励模型对不同的回答进行排序,形成偏好数据。

我们使用了SFT-like Alignment的方法进行对齐训练,我们在内部自研的框架上实现了Alignment的训练流程,训练使用了8k的上下位窗口,一共约2万的偏好数据。

We use excellent LLMs such as GPT4, Minimax, Baichuan2, Qwen-14B, and generate different responses to the same instruction. We use a reward model to rank the different responses and form preference data.

We utilize the SFT-like Alignment method for training, implementing the alignment training process on our internally developed framework. The training uses an 8k context window, resulting in approximately 20,000 preference data points.

效果评估 Performance

写作文案的优劣评价是一个较为主观的评判,很难用一个准确率或者满意度的打分来衡量。因此,我们使用了匿名模型多人Side-by-Side评估的机制,收集了170条不同难度的写作指令数据进行评估,我们后续也会公开这个评测集。

我们以胜出率作为评价模型好坏的指标,一个模型的胜出率计算公式为:

胜出率=(该模型的胜出数量+打平数量/2)/总标注数

一般而言,由于语言模型大多基于采样来生成回答,因此胜出率大于55%表示该模型显著胜出于另外一个模型,胜出率小于45%表示该模型明显落后,胜出率在45%至55%之间表示两个模型基本持平。

The evaluation of the quality of a writing task is quite subjective, making it difficult to measure with precise accuracy or satisfaction score. Therefore, we've used an anonymous multi-person Side-by-Side evaluation mechanism, and have collected 100 pieces of writing instruction data of different difficulties for evaluation. We will also make this evaluation set public in the future.

We use the win rate as an indicator of the quality of a model. The formula to calculate a model's win rate is as follows:

Win Rate = (Number of wins for the model + Number of draws / 2) / Total number of annotations

Generally, since most language models generate responses based on sampling, hence, a win rate greater than 55% indicates that the model significantly outperforms another model, a win rate less than 45% shows that the model clearly lags behind, and a win rate between 45% and 55% signifies that the two models are essentially on par.

Ziya-Writing-13B-v2 胜出率
vs Ziya-Writing-LLaMa-13B-v1 72.5

使用 Usage

由于LLaMA权重的许可限制,该模型不能用于商业用途,请严格遵守LLaMA的使用政策。

from transformers import AutoTokenizer
from transformers import LlamaForCausalLM
import torch

device = torch.device("cuda")

query="帮我写一份去西安的旅游计划"
model = LlamaForCausalLM.from_pretrained("IDEA-CCNL/Ziya-Writing-13B-v2", torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("IDEA-CCNL/Ziya-Writing-13B-v2", use_fast=False)
inputs = '<human>:' + query.strip() + '\n<bot>:'
      
input_ids = tokenizer(inputs, return_tensors="pt").input_ids.to(device)
generate_ids = model.generate(
            input_ids,
            max_new_tokens=4096, 
            do_sample = True, 
            top_p = 0.85, 
            temperature = 0.85, 
            repetition_penalty=1., 
            eos_token_id=2, 
            bos_token_id=1, 
            pad_token_id=0)
output = tokenizer.batch_decode(generate_ids)[0]
print(output)

引用 Citation

如果您在您的工作中使用了我们的模型,可以引用我们的论文

If you are using the resource for your work, please cite the our paper:

@article{fengshenbang,
  author    = {Jiaxing Zhang and Ruyi Gan and Junjie Wang and Yuxiang Zhang and Lin Zhang and Ping Yang and Xinyu Gao and Ziwei Wu and Xiaoqun Dong and Junqing He and Jianheng Zhuo and Qi Yang and Yongfeng Huang and Xiayu Li and Yanghan Wu and Junyu Lu and Xinyu Zhu and Weifeng Chen and Ting Han and Kunhao Pan and Rui Wang and Hao Wang and Xiaojun Wu and Zhongshen Zeng and Chongpei Chen},
  title     = {Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence},
  journal   = {CoRR},
  volume    = {abs/2209.02970},
  year      = {2022}
}

You can also cite our website:

欢迎引用我们的网站:

@misc{Fengshenbang-LM,
  title={Fengshenbang-LM},
  author={IDEA-CCNL},
  year={2021},
  howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}},
}
Downloads last month
25
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train IDEA-CCNL/Ziya-Writing-13B-v2

Space using IDEA-CCNL/Ziya-Writing-13B-v2 1