#LLaMA3-8B-GSLoRA

Model Description

MF-GSLoRA (Multi-Task, Few-Shot, Grouped Sensitivity Low-Rank Adaptation) represents an innovative low-rank adaptation optimization approach that integrates multi-task fine-tuning, few-shot fine-tuning, and a dynamic quantization strategy driven by grouped sensitivity vector data. Its aim is to achieve efficient resource utilization and high-performance adaptation, particularly applicable to intelligent question-answering systems in complex industrial scenarios. Building upon the traditional LoRA, MF-GSLoRA introduces numerous innovations, encompassing the selection of grouped sensitivity vector data in 8-Normal-Float or 4-Normal-Float data types, grouped quantization, dynamic task weight decomposition, and few-shot weight initialization. The concept of "Grouped Sensitivity Vector Data" has been put forward, where model weights or embedding vectors are grouped based on their sensitivity to performance and tasks, to more intelligently determine the processing methods of different quantization levels (such as 4-bit or 8-bit) for the weights. Specifically, prior to quantization, grouped data is generated by computing the sensitivity of the weights or vectors to specific tasks. Highly sensitive weights are quantized with higher precision 8-bit Normal-Float (NF8), while less sensitive weights are quantized with lower precision 4-bit Normal-Float (NF4). In industrial equipment fault diagnosis, highly sensitive weights may comprise important embedding vectors that describe the characteristics or fault modes of specific equipment. These weights will be assigned 8-bit precision to ensure the accuracy of the diagnosis results. In contrast, the weights of regular operation processes or background information are quantized with 4-bit precision to reduce resource consumption.

Usage method

python from transformers import pipeline pipe = pipeline("text-generation", model="model_name") print(pipe("Hello!"))

Deployment result

In order to verify the practical effect of the industrial defect and maintenance question answering system proposed by the invention, the quantitatively optimized model is deployed and tested. The following is a demonstration of the operation results based on the model in specific industrial maintenance scenarios:

eed38cf18ceac4c8e21874f84a35bc2.png

As can be seen from the figure, the model successfully loaded and generated detailed and logical answers to the user's questions about bearing wear and hydraulic maintenance, reflecting the model's semantic understanding ability in the professional field as well as its efficiency and practicality in industrial maintenance scenarios.

Downloads last month
6
Safetensors
Model size
8.03B params
Tensor type
BF16
·
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CrazyYiYi/LLaMA3-8B-GSLoRA

Finetuned
(1176)
this model