DeepSeek-R1-Distill-Qwen-7B-R

The DeepSeek-R1-Distill-Qwen-7B model has been fine-tuned to predict hyperparameters for neural network models. Leveraging the power of large language models (LLMs), this version can analyze neural network architectures and generate optimal hyperparameter configurations — such as learning rate, batch size, dropout, momentum, and so on — for a given task. This approach offers a competitive alternative to traditional optimization methods like the Optuna Framework.

A large language model used in the NNGPT project for generating training hyperparameters for neural networks from the LEMUR NN Dataset

How to Use

This repository provides a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B using the PEFT library with LoRA. The final model is merged so it can be loaded in one step via:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "ABrain/HPGPT-DeepSeek-R1-Distill-Qwen-7B-R"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)

Prompt Example

"""
Generate only the values (do not provide any explanation) of the hyperparameters ({prm_names}) of a given model:
{entry['metric']} for the task: {entry['task']} on dataset: {entry['dataset']}, with transformation: {entry['transform_code']},
so that the model achieves the HIGHEST accuracy with number of training epochs = {entry['epoch']}.
Code of that model: {entry['nn_code']}
"""

Replace placeholders such as {entry['name']}, {entry['task']}, {entry['dataset']}, etc., with your actual values.

Model Details

  • Developed by: [Roman Kochnev / ABrain]
  • Finetuned from model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
  • Model type: Causal Language Model (Transformer-based)
  • Language(s) (NLP): Primarily English (or multilingual, if applicable)
  • License: MIT

Model Sources

Repository: ABrain/DeepSeek-R1-Distill-Qwen-7B-R

Downloads last month
0
Safetensors
Model size
7.62B params
Tensor type
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ABrain/HPGPT-DeepSeek-R1-Distill-Qwen-7B-R

Adapter
(28)
this model
Quantizations
1 model