arxiv:2312.15166

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Published on Dec 23, 2023

· Submitted by

akhaliq on Dec 27, 2023

#1 Paper of the day

Upvote

Authors:

Chanjun Park ,

Abstract

We introduce depth up-scaling (DUS), a novel technique to up-scale base LLMs efficiently and effectively in a simple manner. In contrast to mixture-of-experts (MoE), DUS does not require complex changes to train and inference. Using DUS, we build SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks. Comparative evaluations show that SOLAR 10.7B outperforms existing open-source pretrained LLMs, such as Llama 2 and Mistral 7B. We additionally present SOLAR 10.7B-Instruct, a variant fine-tuned for instruction-following capabilities, surpassing Mixtral-8x7B. SOLAR 10.7B is publicly available under the Apache 2.0 license, promoting broad access and application in the LLM field.

View arXiv page View PDF Add to collection

Community

chovyfu

Dec 27, 2023

can you upload to ollama.ai?

hunkim

Dec 27, 2023

can you upload to ollama.ai?

Already there. Just run: ollama run solar.

pszemraj

Dec 27, 2023

Hi! Coming from this discussion post - what data was used for continued pre-training/was any of the continued pre-training data was synthetically generated via OpenAI models (or any other source with similarly restrictive terms of use)?

hunkim

Jan 1

@pszemraj Details of Data Sets and Training Techniques: Thank you for your interest! Unfortunately, due to the high level of competition in this field, we are unable to share detailed information about the training techniques and datasets used. We appreciate your understanding. However, we have released a list of fine-tuning datasets.

librarian-bot

Jan 3

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

pszemraj

Jan 3

@hunkim thanks! understood. I'm primarily interested in this checkpoint upstage/SOLAR-10.7B-v1.0 as it is apache-2.0 - based on your response it seems like you all have done your homework. I assume there is no issue using upstage/SOLAR-10.7B-v1.0 to the fullest extent of it's apache-2.0 license, including synthetic data generation, commercial use, etc. Please advise if my interpretation is incorrect & thanks again (sorry for duplicate response vs. original thread)

joy2000

Jan 8

•

edited Jan 16

@pszemraj Details of Data Sets and Training Techniques: Thank you for your interest! Unfortunately, due to the high level of competition in this field, we are unable to share detailed information about the training techniques and datasets used. We appreciate your understanding. However, we have released a list of fine-tuning datasets.

What format should fine-tune data use?

Maykeye

Jan 16

Why paper so heavily compares itself against MoE without mentioning or comparing itself to prior nlp works, eg Progressively Stacking 2.0[arxiv 2011.13635] works in similar fashion. It also somewhat alleviates problem of "we removed m = 8 layers from both ends of our base model, primarily due to hardware limitations." by freezing original of copied part until the very end, when they train whole model on part of dataset (and their work predates lora; for all we know instead of targeting part of dataset to complete network trained by parts, targeting LoRA works as well but supports bigger model).