facebook/MobileLLM-350M-layer-share · MobileLLM-350M-LS: Unused and Uninitialized Weights, Repeated Output During Generation

Boxp

Nov 25, 2024

Description

When using MobileLLM-350M-LS with transformers library, I encountered the following issues:

Warnings:
- Unused weights: lm_head.weight.
- Reinitialized weights: model.embed_tokens.weight.

Output Issues:

Generated text contains excessive repetition. For example:

Hello! Who are you? Below are some of the questions we ask new members.
What is your name? What is your birthday? What is your birthday? ...

Environment

Transformers version: 4.44.0
Python version: 3.11

Code

from transformers import AutoTokenizer, AutoModelForCausalLM

def infer_mobilellm():
    model_dir = "/nfs/300-MT-Pro/model/huggingface/MobileLLM-350M-LS"
    tokenizer = AutoTokenizer.from_pretrained(model_dir, use_fast=False)
    model = AutoModelForCausalLM.from_pretrained(model_dir, trust_remote_code=True)

    input_text = "Hello! Who are you?"
    inputs = tokenizer(input_text, return_tensors="pt")
    outputs = model.generate(**inputs, max_length=100)
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))

infer_mobilellm()

Questions

What is the correct way to initialize and perform inference with MobileLLM-350M-LS?

zechunliu

AI at Meta org 7 days ago

Hi, thank you for raising this issue! The lm_head.weight is the same as model.embed_tokens.weight, so loading either of them should be fine.

The model is a pre-trained model, not a chat-finetuned model. Therefore, it can only complete sentences, not answer questions. Does that make sense?

For repeated sentences, you can try setting repetition_penalty=1.5; this will help avoid generating repeated outputs.

zechunliu changed discussion status to closed 7 days ago