Edit model card

News reporter 3B LLM

Image

Model Description

News Reporter 3B LLM is based on Phi-3 Mini-4K Instruct a dense decoder-only Transformer model designed to generate high-quality text based on user prompts. With 3.8 billion parameters, the model is fine-tuned using Supervised Fine-Tuning (SFT) to align with human preferences and question answer pairs.

Base Model

We evaluated multiple off-the-shelf models, including Gemma-7B, Gemma-2B, Llama-3-8B, and Phi-3-mini-4K, and found that the Phi-3-mini-4K model performed best overall for our evaluation set. This model excels in multilingual query understanding and response generation, thanks to its 3.8 billion parameters and a 4096 context window length. Trained with over 3.3 trillion tokens, Phi-3-mini-4K stands out for its ability to be quantized to 4 bits, reducing its memory footprint to around 1.8 GB. It processes 8 to 12 tokens per second on a single T4 GPU, requiring just 3-4 GB of VRAM for inference.

Key Features:

  • Parameter Count: 3.8 billion.
  • Architecture: Dense decoder-only Transformer.
  • Context Length: Supports up to 4,000 tokens.
  • Training Data: 43.5K+ question and answer pairs curated from different News channel.

Inference

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline,set_seed

model_name = "RedHenLabs/news-reporter-3b"

tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, torch_dtype="auto", device_map="cuda")

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

def test_inference(prompt):
    prefix = "Generate a concise and accurate news summary based on the following question.\n Input:"
    prompt = pipe.tokenizer.apply_chat_template([{"role": "user", "content": prefix+prompt}], tokenize=False, add_generation_prompt=True)
    outputs = pipe(prompt, max_new_tokens=512, do_sample=True, num_beams=1, temperature=0.1, top_k=50, top_p=0.95,
                   max_time= 180)
    return outputs[0]['generated_text'][len(prompt):].strip()

res = test_inference(" What is the status of the evacuations and the condition of those injured?")
print(res)

Model Benchmark

(0 Shot) News-reporter-3b Phi-3-mini-4k Gemma-7b-it Llama-2-7B Mistral-7B-Instruct-v0.2
MMLU 69.49 69.90 64.3 45.3 59.02
ARC_C 56.40 56.14 53.2 45.9 55.89
Winogrande 74.19 73.24 68.03 69.5 73.72
Truthfulqa 50.43 66.46 44.18 57.4 53.00

Citation

@misc {lucifertrj,
   author       = { {Tarun Jain} },
   title        = { News Reporter 3B by Red Hen Lab part of Google Summer of Code 2024},
   year         = 2024,
   url          = { https://huggingface.co/RedHenLabs/news-reporter-3b },
   publisher    = { Hugging Face }
}

arxiv.org/abs/2410.07520

Downloads last month
23
Safetensors
Model size
3.82B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train RedHenLabs/news-reporter-3b