Mistral-7B-Reviews-Insight-LoRA

Fine-tuned Mistral-7B for Insight Extraction from Product Reviews

License: MIT Python 3.8+


Model Description

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 using PEFT (Parameter-Efficient Fine-Tuning) with LoRA adapters. The model is specialized for extracting structured insights (pros and cons) from product reviews, outputting results in a standardized JSON format.

Fine-Tuning Objective

  • Task: Extract pros and cons from product reviews.
  • Output Format: Structured JSON with pros and cons lists.
  • Use Case: Automated review analysis, sentiment mining, and product feedback summarization.

Intended Uses & Limitations

Intended Uses

  • Product Review Analysis: Automatically extract pros and cons from customer reviews.
  • Market Research: Summarize large volumes of feedback for product improvement.
  • E-commerce Platforms: Enhance review sections with structured insights.

Limitations

  • Inherited Limitations: Hallucinations, biases, and factual inaccuracies from the base model.
  • Output Format: May occasionally generate malformed JSON.
  • Domain Restrictions: Not suitable for sensitive domains (medical, legal, financial).
  • Language: Optimized for English; performance may vary for other languages.

Dataset

Dataset Card

  • Repository: sdelowar2/product_reviews_insight_10k
  • Size: ~10,000 product reviews
  • Format: JSON with instruction, input (review_list), and answer fields where answer field is also as json format with pros and cons keys.

Example Row

{
"instruction": "Generate pros and cons from the following product reviews.",
"input": [
  "I bought this for my camera a year ago. My camera works only with SD memory so I am using the",
  "It either works or it doesn't. I use it with my Zoom and it sill works. I mean seriously",
  "No data loss. Stayed strong over the last few years and have re-formatted it a few times",
  "Verizon Wireless only guaranteed this brand for my LG phone. I've since upgraded to a newer phone with a",
  "This works great. Good luck finding them at retail any longer! No one in my area carried them, but",
  "Great price even with the high shipping cost. No problem with the shipment, it arrived in 3 days. Product"
],
"answer": {
  "pros": [
    "No data loss over time",
    "Works well with various devices",
    "Great price despite shipping costs",
    "Fast shipping, arrived in 3 days"
  ],
  "cons": [
    "Availability issues in local stores",
    "Inconsistent performance at times"
  ]
}

Training Details

Hyperparameters

  • Epochs: 3
  • Optimizer: AdamW
  • Learning Rate: 3e-4
  • Batch Size: 2 (with gradient accumulation of 8)
  • Precision: BF16 (fallback to FP16)
  • Max Length: 512

LoRA Configuration

  • Rank (r): 16
  • Alpha: 32
  • Dropout: 0.05
  • Target Modules: [q_proj, k_proj, v_proj, o_proj]

Evaluation Metrics

  • JSON Validity Rate: % of outputs with valid JSON structure.
  • Semantic F1 (Pros/Cons): Strict and loose F1 scores for semantic accuracy (using bert-score).

How to Use

Installation

pip install -q -U bitsandbytes==0.42.0
pip install -q -U peft==0.8.2
pip install -q -U accelerate==0.27.0
pip install -q -U transformers==4.38.0

Load Model & Adapter

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

model_id = "mistralai/Mistral-7B-Instruct-v0.2"
adapter_id = "sdelowar2/mistral-7B-reviews-insight-lora"

# Load base model in 4-bit
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="bfloat16",
)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

# Load LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)

Inference Example

reviews = [
  "I am using this with a Nook HD+. It works as described. The HD picture on my Samsung 52",
  "The price is completely unfair and only works with the Nook HD and HD+. The cable is very wobb",
  "This adaptor is real easy to setup and use right out of the box. I had not problem with it",
  "This adapter easily connects my Nook HD 7" to my HDTV through the HDMI cable.",
  "Gave it five stars because it really is nice to extend the screen and use your Nook as a streaming"
]

SYSTEM_PROMPT = "You are an assistant that extracts pros and cons from product reviews. Return only valid JSON with keys pros and cons."

user_content = "Instruction: Extract pros and cons from the following reviews.\nReviews:\n" + "\n".join(f"- {r}" for r in reviews)

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": user_content},
]

# Apply chat template
input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    return_tensors="pt",
    add_generation_prompt=True 
).to(model.device)

# Generate
with torch.no_grad():
    outputs = model.generate(
        input_ids=input_ids,
        max_new_tokens=300,
        temperature=0.2,
        do_sample=False
    )

# Decode
response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)

try:
    parsed = json.loads(response)
    print(f"result: {parsed}")
except Exception as e:
    print("\nCould not parse JSON:", e)
    print(f"raw output: {response}")

Example Output

{
  'pros': [
    'Works as described',
    'Easy to set up and use',
    'Connects Nook to HDTV',
    'Extends screen for streaming'
  ],
  'cons': [
    'Unfair price',
    'Cable feels wobbly'
  ]
}

Evaluation Metrics

Metric Score
JSON Validity Rate 78.7%
Semantic F1 (Pros) 0.89
Semantic F1 (Cons) 0.87

Limitations & Bias

  • Hallucinations: May generate incorrect or nonsensical pros/cons.
  • Bias: Inherits biases from the base model and training data.
  • Domain: Optimized for product reviews; avoid use in sensitive domains.
  • Robustness: Performance may degrade for noisy or non-standard reviews.

Citation

@misc{sdelowar2_mistral7b_reviews_insight_lora,
  author = {Md Sayed Delowar},
  title = {Mistral-7B-Reviews-Insight-LoRA: Fine-tuned Mistral-7B for Product Review Insight Extraction},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face Hub},
  howpublished = {\url{https://huggingface.co/sdelowar2/mistral-7B-reviews-insight-lora}}
}

Downloads last month
113
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sdelowar2/mistral-7B-reviews-insight-lora

Adapter
(1011)
this model

Dataset used to train sdelowar2/mistral-7B-reviews-insight-lora

Space using sdelowar2/mistral-7B-reviews-insight-lora 1