Method Card β€” Football Sentiment Prompting (0/1/5-shot)

TL;DR

We compare zero-shot, adaptive one-shot, and adaptive 5-shot prompting for binary sentiment on football news. Same train/val/test as fine-tuning; we report metrics/CMs and discuss quality/latency/cost.

Data

  • Dataset: james-kramer/football_news (Hugging Face)
  • Task: Binary sentiment (0=negative, 1=positive)
  • Splits: Stratified 80/10/10
  • Cleaning: strip text; drop empty/NA

Models / APIs

  • LLM used: gpt-4o-mini (OpenAI API, September 2025 snapshot)
  • Similarity backend: sklearn TF-IDF + cosine similarity

Prompting Strategy

  • Zero-shot: instruction + schema (return 0 or 1 only).
  • Adaptive one-shot: retrieve most similar train example and include it as exemplar.
  • Adaptive 5-shot: retrieve top-5 similar exemplars.

Prompt Templates

Zero-shot You are a concise sentiment classifier. Decide if the following football-related sentence is positive or negative. Only answer with a single word: "positive" or "negative".

Sentence: "text", Answer:

Adaptive One-shot You are a concise sentiment classifier for football news. Decide if each sentence is positive or negative. Only answer with one word.

Example: [], Sentence: "ex_text", Label: "ex_label",

Now classify the target sentence. Sentence: "text", Answer:

Adaptive K-shot (e.g., K=5) You are a concise sentiment classifier for football news. Decide if the sentence is positive or negative. Only answer with one word. examples: [], Sentence: "text", Answer:

Evaluation Protocol

  • Metrics: accuracy, precision, recall, F1; confusion matrix
  • Latency: avg wall-clock per example
  • Seed: 42
  • Reproducibility: prompts/selection/eval code in this repo

Results (Val/Test)

  • Val:
    • Zero-shot: acc 0.8, f1 0.75, cm [[5, 0], [2, 3]], ~0.416s/ex
    • One-shot: acc 0.5, f1 0.2857142857, cm [[4, 1], [4, 1]], ~0.304s/ex
    • 5-shot: acc 0.8, f1 0.75, cm [[5, 0], [2, 3]], ~0.451s/ex
  • Test:
    • Zero-shot: acc 0.7, f1 0.7272727273, cm [[3, 2], [1, 4]], ~0.282s/ex
    • One-shot: acc 0.7, f1 0.7272727273, cm [[3, 2], [1, 4]], ~0.354s/ex
    • 5-shot: acc 0.7, f1 0.5714285714, cm [[5, 0], [3, 2]], ~0.449s/ex

Tradeoffs

  • Quality: zero-shot β‰ˆ 5-shot β‰₯ one-shot on this dataset.
  • Latency: increases with K (see Results section; ~0.28s/ex for zero-shot β†’ ~0.45s/ex for 5-shot).
  • Cost: scales roughly linearly with prompt length (token count). For this dataset (~20 examples), 5-shot prompts were ~3Γ— the token usage of zero-shot.

Limits & Risks

  • No leakage: retrieve exemplars from train only.
  • Bias: sports phrasing may sway sentiment; small data β†’ instability.

Reproducibility

  • Code: prompts/, selection.py, evaluate_prompting.py
  • Seed: 42
  • Python β‰₯ 3.10

Usage Disclosure

This card and pipeline were organized with GenAI assistance; experiments and results were implemented and verified by the author.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train kevinkyi/Homework2_Multishot_Prompting