jisubae
feat: Add optional HF dataset sync for leaderboard
4a43fed

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: Ko-FreshQA Leaderboard
emoji: ๐Ÿš€
colorFrom: blue
colorTo: green
sdk: gradio
app_file: app.py
pinned: false
license: apache-2.0
hf_oauth: true

Ko-FreshQA Leaderboard

ํ•œ๊ตญ์–ด FreshQA ๊ธฐ๋ฐ˜ ์ž๋™ ํ‰๊ฐ€/๋ฆฌ๋”๋ณด๋“œ ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค. ์ฐธ๊ฐ€์ž๊ฐ€ ์—…๋กœ๋“œํ•œ CSV์˜ model_response๋ฅผ ๊ธฐ์ค€ ๋ฐ์ดํ„ฐ์™€ ๋งค์นญํ•˜๊ณ , Upstage Solar ๋ชจ๋ธ๋กœ Relaxed/Strict ํ‰๊ฐ€๋ฅผ ์ˆ˜ํ–‰ํ•œ ๋’ค ๊ฒฐ๊ณผ๋ฅผ ๋ฆฌ๋”๋ณด๋“œ์— ๋ฐ˜์˜ํ•ฉ๋‹ˆ๋‹ค. Gradio UI๋กœ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๊ธฐ๋Šฅ

  • ๋ฐ์ดํ„ฐ์…‹ ๋ฐฐํฌ: DEV/TEST CSV ๋‹ค์šด๋กœ๋“œ ํƒญ ์ œ๊ณต
  • ์ œ์ถœ ๋ฐ ์ž๋™ ํ‰๊ฐ€: ์—…๋กœ๋“œ๋œ CSV๋ฅผ ๋ณ‘ํ•ฉ โ†’ ํ‰๊ฐ€ โ†’ ์ง€ํ‘œ ์ง‘๊ณ„ โ†’ ๋ฆฌ๋”๋ณด๋“œ ๋ฐ˜์˜
  • ์ƒ์„ธ ์ง€ํ‘œ: fact type, ์ „์ œ ์œ ํšจ์„ฑ(vp/fp), hop(one/multi), ์—ฐ๋„(old/new), ๋„๋ฉ”์ธ๋ณ„ ์ •ํ™•๋„
  • ์ œ์ถœ ์ œํ•œ(์˜ต์…˜): ์‚ฌ์šฉ์ž๋ณ„ ํ•˜๋ฃจ 3ํšŒ ์ œํ•œ ๊ธฐ๋Šฅ (Hugging Face ์ €์žฅ์†Œ ๊ธฐ๋ฐ˜)

๋””๋ ‰ํ„ฐ๋ฆฌ ๊ตฌ์กฐ

  • app.py: Gradio ์•ฑ ์ดˆ๊ธฐํ™” ๋ฐ ํƒญ ๊ตฌ์„ฑ
  • config.py: ํ™˜๊ฒฝ๋ณ€์ˆ˜ ๋กœ๋“œ ๋ฐ ํ•„์ˆ˜ ์„ค์ • ๊ฒ€์ฆ
  • freshqa/
    • fresheval.py: ๋‹จ์ผ ์ƒ˜ํ”Œ ํ‰๊ฐ€ ๋กœ์ง
    • fresheval_parallel.py: ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ๋ณ‘๋ ฌ ํ‰๊ฐ€ ๋ž˜ํผ
    • freshqa_acc.py: ํ‰๊ฐ€ ๊ฒฐ๊ณผ ์ง‘๊ณ„(์ •ํ™•๋„ ๊ณ„์‚ฐ ๋ฐ ๋„๋ฉ”์ธ๋ณ„ ํ†ต๊ณ„)
    • merge_csv_with_model_response.py: ๊ธฐ์ค€ ๋ฐ์ดํ„ฐ์™€ ์‚ฌ์šฉ์ž CSV ๋ณ‘ํ•ฉ
  • src/
    • submission_handler.py: ์ œ์ถœ๋ถ€ํ„ฐ ๋ฆฌ๋”๋ณด๋“œ ๋ฐ˜์˜๊นŒ์ง€ ์ „์ฒด ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜
    • submission_tracker.py: ์ œ์ถœ ์ด๋ ฅ ์ถ”์ (HF repo ๊ธฐ๋ฐ˜, ์˜ต์…˜)
    • leaderboard_manager.py: ๋ฆฌ๋”๋ณด๋“œ CSV ๋กœ๋“œ/์ €์žฅ/ํ‘œ์‹œ์šฉ ์ •๋ฆฌ
    • quick_csv_loader.py, hf_private_csv_loader.py: HF Private repo์—์„œ CSV ๋กœ๋“œ ์œ ํ‹ธ
    • api_key_rotator.py, utils.py: ์œ ํ‹ธ๋ฆฌํ‹ฐ
  • ui/
    • leaderboard_tab.py, submission_tab.py, dataset_tab.py, styles.css
  • data/leaderboard_results.csv: ๋ฆฌ๋”๋ณด๋“œ ๋ˆ„์  ๋ฐ์ดํ„ฐ

์š”๊ตฌ ์‚ฌํ•ญ

  • Python 3.10
  • Upstage API ํ‚ค(๋‹จ์ผ ๋˜๋Š” ๋‹ค์ค‘)
  • Hugging Face ํ† ํฐ(HF Private repo ์ ‘๊ทผ์šฉ)
  • Hugging Face Dataset repo
    • ๊ธฐ์ค€ ๋ฐ์ดํ„ฐ: FRESHQA_DATA_REPO_ID / FRESHQA_DATA_FILENAME
    • (์˜ต์…˜) ์ œ์ถœ ์ถ”์  ์ €์žฅ์†Œ: SUBMISSION_TRACKER_REPO_ID
    • (์˜ต์…˜) ๋ฆฌ๋”๋ณด๋“œ๋ฅผ Hugging Face dataset์— ๋ฐฑ์—…ํ•˜๋ ค๋ฉด UPLOAD_LEADERBOARD_TO_HF=true ์„ค์ •

์„ค์น˜:

python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

๋˜๋Š” Conda:

conda env create -f environment.yml
conda activate freshqa-leaderboard

ํ™˜๊ฒฝ ๋ณ€์ˆ˜(.env)

env.example๋ฅผ .env๋กœ ๋ณต์‚ฌ ํ›„ ๊ฐ’ ์ฑ„์šฐ๊ธฐ:

cp env.example .env

ํ•„์ˆ˜/์ฃผ์š” ๋ณ€์ˆ˜

  • HF_TOKEN
  • FRESHQA_DATA_REPO_ID
  • FRESHQA_DATA_FILENAME (๊ธฐ๋ณธ๊ฐ’: ko-freshqa_2025_total.csv)
  • UPSTAGE_API_KEY ๋˜๋Š” UPSTAGE_API_KEYS(์ฝค๋งˆ ๊ตฌ๋ถ„)
  • ENABLE_SUBMISSION_LIMIT (๊ธฐ๋ณธ: true)
  • SUBMISSION_TRACKER_REPO_ID (์ œ์ถœ ์ œํ•œ ์‚ฌ์šฉ ์‹œ ํ•„์š”)
  • UPLOAD_LEADERBOARD_TO_HF
    • true: ๋ฆฌ๋”๋ณด๋“œ๋ฅผ HF Private Dataset์—๋„ ๋ฐฑ์—…(๊ถŒ์žฅ: ์šด์˜ ํ™˜๊ฒฝ)
    • false: ๋กœ์ปฌ CSV์—๋งŒ ์ €์žฅ(๊ถŒ์žฅ: ๋กœ์ปฌ ๊ฐœ๋ฐœ)

๊ฒ€์ฆ: ์•ฑ ์‹œ์ž‘ ์‹œ Config.validate_required_configs()๊ฐ€ ๋ˆ„๋ฝ๋œ ํ•„์ˆ˜ ์„ค์ •์„ ๊ฒ€์‚ฌํ•ฉ๋‹ˆ๋‹ค.


์‹คํ–‰

๋กœ์ปฌ:

python app.py

๊ธฐ๋ณธ ํฌํŠธ: 7860

Hugging Face Spaces:

  • ํ™˜๊ฒฝ๋ณ€์ˆ˜ SPACE_ID๊ฐ€ ์กด์žฌํ•˜๋ฉด Spaces ๋ชจ๋“œ๋กœ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.

Docker(์˜ต์…˜):

  • Dockerfile, docker-compose.yml ์ œ๊ณต (ํ•„์š” ์‹œ ์„ค์ •์— ๋งž๊ฒŒ ์ˆ˜์ •)

์‚ฌ์šฉ ๋ฐฉ๋ฒ•(Gradio UI)

  1. ๋ฐ์ดํ„ฐ์…‹ ํƒญ
  • DEV/TEST CSV ๋‹ค์šด๋กœ๋“œ
  1. ์ œ์ถœ ๋ฐ ํ‰๊ฐ€ ํƒญ
  • ์—…๋กœ๋“œ: TEST CSV์— model_response๊ฐ€ ์ฑ„์›Œ์ง„ ํŒŒ์ผ
  • ์ž…๋ ฅ: ์ œ์ถœ์ž ์ด๋ฆ„, ์‚ฌ์šฉ ๋ชจ๋ธ, ์„ค๋ช…
  • ํ‰๊ฐ€: Upstage Solar ๋ชจ๋ธ๋กœ Relaxed/Strict ๋™์‹œ ์ˆ˜ํ–‰
  • ์ถœ๋ ฅ: ์ „์ฒด/์„ธ๋ถ€ ์ง€ํ‘œ๊ฐ€ ๊ณ„์‚ฐ๋˜์–ด ๋ฆฌ๋”๋ณด๋“œ์— ๋ฐ˜์˜
  1. ๋ฆฌ๋”๋ณด๋“œ ํƒญ
  • ์ œ์ถœ ๊ฒฐ๊ณผ๊ฐ€ data/leaderboard_results.csv์— ๋ˆ„์ 
    • (์˜ต์…˜) UPLOAD_LEADERBOARD_TO_HF=true์ธ ๊ฒฝ์šฐ Hugging Face Dataset์—๋„
      leaderboard_results.csv๋กœ ์ž๋™ ์—…๋กœ๋“œ๋ฉ๋‹ˆ๋‹ค.
  • ๊ฒ€์ƒ‰/์ƒˆ๋กœ๊ณ ์นจ ๊ฐ€๋Šฅ

๋™์ž‘ ํ๋ฆ„(๋‚ด๋ถ€)

  1. ์ œ์ถœ ์ ‘์ˆ˜: src/submission_handler.py::process_submission
  2. ์‚ฌ์šฉ์ž CSV ๋กœ๋“œ โ†’ ๊ธฐ์ค€ ๋ฐ์ดํ„ฐ์™€ ๋ณ‘ํ•ฉ:
    • freshqa/merge_csv_with_model_response.py::merge_dataframe_with_model_response_df
  3. ํ‰๊ฐ€:
    • freshqa/fresheval_parallel.py::evaluate_dataframe โ†’ freshqa/fresheval.py::FreshEval
  4. ์ •ํ™•๋„ ์ง‘๊ณ„:
    • freshqa/freshqa_acc.py::calculate_accuracy, process_freshqa_dataframe
  5. ์ €์žฅ:
    • ๋ฆฌ๋”๋ณด๋“œ: src/leaderboard_manager.py::append_to_leaderboard_data
      • (์˜ต์…˜) ๋ฆฌ๋”๋ณด๋“œ HF ์ €์žฅ์†Œ ๋ฐฑ์—…: UPLOAD_LEADERBOARD_TO_HF=true์ผ ๋•Œ๋งŒ
    • (์˜ต์…˜) ์ œ์ถœ ์ด๋ ฅ: src/submission_tracker.py (ENABLE_SUBMISSION_LIMIT=true ์ผ ๋•Œ๋งŒ)

์ฃผ์˜: ENABLE_SUBMISSION_LIMIT=false์ธ ๊ฒฝ์šฐ, ์ œ์ถœ ์ด๋ ฅ ์ถ”์ ์šฉ Hugging Face ์ €์žฅ์†Œ ์ ‘๊ทผ์„ ์‹œ๋„ํ•˜์ง€ ์•Š๋„๋ก ์ฝ”๋“œ๊ฐ€ ๋ฐ˜์˜๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.


์ œ์ถœ ์ œํ•œ(์˜ต์…˜)

  • ์„ค์ •: ENABLE_SUBMISSION_LIMIT=true(๊ธฐ๋ณธ)
  • ์ €์žฅ์†Œ: SUBMISSION_TRACKER_REPO_ID์— user_submissions.json ๊ด€๋ฆฌ
  • ๋กœ์ง:
    • ํ•œ ์‚ฌ์šฉ์ž ํ•˜๋ฃจ 3ํšŒ ์„ฑ๊ณต ์ œ์ถœ๊นŒ์ง€ ์นด์šดํŠธ
    • ํ•œ๊ตญ ์‹œ๊ฐ„ ๊ธฐ์ค€ 00:00์— ์ผ์ž ๋‹จ์œ„๋กœ ์นด์šดํŠธ
    • ๋น„ํ™œ์„ฑํ™” ์‹œ(HF ์ €์žฅ์†Œ ์ ‘๊ทผ ์—†์Œ): SubmissionHandler๊ฐ€ ์ถ”์ ๊ธฐ๋ฅผ ์ƒ์„ฑํ•˜์ง€ ์•Š์Œ

ํŠธ๋Ÿฌ๋ธ”์ŠˆํŒ…

  • ์‹œ์ž‘ ์‹œ โ€œํ•„์ˆ˜ ์„ค์ • ๋ˆ„๋ฝโ€ ์˜ค๋ฅ˜
    • .env์—์„œ UPSTAGE_API_KEY(or KEYS), HF_TOKEN, FRESHQA_DATA_REPO_ID ํ™•์ธ
  • ์ œ์ถœ ์ œํ•œ ๋น„ํ™œ์„ฑํ™”์ธ๋ฐ HF 404 ๊ฒฝ๊ณ ๊ฐ€ ๋ณด์ž„
    • ํ˜„ ๋ฒ„์ „์€ ENABLE_SUBMISSION_LIMIT=false์ผ ๋•Œ ์ œ์ถœ ์ถ”์ ๊ธฐ๋ฅผ ์ดˆ๊ธฐํ™”ํ•˜์ง€ ์•Š๋„๋ก ์ˆ˜์ •๋จ
  • HF 404 (์ œ์ถœ ์ œํ•œ ํ™œ์„ฑํ™”)
    • SUBMISSION_TRACKER_REPO_ID ์ €์žฅ์†Œ์— user_submissions.json์ด ์—†์œผ๋ฉด ์ตœ์ดˆ ์ ‘๊ทผ ์‹œ 404๊ฐ€ ๋‚  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŒŒ์ผ์„ ๋นˆ JSON {}์œผ๋กœ ์ƒ์„ฑํ•ด ๋‘์„ธ์š”.

๋ผ์ด์„ ์Šค/์ถœ์ฒ˜

  • ๋ณธ ๋ฆฌ๋”๋ณด๋“œ๋Š” FreshQA์—์„œ ์˜๊ฐ์„ ๋ฐ›์•„ ์ œ์ž‘๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๋ฌธ์˜ ์‚ฌํ•ญ์€ ์ด์Šˆ๋กœ ๋“ฑ๋กํ•ด ์ฃผ์„ธ์š”.