inference-proxy / README.md
cfahlgren1's picture
cfahlgren1 HF Staff
set port
a87f9b7
metadata
title: InferenceProxy
emoji: 💾
colorFrom: blue
colorTo: pink
sdk: docker
pinned: false
app_port: 4040

inference-proxy

Lightweight proxy to store LLM traces in a Hugging Face Dataset.

How it works

This API acts as a proxy for OpenAPI endpoints. You can specify a couple of variables:

  • BATCH_SIZE_LIMIT - the maximum batch size before pushing to dataset
  • BATCH_TIME_LIMIT - the amount of time before pushing to dataset

Required Environment Variables

  • HF_ACCESS_TOKEN - HF Access Token
  • USER_NAME - Used to ensure we only process requests from the user

Example

import { OpenAI } from "openai";

const client = new OpenAI({
    baseURL: "http://localhost:4040/fireworks-ai/inference/v1", 
    apiKey: process.env.HF_API_KEY,
});

let out = "";

const stream = await client.chat.completions.create({
    model: "accounts/fireworks/models/deepseek-v3",
    messages: [
        {
            role: "user",
            content: "What is the capital of France?",
        },
    ],
    stream: true,
    max_tokens: 500,
});

for await (const chunk of stream) {
    if (chunk.choices && chunk.choices.length > 0) {
        const newContent = chunk.choices[0].delta.content;
        out += newContent;
        console.log(newContent);
    }  
}