|
--- |
|
library_name: llmpromptkit |
|
title: LLMPromptKit |
|
emoji: ๐ |
|
tags: |
|
- prompt-engineering |
|
- llm |
|
- nlp |
|
- prompt-management |
|
- huggingface |
|
- version-control |
|
- ab-testing |
|
- evaluation |
|
languages: |
|
- python |
|
license: mit |
|
pipeline_tag: text-generation |
|
datasets: |
|
- none |
|
|
|
--- |
|
|
|
# LLMPromptKit: LLM Prompt Management System |
|
|
|
LLMPromptKit is a comprehensive library for managing, versioning, testing, and evaluating prompts for Large Language Models (LLMs). It provides a structured framework to help data scientists and developers create, optimize, and maintain high-quality prompts. |
|
|
|
## Features |
|
|
|
- **Prompt Management**: Create, update, and organize prompts with metadata and tags |
|
- **Version Control**: Track prompt changes over time with full version history |
|
- **A/B Testing**: Compare different prompt variations to find the most effective one |
|
- **Evaluation Framework**: Measure prompt quality with customizable metrics |
|
- **Advanced Templating**: Create dynamic prompts with variables, conditionals, and loops |
|
- **Command-line Interface**: Easily integrate into your workflow |
|
- **Hugging Face Integration**: Seamlessly test prompts with thousands of open-source models |
|
|
|
## Hugging Face Integration |
|
|
|
LLMPromptKit includes a powerful integration with Hugging Face models, allowing you to: |
|
|
|
- Test prompts with thousands of open-source models |
|
- Run evaluations with models like FLAN-T5, GPT-2, and others |
|
- Compare prompt performance across different model architectures |
|
- Access specialized models for tasks like translation, summarization, and question answering |
|
|
|
```python |
|
from llmpromptkit import PromptManager, PromptTesting |
|
from llmpromptkit.integrations.huggingface import get_huggingface_callback |
|
|
|
# Initialize components |
|
prompt_manager = PromptManager() |
|
testing = PromptTesting(prompt_manager) |
|
|
|
# Get a HuggingFace callback |
|
hf_callback = get_huggingface_callback( |
|
model_name="google/flan-t5-base", |
|
task="text2text-generation" |
|
) |
|
|
|
# Run tests with the model |
|
test_results = await testing.run_test_cases(prompt_id="your_prompt_id", llm_callback=hf_callback) |
|
``` |
|
|
|
## Documentation |
|
|
|
For detailed documentation, see the [docs](./docs) directory: |
|
|
|
- [Getting Started](./docs/getting_started.md) |
|
- [API Reference](./docs/api_reference.md) |
|
- [CLI Usage](./docs/cli_usage.md) |
|
- [Advanced Features](./docs/advanced_features.md) |
|
- [Integration Examples](./docs/integration_examples.md) |
|
- [Integration Examples](./docs/integration_examples.md) |
|
|
|
## Installation |
|
|
|
```bash |
|
pip install llmpromptkit |
|
|
|
Quick Start |
|
|
|
from llmpromptkit import PromptManager, VersionControl, PromptTesting, Evaluator |
|
|
|
# Initialize components |
|
prompt_manager = PromptManager() |
|
version_control = VersionControl(prompt_manager) |
|
testing = PromptTesting(prompt_manager) |
|
evaluator = Evaluator(prompt_manager) |
|
|
|
# Create a prompt |
|
prompt = prompt_manager.create( |
|
content="Summarize the following text: {text}", |
|
name="Simple Summarization", |
|
description="A simple prompt for text summarization", |
|
tags=["summarization", "basic"] |
|
) |
|
|
|
# Create a new version |
|
version_control.commit( |
|
prompt_id=prompt.id, |
|
commit_message="Initial version" |
|
) |
|
|
|
# Update the prompt |
|
prompt_manager.update( |
|
prompt.id, |
|
content="Please provide a concise summary of the following text in 2-3 sentences: {text}" |
|
) |
|
|
|
# Commit the updated version |
|
version_control.commit( |
|
prompt_id=prompt.id, |
|
commit_message="Improved prompt with length guidance" |
|
) |
|
|
|
# Create a test case |
|
test_case = testing.create_test_case( |
|
prompt_id=prompt.id, |
|
input_vars={"text": "Lorem ipsum dolor sit amet..."}, |
|
expected_output="This is a summary of the text." |
|
) |
|
|
|
# Define an LLM callback for testing |
|
async def llm_callback(prompt, vars): |
|
# In a real scenario, this would call an actual LLM API |
|
return "This is a summary of the text." |
|
|
|
# Run the test case |
|
import asyncio |
|
test_result = asyncio.run(testing.run_test_case( |
|
test_case_id=test_case.id, |
|
llm_callback=llm_callback |
|
)) |
|
|
|
# Evaluate a prompt with multiple inputs |
|
evaluation_result = asyncio.run(evaluator.evaluate_prompt( |
|
prompt_id=prompt.id, |
|
inputs=[{"text": "Sample text 1"}, {"text": "Sample text 2"}], |
|
llm_callback=llm_callback |
|
)) |
|
|
|
print(f"Evaluation metrics: {evaluation_result['aggregated_metrics']}") |
|
|
|
Command-line Interface |
|
LLMPromptKit comes with a powerful CLI for managing prompts: |
|
|
|
# Create a prompt |
|
llmpromptkit prompt create "Summarization" --content "Summarize: {text}" --tags "summarization,basic" |
|
|
|
# List all prompts |
|
llmpromptkit prompt list |
|
|
|
# Create a new version |
|
llmpromptkit version commit <prompt_id> --message "Updated prompt" |
|
|
|
# Run tests |
|
llmpromptkit test run-all <prompt_id> --llm openai |
|
|
|
Advanced Usage |
|
Advanced Templating |
|
LLMPromptKit supports advanced templating with conditionals and loops: |
|
|
|
from llmpromptkit import PromptTemplate |
|
|
|
template = PromptTemplate(""" |
|
{system_message} |
|
|
|
{for example in examples} |
|
Input: {example.input} |
|
Output: {example.output} |
|
{endfor} |
|
|
|
Input: {input} |
|
Output: |
|
""") |
|
|
|
rendered = template.render( |
|
system_message="You are a helpful assistant.", |
|
examples=[ |
|
{"input": "Hello", "output": "Hi there!"}, |
|
{"input": "How are you?", "output": "I'm doing well, thanks!"} |
|
], |
|
input="What's the weather like?" |
|
) |
|
|
|
Custom Evaluation Metrics |
|
Create custom metrics to evaluate prompt performance: |
|
from llmpromptkit import EvaluationMetric, Evaluator |
|
|
|
class CustomMetric(EvaluationMetric): |
|
def __init__(self): |
|
super().__init__("custom_metric", "My custom evaluation metric") |
|
|
|
def compute(self, generated_output, expected_output=None, **kwargs): |
|
# Custom logic to score the output |
|
return score # A float between 0 and 1 |
|
|
|
# Register the custom metric |
|
evaluator = Evaluator(prompt_manager) |
|
evaluator.register_metric(CustomMetric()) |
|
|
|
Use Cases |
|
|
|
Prompt Development: Iteratively develop and refine prompts with version control |
|
Prompt Optimization: A/B test different prompt variations to find the most effective approach |
|
Quality Assurance: Ensure prompt quality with automated testing and evaluation |
|
Team Collaboration: Share and collaborate on prompts with a centralized management system |
|
Production Deployment: Maintain consistent prompt quality in production applications |
|
|
|
License |
|
MIT License |
|
|
|
## Contributing |
|
Contributions are welcome! Please feel free to submit a Pull Request. |
|
|
|
## Author |
|
Biswanath Roul - [GitHub](https://github.com/biswanathroul) |
|
|
|
|