File size: 6,380 Bytes
897dc40 6f40440 e54fd17 6f40440 e54fd17 897dc40 e54fd17 897dc40 e54fd17 6f40440 e54fd17 6f40440 e54fd17 6f40440 e54fd17 6f40440 e54fd17 6f40440 e54fd17 6f40440 e54fd17 6f40440 e54fd17 6f40440 e54fd17 6f40440 e54fd17 6f40440 e54fd17 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 |
---
library_name: llmpromptkit
title: LLMPromptKit
emoji: ๐
tags:
- prompt-engineering
- llm
- nlp
- prompt-management
- huggingface
- version-control
- ab-testing
- evaluation
languages:
- python
license: mit
pipeline_tag: text-generation
datasets:
- none
---
# LLMPromptKit: LLM Prompt Management System
LLMPromptKit is a comprehensive library for managing, versioning, testing, and evaluating prompts for Large Language Models (LLMs). It provides a structured framework to help data scientists and developers create, optimize, and maintain high-quality prompts.
## Features
- **Prompt Management**: Create, update, and organize prompts with metadata and tags
- **Version Control**: Track prompt changes over time with full version history
- **A/B Testing**: Compare different prompt variations to find the most effective one
- **Evaluation Framework**: Measure prompt quality with customizable metrics
- **Advanced Templating**: Create dynamic prompts with variables, conditionals, and loops
- **Command-line Interface**: Easily integrate into your workflow
- **Hugging Face Integration**: Seamlessly test prompts with thousands of open-source models
## Hugging Face Integration
LLMPromptKit includes a powerful integration with Hugging Face models, allowing you to:
- Test prompts with thousands of open-source models
- Run evaluations with models like FLAN-T5, GPT-2, and others
- Compare prompt performance across different model architectures
- Access specialized models for tasks like translation, summarization, and question answering
```python
from llmpromptkit import PromptManager, PromptTesting
from llmpromptkit.integrations.huggingface import get_huggingface_callback
# Initialize components
prompt_manager = PromptManager()
testing = PromptTesting(prompt_manager)
# Get a HuggingFace callback
hf_callback = get_huggingface_callback(
model_name="google/flan-t5-base",
task="text2text-generation"
)
# Run tests with the model
test_results = await testing.run_test_cases(prompt_id="your_prompt_id", llm_callback=hf_callback)
```
## Documentation
For detailed documentation, see the [docs](./docs) directory:
- [Getting Started](./docs/getting_started.md)
- [API Reference](./docs/api_reference.md)
- [CLI Usage](./docs/cli_usage.md)
- [Advanced Features](./docs/advanced_features.md)
- [Integration Examples](./docs/integration_examples.md)
- [Integration Examples](./docs/integration_examples.md)
## Installation
```bash
pip install llmpromptkit
Quick Start
from llmpromptkit import PromptManager, VersionControl, PromptTesting, Evaluator
# Initialize components
prompt_manager = PromptManager()
version_control = VersionControl(prompt_manager)
testing = PromptTesting(prompt_manager)
evaluator = Evaluator(prompt_manager)
# Create a prompt
prompt = prompt_manager.create(
content="Summarize the following text: {text}",
name="Simple Summarization",
description="A simple prompt for text summarization",
tags=["summarization", "basic"]
)
# Create a new version
version_control.commit(
prompt_id=prompt.id,
commit_message="Initial version"
)
# Update the prompt
prompt_manager.update(
prompt.id,
content="Please provide a concise summary of the following text in 2-3 sentences: {text}"
)
# Commit the updated version
version_control.commit(
prompt_id=prompt.id,
commit_message="Improved prompt with length guidance"
)
# Create a test case
test_case = testing.create_test_case(
prompt_id=prompt.id,
input_vars={"text": "Lorem ipsum dolor sit amet..."},
expected_output="This is a summary of the text."
)
# Define an LLM callback for testing
async def llm_callback(prompt, vars):
# In a real scenario, this would call an actual LLM API
return "This is a summary of the text."
# Run the test case
import asyncio
test_result = asyncio.run(testing.run_test_case(
test_case_id=test_case.id,
llm_callback=llm_callback
))
# Evaluate a prompt with multiple inputs
evaluation_result = asyncio.run(evaluator.evaluate_prompt(
prompt_id=prompt.id,
inputs=[{"text": "Sample text 1"}, {"text": "Sample text 2"}],
llm_callback=llm_callback
))
print(f"Evaluation metrics: {evaluation_result['aggregated_metrics']}")
Command-line Interface
LLMPromptKit comes with a powerful CLI for managing prompts:
# Create a prompt
llmpromptkit prompt create "Summarization" --content "Summarize: {text}" --tags "summarization,basic"
# List all prompts
llmpromptkit prompt list
# Create a new version
llmpromptkit version commit <prompt_id> --message "Updated prompt"
# Run tests
llmpromptkit test run-all <prompt_id> --llm openai
Advanced Usage
Advanced Templating
LLMPromptKit supports advanced templating with conditionals and loops:
from llmpromptkit import PromptTemplate
template = PromptTemplate("""
{system_message}
{for example in examples}
Input: {example.input}
Output: {example.output}
{endfor}
Input: {input}
Output:
""")
rendered = template.render(
system_message="You are a helpful assistant.",
examples=[
{"input": "Hello", "output": "Hi there!"},
{"input": "How are you?", "output": "I'm doing well, thanks!"}
],
input="What's the weather like?"
)
Custom Evaluation Metrics
Create custom metrics to evaluate prompt performance:
from llmpromptkit import EvaluationMetric, Evaluator
class CustomMetric(EvaluationMetric):
def __init__(self):
super().__init__("custom_metric", "My custom evaluation metric")
def compute(self, generated_output, expected_output=None, **kwargs):
# Custom logic to score the output
return score # A float between 0 and 1
# Register the custom metric
evaluator = Evaluator(prompt_manager)
evaluator.register_metric(CustomMetric())
Use Cases
Prompt Development: Iteratively develop and refine prompts with version control
Prompt Optimization: A/B test different prompt variations to find the most effective approach
Quality Assurance: Ensure prompt quality with automated testing and evaluation
Team Collaboration: Share and collaborate on prompts with a centralized management system
Production Deployment: Maintain consistent prompt quality in production applications
License
MIT License
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## Author
Biswanath Roul - [GitHub](https://github.com/biswanathroul)
|