File size: 6,380 Bytes
897dc40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6f40440
e54fd17
6f40440
e54fd17
 
 
 
 
 
 
 
 
897dc40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e54fd17
 
 
 
 
 
 
 
 
 
897dc40
e54fd17
 
 
 
6f40440
e54fd17
 
 
6f40440
e54fd17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6f40440
e54fd17
 
6f40440
e54fd17
 
6f40440
e54fd17
 
6f40440
e54fd17
 
6f40440
e54fd17
 
 
6f40440
e54fd17
6f40440
e54fd17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6f40440
e54fd17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
---
library_name: llmpromptkit
title: LLMPromptKit
emoji: ๐Ÿš€
tags:
- prompt-engineering
- llm
- nlp
- prompt-management
- huggingface
- version-control
- ab-testing
- evaluation
languages:
- python
license: mit
pipeline_tag: text-generation
datasets:
- none

---

# LLMPromptKit: LLM Prompt Management System

LLMPromptKit is a comprehensive library for managing, versioning, testing, and evaluating prompts for Large Language Models (LLMs). It provides a structured framework to help data scientists and developers create, optimize, and maintain high-quality prompts.

## Features

- **Prompt Management**: Create, update, and organize prompts with metadata and tags
- **Version Control**: Track prompt changes over time with full version history
- **A/B Testing**: Compare different prompt variations to find the most effective one
- **Evaluation Framework**: Measure prompt quality with customizable metrics
- **Advanced Templating**: Create dynamic prompts with variables, conditionals, and loops
- **Command-line Interface**: Easily integrate into your workflow
- **Hugging Face Integration**: Seamlessly test prompts with thousands of open-source models

## Hugging Face Integration

LLMPromptKit includes a powerful integration with Hugging Face models, allowing you to:

- Test prompts with thousands of open-source models
- Run evaluations with models like FLAN-T5, GPT-2, and others
- Compare prompt performance across different model architectures
- Access specialized models for tasks like translation, summarization, and question answering

```python
from llmpromptkit import PromptManager, PromptTesting
from llmpromptkit.integrations.huggingface import get_huggingface_callback

# Initialize components
prompt_manager = PromptManager()
testing = PromptTesting(prompt_manager)

# Get a HuggingFace callback
hf_callback = get_huggingface_callback(
    model_name="google/flan-t5-base", 
    task="text2text-generation"
)

# Run tests with the model
test_results = await testing.run_test_cases(prompt_id="your_prompt_id", llm_callback=hf_callback)
```

## Documentation

For detailed documentation, see the [docs](./docs) directory:

- [Getting Started](./docs/getting_started.md)
- [API Reference](./docs/api_reference.md)
- [CLI Usage](./docs/cli_usage.md)
- [Advanced Features](./docs/advanced_features.md)
- [Integration Examples](./docs/integration_examples.md)
- [Integration Examples](./docs/integration_examples.md)

## Installation

```bash
pip install llmpromptkit

Quick Start

from llmpromptkit import PromptManager, VersionControl, PromptTesting, Evaluator

# Initialize components
prompt_manager = PromptManager()
version_control = VersionControl(prompt_manager)
testing = PromptTesting(prompt_manager)
evaluator = Evaluator(prompt_manager)

# Create a prompt
prompt = prompt_manager.create(
    content="Summarize the following text: {text}",
    name="Simple Summarization",
    description="A simple prompt for text summarization",
    tags=["summarization", "basic"]
)

# Create a new version
version_control.commit(
    prompt_id=prompt.id,
    commit_message="Initial version"
)

# Update the prompt
prompt_manager.update(
    prompt.id,
    content="Please provide a concise summary of the following text in 2-3 sentences: {text}"
)

# Commit the updated version
version_control.commit(
    prompt_id=prompt.id,
    commit_message="Improved prompt with length guidance"
)

# Create a test case
test_case = testing.create_test_case(
    prompt_id=prompt.id,
    input_vars={"text": "Lorem ipsum dolor sit amet..."},
    expected_output="This is a summary of the text."
)

# Define an LLM callback for testing
async def llm_callback(prompt, vars):
    # In a real scenario, this would call an actual LLM API
    return "This is a summary of the text."

# Run the test case
import asyncio
test_result = asyncio.run(testing.run_test_case(
    test_case_id=test_case.id,
    llm_callback=llm_callback
))

# Evaluate a prompt with multiple inputs
evaluation_result = asyncio.run(evaluator.evaluate_prompt(
    prompt_id=prompt.id,
    inputs=[{"text": "Sample text 1"}, {"text": "Sample text 2"}],
    llm_callback=llm_callback
))

print(f"Evaluation metrics: {evaluation_result['aggregated_metrics']}")

Command-line Interface
LLMPromptKit comes with a powerful CLI for managing prompts:

# Create a prompt
llmpromptkit prompt create "Summarization" --content "Summarize: {text}" --tags "summarization,basic"

# List all prompts
llmpromptkit prompt list

# Create a new version
llmpromptkit version commit <prompt_id> --message "Updated prompt"

# Run tests
llmpromptkit test run-all <prompt_id> --llm openai

Advanced Usage
Advanced Templating
LLMPromptKit supports advanced templating with conditionals and loops:

from llmpromptkit import PromptTemplate

template = PromptTemplate("""
{system_message}

{for example in examples}
Input: {example.input}
Output: {example.output}
{endfor}

Input: {input}
Output:
""")

rendered = template.render(
    system_message="You are a helpful assistant.",
    examples=[
        {"input": "Hello", "output": "Hi there!"},
        {"input": "How are you?", "output": "I'm doing well, thanks!"}
    ],
    input="What's the weather like?"
)

Custom Evaluation Metrics
Create custom metrics to evaluate prompt performance:
from llmpromptkit import EvaluationMetric, Evaluator

class CustomMetric(EvaluationMetric):
    def __init__(self):
        super().__init__("custom_metric", "My custom evaluation metric")
    
    def compute(self, generated_output, expected_output=None, **kwargs):
        # Custom logic to score the output
        return score  # A float between 0 and 1

# Register the custom metric
evaluator = Evaluator(prompt_manager)
evaluator.register_metric(CustomMetric())

Use Cases

Prompt Development: Iteratively develop and refine prompts with version control
Prompt Optimization: A/B test different prompt variations to find the most effective approach
Quality Assurance: Ensure prompt quality with automated testing and evaluation
Team Collaboration: Share and collaborate on prompts with a centralized management system
Production Deployment: Maintain consistent prompt quality in production applications

License
MIT License

## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.

## Author
Biswanath Roul - [GitHub](https://github.com/biswanathroul)