llmpromptkit / README.md

Update README.md

897dc40 verified 4 months ago

6.38 kB

	---
	library_name: llmpromptkit
	title: LLMPromptKit
	emoji: 🚀
	tags:
	- prompt-engineering
	- llm
	- nlp
	- prompt-management
	- huggingface
	- version-control
	- ab-testing
	- evaluation
	languages:
	- python
	license: mit
	pipeline_tag: text-generation
	datasets:
	- none

	---

	# LLMPromptKit: LLM Prompt Management System

	LLMPromptKit is a comprehensive library for managing, versioning, testing, and evaluating prompts for Large Language Models (LLMs). It provides a structured framework to help data scientists and developers create, optimize, and maintain high-quality prompts.

	## Features

	- Prompt Management: Create, update, and organize prompts with metadata and tags
	- Version Control: Track prompt changes over time with full version history
	- A/B Testing: Compare different prompt variations to find the most effective one
	- Evaluation Framework: Measure prompt quality with customizable metrics
	- Advanced Templating: Create dynamic prompts with variables, conditionals, and loops
	- Command-line Interface: Easily integrate into your workflow
	- Hugging Face Integration: Seamlessly test prompts with thousands of open-source models

	## Hugging Face Integration

	LLMPromptKit includes a powerful integration with Hugging Face models, allowing you to:

	- Test prompts with thousands of open-source models
	- Run evaluations with models like FLAN-T5, GPT-2, and others
	- Compare prompt performance across different model architectures
	- Access specialized models for tasks like translation, summarization, and question answering

	```python
	from llmpromptkit import PromptManager, PromptTesting
	from llmpromptkit.integrations.huggingface import get_huggingface_callback

	# Initialize components
	prompt_manager = PromptManager()
	testing = PromptTesting(prompt_manager)

	# Get a HuggingFace callback
	hf_callback = get_huggingface_callback(
	model_name="google/flan-t5-base",
	task="text2text-generation"
	)

	# Run tests with the model
	test_results = await testing.run_test_cases(prompt_id="your_prompt_id", llm_callback=hf_callback)
	```

	## Documentation

	For detailed documentation, see the [docs](./docs) directory:

	- [Getting Started](./docs/getting_started.md)
	- [API Reference](./docs/api_reference.md)
	- [CLI Usage](./docs/cli_usage.md)
	- [Advanced Features](./docs/advanced_features.md)
	- [Integration Examples](./docs/integration_examples.md)
	- [Integration Examples](./docs/integration_examples.md)

	## Installation

	```bash
	pip install llmpromptkit

	Quick Start

	from llmpromptkit import PromptManager, VersionControl, PromptTesting, Evaluator

	# Initialize components
	prompt_manager = PromptManager()
	version_control = VersionControl(prompt_manager)
	testing = PromptTesting(prompt_manager)
	evaluator = Evaluator(prompt_manager)

	# Create a prompt
	prompt = prompt_manager.create(
	content="Summarize the following text: {text}",
	name="Simple Summarization",
	description="A simple prompt for text summarization",
	tags=["summarization", "basic"]
	)

	# Create a new version
	version_control.commit(
	prompt_id=prompt.id,
	commit_message="Initial version"
	)

	# Update the prompt
	prompt_manager.update(
	prompt.id,
	content="Please provide a concise summary of the following text in 2-3 sentences: {text}"
	)

	# Commit the updated version
	version_control.commit(
	prompt_id=prompt.id,
	commit_message="Improved prompt with length guidance"
	)

	# Create a test case
	test_case = testing.create_test_case(
	prompt_id=prompt.id,
	input_vars={"text": "Lorem ipsum dolor sit amet..."},
	expected_output="This is a summary of the text."
	)

	# Define an LLM callback for testing
	async def llm_callback(prompt, vars):
	# In a real scenario, this would call an actual LLM API
	return "This is a summary of the text."

	# Run the test case
	import asyncio
	test_result = asyncio.run(testing.run_test_case(
	test_case_id=test_case.id,
	llm_callback=llm_callback
	))

	# Evaluate a prompt with multiple inputs
	evaluation_result = asyncio.run(evaluator.evaluate_prompt(
	prompt_id=prompt.id,
	inputs=[{"text": "Sample text 1"}, {"text": "Sample text 2"}],
	llm_callback=llm_callback
	))

	print(f"Evaluation metrics: {evaluation_result['aggregated_metrics']}")

	Command-line Interface
	LLMPromptKit comes with a powerful CLI for managing prompts:

	# Create a prompt
	llmpromptkit prompt create "Summarization" --content "Summarize: {text}" --tags "summarization,basic"

	# List all prompts
	llmpromptkit prompt list

	# Create a new version
	llmpromptkit version commit <prompt_id> --message "Updated prompt"

	# Run tests
	llmpromptkit test run-all <prompt_id> --llm openai

	Advanced Usage
	Advanced Templating
	LLMPromptKit supports advanced templating with conditionals and loops:

	from llmpromptkit import PromptTemplate

	template = PromptTemplate("""
	{system_message}

	{for example in examples}
	Input: {example.input}
	Output: {example.output}
	{endfor}

	Input: {input}
	Output:
	""")

	rendered = template.render(
	system_message="You are a helpful assistant.",
	examples=[
	{"input": "Hello", "output": "Hi there!"},
	{"input": "How are you?", "output": "I'm doing well, thanks!"}
	],
	input="What's the weather like?"
	)

	Custom Evaluation Metrics
	Create custom metrics to evaluate prompt performance:
	from llmpromptkit import EvaluationMetric, Evaluator

	class CustomMetric(EvaluationMetric):
	def __init__(self):
	super().__init__("custom_metric", "My custom evaluation metric")

	def compute(self, generated_output, expected_output=None, **kwargs):
	# Custom logic to score the output
	return score # A float between 0 and 1

	# Register the custom metric
	evaluator = Evaluator(prompt_manager)
	evaluator.register_metric(CustomMetric())

	Use Cases

	Prompt Development: Iteratively develop and refine prompts with version control
	Prompt Optimization: A/B test different prompt variations to find the most effective approach
	Quality Assurance: Ensure prompt quality with automated testing and evaluation
	Team Collaboration: Share and collaborate on prompts with a centralized management system
	Production Deployment: Maintain consistent prompt quality in production applications

	License
	MIT License

	## Contributing
	Contributions are welcome! Please feel free to submit a Pull Request.

	## Author
	Biswanath Roul - [GitHub](https://github.com/biswanathroul)