Abstract
A framework for evaluating and optimizing natural language prompts in large language models is proposed, revealing correlations between prompt properties and their impact on reasoning tasks.
As large language models (LLMs) have progressed towards more human-like and human--AI communications have become prevalent, prompting has emerged as a decisive component. However, there is limited conceptual consensus on what exactly quantifies natural language prompts. We attempt to address this question by conducting a meta-analysis surveying more than 150 prompting-related papers from leading NLP and AI conferences from 2022 to 2025 and blogs. We propose a property- and human-centric framework for evaluating prompt quality, encompassing 21 properties categorized into six dimensions. We then examine how existing studies assess their impact on LLMs, revealing their imbalanced support across models and tasks, and substantial research gaps. Further, we analyze correlations among properties in high-quality natural language prompts, deriving prompting recommendations. We then empirically explore multi-property prompt enhancements in reasoning tasks, observing that single-property enhancements often have the greatest impact. Finally, we discover that instruction-tuning on property-enhanced prompts can result in better reasoning models. Our findings establish a foundation for property-centric prompt evaluation and optimization, bridging the gaps between human--AI communication and opening new prompting research directions.
Community
We are excited to share our latest prompting analysis paper, appearing at ACL 2025!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts (2025)
- MODP: Multi Objective Directional Prompting (2025)
- PMPO: Probabilistic Metric Prompt Optimization for Small and Large Language Models (2025)
- An Empirical Study on Prompt Compression for Large Language Models (2025)
- Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study (2025)
- PromptPrism: A Linguistically-Inspired Taxonomy for Prompts (2025)
- BELL: Benchmarking the Explainability of Large Language Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper