arxiv:2505.11988

TechniqueRAG: Retrieval Augmented Generation for Adversarial Technique Annotation in Cyber Threat Intelligence Text

Published on May 17

· Submitted by

lekssays on May 20

Upvote

Authors:

Ahmed Lekssays ,

Utsav Shukla ,

Husrev Taha Sencar ,

Md Rizwan Parvez

Abstract

Accurately identifying adversarial techniques in security texts is critical for effective cyber defense. However, existing methods face a fundamental trade-off: they either rely on generic models with limited domain precision or require resource-intensive pipelines that depend on large labeled datasets and task-specific optimizations, such as custom hard-negative mining and denoising, resources rarely available in specialized domains. We propose TechniqueRAG, a domain-specific retrieval-augmented generation (RAG) framework that bridges this gap by integrating off-the-shelf retrievers, instruction-tuned LLMs, and minimal text-technique pairs. Our approach addresses data scarcity by fine-tuning only the generation component on limited in-domain examples, circumventing the need for resource-intensive retrieval training. While conventional RAG mitigates hallucination by coupling retrieval and generation, its reliance on generic retrievers often introduces noisy candidates, limiting domain-specific precision. To address this, we enhance retrieval quality and domain specificity through zero-shot LLM re-ranking, which explicitly aligns retrieved candidates with adversarial techniques. Experiments on multiple security benchmarks demonstrate that TechniqueRAG achieves state-of-the-art performance without extensive task-specific optimizations or labeled data, while comprehensive analysis provides further insights.

View arXiv page View PDF GitHub repository Add to collection

Community

lekssays

Paper author Paper submitter about 23 hours ago

•

edited about 23 hours ago

Summary:
TechniqueRAG is a domain-specific retrieval-augmented generation (RAG) framework for identifying adversarial techniques in cybersecurity texts. It avoids the limitations of generic models and resource-heavy pipelines by fine-tuning only the generation component using minimal data. To improve precision, it uses zero-shot LLM re-ranking to refine retrieved results. TechniqueRAG outperforms existing methods on security benchmarks without needing extensive labeled data or custom optimizations.

librarian-bot

about 5 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 3

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2505.11988 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2505.11988 in a Space README.md to link it from this page.