LLMalMorph: On The Feasibility of Generating Variant Malware using Large-Language-Models
Abstract
LLMalMorph, a semi-automated framework using LLMs, generates malware variants by semantically and syntactically comprehending source code, reducing detection rates and achieving attack success against ML-based classifiers.
Large Language Models (LLMs) have transformed software development and automated code generation. Motivated by these advancements, this paper explores the feasibility of LLMs in modifying malware source code to generate variants. We introduce LLMalMorph, a semi-automated framework that leverages semantical and syntactical code comprehension by LLMs to generate new malware variants. LLMalMorph extracts function-level information from the malware source code and employs custom-engineered prompts coupled with strategically defined code transformations to guide the LLM in generating variants without resource-intensive fine-tuning. To evaluate LLMalMorph, we collected 10 diverse Windows malware samples of varying types, complexity and functionality and generated 618 variants. Our thorough experiments demonstrate that it is possible to reduce the detection rates of antivirus engines of these malware variants to some extent while preserving malware functionalities. In addition, despite not optimizing against any Machine Learning (ML)-based malware detectors, several variants also achieved notable attack success rates against an ML-based malware classifier. We also discuss the limitations of current LLM capabilities in generating malware variants from source code and assess where this emerging technology stands in the broader context of malware variant generation.
Community
🔍 Why it matters: Modern static detectors can be bypassed with tiny semantic-preserving changes. We ask: can LLMs automatically generate source-level malware variants?
🛠️ Our approach: We present LLMalMorph, a semi-automated pipeline:
- Extracts function‑level code from Windows malware
- Uses prompt‑engineered LLMs (no fine‑tuning) to apply transformations
- Validates compilation and functional equivalence
📊 Key results: Over 10 malware families, we generated 618 variants. Detection rates by commercial AV shows drops in detection rates — and an ML-based detector was also evaded in several cases.
🚀 Our Contributions:
- First LLM-based source‑level malware mutation pipeline
- Fully functionality‑preserving, no model fine‑tuning
- Analysis of LLM limitations in semantic/malicious code generation
âť“ Discussion:
- Should LLM‑driven mutation be part of detector robustness benchmarks?
- How do defenders anticipate this emerging threat?
- Where did LLMs struggle (control flow, obfuscation)? Suggestions are welcome!
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper