Papers
arxiv:2507.09411

LLMalMorph: On The Feasibility of Generating Variant Malware using Large-Language-Models

Published on Jul 12
· Submitted by Ajwad on Jul 16
Authors:
,
,
,
,
,

Abstract

LLMalMorph, a semi-automated framework using LLMs, generates malware variants by semantically and syntactically comprehending source code, reducing detection rates and achieving attack success against ML-based classifiers.

AI-generated summary

Large Language Models (LLMs) have transformed software development and automated code generation. Motivated by these advancements, this paper explores the feasibility of LLMs in modifying malware source code to generate variants. We introduce LLMalMorph, a semi-automated framework that leverages semantical and syntactical code comprehension by LLMs to generate new malware variants. LLMalMorph extracts function-level information from the malware source code and employs custom-engineered prompts coupled with strategically defined code transformations to guide the LLM in generating variants without resource-intensive fine-tuning. To evaluate LLMalMorph, we collected 10 diverse Windows malware samples of varying types, complexity and functionality and generated 618 variants. Our thorough experiments demonstrate that it is possible to reduce the detection rates of antivirus engines of these malware variants to some extent while preserving malware functionalities. In addition, despite not optimizing against any Machine Learning (ML)-based malware detectors, several variants also achieved notable attack success rates against an ML-based malware classifier. We also discuss the limitations of current LLM capabilities in generating malware variants from source code and assess where this emerging technology stands in the broader context of malware variant generation.

Community

Paper author Paper submitter

🔍 Why it matters: Modern static detectors can be bypassed with tiny semantic-preserving changes. We ask: can LLMs automatically generate source-level malware variants?

🛠️ Our approach: We present LLMalMorph, a semi-automated pipeline:

  1. Extracts function‑level code from Windows malware
  2. Uses prompt‑engineered LLMs (no fine‑tuning) to apply transformations
  3. Validates compilation and functional equivalence

📊 Key results: Over 10 malware families, we generated 618 variants. Detection rates by commercial AV shows drops in detection rates — and an ML-based detector was also evaded in several cases.

🚀 Our Contributions:

  • First LLM-based source‑level malware mutation pipeline
  • Fully functionality‑preserving, no model fine‑tuning
  • Analysis of LLM limitations in semantic/malicious code generation

âť“ Discussion:

  • Should LLM‑driven mutation be part of detector robustness benchmarks?
  • How do defenders anticipate this emerging threat?
  • Where did LLMs struggle (control flow, obfuscation)? Suggestions are welcome!

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.09411 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.09411 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.09411 in a Space README.md to link it from this page.

Collections including this paper 3