SmaliLLM
					Collection
				
Our Large Language Model to Decompile Smali code to Java code.
					• 
				9 items
				• 
				Updated
					
				
SmaliLLM is a large language model designed to decompile Smali code into Java code. Reconstructing Smali language representations into high-level languages such as Java holds significant practical engineering value. This transformation not only lowers the technical barrier for reverse engineering but also provides the necessary semantic foundation for subsequent tasks such as static analysis and vulnerability detection.
SmaliLLM is a series of models finetuned using nearly 1000 "Smali2Java" data, based on Qwen3, Qwen2.5-Coder, Gemma3, with the following features:
The following contains a code snippet illustrating how to use the model generate content based on given inputs.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "MoxStone/SmaliLLM-Qwen3-8B-Finetuned"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
prompt = "Smali Code You Want to Decompile"
messages = [
{"role":"system", "content": "Decompile following smali code to java code."}
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False # In the Qwen3 base model, we use the non-thinking mode to decompile Smali code.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=4096
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
# parsing thinking content
try:
# rindex finding 151668 (</think>)
index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
index = 0
thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
print("Java code:", content)