Post
143
Open-source AI agents are achieving State-of-the-Art results on two different Android AI agent benchmarks.
Yesterday, I finished evaluating my Android agent model, deki, on two separate benchmarks: Android Control and Android World. For both benchmarks I used a subset of the dataset without fine-tuning. The results show that image description models like deki enables large LLMs (like GPT-4o, GPT-4.1, and Gemini 2.5) to become State-of-the-Art on Android AI agent benchmarks using only vision capabilities, without relying on Accessibility Trees, on both single-step and multi-step tasks.
All the information is available on GitHub: https://github.com/RasulOs/deki
I have also uploaded the model to Hugging Face:
Space: orasul/deki
(Check the analyze-and-get-yolo endpoint)
Model: orasul/deki-yolo
Yesterday, I finished evaluating my Android agent model, deki, on two separate benchmarks: Android Control and Android World. For both benchmarks I used a subset of the dataset without fine-tuning. The results show that image description models like deki enables large LLMs (like GPT-4o, GPT-4.1, and Gemini 2.5) to become State-of-the-Art on Android AI agent benchmarks using only vision capabilities, without relying on Accessibility Trees, on both single-step and multi-step tasks.
All the information is available on GitHub: https://github.com/RasulOs/deki
I have also uploaded the model to Hugging Face:
Space: orasul/deki
(Check the analyze-and-get-yolo endpoint)
Model: orasul/deki-yolo