rmayormartins's picture
Subindo arquivos
eea9cd6

A newer version of the Gradio SDK is available: 5.45.0

Upgrade
metadata
title: ai-q-learning-vacuum-robot-sm
emoji: 🏠⚙️🤖
colorFrom: pink
colorTo: green
sdk: gradio
sdk_version: 4.12.0
app_file: app.py
pinned: false

AI-Q-Learning-Vacuum-Robot-Cleaner-Simulation

This project is an experimental application v2.0 ...

Project Overview

This application allows users to train a vacuum robot cleaner to recognize an environment.

Technical Details

The project utilizes the following technologies:

  • Q-Learning: Reinforcement learning algorithm for training the robot.
  • Gradio: Provides an interactive web interface for users to upload images and adjust parameters.

Instructions

1- Set up the environment:

  • Edit the grid: 0 = Empty, 1 = Dirt, 2 = Wall, 3 = Vacuum and Generate Environment.

2- Train the robot vacuum cleaner:

  • Reinforcement learning: Q-learning to train the robot vacuum cleaner.
  • Start Position Certification: Ensure that the robot does not start in a dirt or wall position.
  • Dirt Cleaning: After finding dirt, the robot cleans it, updating the position to 0.
  • Reduce Epsilon Decay Rate: This will allow the robot to explore for longer before it starts exploring less.
  • Reset the Home State Periodically: To ensure that dirt reappears and the robot has new opportunities to learn.
  • Check that the Robot is Not Stuck: A mechanism was add to ensure that the robot is not stuck in a cycle of invalid states.
  • Epsilon decay: The decay rate (reduced to 0.999), will allow for more exploration.
  • House State Reset: The house is reset every episode to ensure that dirt is present in each new episode.
  • Increase the learning rate: Set the alpha to (e.g. 0.2) to see if it helps you learn faster.
  • Increase the discount factor: Set the gamma to (e.g. 0.95) to give more value to future rewards.
  • Add more randomness to the choice of initial state: This can help to vary training experiences more.
  • Reduce the reward when encountering dirt: Reducing the direct reward can make the robot try harder to learn other parts of the environment.
  • Add penalties for movement: Adding a small penalty for each movement can encourage the robot to find dirt more efficiently.
  • Increase the variation of initial states: Starting from a greater variety of initial positions can help the robot explore more of the environment.
  • Change the learning rate (alpha): If you notice that the robot is converging too slowly or too quickly, adjusting the learning rate can help.
  • Add more dirt or obstacles: Adding more elements to the environment can make the problem more challenging and interesting for the robot.
  • Test different exploration-exploitation (epsilon) policies: Experiment with different epsilon decay strategies to find a good balance between exploration and exploitation.
  • Increase the number of episodes: In some cases, training for more episodes can help to further improve the robot's performance.

3- Simulate:

  • New Simulation Grid: 0 = Empty, 1 = Dirt, 2 = Wall, 3 = Vacuum, set iterations (episodes/epochs) and simulate robot.

License

ECL

Developer Information

Developed by Ramon Mayor Martins, Ph.D. (2024)

Acknowledgements

Special thanks to Instituto Federal de Santa Catarina (Federal Institute of Santa Catarina) IFSC-São José-Brazil.

Contact

For any queries or suggestions, please contact the developer using the information provided above.