rmayormartins commited on
Commit
eea9cd6
Β·
1 Parent(s): e1d5340

Subindo arquivos

Browse files
Files changed (3) hide show
  1. README.md +65 -6
  2. app.py +238 -0
  3. requirements.txt +4 -0
README.md CHANGED
@@ -1,13 +1,72 @@
1
  ---
2
- title: Ai Q Learning Vacuum Robot Sim
3
- emoji: 🐨
4
- colorFrom: purple
5
  colorTo: green
6
  sdk: gradio
7
- sdk_version: 4.37.2
8
  app_file: app.py
9
  pinned: false
10
- license: ecl-2.0
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: ai-q-learning-vacuum-robot-sm
3
+ emoji: πŸ βš™οΈπŸ€–
4
+ colorFrom: pink
5
  colorTo: green
6
  sdk: gradio
7
+ sdk_version: "4.12.0"
8
  app_file: app.py
9
  pinned: false
 
10
  ---
11
 
12
+ # AI-Q-Learning-Vacuum-Robot-Cleaner-Simulation
13
+
14
+ This project is an experimental application v2.0 ...
15
+
16
+ ## Project Overview
17
+
18
+ This application allows users to train a vacuum robot cleaner to recognize an environment.
19
+
20
+ ## Technical Details
21
+
22
+ The project utilizes the following technologies:
23
+ - **Q-Learning**: Reinforcement learning algorithm for training the robot.
24
+ - **Gradio**: Provides an interactive web interface for users to upload images and adjust parameters.
25
+
26
+ ## Instructions
27
+
28
+ **1- Set up the environment**:
29
+ - Edit the grid: 0 = Empty, 1 = Dirt, 2 = Wall, 3 = Vacuum and Generate Environment.
30
+
31
+ **2- Train the robot vacuum cleaner**:
32
+ - Reinforcement learning: Q-learning to train the robot vacuum cleaner.
33
+ - Start Position Certification: Ensure that the robot does not start in a dirt or wall position.
34
+ - Dirt Cleaning: After finding dirt, the robot cleans it, updating the position to 0.
35
+ - Reduce Epsilon Decay Rate: This will allow the robot to explore for longer before it starts exploring less.
36
+ - Reset the Home State Periodically: To ensure that dirt reappears and the robot has new opportunities to learn.
37
+ - Check that the Robot is Not Stuck: A mechanism was add to ensure that the robot is not stuck in a cycle of invalid states.
38
+ - Epsilon decay: The decay rate (reduced to 0.999), will allow for more exploration.
39
+ - House State Reset: The house is reset every episode to ensure that dirt is present in each new episode.
40
+ - Increase the learning rate: Set the alpha to (e.g. 0.2) to see if it helps you learn faster.
41
+ - Increase the discount factor: Set the gamma to (e.g. 0.95) to give more value to future rewards.
42
+ - Add more randomness to the choice of initial state: This can help to vary training experiences more.
43
+ - Reduce the reward when encountering dirt: Reducing the direct reward can make the robot try harder to learn other parts of the environment.
44
+ - Add penalties for movement: Adding a small penalty for each movement can encourage the robot to find dirt more efficiently.
45
+ - Increase the variation of initial states: Starting from a greater variety of initial positions can help the robot explore more of the environment.
46
+ - Change the learning rate (alpha): If you notice that the robot is converging too slowly or too quickly, adjusting the learning rate can help.
47
+ - Add more dirt or obstacles: Adding more elements to the environment can make the problem more challenging and interesting for the robot.
48
+ - Test different exploration-exploitation (epsilon) policies: Experiment with different epsilon decay strategies to find a good balance between exploration and exploitation.
49
+ - Increase the number of episodes: In some cases, training for more episodes can help to further improve the robot's performance.
50
+
51
+ **3- Simulate**:
52
+ - New Simulation Grid: 0 = Empty, 1 = Dirt, 2 = Wall, 3 = Vacuum, set iterations (episodes/epochs) and simulate robot.
53
+
54
+ ## License
55
+
56
+ ECL
57
+
58
+ ## Developer Information
59
+
60
+ Developed by Ramon Mayor Martins, Ph.D. (2024)
61
+ - Email: [email protected]
62
+ - Homepage: [https://rmayormartins.github.io/](https://rmayormartins.github.io/)
63
+ - Twitter: @rmayormartins
64
+ - GitHub: [https://github.com/rmayormartins](https://github.com/rmayormartins)
65
+
66
+ ## Acknowledgements
67
+
68
+ Special thanks to Instituto Federal de Santa Catarina (Federal Institute of Santa Catarina) IFSC-SΓ£o JosΓ©-Brazil.
69
+
70
+ ## Contact
71
+
72
+ For any queries or suggestions, please contact the developer using the information provided above.
app.py ADDED
@@ -0,0 +1,238 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import numpy as np
3
+ import matplotlib.pyplot as plt
4
+ import matplotlib.patches as patches
5
+ import random
6
+ import time
7
+ import imageio
8
+ import os
9
+
10
+ #
11
+ grid_size = (10, 10)
12
+ house = np.zeros(grid_size, dtype=int)
13
+ vacuum_pos = None
14
+ q_table = np.zeros((grid_size[0], grid_size[1], 4))
15
+ initial_house = None
16
+
17
+ def draw_grid(house, vacuum_pos, iteration=None):
18
+ fig, ax = plt.subplots()
19
+ block_size = 1
20
+ for x in range(grid_size[0]):
21
+ for y in range(grid_size[1]):
22
+ if house[x, y] == 1:
23
+ rect = patches.Rectangle((y, x), block_size, block_size, linewidth=1, edgecolor='black', facecolor='green')
24
+ elif house[x, y] == 2:
25
+ rect = patches.Rectangle((y, x), block_size, block_size, linewidth=1, edgecolor='black', facecolor='red')
26
+ elif house[x, y] == 3:
27
+ rect = patches.Rectangle((y, x), block_size, block_size, linewidth=1, edgecolor='black', facecolor='blue')
28
+ else:
29
+ rect = patches.Rectangle((y, x), block_size, block_size, linewidth=1, edgecolor='black', facecolor='white')
30
+ ax.add_patch(rect)
31
+ if vacuum_pos:
32
+ robot = patches.Circle((vacuum_pos[1] + 0.5, vacuum_pos[0] + 0.5), 0.3, linewidth=1, edgecolor='blue', facecolor='blue')
33
+ ax.add_patch(robot)
34
+ ax.set_xlim(0, grid_size[1])
35
+ ax.set_ylim(0, grid_size[0])
36
+ ax.set_aspect('equal')
37
+ plt.gca().invert_yaxis()
38
+ plt.title(f'Environment Configuration' if iteration is None else f'Iteration: {iteration}')
39
+ plt.legend(handles=[
40
+ patches.Patch(color='green', label='Dirt'),
41
+ patches.Patch(color='red', label='Wall'),
42
+ patches.Patch(color='white', label='Empty'),
43
+ patches.Patch(color='blue', label='Vacuum')
44
+ ], bbox_to_anchor=(1.05, 1), loc='upper left')
45
+ if iteration is not None:
46
+ plt.savefig(f"iteration_{iteration}.png")
47
+ else:
48
+ plt.savefig("grid.png")
49
+ plt.close()
50
+ return f"iteration_{iteration}.png" if iteration is not None else "grid.png"
51
+
52
+ def update_grid(grid):
53
+ global house, vacuum_pos, initial_house
54
+ house = np.array(grid, dtype=int)
55
+ initial_house = house.copy() #
56
+ vacuum_pos = tuple(np.argwhere(house == 3)[0]) if 3 in house else None
57
+ return draw_grid(house, vacuum_pos)
58
+
59
+ def reset_house():
60
+ global house
61
+ house = initial_house.copy() #
62
+
63
+ def choose_action(state, epsilon):
64
+ if random.uniform(0, 1) < epsilon:
65
+ action = random.randint(0, 3) # Explore
66
+ else:
67
+ action = np.argmax(q_table[state[0], state[1]]) # Exploit
68
+ return action
69
+
70
+ def get_next_state(state, action):
71
+ if action == 0: # up
72
+ next_state = (max(state[0] - 1, 0), state[1])
73
+ elif action == 1: # down
74
+ next_state = (min(state[0] + 1, grid_size[0] - 1), state[1])
75
+ elif action == 2: # left
76
+ next_state = (state[0], max(state[1] - 1, 0))
77
+ else: # right
78
+ next_state = (state[0], min(state[1] + 1, grid_size[1] - 1))
79
+ return next_state
80
+
81
+ def is_valid_state(state):
82
+ return house[state] != -1
83
+
84
+ def train_robot(episodes, alpha, gamma, epsilon, epsilon_decay, epsilon_min, move_penalty):
85
+ global house, vacuum_pos, q_table, max_steps_per_episode
86
+ rewards_per_episode = []
87
+ max_steps_per_episode = 200 #
88
+ episode_log = []
89
+
90
+ for episode in range(episodes):
91
+ reset_house() #
92
+ state = (random.randint(0, grid_size[0] - 1), random.randint(0, grid_size[1] - 1))
93
+ while not is_valid_state(state) or house[state] == 1: #
94
+ state = (random.randint(0, grid_size[0] - 1), random.randint(0, grid_size[1] - 1))
95
+
96
+ steps = 0
97
+ total_reward = 0
98
+ episode_info = []
99
+
100
+ while steps < max_steps_per_episode:
101
+ action = choose_action(state, epsilon)
102
+ next_state = get_next_state(state, action)
103
+ if is_valid_state(next_state):
104
+ reward = 1 if house[next_state] == 1 else move_penalty #
105
+ q_table[state[0], state[1], action] = q_table[state[0], state[1], action] + \
106
+ alpha * (reward + gamma * np.max(q_table[next_state[0], next_state[1]]) - q_table[state[0], state[1], action])
107
+ state = next_state
108
+ total_reward += reward
109
+ if reward == 1:
110
+ house[next_state] = 0 #
111
+ episode_info.append(f"Episode {episode}, Step {steps}: Robot found dirt at position {state}!")
112
+ steps += 1
113
+
114
+ rewards_per_episode.append(total_reward)
115
+ epsilon = max(epsilon_min, epsilon * epsilon_decay)
116
+ episode_log.append(f"Episode {episode} completed with total reward: {total_reward}")
117
+ episode_log.extend(episode_info)
118
+
119
+ fig, ax = plt.subplots()
120
+ ax.plot(rewards_per_episode)
121
+ ax.set_xlabel('Episode')
122
+ ax.set_ylabel('Total Reward')
123
+ ax.set_title('Total Reward per Episode during Training')
124
+ plt.savefig("training_rewards.png")
125
+ plt.close()
126
+
127
+ return "training_rewards.png", "\n".join(episode_log)
128
+
129
+ def simulate_robot(simulation_grid, iterations):
130
+ global house, vacuum_pos
131
+ house = np.array(simulation_grid, dtype=int)
132
+ vacuum_pos = tuple(np.argwhere(house == 3)[0]) if 3 in house else None
133
+ filenames = []
134
+
135
+ state = vacuum_pos
136
+
137
+ dirt_cleaned = 0
138
+ start_time = time.time()
139
+
140
+ for iteration in range(iterations):
141
+ action = choose_action(state, epsilon=0) #
142
+ next_state = get_next_state(state, action)
143
+ if is_valid_state(next_state):
144
+ state = next_state
145
+ #
146
+ if house[state[0], state[1]] == 1:
147
+ house[state[0], state[1]] = 0 #
148
+ dirt_cleaned += 1
149
+ draw_grid(house, state, iteration)
150
+ filenames.append(f'iteration_{iteration}.png')
151
+ time.sleep(0.1)
152
+
153
+ end_time = time.time()
154
+ total_time = end_time - start_time
155
+
156
+ #
157
+ images = []
158
+ for filename in filenames:
159
+ images.append(imageio.imread(filename))
160
+ imageio.mimsave('simulation.gif', images, duration=0.5)
161
+
162
+ #
163
+ for filename in filenames:
164
+ os.remove(filename)
165
+
166
+ metrics = f'Total dirt cleaned: {dirt_cleaned}\n'
167
+ metrics += f'Total simulation time: {total_time:.2f} seconds\n'
168
+ metrics += f'Average dirt cleaned per iteration: {dirt_cleaned / iterations:.2f}'
169
+
170
+ return 'simulation.gif', metrics
171
+
172
+ with gr.Blocks() as gui:
173
+ gr.Markdown("# Vacuum Cleaner Robot Simulation\n**Created by Prof. Ramon Mayor Martins, Ph.D. [version 2.0 07/07/2024]**\n\n0-Read the instruction, 1-Set up the environment, 2-train the robot vacuum cleaner and 3-simulate.")
174
+
175
+ with gr.Accordion("πŸ“‹ Instructions", open=False):
176
+ gr.Markdown("""
177
+ **1- Set up the environment**:
178
+ - Edit the grid: 0 = Empty, 1 = Dirt, 2 = Wall, 3 = Vacuum and Generate Environment.
179
+
180
+ **2- Train the robot vacuum cleaner**:
181
+ - Reinforcement learning: Q-learning to train the robot vacuum cleaner.
182
+ - Start Position Certification: Ensure that the robot does not start in a dirt or wall position.
183
+ - Dirt Cleaning: After finding dirt, the robot cleans it, updating the position to 0.
184
+ - Reduce Epsilon Decay Rate: This will allow the robot to explore for longer before it starts exploring less.
185
+ - Reset the Home State Periodically: To ensure that dirt reappears and the robot has new opportunities to learn.
186
+ - Check that the Robot is Not Stuck: A mechanism was add to ensure that the robot is not stuck in a cycle of invalid states.
187
+ - Epsilon decay: The decay rate (reduced to 0.999), will allow for more exploration.
188
+ - House State Reset: The house is reset every episode to ensure that dirt is present in each new episode.
189
+ - Increase the learning rate: Set the alpha to (e.g. 0.2) to see if it helps you learn faster.
190
+ - Increase the discount factor: Set the gamma to (e.g. 0.95) to give more value to future rewards.
191
+ - Add more randomness to the choice of initial state: This can help to vary training experiences more.
192
+ - Reduce the reward when encountering dirt: Reducing the direct reward can make the robot try harder to learn other parts of the environment.
193
+ - Add penalties for movement: Adding a small penalty for each movement can encourage the robot to find dirt more efficiently.
194
+ - Increase the variation of initial states: Starting from a greater variety of initial positions can help the robot explore more of the environment.
195
+ - Change the learning rate (alpha): If you notice that the robot is converging too slowly or too quickly, adjusting the learning rate can help.
196
+ - Add more dirt or obstacles: Adding more elements to the environment can make the problem more challenging and interesting for the robot.
197
+ - Test different exploration-exploitation (epsilon) policies: Experiment with different epsilon decay strategies to find a good balance between exploration and exploitation.
198
+ - Increase the number of episodes: In some cases, training for more episodes can help to further improve the robot's performance.
199
+
200
+ **3- Simulate**:
201
+ - New Simulation Grid: 0 = Empty, 1 = Dirt, 2 = Wall, 3 = Vacuum, set iterations (episodes/epochs) and simulate robot.
202
+ """)
203
+
204
+ with gr.Accordion("πŸ βš™οΈ Environment Configuration", open=False):
205
+ with gr.Row():
206
+ with gr.Column():
207
+ env_grid = gr.DataFrame(value=house.tolist(), headers=[str(i) for i in range(grid_size[1])], type="array", label="Edit the grid: 0 = Empty, 1 = Dirt, 2 = Wall, 3 = Vacuum")
208
+ generate_button = gr.Button("Generate Environment")
209
+ with gr.Column():
210
+ env_img = gr.Image(interactive=False)
211
+ generate_button.click(fn=update_grid, inputs=env_grid, outputs=env_img)
212
+
213
+ with gr.Accordion("πŸ€–πŸ”§ Vacuum Robot Training", open=False):
214
+ with gr.Row():
215
+ episodes = gr.Number(label="Episodes", value=2000)
216
+ alpha = gr.Number(label="Alpha (Learning Rate)", value=0.2)
217
+ gamma = gr.Number(label="Gamma (Discount Factor)", value=0.95)
218
+ epsilon = gr.Number(label="Epsilon (Exploration Rate)", value=1.0)
219
+ epsilon_decay = gr.Number(label="Epsilon Decay", value=0.999)
220
+ epsilon_min = gr.Number(label="Epsilon Min", value=0.1)
221
+ move_penalty = gr.Number(label="Move Penalty", value=-0.1)
222
+ train_button = gr.Button("Train Robot")
223
+ with gr.Row():
224
+ training_img = gr.Image(interactive=False)
225
+ episode_log_output = gr.Textbox(label="Episode Log", lines=20, interactive=False)
226
+ train_button.click(fn=train_robot, inputs=[episodes, alpha, gamma, epsilon, epsilon_decay, epsilon_min, move_penalty], outputs=[training_img, episode_log_output])
227
+
228
+ with gr.Accordion("πŸ€–πŸ“Š Robot Simulation", open=False):
229
+ with gr.Row():
230
+ new_simulation_grid = gr.DataFrame(value=house.tolist(), headers=[str(i) for i in range(grid_size[1])], type="array", label="New Simulation Grid: 0 = Empty, 1 = Dirt, 2 = Wall, 3 = Vacuum")
231
+ iterations = gr.Number(label="Iterations", value=50)
232
+ simulate_button = gr.Button("Simulate Robot")
233
+ with gr.Row():
234
+ simulation_img = gr.Image(interactive=False)
235
+ metrics_output = gr.Textbox(label="Simulation Metrics", lines=10, interactive=False)
236
+ simulate_button.click(fn=simulate_robot, inputs=[new_simulation_grid, iterations], outputs=[simulation_img, metrics_output])
237
+
238
+ gui.launch(debug=True)
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ gradio
2
+ numpy
3
+ matplotlib
4
+ imageio