2011 Tsunami, Japan. Photograph by Koji Ueda
When a natural disaster strikes, rescue personnel spend weeks searching through dangerous rubble to locate survivors.
I'm Arya, and with my research partner Alan, I'm working in the TJ Computer Systems Lab to create a future in which drones can do the dangerous work. I want to teach drones how to swarm so they can search the rubble for us.
A swarm of drones would be cheaper, faster, safer and more adaptable than human personnel, which would make them ideal for disaster relief situations.
We apply reinforcement learning algorithms to train the drone swarms. For this we create a reward function for each characteristic that we’d like the drones to learn, such as how to get from where it is to an arbitrary point B. This allows us to formally define the characteristics of a swarm.
We apply control theory to guarantee stable behaviour. This lets us mathematically define the system and prove that the algorithms will have consistent, reproducible actions.
We have set regular milestones, starting with basic goals (eg. a drone hovering), all the way up to swarms of drones that can avoid moving obstacles.
We began by building two physics simulators, to test which control structure worked better with our PPO-Clip scripts. We then tested the simulators by training a drone to hover at a specified height. Only one simulator (CFfirm, right) was able to train a drone successfully.
Gazebo Simulator attempting to train hover PPO scripts
CFfirm Simulator demonstrating a successfully trained hover PPO script
With the simulation infrastructure in place, we moved on to more complicated tasks, like moving from A to B:
In this task, we trained a drone to travel from its current position to an arbitrary point B.
We began by designing a reward function, that would tell the PPO scripts how well it was performing the tasks. We built a function that increases the reward as the drone gets closer to the target. It is also penalized for movement, to encourage stabilizing at the target point.
While working on this goal, we encountered several obstacles. For example, PPO was not converging stably, so we switched to PPO-Clip. We also multithreaded our scripts for speed, and refactored our code base (link).
Drone successfully completing A (blue) to B (red) task in simulation.
Drone successfully completing DynamicObstacle task in simulation.
We then added obstacles, creating the DynamicObstacle task.
The largest challenge we faced here was how we wished to model obstacles. We wanted the drones to be able to work with any number of objects, in any possible orientation. We ended up using a Particle Field Controller, or PFC. PFCs model each obstacle as a positively charged conductor, which create repulsive fields. These repulsive fields can be used by the drone to determine which obstacles to avoid, and when.
While working on this goal, we found that we were able to drastically simplify our reward function by varying training more actively. We determined there was a tradeoff between tuned rewards, and varied training. Later, we realized that this was consistent with Google Deepmind's results (link)
Once we'd achieved all that we wanted to with one drone, we set our sights on working with swarms:
We tried to keep the first task relatively simple, with the modest goal of "don't crash". Unsurprisingly, this proved to be quite boring. Drones would simply hover in place, and never move.
So we changed the goal. Starting each drone in a random location, we asked the drones to move towards the origin as quickly as possible, without crashing into each other.
Drone completing StaticSwarm task in simulation.
A hundred drones completing the SwarmSwap task.
Once StaticSwarm trained successfully, we breathed a sigh of relief, knowing that all the tricks we'd learnt with one drone, like diversified training and movement penalties in reward functions, would all transfer to a swarm. We worked our way through each task, from StaticObstacleSwarm to DynamicObstacleSwarm.
What we were looking for though, was a more powerful demonstration, and we found it in SwarmSwap. In this task, two swarms of drones start in opposite locations, with obstacles between them. They then swap places, all without crashing into each other, or the obstacles.
The Computer Systems Lab supports studies in applied computational science, computer architecture, artificial intelligence,
and supercomputing. Working in a UNIX environment with full Internet access, students are able to investigate a broad range
of research topics which emphasize high performance computing and graphics visualization techniques.
Alan and I are working under the supervision of TJCSL lab directors Mr. Patrick White and Dr. Peter Gabor, respectively.