This is a variation of the MADDPG algorithm enhanced with Graph Attention Layers on the critic networks
This algorithm has been used to solve the simple spread (Cooperative navigation) environment from OpenAI link. N agents, N landmarks. Agents are rewarded based on how far any agent is from each landmark. Agents are penalized if they collide with other agents. So, agents have to learn to cover all the landmarks while avoiding collisions. However, I modified part of the reward function to be able to increase the training performance (i.e. the agents receive +10 if they are near a landmark).
