The list of games humanity has lost to artificial intelligence (AI) is substantial and growing and includes; chess, Go, Ms Pac Man and, most recently, StarCraft II.
However, until now, one game AI has struggled with was the 1980’s Atari game Montezuma’s Revenge.
In Montezuma’s Revenge, which is delightfully named after travellers' diarrhea (no really), players guide Panama Joe through rooms in a labyrinthine Aztec pyramid. Due to its layout and 2D side-scrolling nature, the game is part of the “Metroidvania” sub-genre (named after the widely influential Metroid and Castlevania).
The last major AI attempt to beat the game was in 2015 when Google’s DeepMind failed to learn a path to the first key after more than a month of experience.
But now a new method developed at RMIT University in Melbourne, Australia, has smashed DeepMind, identifying sub-goals 10 times faster to finish the game.
Associate Professor Fabio Zambetta from RMIT University unveiled the new approach last Friday at the 33rd AAAI Conference on AI in the United States.
He developed the method along with Professor John Thangarajah and Phd candidate Michael Dann.
According to their paper, DeepMind struggled in adventure games because, prior to finding some reward, the AI sees no incentive to favour one course of action over another.
In response, the AI will choose actions at random. While this may work fine in a game with a large number of rewards such as Video Pinball – a game DeepMind did well at in 2015 – it fairs worse in a “sparse reward” environment.
“[T]he agent can become stuck in a “chicken-egg” scenario, where it cannot improve its policy until it finds some reward, but it cannot discover any rewards until it improves its policy,” the paper said.
RMIT’s method, which they call “pellet rewards”, works by combining “carrot-and-stick” reinforcement learning with an intrinsic motivation approach that rewards the AI for being curious.
“Not only did our algorithms autonomously identify relevant tasks roughly 10 times faster than Google DeepMind while playing Montezuma’s Revenge, they also exhibited relatively human-like behaviour while doing so,” Zambetta says.
Zambetta gives the example of the first screen of the game, which requires players to execute “sub-tasks" such as climbing ladders, jumping an enemy and picking up a key before continuing to the next room.
“This would eventually happen randomly after a huge amount of time but to happen so naturally in our testing shows some sort of intent,” he said. “This makes ours the first fully autonomous sub-goal-oriented agent to be truly competitive with state-of-the-art agents on these games.”
So who cares about AIs beating each other in video games?
Zambetta says that equipping AI with the ability to cope with ambiguity while choosing from an arbitrary number of possible actions is an important step in the AI’s evolution. In addition, the pellet reward system will work outside of the video games in a wide range of tasks if provided the appropriate data.
“It means that, with time, this technology will be valuable to achieve goals in the real world, whether in self-driving cars or as useful robotic assistants with natural language recognition,” he says.
AI can be loosely lumped into three broad categories – Narrow, which excels in specific tasks, General, which will have human-level intelligence, and Super, which will surpass us in ways we can’t even imagine.
At present – and some may consider this a good thing – we are still very much at the point of creating ANI. But many data scientists believe that the path towards AGI will be forged through research into “reinforcement learning” which is the area Google and RMIT explore in research such as this.
In one famous thought experiment, an AGI designed to collect paperclips inadvertently destroys humanity as a result of its prime directive.
“The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else,” Eliezer Yudkowsky of the Machine Intelligence Research Institute said in a paper that contemplated AI as a positive or negative factor in global risk.
The point here is not to fear monger but to emphasise that we are still very early in our AI journey. The gap between ANI and AGI could be 10 or even 20 years.
Looking further ahead, we don’t what will happen when we get to ASI and some believe we may not ever see it as experts’ estimate a range from “within 25 years” to “never”’.
Regardless, AI is shaping up to permeate every area of our lives, including customer experience (CX) as chatbots become smarter. Fifth Quadrant has identified AI as a key CX trend moving into 2019 and recently spoke with experts to get their opinions. Read more about how AI is impacting CX here.