Text-To-Games AI Agent - Is It Possible?

Discover the AI model from Google Deepmind that can make games from text, sketches, and static images.

April 08, 2024

I rarely introduced AI tools that have yet to be released to the public, as they tend to be overhyped. However, I came across a research paper by Google that is genuinely intriguing and exciting, so I can’t help but share this one.

Can you imagine being able to create video games from text? The potential to bring your creativity to life is mind-blowing and feels unrealistic. But it might be a reality soon with Genie, Google DeepMind’s new generative model that can create Super Mario–like games from scratch.

In today’s rundown:

Targeted Audience: Video Game Creators, Artists.
Genie.

Read time: 1.5 minutes.

Genie

Overview: If Sora is Text to Video, Genie is Text to Video Game. Google Deepmind's new AI model, Genie, can create playable game worlds in 2D style—all from a single image prompt, sketch, or text description.

Capabilities: Genie can take a short description, a hand-drawn sketch, or a photo and turn it into a playable video game in the style of classic 2D platformers like Super Mario Bros.

How does Genie work?

Initial Data Processing: Genie begins by analyzing a vast dataset comprising 200,000 hours of publicly available internet gaming videos. This dataset is meticulously filtered down to 30,000 hours of video footage from hundreds of different 2D games, providing a diverse range of game mechanics and visuals for Genie to learn.
Tokenization and Model Training: The individual frames from these selected videos are then tokenized, breaking down the complex visuals into a format that a machine-learning algorithm can understand and work with. This process results in a model with approximately 200 million parameters.
Latent Action Model Creation: Using this tokenized data, Genie develops a "latent action model" that predicts the types of interactive actions (such as button presses) that could produce the frame-by-frame changes observed across the game footage. This model constrains itself to eight possible inputs to ensure the generated game environments are human-playable.
Dynamics Model Generation: Armed with the latent action model, Genie constructs a "dynamics model." This model can take any number of arbitrary frames and latent actions to predict what the next frame should look like based on any given input. This extensive model consists of 10.7 billion parameters trained on 942 billion tokens.
Interactive Environment Synthesis: Genie can generate new, interactive game environments from either static starting images or text prompts by utilizing the latent action model and the dynamics model. These environments are visually coherent and respond to simulated player inputs, allowing for the exploration and interaction within these AI-generated worlds.

Limitations

The developers of Genie have mentioned that the current version of the program has a few limitations. For instance, it generates environments at a slower frame rate of one frame per second, which is deemed unplayable in real time. Moreover, sometimes Genie can produce unrealistic scenarios in its generated game environments, which is a common issue among AI models known as "hallucination."

Status

The model is not open to the public yet, but please keep an eye on it, as this can be another AI that will break the internet.

Check out the full research paper here, including different demonstractions 👉️ link.

Read below for more demonstrations of Genie’s capabilities.

Gennie 2D Game Demo

Genie generates static image to game

Input: static image.

Genie-generated output

Input 2: static image

Output 2

Let me know what other tool you want to be reviewed 🧑‍💻 at [email protected]

Do you find this review helpful?

That’s a wrap! 🌯

Thank you for reading ❤️I hope you will find these insights useful. Please contact us at [email protected] if you have suggestions, feedback, or anything!