Combining planners and LLMs to power groups of characters
Welcome to our Substack. This is our first post. We will be using this space to share our journey and thinkings as we build Bitpart.ai.
Our team at Bitpart is creating character technology to bring game worlds to life. More specifically, we are building a multi-agent planning AI engine for games. Our AI engine controls Actions for groups of characters, where actions are a fluid mix of dialogue and physical actions. Below we share a couple clips of a prototype we built called Prompt-to-Action, and this Substack post gives a bit of background on the prototype.
In order to support open-ended adaptive behavior, our hypothesis is that this problem can be solved with scale — by scaling up the variety of dialogue, actions, and action sequences by 10X (or more). Supporting behavior at scale requires changes to the runtime, but is also a significant content authoring problem.
At GDC 2024, I gave a short 7-minute overview of how we create planners from example narratives, as a segment within the Simplest Trick in the Book session at the AI Summit. The trick is to collect enough example narratives. In my PhD research, I collected narratives by recording humans role-playing in multiplayer games. Today in 2024, LLMs are an alternative means of creating a lot of example content.
We use LLMs at design time. Why don’t we use LLMs directly at runtime? That’s a Substack post for another day. Let’s focus here on how LLMs can assist designers in creating lots of content.
It’s easy to prompt LLMs to generate dialogue that can be executed by characters in a game. Why is it harder to generate sequences that mix actions and dialogue that can be executed coherently in a game? Because actions are hard.
There is a lot to consider to execute an action correctly in a game. Character position, facing, IK, head look, and animation. Shape, scale, orientation, and state changes of an object being acted upon. Who is allowed to take which actions? Etc, etc. Spoiler alert – solving all of these problems is not going to happen soon. Essentially, solving all of these problems will require imbuing characters with human-level common sense reasoning — a worthy long-term goal, that we are pursuing, but will not be fully achieved near-term. Thus, if we want to generate narratives that include physical actions, the choice is either to describe actions in text narration the overlays the game, or to constrain actions to a pre-agreed to repertoire.
Over the past year, I’ve seen developers try different approaches to include actions in generated gameplay. Below are a couple of these. In some cases, actions are executed via traditional game AI approaches which provide context to an LLM for generating only the dialogue. In other cases, an LLM is used to generate everything, and accepting the reality that not everything will be able to be expressed with body language.
Our approach at Bitpart does let the LLM generate both actions and dialogue. However, we encourage the LLM to stay within a pre-agreed to repertoire. This contract is communicated via an HTN planning domain. The planning domain describes the roles, props, actions, and state changes that the generated narrative should abide by.
Long-term, the generated narratives are part of a feedback loop: we use an HTN planning domain to constrain the LLM generated output; and these generated example narratives provide content to expand the HTN planning domain with additional Tasks and Methods.
We prototyped a system called Prompt-to-Action that generates narratives from prompts that can be executed in a Unity 3D environment. When we use the term Action, we mean two things: First, we mean a performance, like “Lights, Camera, Action!” Second, we mean physical action in the environment — characters walking around, sitting on furniture, picking up props and using them in different ways, and using body language to interact with each other.
We are LLM-agnostic, but this particular prototype uses GPT-4. We also use ElevenLabs for AI voices. While the ultimate goal is to use these example narratives to expand and HTN planning domain, it turns out that viewing these generated narratives directly can be quite entertaining. While we continue to mull over a 24/7 Twitch stream of LLM improv, for now we’ll leave you with a couple sample runs of our Prompt-to Action prototype.
Take a look at some Prompt-to-Action results on youTube.