Iron Galaxy Interview
In addition to explaining how delay-based and rollback solutions work, I wanted to get opinions and perspective from game developers who have worked with rollback on multiple projects for many years. I’m very fortunate to be joined by Ramón "krazhier" Franco, Iron Galaxy software engineer and networking mastermind, and Adam "Keits" Heart, Iron Galaxy designer. They’ve generously taken time out of their schedules to let me ask a few questions about rollback and generally discuss the landscape of netcode a bit. Thanks a lot for your time!
Keits: Thanks for having us! We’re big believers in rollback so we want to do what we can to help.
krazhier: Yeah, agreed. I could talk about this stuff all day if you let me.
Most people will know Keits from his work as combat designer on S2 and S3 of Killer Instinct, but others may not know that Ramon did hugely important work too, working on and improving KI’s implementation of rollback behind the scenes. But maybe it’s best to get a bit of history first. What was your first experience with rollback in online fighting games?
krazhier: As an online engineer on all of Iron Galaxy’s fighting games, I first experienced rollback in Third Strike Online Edition. I didn’t really know it was a thing until then. Coincidentally, that’s when I started playing fighting games in earnest. Before that, working on fighting games was just something you’d do as a job, or to button mash with friends. I have to give credit to my co-workers at IG, particularly Floe, for showing me how to like fighting games.
Keits: For me, it was back in the old-school days of the #capcom IRC channel. Tony Cannon, founder of GGPO, was a co-owner of the channel and he offered some people the chance to try out the closed beta, using Alpha 2 as the test game. The experience was so good. After years of tolerating bad online experiences in the early days of Xbox Live, I thought for the first time that the FGC was saved and an online future was possible.
You guys went from playing rollback games to designing and programming them. There’s so many different questions I could ask. Let’s start with perhaps the most obvious question. How would you assess how difficult it is to include rollback in fighting games?
krazhier: It’s definitely more work than delay-based, because to implement rollback, you have to solve the delay-based part first. I can take any rollback game I’ve ever worked on and turn off the prediction frames and rollback algorithm, and I now have a delay-based game. It’s also quite a bit more work if you choose to do it the way Mortal Kombat X did, retrofitting an existing game with rollback. Those guys did a fantastic job, but I imagine it was a big investment for them. If you plan for it in advance, it’s more manageable, but it’s still more work than delay-based.
Also, any time a new element is added to the game, it has the possibility of affecting your game state and making it so rollbacks could cause a desync. There’s no getting around that. When Keits comes to me and says "hey, I want these projectiles to be random and sometimes four of them come out and they loop around in odd patterns," these are now elements I have to make sure can be rolled back, including adding new elements to the code. I can train programmers to make sure their variables are registered with our rollback system, but it’s unavoidably more work. Even non-programmers like the QA team now need to incorporate rollbacks into their testing. For example, with Killer Instinct, our QA team tested with 20% packet loss and 150ms of latency always turned on. You should throw your router into the nearest garbage if you’re having that kind of packet loss, but we made sure KI could play acceptably well at that rate.
Rollbacks can also add some small handcuffs to the programming team and you have to be creative how you work around them. For instance, let’s say your game has a projectile that can be fired and then they disappear a short distance in front of you. If you need to roll back to a point when those projectiles existed, getting rid of them will add a huge performance debt. So instead, you should probably save the projectiles in a clever way and not actually destroy them until long enough in the future. Performance of these things (and many others) is a big deal. That thing you could have done in 16ms without even thinking twice now needs to fit into 1ms if you want to simulate it 10 times in a single frame.
I’m not trying to make it sound like the end of the world, though. Despite it being more work, all these things have solutions and good rollback games do them. My opinion is that it’s a super worthwhile investment.
There are some games that let you choose the input delay before rollbacks kick in, while other games hide that option from the user and always choose the same delay for each player. What’s your take on these two options? Is one better than the other?
krazhier: Killer Instinct is one of those games that doesn’t let the user choose, it always adds 3 frames of delay. It gives us 45ms for the input to get to the other side. The reason this is nice is because it means pings can be 90ms before you start having any rollbacks occurring at all. And even if you have 1 or 2 frames into the rollback threshold, it’s pretty invisible to players. I personally can’t tell when 1 frame of rollback is happening. So that’s why I personally like the 3 frame delay approach.
In some of the early rollback games Iron Galaxy did, we let the users set the delay. And this would of course mean that people would automatically set the delay to 0, because 0 is a nice number and there’s no reason their supposedly good internet couldn’t handle it, right? But then that means every time he pressed a button, the other user would experience a rollback, because, at 0 frames with a proper timesync, when the other user gets your input, it will always be in the past. That said, I know people who preferred the Darkstalkers Resurrection version we did, which let the user set the input delay, because they always just played in lobbies with local friends to practice for tournaments. And that’s a fine preference to have. But as a whole, the studio has to think about the connection quality for the average player. Three frames of rollback every time you press a button can be a little challenging for the non-rollback aware consumer.
Keits: I agree that letting users set it manually is difficult, because they will always choose 0 and things will look worse, especially since most players do not understand what the setting really means. A delay of 3 frames gives you 90ms with basically nothing noticeable, which means for players in North America, most of the continent is a possible opponent. 150ms ping is then possible with just a couple frames of rollback occurring, which increases the range to a pretty large chunk of the world. That same 150ms ping connection would be hilariously unplayable in delay-based netcode.
If you’re going to not let the user choose the delay, would it ever be desirable or possible to get the game to detect the connection quality, then automatically lower the delay to 1 or 2 frames?
krazhier: Having something that decides 1 frame of delay at the beginning of the match means that, halfway through the match, you can all of a sudden be playing at a higher latency and the delay won’t adjust. Then you’d end up with significantly more rollbacks, and changing the input latency on the fly is what we’re trying to avoid from delay-based approaches. I also think it’s not entirely clear whether playing 10% of your matches at 2 frames of delay, but 90% of your matches at 3f of delay, is even that valuable. I think consistency across all games is possibly more important.
Keits: You can also do things to make the game quite a bit more playable even with 1 or 2 extra frames of delay, like increase the size of the buffers to make reversals easier and combos easier to link. I think the number of opponents you can play at 1-2 frames of delay and have it be steady is very tiny. Even people who live 30 minutes away from you often can’t maintain that good of a connection with you. Just think about how rarely the delay stays at a steady 1 frame when you play Guilty Gear online. (2022 Edit: This article was written before Guilty Gear Strive was released, and before Arc System Works retrofitted their older Guilty Gear titles with rollback netcode. - Infil) So I think the people who say "in delay-based games, I get to play at 1 frame of delay but I’ll never get below 3 in these rollback games" are being a bit dishonest with how things truly work in real conditions.
Seems like a good time to branch into how the designer can make smart design choices, knowing you are building for a rollback game. I’m sure you’ve thought about this quite a bit Keits.
Keits: Maybe the first point to talk about would be the startup of moves. Because rollbacks happen only around your opponent’s button presses, it means when a rollback occurs, it always cuts off some part of the startup. This means having moves with slower startup makes it easier to hide rollbacks. For example, 3 frames missing off the front of a 10 frame jab is way less noticeable than 3 frames missing off the front of a 3 frame jab. I think it’s a decent argument against having 3 frame normals in a modern fighting game, actually. Killer Instinct’s fastest normal hits on the 5th frame, and the speed of the game doesn’t suffer for it. The typical Tekken jab is 10 frame startup, which I think makes Tekken a great candidate for rollback.
This goes beyond just normals too. If you take a game like Marvel vs Capcom Infinite, which uses rollback, you’ve got character like Chun-Li and Spiderman using air dashes that travel extremely far forward during their first few frames, so when a rollback occurs, these characters just warp all over the place and the player’s frustration goes way up. You don’t have to sacrifice the speed of Marvel, either, you just have to consider how you animate these startup frames. What if the airdash had 2 or 3 frames of stationary or slower moving startup with the same total duration? Your game plays almost identically but the visual artifacts of rollback are way reduced.
Divekick is another example of this. We implemented GGPO very well in that game, but I didn’t know how to design the game with rollbacks in mind, so the startup of all jumps and all divekicks is instant. There’s no prejump or anything, so the teleporting can get a little wild in that game with the faster characters. If I had added just 2 or 3 frames to the startup of jumps and divekicks, the game would feel and play mostly the same but look vastly better online.
In general though, designers just need to think about how rollbacks will look in their game. One of the things people don’t like about rollback is when a move looks like it hit, but then rolls back into being blocked and you get yourself killed. Firstly, I should say that a game with extremely tight hit confirms like this would play extremely poorly in delay-based netcode, so the answer is not to avoid rollbacks. But you can design your game such that hit confirms right on the edge of offline human reaction into a -20 on block move aren’t required to play the game. This could manifest itself in all sorts of ways, like slightly bigger cancel windows, how safe certain special moves are, how much damage you can get off these confirms, and how good your defensive options are. If you design the game such that moments around rollback are pivotal every 5 seconds, you’re not playing smart with the system and you’re increasing player frustration.
Why do you think fighting games are always peer-to-peer and never a client-server architecture like so many other genres? Does rollback not play well with a server-based system? Should fighting games switch?
krazhier: Well first of all, there’s no reason a server can’t act as an intermediary just to send inputs to each player. I believe Brawlhalla uses such a system, and Street Fighter V will sometimes fall back on using a server between players if they can’t connect directly to each other. But I think the main two reasons are latency and cost. Direct peer-to-peer connections means your message gets to the other client directly rather than routing through a server, and you also don’t have to pay costs to keep a server up and running.
As for why we do lockstep as opposed to server-client topology, I think either is possible in theory. But it’s been my experience that lockstep works nicely for fighting games because the only variable is player input. So if you build a deterministic simulation, you can derive game state off input alone rather than needing to do a lot of work on client-side prediction the way that shooters do. Some older fighting games actually did host-client architecture instead of lockstep and those games had differences in online vs offline play.
Keits: The important thing about lockstep is that everybody is simulating the exact same thing. If one person’s PC runs the result of a hitbox collision slightly faster, or differently due to floating point error for example, then you aren’t lockstep and you’d need a server to track the "real" game state. In a server-client system, when you see my character, you are seeing a ghost of my character’s past, instead of waiting for inputs to be sure we are synced. I do something, then the server finds out and verifies, and then sends it to you and then you see it -- it’s not actually synced.
In Overwatch for example, if I see you running towards a corner as I aim at you, I shoot you just before you make it around the corner. But actually on your side, you had already passed the corner. Now, should you take damage? My shot technically went to a place you were not, and you were behind a wall that should stop shots, but on my screen it was a clean hit. There are dozens of tricks games use to make that feel not as bad in shooters, but shooters don’t have the same type of hit reactions that fighting games do. If you take a bullet behind a corner for some small damage, you don’t lose agency or control over your character and it’s more tolerable. But if you take a Ryu heavy punch "behind a corner", you can clearly see it didn’t hit and then you are being comboed in a normally impossible situation. This would feel really frustrating to play. If you wanted to make server-client work for a fighting game, you’d have to do a lot of weird stuff to built the whole game around those limitations.
The prediction model in fighting games simply duplicates the last known frame until input is received. Should we try to improve our prediction model?
krazhier: I’ve thought about this a bunch. Fighting games in general are pretty great in that, more often than not, you are pressing (or not pressing) the same button you pressed the frame before. When you do a quarter circle forwards, you might press down for 2 frames, then down-right for 2 frames, then maybe right for around 4 frames. Now, there’s a world where we could maybe guess "hey, he’s going to do a quarter circle". But what about those times where we only sit on each input for 1 frame in a row? It just makes it so that predicting to this degree is not something that makes a ton of sense.
I imagine it’s considerably worse to predict he will throw a fireball, then have it rollback when your guess was wrong, instead of just waiting until the fireball is actually thrown and then rollback the startup by 1 or 2 frames?
krazhier: I would imagine you’d be wrong considerably more often than right if you went the approach of actually trying to guess the end result before it happens. And I think a solution that mispredicts with a rate of even 10%, and then has to rollback buttons that never happened and the player wasn’t even thinking of pressing, would feel so much worse than the current prediction model that it’s probably not worth it.
How does rollback work with spectators or more than 2 players? Is the model robust to adding more players?
krazhier: Rollback can work fine with more than 2 players. Iron Galaxy actually shipped a port of a 4-player dungeon crawler named Dungeons & Dragons: Chronicles of Mystara that uses GGPO and it works fantastic. As you would expect, with more players you’ll have more opportunities to roll back and you’ll have to send more data across the network. Eventually you might hit a limit where there are too many players and the bandwidth hit is too big, but the limit is definitely not 2 players.
Keits: As for spectators, they don’t have to see the game truly "live", they can see it 10, 20, or 30 frames in the past and it’s still close enough to live. This means that their delay isn’t hyper sensitive and it doesn’t put the burden on the players to have to send their inputs not only to each other, but also to 8 spectators with the same type of urgency.
krazhier: You could, for example, bundle and then compress all the inputs for the last 30 frames and designate one player to send it to all the spectators. Because it only happens every 30 frames and the inputs can be compressed, the bandwidth hit isn’t that high. You could also designate a server to do this, if you’re willing to incur the cost. But in theory, spectators shouldn’t have to roll back. They are just running on a delay-based system with very large delay, since they should always have the input they need to run the simulation. And if a packet loss happens, the game doesn’t have to kill itself to get the inputs there immediately. Because you’re running 30 or more frames behind, you can just resend the packets 10 frames later or whatnot and it usually works out just fine.
I think all of us could talk about these topics all day but perhaps it’s best to start summing up some final thoughts. Do you guys have any general thoughts on rollback and its use in games in general?
krazhier: I can tell you that if it wasn’t for GGPO, I would not be the proponent of rollback that I am today. It’s funny, because when I first saw rollbacks in action, my reaction was "duh, why isn’t everyone doing it??" And then when Iron Galaxy first did it on 3rd Strike, it was like "wow, this is just magic and it works". So I think GGPO is incredibly obvious in hindsight, and that’s maybe the coolest thing about it. I worked on lockstep games before coming to Iron Galaxy, like Blitz the League and NBA Ballers, and I would have never thought to do what GGPO does. That’s why I love GGPO and rollback solutions, not only as a gamer but also what it’s changed in my career.
We’ve been doing delay-based lockstep networking architecture for 25 years. Quake came out and we switched to a server-client architecture where the server tells everyone what they should be seeing and everyone interpolates it in order to make it look smooth. When this happened, lockstep networking kind of stopped evolving. It’s my opinion that rollback is the next step for lockstep networking and, just like every other discipline in video game development, it’s time for lockstep implementations to evolve. The time investment is just super worth it to me.
Keits: I don’t think there’s anything more valuable to online fighting games than good netcode, so to me it’s a no brainer that devs should prioritize rollback in their development. You need to change a bit about how your game is built, since you now need to manage your game state differently and be able to both load it and simulate it very fast, but the end result is being able to confidently tell your fans that they can enjoy your game coast to coast and even be able to play most matches overseas. It’s the confidence that says "you can run online tournaments and they won’t be a joke, the results will be valid."
Can rollback improve at all? What possible advancements in rollback should developers be experimenting with over the next decade?
krazhier: I think the main two ways to improve are implementing ways that can reduce the number of rollbacks needed, and improving the per-frame simulation time. There are still unexplored tricks for these two things. Exploring other genres besides fighting games is another possibility. It’s cool that games like For Honor, a third-person action game, have implemented rollbacks, but we can do more here. For example, I’d love to do a rollback sports game.
It would also be cool to see rollbacks go beyond the small scope and enter more of a hybrid approach where we can use frame-lock rollback simulation for "significant" actions (that is, actions that are close to you), and use more traditional server authority architecture for events further away from you. Can you imagine an MMO action game that used rollback deterministic solutions for close combat gameplay, but could use the advantage of having a server to do the heavy lifting for the rest of the world? These would be fun problems to explore.