64Digits

Multithreading Problems in Game Design

Posted by blackhole on May 24, 2012, 1:53 a.m.

Originally posted on my blog

…

A couple years ago, when I first started designing a game engine to unify Box2D and my graphics engine, I thought this was a superb opportunity to join all the cool kids and multithread it. I mean all the other game developers were talking about having a thread for graphics, a thread for physics, a thread for audio, etc. etc. etc. So I spent a lot of time teaching myself various lockless threading techniques and building quite a few iterations of various multithreading structures. Almost all of them failed spectacularly for various reasons, but in the end they were all too complicated.

I eventually settled on a single worker thread that was sent off to start working on the physics at the beginning of a frame render. Then at the beginning of each subsequent frame I would check to see if the physics were done, and if so sync the physics and graphics and start up another physics render iteration. It was a very clean solution, but fundamentally flawed. For one, it introduces an inescapable frame of input lag.

Single Thread Low Load FRAME 1 +—-+ | | . Input1 -] | | |[__]| Physics |[__]| Render . FRAME 2 +—-+ INPUT 1 ON BACKBUFFER . Input2 -] | | . Process -]| | |[__]| Physics . Input3 -] |[__]| Render . FRAME 3 +—-+ INPUT 2 ON BACKBUFFER, INPUT 1 VISIBLE . | | . | | . Process -]|[__]| Physics |[__]| Render FRAME 4 +—-+ INPUT 3 ON BACKBUFFER, INPUT 2 VISIBLE Multi Thread Low Load FRAME 1 +—-+ | | | | . Input1 -] | | . |[__]| Render/Physics START . FRAME 2 +—-+ . Input2 -] |____| Physics END . | | . | | . Input3 -] |[__]| Render/Physics START . FRAME 3 +—-+ INPUT 1 ON BACKBUFFER . |____| . | | PHYSICS END . | | |____| Render/Physics START FRAME 4 +—-+ INPUT 2 ON BACKBUFFER, INPUT 1 VISIBLE

The multithreading, by definition, results in any given physics update only being reflected in the next rendered frame, because the entire point of multithreading is to immediately start rendering the current frame as soon as you start calculating physics. This causes a number of latency issues, but in addition it requires that one introduce a separated "physics update" function to be executed only during the physics/graphics sync. This is a massive architectural complication, especially when you try to put in scripting or let other languages use your engine.

There is another, more subtle problem with dedicated threads for graphics/physics/audio/AI/anything. It doesn't scale. Let's say you have a platformer - AI will be contained inside the game logic, and the absolute vast majority of your CPU time will either be spent in graphics or physics, or possibly both. That means your game effectively only has two threads that are doing any meaningful amount of work. Modern processors have 8 cores, and the best one currently available has 12. You're using two of them. You introduced all this complexity and input lag just so you could use 16.6% of the processor instead of 8.3%.

Instead of trying to create a thread for each individual component, you need to go deeper. You have to parallelize each individual component separately, then tie them together in a single-threaded design. This has the added bonus of being vastly more friendly to single-threaded CPUs that can't thread things (like certain phones), because the parallization goes on at a lower level and is invisible to the overall architecture of the library. So instead of having a graphics thread and a physics thread, you simply call the physics update, then call the graphics update, and inside each physics and graphics update you spawn enough worker threads to match the number of cores you have to work with and concurrently process as much stuff as possible. This eliminates latency problems, complicated library designs, and it scales forever. Even if your initial implementation of concurrency won't handle 32 cores, because the concurrency is encapsulated inside the engine, you can just go back and change it later without ever having to modify any programs that use the graphics engine.

Consequently, don't try to multithread your games. It isn't worth it. Separately parallelize each individual component instead and write your game engine single-threaded; only use additional threads for asynchronous activities like resource loading.

Comments

LAR Games 11 years, 11 months ago

This was an interesting read. Thanks.

Mordi 11 years, 11 months ago

Nice find. If I understand this correctly, this means that all threads are started and finished each frame, right?

JuurianChi 11 years, 11 months ago

LOL, "cool kids".

Multithreading might not work for actual gameplay, but what about ingame engine cutscenes? :P

svf 11 years, 11 months ago

You most certainly make the best posts.

I think the majority of people here (who use gamemaker) don't care/know about threads, though. :(

@JuurianChi, and loading resources!

blackhole 11 years, 11 months ago

There's actually a lot of non-game-logic related things that can be multithreaded quite easily (like resource loading and certain other deferred tasks that aren't realtime), but the underlying idea is what Mordi said - all concurrent processes are evaluated and completed each frame, one after another. Obviously in a realistic scenario you would work from a pool of worker threads to avoid the overhead of actually starting a thread.

Mordi 11 years, 11 months ago

Nice. I can see why this is effective, and yet kept relatively simple, although I would imagine that a more complicated approach would be even more efficient. Depends what you need, I guess.

blackhole 11 years, 11 months ago

Well I've been repeatedly told off by people who insist that you should use encapsulated future objects in a job queue to enforce pseudo-concurrency, but I don't think they understand the absurd number of interdependencies going on in a game, which makes it almost completely impractical.

Fuck you chrome interdependencies is a word.

ludamad 11 years, 11 months ago

+1 on using threads for asynchronous actions.

1013Games 11 years, 11 months ago

The problem with the "deeper" solution is that it takes a significant amount of time to create a thread, and any performance gained by using worker threads would be lost. Better to create the threads at the beginning of the program. However, such a solution is only really appropriate for a game somewhere in the order of magnitude of RAGE (EDIT: or mobile gaming with multi-cores). For multi threading a game you will have to encorporate blocking, something like this, assuming physics is directly rendered to screen:

Multi Thread Medium Load FRAME 1 +—-+ . Input1 -] |____| . | | Physics STARTS . | | Physics ENDS . |[__]| Render STARTS AND COMPLETES. Physics STARTS . |____| . FRAME 2 +—-+ . Input2 -] |____| . | | Render and game loop enters WAIT (Physics Blocking) . | | Physics ENDS. . |[__]| Render START AND COMPLETES BLOCKED. Physics STARTS The benefit would come by having a second core running physics while other game logic code runs asynchronously.

blackhole 11 years, 11 months ago

Quote: blackhole

Obviously in a realistic scenario you would work from a pool of worker threads to avoid the overhead of actually starting a thread.