How to wait for multiple threads

Jjch · May 16, 2023

I have a bunch of independent tasks to process in separate threads. There is no data shared between threads, and each thread stashes its result in a dedicated variable. Once all threads are done, I would like to use the results to update the game state in a sequential manner, one by one.

Am I on the right track here?

extends Node

@onready var thread0: Thread = Thread.new()
@onready var thread1: Thread = Thread.new()

func _ready():
	var result0 = thread0.start(Callable(self, "_do_something").bind(some_data))
	var result1 = thread1.start(Callable(self, "_do_something_else").bind(more_data))
	
	# Wait for all threads to complete. All threads should be done 
	# computing before proceeding any further.
	while (thread0.is_alive() or thread1.is_alive()):
		pass
	
	# Do something with result0 and result1.
	update_state_one(result0)
	update_state_two(result1)

func _do_something(data):
	# Process data.
	call_deferred("_done_something", result)

func _do_something_else(data):
	# Process data.
	call_deferred("_done_something", result)

func _done_something(result):
	thread.wait_to_finish()
	return result

xyz · May 16, 2023

jch If you make an infinite while loop in _ready() your execution will freeze there and nothing else in the main thread will run. It's almost the same as not using threads at all. The way you're doing it is equivalent to calling those two _do_something functions directly from _ready(). There's a bit of difference though as threads themselves will run concurrently but the execution of the main thread will be blocked for the time needed for slowest thread to finish its work.

After starting the threads, you want execution of the main thread to resume normally.

xyz · May 17, 2023

Here's an example. It starts 100 threads each running for 1-5 seconds. When a particular thread is finished it increments the counter. Once the counter reaches total number of threads we know we're done and the last live thread calls done() which disposes all thread objects and can be used to consolidate generated data. Note that while all this is happening the main thread operates without any blockages. For all nodes in the scene tree the _process() callback runs normally each frame.

extends Node

var threads_finished = 0
var threads = []
var mutex = Mutex.new()

func _ready():
	for i in 100:
		threads.push_back(Thread.new())
		threads[-1].start(thread_func.bind(randf_range(1.0, 5.0)))
		
func thread_func(workload): 
	await get_tree().create_timer(workload).timeout
	mutex.lock()
	threads_finished += 1
	print("Thread ", threads_finished, " done")
	if threads_finished == threads.size():
		call_deferred("done")
	mutex.unlock()

func done():
	print("All finished, destroying thread objects")
	for t in threads:
		t.wait_to_finish()
	threads.clear()
	print("Using thread generated data")

DaveTheCoder · May 17, 2023

@"xyz"

Are you a core developer posting incognito?

Megalomaniak · May 17, 2023

DaveTheCoder Are you a core developer posting incognito?

Pretty sure this is fairly generic knowledge that any experienced developer should have, but who knows. Maybe?

Jjch · May 17, 2023

xyz Thanks! Let's see if I understand:

In each thread, the call_deferred() is only called if the thread_finished counter is at max, which means done() is only called when all threads are done.
Whatever code I want to run on these combined results goes into done(). And I don't have to call done() anywhere in the main thread because it is already called inside ~~each~~ the thread that finishes last, via call_deferred().
Why exactly do I need the mutex there? No actual workload data is shared between the threads. Is it for the purpose of updating the counter? But then the call_deferred() is inside it as well. ~~This part of the code inside the mutex is confusing me. Could you go over it step by step please?~~

EDIT: Hm, actually done() would be called exactly once when whichever happens to be the last thread finishes. So I think I understand this now, but still, if you can sanity-check my rambling, would be much appreciated!

xyz · May 17, 2023

DaveTheCoder Are you a core developer posting incognito?

Lol, no, nope. I'd gladly disclose if that was the case. And while we're at it, I don't have any ambition to contribute. As @Megalomaniak pointed out, it's generic thread code, not even Godot specific for the most part. But I'm not gonna go on the record saying it's simple, as people these days tend to get offended by such statements. Paradoxically it also happens when you say something is not simple. So... this is some common thread stuff that just is

xyz · May 17, 2023

jch
I think you get it.

Yes. As you already concluded done() is called only once when the last thread is about to finish. It is a deferred call (instead of normal call) to allow the thread function to actually exit. Calling done() could alternatively happen from outside of threads, but then you'd need to check the value of threads_finished every frame in _process() and then use some flag to disable the check after executing done() once. Doing it from the thread function is a bit more elegant because it works like a callback rather than polling.
Yes (with "each" crossed out)
It doesn't need to be workload data. Any data that multiple threads could access at the same time should be protected by mutexes. In the example, the thread_finished counter as well as threads array are both accessed by multiple threads, so technically they represent shared data. But if each thread is working on its dedicated data then that part of the code doesn't need a mutex lock.

The gist is - you don't want to explicitly wait for threads in your main code. It defies the purpose of using threads in the first place. Let them do their job while your main thread is free to play funny animations. Threads themselves should take care to notify the main code when they're finished doing their business.