Why does this threaded task take longer than if done sequentially?

Jjch · May 21, 2023

I wrote a simple test to see if there is a difference in sequential vs threaded performance. On my machine, both tasks run about the same amount of time, threaded is usually a tad slower (14× 12th Gen Intel® Core™ i7-12700H). Why is that?

EDIT: So apparently, running the project in the editor with debug window, has a profound effect on multi threading. This is probably obvious to anyone who has experience with Godot, but me being new to it, it did not occur to me. So I replaced the print statements in the below code with updating a label, so it can be monitored without the debug window showing (i.e. go to project list, select Run; as has been pointed out below, this would still run in debug mode but without the overhead from the editor).

The timings became sensible: Processing 40M iterations sequentially took 430ms on my machine. When distributed among 4 threads, it took 147ms. This is what I would expect.

Thanks everyone for the input!

extends Node

var threads_count: int = 4
var threads_finished: int = 0
var threads = []
var mutex = Mutex.new()
var iters: int = 10_000_000
var start
var finished

func _ready():
	# Test sequential processing.
	start = Time.get_unix_time_from_system()
	for i in threads_count:
		for j in iters:
			j += 1
	finished = Time.get_unix_time_from_system()
	print("Sequential: ", finished - start)
	
	# Test parallel processing.
	start = Time.get_unix_time_from_system()
	for i in threads_count:
		threads.push_back(Thread.new())
		threads[-1].start(thread_func)

func thread_func():
	for i in iters:
		i += 1
	mutex.lock()
	threads_finished += 1
	if threads_finished == threads_count:
		call_deferred("done")
	mutex.unlock()

func done():
	for t in threads:
		t.wait_to_finish()
	threads.clear()
	finished = Time.get_unix_time_from_system()
	print("Parallel: ", finished - start)

xyz · May 21, 2023

jch You're testing it with too little work (and too few threads), so threads possibly might not have been scheduled to different cpu cores, and/or the overhead probably ate up most of the benchmarked time. If I run your example with 10 threads for 100 million iterations on a 6 core cpu, threaded execution takes about 12 seconds with reported 100% cpu utilization while non-threaded runs for 20 seconds with 14% utilization.
Note that you can expect speed ups only on multi core cpus. But in Godot (and realtime apps in general) threads are useful even if there are no speed gains because you can execute long heavy work without blocking the normal execution of the main game loop.

cybereality · May 21, 2023

Spawning threads is heavy, much heavier than just calling a function. So if you use them, the workload needs to be sufficient to offset the overhead. In addition, you never get linear scaling based on the number of processors. Your app is always limited by the part of code that needs to be serial (which is a lot in games or a game engine). See Amdahl's Law.

Maximum speedup is usually n with n processors (linear speedup). Note that as n → ∞, the maximum speedup is limited to 1/f. Even with infinite number of processors, maximum speedup limited to 1/f . Example: With only 5% of computation being serial, maximum speedup is 20, irrespective of number of processors.

Jjch · May 22, 2023

For more accurate timings (at least where multi-threading is involved), run the project from the project list, rather than from inside the editor (see the EDIT part of the original question).

cybereality · May 22, 2023

Right, the editor itself uses processing and can slow things down. However, even running it from the project list, it's still in debug mode. You have to export the project as an exe file, then close the editor completely before running it, to see full performance.

xyz · May 22, 2023

cybereality It doesn't really matter if the point is to measure relative difference in the performance.

Jjch · May 22, 2023

cybereality Ok good point. I included a note to say that this way of launching the project is actually still debug mode, just without the editor overhead.

I searched but could not find a section in the docs that would explain simply the different modes a project can be launched (editor, no editor, export) for newbies, and what kind of overhead is there for each. Is this described anywhere?