• Godot HelpShaders
  • Run compute shader multiple times with differrent uniforms in one compute list

Good afternoon, I'm currently working on learning how to use compute shaders and I've hit a bit of an issue-- I'm trying to do a simple test with two textures, with one shader that writes the value of one texture to the other, plus a constant. To make this work infinitely, I started by just swapping RIDs of the uniforms-- however, looking at the documentation for RenderingDevice.compute_list_begin(), it has an example which seems to show binding two sets of uniforms, then running a shader twice. Presumably this would speed things up significantly, by reducing how much the CPU has to talk to the GPU, but regrettably I'm not entirely confident on how GPU scheduling works. I've attempted to implement this, but with no luck-- I tried setting barriers, setting the pipeline specialization constants, etc, to no avail. Here's the code that I'm attempting to use:

func ReadyForDispatch(x: int, y: int = 1, z: int = 1) -> void:
	var specC1 := RDPipelineSpecializationConstant.new()
	specC1.set_constant_id(0)
	var specC2 := RDPipelineSpecializationConstant.new()
	specC2.set_constant_id(1)
	var compute_path := _rd.compute_pipeline_create(_shader, [specC1, specC2])
	var compute_list := _rd.compute_list_begin()
	_rd.compute_list_bind_compute_pipeline(compute_list, compute_path)
	for i in range(_uniform_sets.size()):
		_rd.compute_list_bind_uniform_set(compute_list, _uniform_sets[i], i)
	for i in range(_uniform_sets.size()):
		_rd.compute_list_dispatch(compute_list, x, y, z)
	_rd.compute_list_add_barrier(compute_list)
	var mask: RenderingDevice.BarrierMask
	mask ^= RenderingDevice.BARRIER_MASK_ALL_BARRIERS
	_rd.compute_list_end(mask)
	_rd.free_rid(compute_path)
	pass

And honestly I have no idea what could be wrong. What appears to be happening is that it only runs the shader once, ignoring the multiple calls to RenderingDevice.compute_list_dispatch()-- which isn't exactly behavior I'd expect. The uniform sets I'm using contain 3 independent uniforms, 2 for the textures and 1 for the buffer, where the only difference is that the RIDs assigned to the 2 texture uniforms are swapped in the second. Here's the shader code, if it helps:

#[compute]
#version 450

layout (binding=0, rgba32f) uniform readonly image2D Input;
layout (binding=1, rgba32f) uniform writeonly image2D Output;

layout (binding=2) uniform Test {
  float value;
  float value2;
  float value3;
  float value4;
};

// Invocations in the (x, y, z) dimension
layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in;


// The code we want to execute in each invocation
void main() {
  // gl_GlobalInvocationID.x uniquely identifies this invocation across all work groups
  ivec2 index =  ivec2(gl_GlobalInvocationID.xy);
  vec2 uv = vec2(index) / vec2(gl_NumWorkGroups.xy * gl_WorkGroupSize.xy);
  vec4 col = imageLoad(Input, index);
  imageStore(Output, index, vec4(1.0));
}

Thanks a bunch!
(Let me know if you need any extra code, I don't know if this is everything that's needed to figure out what's going on)

5 days later

Figured it out! The code that actually works is

func ReadyForDispatch(x: int, y: int = 1, z: int = 1) -> void:
	var compute_path := _rd.compute_pipeline_create(_shader)
	var compute_list := _rd.compute_list_begin()
	_rd.compute_list_bind_compute_pipeline(compute_list, compute_path)
	for i in range(_uniform_sets.size()):
		_rd.compute_list_bind_uniform_set(compute_list, _uniform_sets[i], 0)
		_rd.compute_list_dispatch(compute_list, x, y, z)
	_rd.compute_list_add_barrier(compute_list)
	var mask: RenderingDevice.BarrierMask = RenderingDevice.BARRIER_MASK_ALL_BARRIERS
	_rd.compute_list_end(mask)
	_rd.free_rid(compute_path)
	pass

It looks like, as opposed to the example the docs give, the only way for this to work is if the uniforms are set with set_index of 0 then compute_list_dispatch is called immediately after. Not sure if there's a better way to do this, or even if this is a good way to do it/faster than just making two compute lists, but oh well.