Is there a way to read buffer faster?

kuligs2 · 1 Apr

Hello, im pretty new around here and i wonder, i kinda am writing a byte parser, so i have to read files by bytes. While while and file.get_buffer(<size>) works, its really slow.

For instance, it takes around a second to parse 1MB file.

Now for 18MB file it takes 18s.

What i did was:

func read_wav(wav_file_path:String):
	var start_time = Time.get_unix_time_from_system()
	var sfx_file = FileAccess.open(wav_file_path,FileAccess.READ)
	print("---START---")
	print("file_start: ",wav_file_path)
	while sfx_file.get_position() < sfx_file.get_length():
		# Read data
		var id_tag = sfx_file.get_buffer(4).get_string_from_ascii()
		match id_tag:
			"RIFF":
				var r_buff = sfx_file.get_buffer(8)
				var r_size = r_buff.slice(0,4).to_int32_array()
				var r_format = r_buff.slice(4,8).get_string_from_ascii()
				print("---")
				print(id_tag)
				print("size: ",r_size)
				print("format: ",r_format)

Then i changed it to:

func read_wav_new(wav_file_path:String):
	var start_time = Time.get_unix_time_from_system()
	var sfx_file = FileAccess.open(wav_file_path,FileAccess.READ)
	print("---START NEW---")
	print("file_start: ",wav_file_path)
	var buffer = sfx_file.get_buffer(sfx_file.get_length())
	
	for i in range(0,buffer.size()-1,4):
		# Read data
		if i+4 > buffer.size()-1: 
			break
		var id_tag = buffer.slice(i,i+4).get_string_from_ascii()

		if id_tag.contains("LI"):
			pass
		match id_tag:
			"RIFF":
				var r_buff = buffer.slice(i+4,i+4+8)
				var r_size = r_buff.slice(0,4).to_int32_array()
				var r_format = r_buff.slice(4,8).get_string_from_ascii()
				print("---")
				print(id_tag)
				print("size: ",r_size)
				print("format: ",r_format)

And the new method on the 18MB file took only 3s.

So maybe there is better way to parse bytes using gdscript?

DaveTheCoder · 1 Apr

File I/O is slow.

Using a large buffer size is the conventional way of optimizing it.

Unless the FileAccess class or the platform does its own read-ahead or write-deferred buffering, you're stuck with doing it manually as in your read_wav_new function.

kuligs2 · 1 Apr

DaveTheCoder i think i know the problem, i forgot im in the process thread.. i will have to try to move it into a separate thread... this way the while loop can parse faster

Also, as i tried to use an array of bytes and then parse the array, it seems to not function like the get_buffer method.. idk why.. i tripple checked the code.. it executes differently, like reading wrong bytes..

Maybe there is a limit on how much can you fit into array.. and 18MB of bytes is just one too many..?

DaveTheCoder · 1 Apr

Yes, putting a time-consuming operation in _process is bad.

Parsing bytes in a packed array should work. 18MB shouldn't be a problem. Something else is wrong.

This looks suspicious:
var r_size = r_buff.slice(0,4).to_int32_array()
If that's intended to get an int value, shouldn't it be:
var r_size = r_buff.decode_u32(0)

Depending on how the file was created, you may need to rearrange the byte order when converting to an int.

I strongly recommend using static typing and declaring the type of each variable. That will often tell you about problems.

kuligs2 · 2 Apr

DaveTheCoder the thread only saved me half of the time.. still takes 9.7s to process 18MB file ...

As for the decode, im not sure what type it is.. the to_int32 worked so i used it.. whats the difference?

Tested with decode, the time didnt improve...
Added types to vars, ddint improve nothing..

Any other suggestions?

func read_wav_decode(wav_file_path:String):
	var start_time = Time.get_unix_time_from_system()
	var sfx_file = FileAccess.open(wav_file_path,FileAccess.READ)
	print("---START---")
	print("file_start: ",wav_file_path)
	while sfx_file.get_position() < sfx_file.get_length():
		# Read data
		var buff_bytes:PackedByteArray = sfx_file.get_buffer(4)
		var id_tag:String = sfx_file.get_buffer(4).get_string_from_ascii()
		match id_tag:
			"RIFF":
				var r_buff:PackedByteArray = sfx_file.get_buffer(8)
				var r_size:int = r_buff.decode_u32(0)
				var r_format:String = r_buff.slice(4,8).get_string_from_ascii()
				print("---")
				print(id_tag)
				print("size: ",r_size)
				print("format: ",r_format)
			"fmt ":
				var f_buff:PackedByteArray = sfx_file.get_buffer(20)
				var f_size:int = f_buff.decode_u32(0)
				var f_format:int = f_buff.slice(4,6)[0]
				var f_num_channels:int = f_buff.slice(6,8)[0]
				var f_sample_rate:int = f_buff.slice(8,12).decode_u32(0)
				var f_byte_rate:int = f_buff.slice(12,16).decode_u32(0)
				var f_block_align:int = f_buff.slice(16,18).[0]
				var f_bits_per_sample:int = f_buff.slice(18,20).[0]
				print("---")
				print(id_tag)
				print("f_size: ",f_size)
				print("f_format: ",f_format)
				print("f_num_channels: ",f_num_channels)
				print("f_sample_rate: ",f_sample_rate)
				print("f_byte_rate: ",f_byte_rate)
				print("f_block_align: ",f_block_align)
				print("f_bits_per_sample: ",f_bits_per_sample)
			"data":
				var d_buff:PackedByteArray = sfx_file.get_buffer(4)
				var d_size:int = d_buff.decode_u32(0)
				print("---")
				print(id_tag)
				print("d_size: ",d_size)
			"bext":
				var b_buff:PackedByteArray = sfx_file.get_buffer(4)
				var b_size:int = b_buff.decode_u32(0)
				var b_data:String = sfx_file.get_buffer(4+b_size).get_string_from_ascii()
				print("---")
				print(id_tag)
				print("b_size: ",b_size)
				print("b_data: ",b_data)
			"iXML":
				var i_buff:PackedByteArray = sfx_file.get_buffer(4)
				var i_size:int = i_buff.decode_u32(0)
				var i_data:String = sfx_file.get_buffer(4+i_size).get_string_from_ascii()
				print("---")
				print(id_tag)
				print("i_size: ",i_size)
				print("i_data: ",i_data)
			"LIST":
				var l_buff:PackedByteArray = sfx_file.get_buffer(8)
				var l_size:int = l_buff.decode_u32(0)
				var l_type:String = l_buff.slice(4,8).get_string_from_ascii()
				var l_data_buff:PackedByteArray = sfx_file.get_buffer(l_size)
				print("---")
				print(id_tag)
				print("l_size: ",l_size)
				print("l_type: ",l_type)
				
				match l_type:
					"INFO":
						var inf_type:String = l_data_buff.slice(0,4).get_string_from_ascii()
						var inf_size:int = l_data_buff.slice(4,8).decode_u32(0)
						var inf_data:String = l_data_buff.slice(8,l_data_buff.size()-1).get_string_from_ascii()
						print("---SUB---")
						print("inf_type: ",inf_type)
						print("inf_size: ",inf_size)
						print("inf_data: ",inf_data)
	var end_time = Time.get_unix_time_from_system()
	var time_diff = end_time - start_time
	
	print("---END---")
	print("---Elapsed time---: ",time_diff,"s")

Full code with decode and types

TEST

Test with my old method without types and decode

---START---
file_start: res://Test_files/sfx/01 - Title Theme (Main Menu).wav
---
fmt 
f_size: [18]
f_format: [1, 0]
f_num_channels: [2, 0]
f_sample_rate: [44100]
f_byte_rate: [176400]
f_block_align: [4, 0]
f_bits_per_sample: [16, 0]
---
bext
b_size: [602]
b_data: 
---
LIST
l_size: [66]
l_type: INFO
---SUB---
inf_type: IART
inf_size: [11]
inf_data: David Wise
---END---
---Elapsed time---: 9.79999995231628s

The new: decode method

---START---
file_start: res://Test_files/sfx/01 - Title Theme (Main Menu).wav
---
fmt 
f_size: 18
f_format: 1
f_num_channels: 2
f_sample_rate: 44100
f_byte_rate: 176400
f_block_align: 4
f_bits_per_sample: 16
---
bext
b_size: 602
b_data: 
---
LIST
l_size: 66
l_type: INFO
---SUB---
inf_type: IART
inf_size: 11
inf_data: David Wise
---END---
---Elapsed time---: 9.76999998092651s

I run this from:

func thread_func():
	var t_wav = "res://Test_files/sfx/01 - Title Theme (Main Menu).wav"
	read_wav(t_wav)
	read_wav_decode(t_wav)

That gets started with button press:

	thread.start(thread_func)
	thread.wait_to_finish()

RIFF spec for reference
http://soundfile.sapp.org/doc/WaveFormat/

DaveTheCoder · 2 Apr

kuligs2 As for the decode, im not sure what type it is.. the to_int32 worked so i used it.. whats the difference?

to_int32_array returns a PackedInt32Array, not an int.

kuligs2 Added types to vars, ddint improve nothing..

I didn't expect that to change the speed. It makes the code more reliable by warning you about problems.

DaveTheCoder · 2 Apr

I thought of something.

Measure how long it takes to read the file into memory, without doing any parsing or processing of the data.

Try both methods (reading a few bytes at a time, and using a large buffer) that you used above.

If it's the parsing, rather than the file I/O, that's consuming the time, then a significant speed-up could be achieved by writing a GDExtension in C or C++, or using .NET Godot with C#.

By the way, what's the purpose of doing this? Maybe there's a better approach.

kuligs2 · 3 Apr

DaveTheCoder By the way, what's the purpose of doing this? Maybe there's a better approach.

audio sfx browser app.. search via metadata. i need to parse audio files for metadata.. the godot has few PRs that deal with mp3 and ogg, but they went lazy on wav.. so im trying to make a proof of concept, clean it up and maybe suggest they include it into the engine, or maybe not..

kuligs2 · 3 Apr

DaveTheCoder If it's the parsing, rather than the file I/O, that's consuming the time, then a significant speed-up could be achieved by writing a GDExtension in C or C++, or using .NET Godot with C#.

Its not IO, file gets read instantly. Its the for loop.. it iterates every byte, but in reality i dont need to do that, because when i read byte length and process the data, i need the iterator to be at the last read byte value to continue the next loop.. and that value is dependant on the data size and is variable..

What i need is recursive something.. somehow where i can set the cursor position to the byte index where the next loop needs to start..

So far with updated method i get nice gains.. from 18s to 4s

Results:

Old meth:

---START---read_wav_decode---
file_start: res://Test_files/sfx/01 - Title Theme (Main Menu).wav
---
RIFF
size: 22557324
format: WAVE
---
fmt 
f_size: 18
f_format: 1
f_num_channels: 2
f_sample_rate: 44100
f_byte_rate: 176400
f_block_align: 4
f_bits_per_sample: 16
---
bext
b_size: 602
b_data: 
---
LIST
l_size: 66
l_type: INFO
---SUB---
inf_type: IART
inf_size: 11
inf_data: David Wise
---END---
---Elapsed time---: 18.6229999065399s

Improved meth

---START---read_wav_decode_new---
file_start: res://Test_files/sfx/01 - Title Theme (Main Menu).wav
---
RIFF
size: 22557324
format: WAVE
---
fmt 
f_size: 18
f_format: 1
f_num_channels: 2
f_sample_rate: 44100
f_byte_rate: 176400
f_block_align: 4
f_bits_per_sample: 16
---
bext
b_size: 602
b_data: 
---
LIST
l_size: 66
l_type: INFO
---SUB---
inf_type: IART
inf_size: 11
inf_data: David Wise
---END---
---Elapsed time---: 4.02900004386902s

Code:

func read_wav_decode(wav_file_path:String):
	var start_time = Time.get_unix_time_from_system()
	var sfx_file = FileAccess.open(wav_file_path,FileAccess.READ)
	print("---START---read_wav_decode---")
	print("file_start: ",wav_file_path)
	while sfx_file.get_position() < sfx_file.get_length():
		# Read data
		
		var id_tag:String = sfx_file.get_buffer(4).get_string_from_ascii()
		match id_tag:
			"RIFF":
				var r_buff:PackedByteArray = sfx_file.get_buffer(8)
				var r_size:int = r_buff.slice(0,4).decode_u32(0)
				var r_format:String = r_buff.slice(4,8).get_string_from_ascii()
				print("---")
				print(id_tag)
				print("size: ",r_size)
				print("format: ",r_format)
			"fmt ":
				var f_buff:PackedByteArray = sfx_file.get_buffer(20)
				var f_size:int = f_buff.slice(0,4).decode_u32(0)
				var f_format:int = f_buff.slice(4,6)[0]
				var f_num_channels:int = f_buff.slice(6,8)[0]
				var f_sample_rate:int = f_buff.slice(8,12).decode_u32(0)
				var f_byte_rate:int = f_buff.slice(12,16).decode_u32(0)
				var f_block_align:int = f_buff.slice(16,18)[0]
				var f_bits_per_sample:int = f_buff.slice(18,20)[0]
				print("---")
				print(id_tag)
				print("f_size: ",f_size)
				print("f_format: ",f_format)
				print("f_num_channels: ",f_num_channels)
				print("f_sample_rate: ",f_sample_rate)
				print("f_byte_rate: ",f_byte_rate)
				print("f_block_align: ",f_block_align)
				print("f_bits_per_sample: ",f_bits_per_sample)
			"data":
				var d_buff:PackedByteArray = sfx_file.get_buffer(4)
				var d_size:int = d_buff.slice(0,4).decode_u32(0)
				print("---")
				print(id_tag)
				print("d_size: ",d_size)
			"bext":
				var b_buff:PackedByteArray = sfx_file.get_buffer(4)
				var b_size:int = b_buff.slice(0,4).decode_u32(0)
				var b_data:String = sfx_file.get_buffer(4+b_size).get_string_from_ascii()
				print("---")
				print(id_tag)
				print("b_size: ",b_size)
				print("b_data: ",b_data)
			"iXML":
				var i_buff:PackedByteArray = sfx_file.get_buffer(4)
				var i_size:int = i_buff.slice(0,4).decode_u32(0)
				var i_data:String = sfx_file.get_buffer(4+i_size).get_string_from_ascii()
				print("---")
				print(id_tag)
				print("i_size: ",i_size)
				print("i_data: ",i_data)
			"LIST":
				var l_buff:PackedByteArray = sfx_file.get_buffer(8)
				var l_size:int = l_buff.slice(0,4).decode_u32(0)
				var l_type:String = l_buff.slice(4,8).get_string_from_ascii()
				var l_data_buff:PackedByteArray = sfx_file.get_buffer(l_size)
				print("---")
				print(id_tag)
				print("l_size: ",l_size)
				print("l_type: ",l_type)
				
				match l_type:
					"INFO":
						var inf_type:String = l_data_buff.slice(0,4).get_string_from_ascii()
						var inf_size:int = l_data_buff.slice(4,8).decode_u32(0)
						var inf_data:String = l_data_buff.slice(8,l_data_buff.size()).get_string_from_ascii()
						print("---SUB---")
						print("inf_type: ",inf_type)
						print("inf_size: ",inf_size)
						print("inf_data: ",inf_data)
	var end_time = Time.get_unix_time_from_system()
	var time_diff = end_time - start_time
	
	print("---END---")
	print("---Elapsed time---: ",time_diff,"s")
	
func read_wav_decode_new(wav_file_path:String):
	var start_time = Time.get_unix_time_from_system()
	var sfx_file = FileAccess.open(wav_file_path,FileAccess.READ)
	var buff_file:PackedByteArray  = sfx_file.get_buffer(sfx_file.get_length())
	var offset:int = 0
	
	print("---START---read_wav_decode_new---")
	print("file_start: ",wav_file_path)
	for i in range(0,buff_file.size()):
		# Read data
		if i < offset:
			continue
		offset+=4
		var id_tag:String =buff_file.slice(i,offset).get_string_from_ascii()
		match id_tag:
			"RIFF":
				var r_buff:PackedByteArray = buff_file.slice(offset,offset+8)
				var r_size:int = r_buff.slice(0,4).decode_u32(0)
				var r_format:String = r_buff.slice(4,8).get_string_from_ascii()
				print("---")
				print(id_tag)
				print("size: ",r_size)
				print("format: ",r_format)
				offset+=8
			"fmt ":
				var f_buff:PackedByteArray = buff_file.slice(offset,offset+20)
				var f_size:int = f_buff.slice(0,4).decode_u32(0)
				var f_format:int = f_buff.slice(4,6)[0]
				var f_num_channels:int = f_buff.slice(6,8)[0]
				var f_sample_rate:int = f_buff.slice(8,12).decode_u32(0)
				var f_byte_rate:int = f_buff.slice(12,16).decode_u32(0)
				var f_block_align:int = f_buff.slice(16,18)[0]
				var f_bits_per_sample:int = f_buff.slice(18,20)[0]
				print("---")
				print(id_tag)
				print("f_size: ",f_size)
				print("f_format: ",f_format)
				print("f_num_channels: ",f_num_channels)
				print("f_sample_rate: ",f_sample_rate)
				print("f_byte_rate: ",f_byte_rate)
				print("f_block_align: ",f_block_align)
				print("f_bits_per_sample: ",f_bits_per_sample)
				offset+=20
			"data":
				var d_buff:PackedByteArray = buff_file.slice(offset,offset+4)
				var d_size:int = d_buff.decode_u32(0)
				print("---")
				print(id_tag)
				print("d_size: ",d_size)
				offset+=4
			"bext":
				var b_buff:PackedByteArray = buff_file.slice(offset,offset+4)
				var b_size:int = b_buff.decode_u32(0)
				var b_data:String = buff_file.slice(offset+4,offset+4+b_size).get_string_from_ascii()
				print("---")
				print(id_tag)
				print("b_size: ",b_size)
				print("b_data: ",b_data)
				offset+=4+b_size
			"iXML":
				var i_buff:PackedByteArray = buff_file.slice(offset,offset+4)
				var i_size:int = i_buff.decode_u32(0)
				var i_data:String = buff_file.slice(offset+4,offset+4+i_size).get_string_from_ascii()
				print("---")
				print(id_tag)
				print("i_size: ",i_size)
				print("i_data: ",i_data)
				offset+=4+i_size
			"LIST":
				var l_buff:PackedByteArray = buff_file.slice(offset,offset+8)
				var l_size:int = l_buff.slice(0,4).decode_u32(0)
				var l_type:String = l_buff.slice(4,8).get_string_from_ascii()
				var l_data_buff:PackedByteArray = buff_file.slice(offset+8,offset+8+l_size)
				print("---")
				print(id_tag)
				print("l_size: ",l_size)
				print("l_type: ",l_type)
				offset+=8+l_size
				match l_type:
					"INFO":
						var inf_type:String = l_data_buff.slice(0,4).get_string_from_ascii()
						var inf_size:int = l_data_buff.slice(4,8).decode_u32(0)
						var inf_data:String = l_data_buff.slice(8,l_data_buff.size()).get_string_from_ascii()
						print("---SUB---")
						print("inf_type: ",inf_type)
						print("inf_size: ",inf_size)
						print("inf_data: ",inf_data)
	var end_time = Time.get_unix_time_from_system()
	var time_diff = end_time - start_time
	
	print("---END---")
	print("---Elapsed time---: ",time_diff,"s")

Note here:
As you can see below here i wait for the iterator to catch up with the desired offset to continue.

	for i in range(0,buff_file.size()):
		# Read data
		if i < offset:
			continue

I havent come up with a better way to loop with given position..

Zini · 3 Apr

For your second version you should use a while loop instead of the for loop. This allows you to manually control the loop variable (i) without the continue hack. Also, you may want to consider assigning the value of buff_file.size() to a variable outside of the loop and use that variable for comparison instead. GDScirpt may be able to optimise out these function calls. Or it may not. Only testing will answer that question.

One thing I notice is that you have a lot of print statements inside the loop. I am not sure if that is the case for Godot. But on many platforms print-style output is very slow. I suggest to comment out all the print statements inside the loop to ensure that you are actually measuring the time of the algorithm and not the debug-printing instead.

kuligs2 · 4 Apr

Zini One thing I notice is that you have a lot of print statements inside the loop. I am not sure if that is the case for Godot. But on many platforms print-style output is very slow. I suggest to comment out all the print statements inside the loop to ensure that you are actually measuring the time of the algorithm and not the debug-printing instead.

Didnt help much commenting out print statements, maybe 0.3s faster.. still i think file parsing on byte level should be much faster..

Zini For your second version you should use a while loop instead of the for loop. This allows you to manually control the loop variable (i) without the continue hack. Also, you may want to consider assigning the value of buff_file.size() to a variable outside of the loop and use that variable for comparison instead. GDScirpt may be able to optimise out these function calls. Or it may not. Only testing will answer that question.

Didnt do diddly squat. Negligible improvements...

added at the top

var i = 0 # <---
	while offset < buff_len: # <---

And at the bottom:

		i = offset # <-------------
	var end_time = Time.get_unix_time_from_system()
	var time_diff = end_time - start_time

func read_wav_decode_new(wav_file_path:String):
	var start_time = Time.get_unix_time_from_system()
	var sfx_file = FileAccess.open(wav_file_path,FileAccess.READ)
	var buff_file:PackedByteArray  = sfx_file.get_buffer(sfx_file.get_length())
	var offset:int = 0
	
	print("---START---read_wav_decode_new---")
	print("file_start: ",wav_file_path)
	var buff_len = buff_file.size()
	var i = 0
	#for i in range(0,buff_file.size()):
		## Read data
		#if i < offset:
			#continue
	while offset < buff_len:
		offset+=4
		var id_tag:String =buff_file.slice(i,offset).get_string_from_ascii()
		match id_tag:
			"RIFF":
				var r_buff:PackedByteArray = buff_file.slice(offset,offset+8)
				var r_size:int = r_buff.slice(0,4).decode_u32(0)
				var r_format:String = r_buff.slice(4,8).get_string_from_ascii()
				print("---")
				print(id_tag)
				print("size: ",r_size)
				print("format: ",r_format)
				offset+=8
			"fmt ":
				var f_buff:PackedByteArray = buff_file.slice(offset,offset+20)
				var f_size:int = f_buff.slice(0,4).decode_u32(0)
				var f_format:int = f_buff.slice(4,6)[0]
				var f_num_channels:int = f_buff.slice(6,8)[0]
				var f_sample_rate:int = f_buff.slice(8,12).decode_u32(0)
				var f_byte_rate:int = f_buff.slice(12,16).decode_u32(0)
				var f_block_align:int = f_buff.slice(16,18)[0]
				var f_bits_per_sample:int = f_buff.slice(18,20)[0]
				print("---")
				print(id_tag)
				print("f_size: ",f_size)
				print("f_format: ",f_format)
				print("f_num_channels: ",f_num_channels)
				print("f_sample_rate: ",f_sample_rate)
				print("f_byte_rate: ",f_byte_rate)
				print("f_block_align: ",f_block_align)
				print("f_bits_per_sample: ",f_bits_per_sample)
				offset+=20
			"data":
				var d_buff:PackedByteArray = buff_file.slice(offset,offset+4)
				var d_size:int = d_buff.decode_u32(0)
				print("---")
				print(id_tag)
				print("d_size: ",d_size)
				offset+=4
			"bext":
				var b_buff:PackedByteArray = buff_file.slice(offset,offset+4)
				var b_size:int = b_buff.decode_u32(0)
				var b_data:String = buff_file.slice(offset+4,offset+4+b_size).get_string_from_ascii()
				print("---")
				print(id_tag)
				print("b_size: ",b_size)
				print("b_data: ",b_data)
				offset+=4+b_size
			"iXML":
				var i_buff:PackedByteArray = buff_file.slice(offset,offset+4)
				var i_size:int = i_buff.decode_u32(0)
				var i_data:String = buff_file.slice(offset+4,offset+4+i_size).get_string_from_ascii()
				print("---")
				print(id_tag)
				print("i_size: ",i_size)
				print("i_data: ",i_data)
				offset+=4+i_size
			"LIST":
				var l_buff:PackedByteArray = buff_file.slice(offset,offset+8)
				var l_size:int = l_buff.slice(0,4).decode_u32(0)
				var l_type:String = l_buff.slice(4,8).get_string_from_ascii()
				var l_data_buff:PackedByteArray = buff_file.slice(offset+8,offset+8+l_size)
				print("---")
				print(id_tag)
				print("l_size: ",l_size)
				print("l_type: ",l_type)
				offset+=8+l_size
				match l_type:
					"INFO":
						var inf_type:String = l_data_buff.slice(0,4).get_string_from_ascii()
						var inf_size:int = l_data_buff.slice(4,8).decode_u32(0)
						var inf_data:String = l_data_buff.slice(8,l_data_buff.size()).get_string_from_ascii()
						print("---SUB---")
						print("inf_type: ",inf_type)
						print("inf_size: ",inf_size)
						print("inf_data: ",inf_data)
		i = offset
	var end_time = Time.get_unix_time_from_system()
	var time_diff = end_time - start_time
	
	print("---END---")
	print("---Elapsed time---: ",time_diff,"s")

Output

---Elapsed time---: 3.28799986839294s

Zini · 4 Apr

Another idea: Maybe the slice function is responsible for some of the slowdown? It created a new PackedArray every time. This might be expensive.

Potential improvements:

remove the initial slice for each tag and read each element of the tag from the main buffer directly
r_buff.slice(0,4).decode_u32(0): The slice here is clearly redundant because decode_u32 only grabs 4 bytes anyway
All the strings seem to be 4 characters long. You could try to read them as an integer (32 bit) instead and compare them to an int value that matches the bit pattern of the desired string. This would only work for strings that you want to compare against fixed values and that don't require further processing as strings.

kuligs2 · 5 Apr

Zini well maybe i got few 0.2s improvement..

This is what AI said too.. gave me broken code because of outdated godot docs, but fixing it manually i have this:

func read_wav_decode_new_ai(wav_file_path: String):
	var start_time = Time.get_unix_time_from_system()
	var sfx_file = FileAccess.open(wav_file_path, FileAccess.READ)
	var buff_file: PackedByteArray = sfx_file.get_buffer(sfx_file.get_length())
	var offset: int = 0
	print("---START---read_wav_decode_new---")
	print("file_start: ", wav_file_path)
	var buff_len = buff_file.size()

	while offset < buff_len:
		var id_tag: String = buff_file.slice(offset, offset + 4).get_string_from_ascii()
		offset += 4
		match id_tag:
			"RIFF":
				var r_size: int = buff_file.decode_u32(offset)
				offset += 4
				var r_format: String = buff_file.slice(offset, offset + 4).get_string_from_ascii()
				offset += 4
				print("---")
				print(id_tag)
				print("size: ", r_size)
				print("format: ", r_format)

			"fmt ":
				offset += 4  # Skip chunk size field
				var f_format: int = buff_file.decode_u16(offset)
				offset += 2
				var f_num_channels: int = buff_file.decode_u16(offset)
				offset += 2
				var f_sample_rate: int = buff_file.decode_u32(offset)
				offset += 4
				offset += 4  # Skip byte rate field
				var f_block_align: int = buff_file.decode_u16(offset)
				offset += 2
				var f_bits_per_sample: int = buff_file.decode_u16(offset)
				offset += 2
				print("---")
				print(id_tag)
				print("f_format: ", f_format)
				print("f_num_channels: ", f_num_channels)
				print("f_sample_rate: ", f_sample_rate)
				print("f_block_align: ", f_block_align)
				print("f_bits_per_sample: ", f_bits_per_sample)

			"data":
				var d_size: int = buff_file.decode_u32(offset)
				offset += 4+d_size
				print("---")
				print(id_tag)
				print("d_size: ", d_size)

			"bext":
				var b_size: int = buff_file.decode_u32(offset)
				offset += 4
				var b_data: String = buff_file.slice(offset, offset + b_size).get_string_from_ascii()
				offset += b_size
				print("---")
				print(id_tag)
				print("b_size: ", b_size)
				print("b_data: ", b_data)

			"iXML":
				var i_size: int = buff_file.decode_u32(offset)
				offset += 4
				var i_data: String = buff_file.slice(offset, offset + i_size).get_string_from_ascii()
				offset += i_size
				print("---")
				print(id_tag)
				print("i_size: ", i_size)
				print("i_data: ", i_data)

			"LIST":
				var l_size: int = buff_file.decode_u32(offset)
				offset += 4
				var l_type: String = buff_file.slice(offset, offset + 4).get_string_from_ascii()
				offset += 4
				print("---")
				print(id_tag)
				print("l_size: ", l_size)
				print("l_type: ", l_type)

				match l_type:
					"INFO":
						var inf_type: String = buff_file.slice(offset, offset + 4).get_string_from_ascii()
						offset += 4
						var inf_size: int = buff_file.decode_u32(offset)
						offset += 4
						var inf_data: String = buff_file.slice(offset, offset + inf_size).get_string_from_ascii()
						offset += inf_size
						print("---SUB---")
						print("inf_type: ", inf_type)
						print("inf_size: ", inf_size)
						print("inf_data: ", inf_data)

	var end_time = Time.get_unix_time_from_system()
	var time_diff = end_time - start_time
	print("---END---")
	print("---Elapsed Time: ", time_diff, " seconds---")

Output:

---START---read_wav_decode_new---
file_start: res://Test_files/sfx/01 - Title Theme (Main Menu).wav
---
RIFF
size: 22557324
format: WAVE
---
fmt 
f_format: 1
f_num_channels: 2
f_sample_rate: 44100
f_block_align: 4
f_bits_per_sample: 16
---
bext
b_size: 602
b_data: 
---
LIST
l_size: 66
l_type: INFO
---SUB---
inf_type: IART
inf_size: 11
inf_data: David Wise
---END---
---Elapsed Time: 3.07799983024597 seconds---

I dont understand how else to improve it..

kuligs2 · 5 Apr

Zini All the strings seem to be 4 characters long. You could try to read them as an integer (32 bit) instead and compare them to an int value that matches the bit pattern of the desired string. This would only work for strings that you want to compare against fixed values and that don't require further processing as strings.

hmm maybe i should try this.. need to see how to encode string into int

EDIT:

Seems i dont understand how to make int from either string or buffer..

--TEST--
r_s: 1380533830
r_d: 1179011410

As you can see the values dont match..

var r_s = string_to_int(id_tag)
var r_d = buff_file.slice(offset, offset + 4).decode_u32(0)

where

func string_to_int(s: String) -> int:
	var result: int = 0
	for i in range(4):
		if i >= s.length():
			break
		var char_value: int = s[i].to_ascii_buffer()[0]
		result |= (char_value << ((3 - i) * 8))
	return result

EDIT2:
Ah, the joys of precision when doing meths:

func string_to_int(s: String) -> int:
	var result: int = 0
	#for i in range(4):
		#if i >= s.length():
			#break
		#var char_value: int = s[i].to_ascii_buffer()[0]
		#result |= (char_value << ((3 - i) * 8))
	result = s.to_ascii_buffer().decode_u32(0)
	return result

output:

--TEST--
r_s: 1179011410
r_d: 1179011410

Now need to make this into a whole function

EDIT:3
Hardcoded ints:

RIFF 1179011410
fmt  544501094
data 1635017060
bext 1954047330
iXML 1280137321
LIST 1414744396
INFO 1330007625

EDIT:4

Well, got maybe 1s improvement, but still takes more than 2s to parse.. also it throws me this error:

E 0:00:03:354   file_ops.gd:749 @ read_wav_decode_new_2(): Condition "p_offset < 0 || p_offset > (int64_t(size) - 4)" is true. Returning: 0
  <C++ Source>  core\variant\variant_call.cpp:824 @ func_PackedByteArray_decode_u32()
  <Stack Trace> file_ops.gd:749 @ read_wav_decode_new_2()
                file_ops.gd:931 @ thread_func()

---START---read_wav_decode_new---
file_start: res://Test_files/sfx/01 - Title Theme (Main Menu).wav
---

size: 22557324
format: WAVE
---

f_format: 1
f_num_channels: 2
f_sample_rate: 44100
f_block_align: 4
f_bits_per_sample: 16
---

b_size: 602
b_data: 
---

l_size: 66
l_type: INFO
---END---
---Elapsed Time: 2.19700002670288 seconds---