I'm trying to render thousands of meshes (as many as possible), but I'm having the problem that I can't optimize them in the usual ways.
I'm making a brush-based 3D level editor, akin to Valve Hammer, Trenchbroom, DarkRadiant and others. And one problem I'd like to get solved from the beginning is how to render many brushes with good performance.
As of now, if I have 1000 brushes on the screen, my frame rate drops to 40. That's not a lot of brushes to be capped at, so I need to improve this. My system is old and slow, but that shouldn't be a concern. DarkRadiant can render upwards of 18,000 brushes at 60fps, no sweat, on my system (I got sick of adding more brushes at some point).
The main problem I have is that brushes can't be optimized in the obvious ways. For example, most brushes in a map are going to be different from all other brushes, so they must be unique meshes, and that means I can't use MultiMeshes. On top of that, each face of a brush can have an arbitrary material, so there's a high likelihood of there being multiple surfaces per brush mesh. But then each surface in a mesh is a separate draw call.
Right now my tests involve simple cuboid brushes, with 6 surfaces each. And so as it stands, rendering 1000 of those brushes, results in 6000 draw calls. I'm testing with the worst case scenario, where each face of every brush has a different material.

So I'm trying to figure out how to optimize this. I tried using the RenderingServer, but it didn't improve performance at all (at least not with just 1000 brushes; it may be useful down the line, but not now, it seems).
I tried occlusion culling too. The results seemed promising, but it has a problem when you fly into a brush it culls brushes you should be able to see from there. I tried turning off the occluder of the brush the camera is in, but with too many brushes in the scene the occlusion algorithm takes too long to update and you see the brushes disappearing momentarily.
An idea I had, was to merge brush meshes, or face meshes that share the same material. The problem is doing this blindly might result in meshes being merged despite their brushes being too far away, and then many of them might lose frustum culling. I presume this method might need to be chunked with a quadtree or octree or something. But doing some kind of chunking for brushes doesn't sound as intuitive as for voxels, because brushes are at arbitrary positions and sizes, and any leaf in a tree might be densely packed or nearly empty.
I thought about using an AtlasTexture, which would allow all brushes to have a single surface, but in 3D with big textures you'll hit the atlas texture size limits quickly (4096x4096). I'm also concerned that with an AtlasTexture I can't do texture tiling on the brush faces. Another option that I stumbled on was the Texture2DArray, but it seems to require that all textures have the same size, so that's no good.
I'd be interested in hearing any other thoughts on this.