4

I have seen instanced rendering using a vertex attribute buffer. So in addition to your usual data such as 'position' and 'normal' you have an extra 'attribute' for the instanced data:

in vec3 position;
in vec3 normal;
in instanceData
{
    mat4 worldMatrix;
};

For each vertex the pointer for 'position' and 'normal' is advanced the stride of one vertex, but the pointer for 'in instanceData' is advanced one stride length for each instance drawn. In OpenGL this is set with the glVertexAttribDivisor function, and in Direct3D I think with IndexCountPerInstance.

I was wondering however whether instanced rendering could be done with the 'instanceData' buffer read from a uniform or constant buffer. In OpenGL this would like this:

in vec3 position;
in vec3 normal;
uniform instanceData
{
    mat4 worldMatrix[4000]; // Drawing 4000 objects
};

int main()
{
     worldMatrix[glInstanceID]; // If we're drawing for example the 4th instance then the instance data
// is read from the uniform buffer using the instance ID at index 4
}

This way your instance data could be fed to a uniform buffer instead having it as a vertex attribute, which would require changing the vertex input layout when you wanted to instanced render. I know in OpenGL uniform buffers are of a limited size and so would you would be restricted to fewer instances than using the vertex attribute method. But in the case you needed to draw that many instances you can always use a Shader Storage Buffer Object, which allow for much bigger buffers (maybe unlimited), so SSBOs can sort of act like uniforms anyway.

I'm just wondering:

  1. Whether using the uniform buffer method is doable/a good idea, and what are the tradeoffs between the two methods? For example, is the uniform buffer method slower? You'd be indexing into it at each shader call, for each vertex.

  2. Is using this method still instanced rendering? I've only seen examples of instanced rendering using the vertex attribute method.

2 Answers 2

1

Attribute-based instancing relies on vertex attributes. And therefore, it is subject to the limitations of vertex attributes.

Taking your example:

in instanceData
{
    mat4 worldMatrix;
};

First, this is incorrect; you cannot aggregate vertex shader inputs into interface blocks. So really this should be in mat4 worldMatrix;.

More importantly, the correct version burns four vertex attribute locations. A mat4, as a vertex attribute, is comprised of 4 attribute locations.

Many implementations don't allow more than 16 attribute locations. So there are non-trivial limitations on attribute-based per-instance data.

By using gl_InstanceID and an explicit memory fetch, you are not limited on the amount of per-instance data that you can have.

The primary limitation of using explicit memory fetches is not UBO limits (you shouldn't be using UBOs for per-instance data) or even SSBO limits (because SSBOs in all practical implementations are of unlimited size). The primary limitation is that gl_InstanceID does not respect the base instance parameter of instance calls. So if you need to use this functionalty, you must either use vertex attributes or rely on GL 4.6/ARB_shader_draw_parameter's gl_BaseInstance value.

Note that Vulkan doesn't have this problem; gl_InstanceIndex always respects the base instance.

Sign up to request clarification or add additional context in comments.

Comments

0

Yep this works and I'm currently doing it with my engine.

My concern is something you touched on though... with vertex array approach, you can have as much instance data as your GPU memory allow. Uniform buffer objects on the other hand have pretty low limits (up to the implementation, but the standard set 16kb as the minimum guaranteed). SSBO is better, but that standard doesn't guarantee you get to use maximum memory for that data like it does when using a vertex array (I believe).

I think I will switch to using vertex array. It's also simpler to code up rather than tracking binding points, etc. What advantage are you looking for with uniform buffer objects?

edit Ah ok, I see now that you're trying to avoid having to change the input layout for the sake of instancing. Seems like it's not work sacrificing all of that extra memory just for that convenience. Also, I wonder what the performance difference would be...

Right now, I'm using openGL 3.0 ES on iOS 16+. I'm rendering 10M polys with a combo of instancing and regular (just 8 instances of hight poly objects for example). I'm getting very high driver bottleneck (metal is churning away trying to convert gl ES 3, uniform buffer object code into metal on the fly). It doesn't matter what size I render to, or what resolution, I'm still getting 20fps.

1 Comment

please summarize

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.