Skip to content

Conversation

@robertosfield
Copy link
Collaborator

Changes also improve performance when using static ViewportState so everyone wins :-)

…dual sizing of the State::stateStacks and viewStateStacks.
@AnyOldName3
Copy link
Contributor

This seems to work with the application we've discussed that originally had problems with resized viewports and sparked the dynamic viewport work.

@AnyOldName3
Copy link
Contributor

However, the lazy state tracking does seem to have exacerbated the problem with disturbed descriptors. I didn't come up with a simple repro when I was investigating it earlier, and thought I'd found a typo in the mitigation attempt I mentioned in #1394 rather than it being a problem with this PR, but just realised that there's probably a new problem. I'll try and make a demo for it tomorrow.

@robertosfield
Copy link
Collaborator Author

@AnyOldName3 I am consider another approach for State::stateStack + viewStateStack, as I'm not happy with ViewportState being on slot 0 by default, as this might cause problems down the line if folks start putting ViewportState on the vsg::StateGroup.

The change I'm thinking of would be to have just State::stateStack, set the ViewportState::slot to something like 8, then have maxSlot number for standard subgraphs, a maxSlot for view releated state, and potentially another higher up for RenderGraph/CommandGraph level state. The later is just a possibility. The crux with this approach is whether I can keep the overhead down.

With the other bug(s) mentioned in #1394 is it worth me trying/merging out the vsgincompatiblepipelinelayouts example at this point?

@robertosfield
Copy link
Collaborator Author

I have checked in my refactor and it's even faster than this branch, and simpler, so all good :-)

I have generated a PR for these changes: #1433

I will do a code review and continue with other checks if that all goes smoothly I'll close this PR and focus on the new approach.

@AnyOldName3
Copy link
Contributor

stateStackPointer-validation-errors.zip

Here's a .vsgt that shows the problem with this branch. I'd expect it to affect viewMaxSlot, too.

It contains a group whose children are:

  • a node that uses the PBR shaderset (generated by loading a basic .glTF).
  • a text node.
  • the first node again.

The PBR shaderset uses more descriptor sets than the text shaderset, so for one of the slots, the PBR material is still the most recently bound state command, and when it's popped (after the first node's recorded) and repushed (so the third node can be recorded), that slot's marked as dirty on master, but on this branch, that mechanism's scrapped and it still passes the pointer equality test. Binding the text pipeline in between disturbed the descriptor, though, so it must be rebound like it is on master.

@robertosfield
Copy link
Collaborator Author

Thanks for the test model.

When I run it I see a blue cube, when I zoom in I see "some text", as far as I can tell there isn't visual glitches. Should I be expecting some?

When I run:

vsgviewer stateStackPointer-validation-errors.vsgt -d

I get lots of output in the form:

    Objects: 2
        [0] 0xcb1c7c000000001b, type: 19, name: NULL
        [1] 0xd10d270000000018, type: 17, name: NULL
VUID-vkCmdDrawIndexed-None-08600(ERROR / SPEC): msgNum: 941228658 - Validation Error: [ VUID-vkCmdDrawIndexed-None-08600 ] Object 0: handle = 0xcb1c7c000000001b, type = VK_OBJECT_TYPE_PIPELINE; Object 1: handle = 0xd10d270000000018, type = VK_OBJECT_TYPE_PIPELINE_LAYOUT; | MessageID = 0x381a0272 | vkCmdDrawIndexed():  The VkPipeline 0xcb1c7c000000001b[] (created with VkPipelineLayout 0xd10d270000000018[]) statically uses descriptor set 1 which is not compatible with the currently bound descriptor set's pipeline layout (VkPipelineLayout 0xd10d270000000018[])
The set (1) is out of bounds for the number of sets bound (1)
. The Vulkan spec states: For each set n that is statically used by a bound shader, a descriptor set must have been bound to n at the same pipeline bind point, with a VkPipelineLayout that is compatible for set n, with the VkPipelineLayout used to create the current VkPipeline or the VkDescriptorSetLayout array used to create the current VkShaderEXT , as described in Pipeline Layout Compatibility (https://vulkan.lunarg.com/doc/view/1.3.290.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdDrawIndexed-None-08600)

This is with viewMaxSlot branch derived from this PR's branch. Both this and the viewMaxSlot branch use the pointer based lazy state updating so will assume the problem exists there as well in the same way.

From what you've written it sounds like VSG master works fine with this test model. Is that correct? Is this issue related to other one that you've been looking into?

@AnyOldName3
Copy link
Contributor

From the fact that we've not seen major problems in the client app we've discussed, where even with the master branch, descriptor-disturbance-related validation errors are emitted for most draw calls if you disable the ten-of-each-validation-error limit that Vulkan's validation layer has by default, I think it's a safe bet that both Nvidia's and AMD's desktop drivers are more leniant with this particular abuse of the API than the specification requires them to be. On a driver that was stricter, the duplicate cube (which doesn't contribute to any fragments as I didn't bother giving it its own transform) would potentially access uninitialised memory for the material descriptor, so could end up entirely the wrong colour or could even crash the driver and take down the whole system.

I don't know the prevalence of implementations that have symptoms, whether they're just minor visual glitches or major consequences like kernel crashes.

To confirm, the master branch generates no validation errors with that test data and this branch generates the ones you've seen.

The issue here is related to #1394 in that in both cases, we're running afoul of the descriptor set disturbance rules, but it's not entirely the same thing as this one's a regression caused by not rebinding things when we used to with obviously sensible scenegraphs, whereas the scenegraphs that trigger #1394 are structured in a way that looks more fragile once you know the disturbance rules, although it would be reasonable for someone who's got scenegraph experience (especially with the OSG) and hasn't been looking into this class of problem recently to expect them to work.

@AnyOldName3
Copy link
Contributor

As for the vsgincompatiblepipelinelayouts example, there are potential fixes for the regression from this branch which would also make it work, but they're invasive and would likely have more of a performance cost than you'd be happy with. For example, we could add pipeline layout tracking to State, automatically dirty descriptor set bindings when the pipeline layout changed to one incompatible for those bindings, and check descriptor set bindings were compatible with the current pipeline layout before recording them, but that would be extra state to track and more runtime checks, too. There might be a silver bullet solution I've not thought of, but I'm not going to hold my breath for one. If we want maximum performance, I think we're going to need a system that figures out which state is applicable to which pipeline upfront and blocks out incompatible state higher in the scenegraph from being inappropriately applied, and solving that won't solve the regression here with state from sibling nodes.

@robertosfield
Copy link
Collaborator Author

I will need to spend some time understanding the issue properly. As this lazy state updating results in a Vulkan validation error I will try to resolve the validation error being merging with VSG master.

FYI, the pointer based lazy state updating approach I've implemented can be reset for each slot or for all slots at any time by resetting the StateStack.stack[0] entry to nullptr. This essentially implements a dirty() do the next call with force a rebind.

So.. I guess when the text subgraph pops it'd need to selectively dirty the StateStack's that have been "disturbed." It's curious that NVidia and AMD drivers aren't been "distribued" by what is happening at present, but the Validation layer is. As you say the Validation layer is following specs, rather than going out of it's way to do something as sensible fallback like NVidia and AMD drivers are doing.

@AnyOldName3
Copy link
Contributor

The wording of the spec suggests they expect some drivers to have a stack that holds push constants at the top, then descriptor sets in numerical order, and when something gets bound, things have to be popped off the stack to make the thing that's being bound be at the top, so then everything in higher slots needs binding again. If a driver didn't use a stack and used something like a vector that allowed arbitrary access to things in the middle, then rebinding a low slot wouldn't discard anything bound to a higher slot.

@robertosfield
Copy link
Collaborator Author

I'm currently thinking of looking at dirtying/resetting the State::stateStacks last applied StateCommand when a GraphicsPipeline/ComputePipeline is bound. The lazy state updating should make sure that pipelines are only bounded when they change.

Potentially putting the last applied PipelineLayout into State might be a way of make it possible to use different GraphicsPipeline without dirtying the State::stateStacks.

@robertosfield
Copy link
Collaborator Author

@AnyOldName3 I have changes that fix the Vulkan validation error in the dataset you provided. There is a performance overhead on the models I'm testing, so I'll keep testing and reviewing the code to see if I can lower the overhead. Changes are not too intrusive as part of the functionality was already in place for tracking current pipline layout.

@robertosfield
Copy link
Collaborator Author

I have checked in my fix for the change of pipeline causing Vulkan validation errors: bc4ce60

I have got the performance overhead down enough that I'm happy to merge with VSG master. We are about 24% faster than VSG master, and between 20% and 67% faster than v1.1.10.

Static ViewportState is still faster for the model with lots of trees in the transparent bin, no matter what optimization I've applied I can't get rid of this overhead, we still have a 2.4% overhead. What it looks like to me is the BindGraphicsPipeline is forcing the driver to set up the pipeline each time and when using dynamic ViewportState there is a small overhead. Once a pipeline is bound there doesn't seem to be little or any overhead.

The irony is my trying to get rid of this 2.4% overhead didn't result in fixing it, but I ended up fixing other bottlenecks that affected the static and dynamic ViewportState usage to an extent that both are now significantly faster than v1.0.8 and v.1.10. Unfortunately, it's taken me an extra months work to do it... but hey 20-60% performance improvement has to be worth it.

@robertosfield robertosfield merged commit cefc978 into master Apr 3, 2025
16 checks passed
@AnyOldName3
Copy link
Contributor

That fix might throw away more performance than it needs to. I've not made a PR yet and it looks like it doesn't build in CI due to either a missing include or not fencing off a check for an extension feature behind conditional compilation to see if the machine's Vulkan SDK has that extension, but last night I implemented this function to determine how compatible two pipeline layouts are here master...AnyOldName3:VulkanSceneGraph:partial-compatibility.

If the bool in the return value is false, then the push constants need dirtying, and the unit32_t tells you the lowest descriptor set slot that needs dirtying (everything with a higher slot needs dirtying, too). Computing it every time pipelines are switched during the record traversal might turn out to be more expensive than rebinding all the descriptor sets like bc4ce60 does, but maybe if the results were cached or could be prepared upfront during the compile traversal so the compatibility level could be grabbed from a table for most pipeline switches, it would pay for itself.

@robertosfield
Copy link
Collaborator Author

@AnyOldName3 I've done my final code review and testing, so far everything looks like with my latest changes, and they resolve the Vulkan validation error in the stateStackPointer-validation-errors.vsgt model. As I'm happy I've gone ahead and merge the new branch with VSG master #1433.

This moves the need for testing on any applications that may be of interest. Fingers crossed the PipelineLayout change forcing a State::stateStack::dirty() will force all the state to rebuild that require it.

@robertosfield
Copy link
Collaborator Author

That fix might throw away more performance than it needs to. I've not made a PR yet and it looks like it doesn't build in CI due to either a missing include or not fencing off a check for an extension feature behind conditional compilation to see if the machine's Vulkan SDK has that extension, but last night I implemented this function to determine how compatible two pipeline layouts are here master...AnyOldName3:VulkanSceneGraph:partial-compatibility.

If the bool in the return value is false, then the push constants need dirtying, and the unit32_t tells you the lowest descriptor set slot that needs dirtying (everything with a higher slot needs dirtying, too). Computing it every time pipelines are switched during the record traversal might turn out to be more expensive than rebinding all the descriptor sets like bc4ce60 does, but maybe if the results were cached or could be prepared upfront during the compile traversal so the compatibility level could be grabbed from a table for most pipeline switches, it would pay for itself.

A compatibility function is potentially useful regardless of whether it's used at runtime. I have plenty of other work that's been held up by trying to fix the original performance regressions so now we are significantly ahead I want move on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants