Refacatored support for dynamic ViewportState to avoid the CPU overhead assocaited with using it. #1431

robertosfield · 2025-03-31T13:00:34Z

Changes also improve performance when using static ViewportState so everyone wins :-)

…ng dedicated view related state.

…dual sizing of the State::stateStacks and viewStateStacks.

…urceHints/ResourceRequirements.

…Handler

…cks.

AnyOldName3 · 2025-04-01T17:42:07Z

This seems to work with the application we've discussed that originally had problems with resized viewports and sparked the dynamic viewport work.

AnyOldName3 · 2025-04-01T21:54:16Z

However, the lazy state tracking does seem to have exacerbated the problem with disturbed descriptors. I didn't come up with a simple repro when I was investigating it earlier, and thought I'd found a typo in the mitigation attempt I mentioned in #1394 rather than it being a problem with this PR, but just realised that there's probably a new problem. I'll try and make a demo for it tomorrow.

robertosfield · 2025-04-02T09:03:50Z

@AnyOldName3 I am consider another approach for State::stateStack + viewStateStack, as I'm not happy with ViewportState being on slot 0 by default, as this might cause problems down the line if folks start putting ViewportState on the vsg::StateGroup.

The change I'm thinking of would be to have just State::stateStack, set the ViewportState::slot to something like 8, then have maxSlot number for standard subgraphs, a maxSlot for view releated state, and potentially another higher up for RenderGraph/CommandGraph level state. The later is just a possibility. The crux with this approach is whether I can keep the overhead down.

With the other bug(s) mentioned in #1394 is it worth me trying/merging out the vsgincompatiblepipelinelayouts example at this point?

…SceneGraph into stateStackPointer

robertosfield · 2025-04-02T11:17:08Z

I have checked in my refactor and it's even faster than this branch, and simpler, so all good :-)

I have generated a PR for these changes: #1433

I will do a code review and continue with other checks if that all goes smoothly I'll close this PR and focus on the new approach.

AnyOldName3 · 2025-04-02T15:30:35Z

stateStackPointer-validation-errors.zip

Here's a .vsgt that shows the problem with this branch. I'd expect it to affect viewMaxSlot, too.

It contains a group whose children are:

a node that uses the PBR shaderset (generated by loading a basic .glTF).
a text node.
the first node again.

The PBR shaderset uses more descriptor sets than the text shaderset, so for one of the slots, the PBR material is still the most recently bound state command, and when it's popped (after the first node's recorded) and repushed (so the third node can be recorded), that slot's marked as dirty on master, but on this branch, that mechanism's scrapped and it still passes the pointer equality test. Binding the text pipeline in between disturbed the descriptor, though, so it must be rebound like it is on master.

robertosfield · 2025-04-02T16:16:48Z

Thanks for the test model.

When I run it I see a blue cube, when I zoom in I see "some text", as far as I can tell there isn't visual glitches. Should I be expecting some?

When I run:

vsgviewer stateStackPointer-validation-errors.vsgt -d

I get lots of output in the form:

    Objects: 2
        [0] 0xcb1c7c000000001b, type: 19, name: NULL
        [1] 0xd10d270000000018, type: 17, name: NULL
VUID-vkCmdDrawIndexed-None-08600(ERROR / SPEC): msgNum: 941228658 - Validation Error: [ VUID-vkCmdDrawIndexed-None-08600 ] Object 0: handle = 0xcb1c7c000000001b, type = VK_OBJECT_TYPE_PIPELINE; Object 1: handle = 0xd10d270000000018, type = VK_OBJECT_TYPE_PIPELINE_LAYOUT; | MessageID = 0x381a0272 | vkCmdDrawIndexed():  The VkPipeline 0xcb1c7c000000001b[] (created with VkPipelineLayout 0xd10d270000000018[]) statically uses descriptor set 1 which is not compatible with the currently bound descriptor set's pipeline layout (VkPipelineLayout 0xd10d270000000018[])
The set (1) is out of bounds for the number of sets bound (1)
. The Vulkan spec states: For each set n that is statically used by a bound shader, a descriptor set must have been bound to n at the same pipeline bind point, with a VkPipelineLayout that is compatible for set n, with the VkPipelineLayout used to create the current VkPipeline or the VkDescriptorSetLayout array used to create the current VkShaderEXT , as described in Pipeline Layout Compatibility (https://vulkan.lunarg.com/doc/view/1.3.290.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdDrawIndexed-None-08600)

This is with viewMaxSlot branch derived from this PR's branch. Both this and the viewMaxSlot branch use the pointer based lazy state updating so will assume the problem exists there as well in the same way.

From what you've written it sounds like VSG master works fine with this test model. Is that correct? Is this issue related to other one that you've been looking into?

AnyOldName3 · 2025-04-02T16:40:01Z

From the fact that we've not seen major problems in the client app we've discussed, where even with the master branch, descriptor-disturbance-related validation errors are emitted for most draw calls if you disable the ten-of-each-validation-error limit that Vulkan's validation layer has by default, I think it's a safe bet that both Nvidia's and AMD's desktop drivers are more leniant with this particular abuse of the API than the specification requires them to be. On a driver that was stricter, the duplicate cube (which doesn't contribute to any fragments as I didn't bother giving it its own transform) would potentially access uninitialised memory for the material descriptor, so could end up entirely the wrong colour or could even crash the driver and take down the whole system.

I don't know the prevalence of implementations that have symptoms, whether they're just minor visual glitches or major consequences like kernel crashes.

To confirm, the master branch generates no validation errors with that test data and this branch generates the ones you've seen.

The issue here is related to #1394 in that in both cases, we're running afoul of the descriptor set disturbance rules, but it's not entirely the same thing as this one's a regression caused by not rebinding things when we used to with obviously sensible scenegraphs, whereas the scenegraphs that trigger #1394 are structured in a way that looks more fragile once you know the disturbance rules, although it would be reasonable for someone who's got scenegraph experience (especially with the OSG) and hasn't been looking into this class of problem recently to expect them to work.

AnyOldName3 · 2025-04-02T16:53:52Z

As for the vsgincompatiblepipelinelayouts example, there are potential fixes for the regression from this branch which would also make it work, but they're invasive and would likely have more of a performance cost than you'd be happy with. For example, we could add pipeline layout tracking to State, automatically dirty descriptor set bindings when the pipeline layout changed to one incompatible for those bindings, and check descriptor set bindings were compatible with the current pipeline layout before recording them, but that would be extra state to track and more runtime checks, too. There might be a silver bullet solution I've not thought of, but I'm not going to hold my breath for one. If we want maximum performance, I think we're going to need a system that figures out which state is applicable to which pipeline upfront and blocks out incompatible state higher in the scenegraph from being inappropriately applied, and solving that won't solve the regression here with state from sibling nodes.

robertosfield · 2025-04-02T17:20:23Z

I will need to spend some time understanding the issue properly. As this lazy state updating results in a Vulkan validation error I will try to resolve the validation error being merging with VSG master.

FYI, the pointer based lazy state updating approach I've implemented can be reset for each slot or for all slots at any time by resetting the StateStack.stack[0] entry to nullptr. This essentially implements a dirty() do the next call with force a rebind.

So.. I guess when the text subgraph pops it'd need to selectively dirty the StateStack's that have been "disturbed." It's curious that NVidia and AMD drivers aren't been "distribued" by what is happening at present, but the Validation layer is. As you say the Validation layer is following specs, rather than going out of it's way to do something as sensible fallback like NVidia and AMD drivers are doing.

AnyOldName3 · 2025-04-02T17:31:49Z

The wording of the spec suggests they expect some drivers to have a stack that holds push constants at the top, then descriptor sets in numerical order, and when something gets bound, things have to be popped off the stack to make the thing that's being bound be at the top, so then everything in higher slots needs binding again. If a driver didn't use a stack and used something like a vector that allowed arbitrary access to things in the middle, then rebinding a low slot wouldn't discard anything bound to a higher slot.

robertosfield · 2025-04-03T08:06:24Z

I'm currently thinking of looking at dirtying/resetting the State::stateStacks last applied StateCommand when a GraphicsPipeline/ComputePipeline is bound. The lazy state updating should make sure that pipelines are only bounded when they change.

Potentially putting the last applied PipelineLayout into State might be a way of make it possible to use different GraphicsPipeline without dirtying the State::stateStacks.

robertosfield · 2025-04-03T11:06:23Z

@AnyOldName3 I have changes that fix the Vulkan validation error in the dataset you provided. There is a performance overhead on the models I'm testing, so I'll keep testing and reviewing the code to see if I can lower the overhead. Changes are not too intrusive as part of the functionality was already in place for tracking current pipline layout.

robertosfield · 2025-04-03T13:18:42Z

I have checked in my fix for the change of pipeline causing Vulkan validation errors: bc4ce60

I have got the performance overhead down enough that I'm happy to merge with VSG master. We are about 24% faster than VSG master, and between 20% and 67% faster than v1.1.10.

Static ViewportState is still faster for the model with lots of trees in the transparent bin, no matter what optimization I've applied I can't get rid of this overhead, we still have a 2.4% overhead. What it looks like to me is the BindGraphicsPipeline is forcing the driver to set up the pipeline each time and when using dynamic ViewportState there is a small overhead. Once a pipeline is bound there doesn't seem to be little or any overhead.

The irony is my trying to get rid of this 2.4% overhead didn't result in fixing it, but I ended up fixing other bottlenecks that affected the static and dynamic ViewportState usage to an extent that both are now significantly faster than v1.0.8 and v.1.10. Unfortunately, it's taken me an extra months work to do it... but hey 20-60% performance improvement has to be worth it.

AnyOldName3 · 2025-04-03T13:42:00Z

That fix might throw away more performance than it needs to. I've not made a PR yet and it looks like it doesn't build in CI due to either a missing include or not fencing off a check for an extension feature behind conditional compilation to see if the machine's Vulkan SDK has that extension, but last night I implemented this function to determine how compatible two pipeline layouts are here master...AnyOldName3:VulkanSceneGraph:partial-compatibility.

If the bool in the return value is false, then the push constants need dirtying, and the unit32_t tells you the lowest descriptor set slot that needs dirtying (everything with a higher slot needs dirtying, too). Computing it every time pipelines are switched during the record traversal might turn out to be more expensive than rebinding all the descriptor sets like bc4ce60 does, but maybe if the results were cached or could be prepared upfront during the compile traversal so the compatibility level could be grabbed from a table for most pipeline switches, it would pay for itself.

robertosfield · 2025-04-03T13:44:05Z

@AnyOldName3 I've done my final code review and testing, so far everything looks like with my latest changes, and they resolve the Vulkan validation error in the stateStackPointer-validation-errors.vsgt model. As I'm happy I've gone ahead and merge the new branch with VSG master #1433.

This moves the need for testing on any applications that may be of interest. Fingers crossed the PipelineLayout change forcing a State::stateStack::dirty() will force all the state to rebuild that require it.

robertosfield · 2025-04-03T14:26:37Z

That fix might throw away more performance than it needs to. I've not made a PR yet and it looks like it doesn't build in CI due to either a missing include or not fencing off a check for an extension feature behind conditional compilation to see if the machine's Vulkan SDK has that extension, but last night I implemented this function to determine how compatible two pipeline layouts are here master...AnyOldName3:VulkanSceneGraph:partial-compatibility.

If the bool in the return value is false, then the push constants need dirtying, and the unit32_t tells you the lowest descriptor set slot that needs dirtying (everything with a higher slot needs dirtying, too). Computing it every time pipelines are switched during the record traversal might turn out to be more expensive than rebinding all the descriptor sets like bc4ce60 does, but maybe if the results were cached or could be prepared upfront during the compile traversal so the compatibility level could be grabbed from a table for most pipeline switches, it would pay for itself.

A compatibility function is potentially useful regardless of whether it's used at runtime. I have plenty of other work that's been held up by trying to fix the original performance regressions so now we are significantly ahead I want move on.

robertosfield added 24 commits March 7, 2025 12:42

Added State::pushView/popView()/recordView() methods in prep for havi…

2dc286d

…ng dedicated view related state.

Wired up ddicated State::viewStateStack

bac0ae0

Added dedicated maxStateSlot and maxViewSlot numbers to enable indivi…

d86f256

…dual sizing of the State::stateStacks and viewStateStacks.

Removoed unnecessary defaults

7190513

Added back in default constructor to fix vsgExamples build

54a2bca

Removed unneccessary hardwired maxStateSlot setting

76375fc

Removed slot number setting from constructor.

2ae7aef

Implemented merge of DynamicState values.

a7491f0

Added support for selecting dynamic or static ViewportState, via Reso…

426f49c

…urceHints/ResourceRequirements.

Added assignment of ResourceRequirements when setting up WindowResize…

eb25f59

…Handler

Converted viewportStateHint into mask.

2539a31

Merge branch 'master' into viewStateStack

01afbd0

Merge branch 'master' into viewStateStack

61f332c

Added lazy state updating.

c1528ac

Changed to using const C pointers rather than ref_ptr<> for state sta…

3a9677b

…cks.

Implemented vsg::StateStack as an vsg::Array

805c62c

Fixed lazy state tracking.

df4f57c

Fixed popView()

da37c65

Merge branch 'ProfilerFix' into stateStackPointer

e6fad5b

Merge branch 'master' into stateStackPointer

b204e3d

Merge branch 'ProfilerFix' into stateStackPointer

492507a

Merge branch 'master' into stateStackPointer

d2cd118

Fixed copyright date.

b6f682f

Ran clang-format

47e65b6

Merge branch 'master' into stateStackPointer

38f703a

robertosfield added 2 commits April 2, 2025 10:09

Merge branch 'master' into stateStackPointer

33d58d1

Merge branch 'stateStackPointer' of https://github.com/vsg-dev/Vulkan…

cefc978

…SceneGraph into stateStackPointer

robertosfield merged commit cefc978 into master Apr 3, 2025
16 checks passed

AnyOldName3 mentioned this pull request Apr 3, 2025

Descriptor sets aren't rebound when pipeline layout change 'disturbs' them #1394

Open

robertosfield deleted the stateStackPointer branch April 11, 2025 09:48

theodoregoetz mentioned this pull request Jun 4, 2025

use rendergraph renderArea as scissor instead of viewport #1487

Open

9 tasks

Refacatored support for dynamic ViewportState to avoid the CPU overhead assocaited with using it. #1431

Refacatored support for dynamic ViewportState to avoid the CPU overhead assocaited with using it. #1431

Uh oh!

Conversation

robertosfield commented Mar 31, 2025

Uh oh!

AnyOldName3 commented Apr 1, 2025

Uh oh!

AnyOldName3 commented Apr 1, 2025

Uh oh!

robertosfield commented Apr 2, 2025

Uh oh!

robertosfield commented Apr 2, 2025

Uh oh!

AnyOldName3 commented Apr 2, 2025

Uh oh!

robertosfield commented Apr 2, 2025

Uh oh!

AnyOldName3 commented Apr 2, 2025

Uh oh!

AnyOldName3 commented Apr 2, 2025

Uh oh!

robertosfield commented Apr 2, 2025

Uh oh!

AnyOldName3 commented Apr 2, 2025

Uh oh!

robertosfield commented Apr 3, 2025

Uh oh!

robertosfield commented Apr 3, 2025

Uh oh!

robertosfield commented Apr 3, 2025

Uh oh!

Uh oh!

AnyOldName3 commented Apr 3, 2025

Uh oh!

robertosfield commented Apr 3, 2025

Uh oh!

robertosfield commented Apr 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants