-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add proposal for scalar layout for constant buffers #317
base: main
Are you sure you want to change the base?
Conversation
@microsoft-github-policy-service agree company="AMD" |
A few thoughts/questions, mostly around some extra details that would be useful. UsageI think you're suggesting we add a source-level attribute something like:
Is that correct? Should we also have a command line argument to change the default value? Expected AlignmentsCan you expand a bit on the expected alignments for different types? Most C & C++ compilers do support vector types with 16-byte alignment, so they wouldn't necessarily be aligned to their element component. How would the new packing rules apply for something like: struct MyConstants {
int X;
int64_t Y[2];
int16_t Z[3];
float __attribute__((vector_size(4))) V; // equivalent to float4
}; |
Yes that's what I'm thinking yep, and I agree a compiler flag to switch the default would be useful.
In the Vulkan version of this feature against SPIR-V, vector size doesn't change the alignment requirements; we basically treat them the same as arrays of floats. So the alignment required is that of a float - 4 bytes. I would expect the same here for float4 in HLSL. Granted this is not the same as C/C++ language extensions, but vectors aren't really treated the same way on modern GPUs as they are in CPU ISAs, which is why it works this way in Vulkan (and every supporting vendor's hardware). I could see an argument for matching C/C++ vector extensions, but it's going to depend on what developers want to see here. We could add in a way to switch between these two packing modes if we think it's helpful though. |
I've updated this with a first pass to a detailed design. I have the open question of if metadata will be required to determine if the constant buffer is in legacy or scalar layout. Currently when examining the DXIL output of a shader it can be impossible to determine if a constant buffer is in legacy or scalar layout. My (limited) understanding of DX reflection makes me think we do need some metadata to state when a constant buffer is in scalar layout so the runtime can act appropraitely, but I'd like to hear other opinions on this, and if we need to support this or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking on the detail sections of this proposal, looks good overall, just a few clarifying questions!
# Scalar constant buffer layout | ||
|
||
* Proposal: [0023](0023-scalar-constant-buffer-layout.md) | ||
* Author(s): [Tobias Hector](https://github.com/tobski) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Alexander-Johnston you should probably add yourself as author at this point!
``` | ||
|
||
As all values must be aligned to their scalar size, a scalar or element of a | ||
composite type may not lay across the 16 byte boundary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to say 4-byte? Or was this meant to refer to the legacy layout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the language is probably a bit redundant. If scalar objects are aligned to their size for all scalar objects under 16-bytes, it would be impossible to cross a 16-byte boundary. HLSL doesn't have any larger than 8-bytes, but I don't think we need language about not crossing a 4-byte boundary because 64-bit types would be 8-byte aligned and will cross the 4-byte boundary between 8-byte alignments.
As all values must be aligned to their scalar size, a scalar or element of a | ||
composite type may not lay across the 16 byte boundary. | ||
In the following example the double is aligned identically in both legacy and | ||
scalar layout, to 8 bytes, demonstrating this point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on what the prior sentence was supposed to say this might need some more explanation.
showing the cbuffer loads and extractvalue instructions needed to load each | ||
element of a 2x2 float matrix. | ||
Note that the legacy example requires two 16 byte loads from the cbuffer, while | ||
the scalar example requires only one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As someone not entirely familiar with DXIL, it would be useful to explain here why the legacy version needs an extra load, because it's not at all clear here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cbuffer storage for a matrix<T,2,2> is basically a vector<T,2>[2]. The packing rules for arrays in cbuffers are that each array index starts on a new "row" of the cbuffer. You can see this visualized here:
https://maraneshi.github.io/HLSL-ConstantBufferLayoutVisualizer/?visualizer=MYIwrgZhCmBOAE0AeBDAtgBwDbXgbwCh5j4IsB7FAFwCYkb41qBuAgX2aA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks, that does help quite a bit - would be good to get this explanation added to the proposal!
|
||
### SPIR-V Support | ||
|
||
SPIR-V already supports this scalar layout proposal through the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
SPIR-V already supports this scalar layout proposal through the | |
SPIR-V already supports this scalar layout proposal, as exercised by the |
(noting that SPIR-V can express this natively, the Vulkan extension just makes it valid to use with Vulkan)
### Interchange Format Additions | ||
|
||
This feature does not require any change to the DXIL language to represent | ||
scalar layouts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems questionable...
I'd really like @tex3d to think on this. I suspect we could use the existing cbuffer load instructions to emulate the scalar layout but that may have limitations and impacts on things like the reflection data which we would need to surface the layout to users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should attempt to implement scalar layout on top of the CBufferLoadLegacy
DXIL op.
While it should be possible to map arbitrary accesses of the scalar layout to the legacy cbuffer load op, it would be ugly, and there would be complications (especially around dynamic indexing). The reason the non-legacy cbuffer load (CBufferLoad
) DXIL op exists is to support this kind of scalar layout in a future compiler version. The intended mechanism was through a global switch, rather than a per-cbuffer (range) flag, however.
In any case, scalar layout is (as far as I can tell) the same as the layout we would use in structured buffer and what's described by the data layout, plus the default vector alignment modification DXC uses.
We already surface reflection information for structured buffers using the data layout rather than using the constant buffer offsets from the type annotation metadata. It shouldn't be a stretch to use this for scalar-layout constant buffer reflection, but we will need to retain a flag for the cbuffer to guide the reflection code.
There is also code in DXC for marking constants as used for reflection which would be impacted, though that code is problematic (such as for offset array indexing) and needs a new approach anyway.
Regarding the existing "CBufferLoad" DXIL op (non-legacy version), we intended to retroactively disallow it on existing shader models, since we know that it's not HLK tested or well supported by existing drivers, reserving it for a new shader model once we had the HLK testing and driver support lined up.
There is also a proposal for alignas
(#154), which would interact with this, requiring some solution for better-than-default alignment of fields and a corresponding reflection solution.
This proposal adds an attribute to specify that the layout of constant buffers uses C-style packing rules, according to the size of the scalar components of matrices and vectors, rather than the packing rules currently used for constant buffers when they are involved.