-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add proposal for scalar layout for constant buffers #317
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,300 @@ | ||||||
<!-- {% raw %} --> | ||||||
|
||||||
# Scalar constant buffer layout | ||||||
|
||||||
* Proposal: [0023](0023-scalar-constant-buffer-layout.md) | ||||||
* Author(s): [Tobias Hector](https://github.com/tobski) | ||||||
* Sponsor: Chris Bieneman | ||||||
* Status: **Under Consideration** | ||||||
|
||||||
|
||||||
## Introduction | ||||||
|
||||||
This proposal adds the ability to define constant buffers with alignment | ||||||
defined only by the scalar components of its structure members. | ||||||
This allows layout compatibility with native C/C++ structure layouts, where | ||||||
there is no native vector type. | ||||||
This functionality already exists in Vulkan via the | ||||||
[VK_EXT_scalar_block_layout](https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_scalar_block_layout.html) | ||||||
extension, but not yet in DirectX. | ||||||
|
||||||
|
||||||
## Motivation | ||||||
|
||||||
When generating data to be passed to a shader from the CPU programming | ||||||
language, applications need to be very deliberate with how that data is packed | ||||||
if they use vector and array types in HLSL. | ||||||
The packing rules for constant buffers are described here: | ||||||
https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-packing-rules. | ||||||
The gist is that most things are aligned to either 4 or 16 bytes, but members | ||||||
are packed down to a smaller alignment where possible, such that information | ||||||
about all elements is necessary to determine the final layout. | ||||||
|
||||||
|
||||||
## Proposed solution | ||||||
|
||||||
Given that scalar block layout is now widely supported across all recent | ||||||
hardware in Vulkan, this document proposes adding a new `bufferlayout` | ||||||
attribute that takes one of two values; either `legacy`, matching the current | ||||||
layout constraints, or the new `scalar` layout. | ||||||
The new `scalar` layout aligns structure members to the size of the scalar | ||||||
element of each type, with no additional alignment for arrays, vectors, or | ||||||
matrices; effectively matching standard C/C++ memory layouts. | ||||||
|
||||||
The new attribute would have a default value of `legacy` if not specified, | ||||||
but this could be changed with a command line option. | ||||||
A future version of HLSL may choose to make `scalar` the default, or even | ||||||
remove the `legacy` path altogether. | ||||||
|
||||||
Further, a compiler flag can be provided to switch the default layout to | ||||||
scalar, allowing for users to move to scalar layout without re-authoring large | ||||||
numbers of existing shaders. | ||||||
|
||||||
|
||||||
## Detailed Design | ||||||
|
||||||
### HLSL Additions | ||||||
|
||||||
A new `[bufferlayout("layout")]` attribute will be applicable to `cbuffer` and | ||||||
`ConstantBuffer<struct>` definitions. | ||||||
|
||||||
The `layout` may be either `legacy` or `scalar`. | ||||||
|
||||||
To avoid incompatibility with existing code, when the `bufferlayout` | ||||||
attribute is not applied to a constant buffer the layout will default to the | ||||||
`legacy` layout. | ||||||
The addition of this feature to HLSL will not have any effect on existing HLSL | ||||||
code as a result. | ||||||
|
||||||
|
||||||
#### Layouts | ||||||
|
||||||
`legacy` layout uses the current packing rules described here: | ||||||
https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-packing-rules. | ||||||
|
||||||
`scalar` layout modifies the alignment rules for constant buffers, requiring | ||||||
that each element is packed to align with it's scalar size. | ||||||
For composite types such as vectors and matrices this requires that each | ||||||
component be aligned individually to it's size, rather than the composite type | ||||||
be aligned to it's total size. | ||||||
The intention of these rules is to match the `scalarBlockLayout` in Vulkan, | ||||||
providing easy compatibility between DXIL and SPIR-V layouts. | ||||||
|
||||||
|
||||||
### Interchange Format Additions | ||||||
|
||||||
This feature does not require any change to the DXIL language to represent | ||||||
scalar layouts. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems questionable... I'd really like @tex3d to think on this. I suspect we could use the existing cbuffer load instructions to emulate the scalar layout but that may have limitations and impacts on things like the reflection data which we would need to surface the layout to users. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we should attempt to implement scalar layout on top of the While it should be possible to map arbitrary accesses of the scalar layout to the legacy cbuffer load op, it would be ugly, and there would be complications (especially around dynamic indexing). The reason the non-legacy cbuffer load ( In any case, scalar layout is (as far as I can tell) the same as the layout we would use in structured buffer and what's described by the data layout, plus the default vector alignment modification DXC uses. We already surface reflection information for structured buffers using the data layout rather than using the constant buffer offsets from the type annotation metadata. It shouldn't be a stretch to use this for scalar-layout constant buffer reflection, but we will need to retain a flag for the cbuffer to guide the reflection code. There is also code in DXC for marking constants as used for reflection which would be impacted, though that code is problematic (such as for offset array indexing) and needs a new approach anyway. Regarding the existing "CBufferLoad" DXIL op (non-legacy version), we intended to retroactively disallow it on existing shader models, since we know that it's not HLK tested or well supported by existing drivers, reserving it for a new shader model once we had the HLK testing and driver support lined up. There is also a proposal for |
||||||
|
||||||
SPIR-V already has full support for scalar layouts. | ||||||
|
||||||
|
||||||
### Diagnostic Changes | ||||||
|
||||||
These changes introduce two new errors for the `bufferlayout` attribute. | ||||||
|
||||||
First if the `bufferlayout` attribute is applied to any definition other than a | ||||||
constant buffer an error must be reported. | ||||||
|
||||||
Second if the `bufferlayout` attribute is given an invalid layout name an error | ||||||
must be reported. | ||||||
|
||||||
|
||||||
### Validation Changes | ||||||
|
||||||
DXIL Validation must be updated to accept scalar layout constant buffers in | ||||||
addition to the existing layout constant buffers. | ||||||
|
||||||
|
||||||
## Examples | ||||||
|
||||||
The following example demonstrates legal uses of the attribute. | ||||||
|
||||||
``` | ||||||
[bufferlayout("legacy")] | ||||||
cbuffer MyBuffer ... | ||||||
|
||||||
[bufferlayout("scalar")] | ||||||
cbuffer MyBuffer ... | ||||||
|
||||||
[bufferlayout("legacy")] | ||||||
ConstantBuffer<MyStruct> MyBuffer ... | ||||||
|
||||||
[bufferlayout("scalar")] | ||||||
ConstantBuffer<MyStruct> MyBuffer ... | ||||||
``` | ||||||
|
||||||
The following example demonstrates the difference in buffer alignment and size | ||||||
between legacy and scalar layouts in DXIL. | ||||||
Note that in legacy layout the float3 is aligned such that it begins in the | ||||||
second 16 byte space, while in scalar layout it begins immediately after the | ||||||
float2, with the 3rd element starting at offset 16. | ||||||
|
||||||
``` | ||||||
[bufferlayout("legacy")] | ||||||
cbuffer MyBuffer { | ||||||
float2 Foo; | ||||||
float3 Bar; | ||||||
}; | ||||||
|
||||||
; cbuffer MyBuffer | ||||||
; { | ||||||
; | ||||||
; struct MyBuffer | ||||||
; { | ||||||
; | ||||||
; float2 Foo; ; Offset: 0 | ||||||
; float3 Bar; ; Offset: 16 | ||||||
; | ||||||
; } MyBuffer; ; Offset: 0 Size: 28 | ||||||
; | ||||||
; } | ||||||
|
||||||
[bufferlayout("scalar")] | ||||||
cbuffer MyBuffer { | ||||||
float2 Foo; | ||||||
float3 Bar; | ||||||
}; | ||||||
|
||||||
; cbuffer MyBuffer | ||||||
; { | ||||||
; | ||||||
; struct MyBuffer | ||||||
; { | ||||||
; | ||||||
; float2 Foo; ; Offset: 0 | ||||||
; float3 Bar; ; Offset: 8 | ||||||
; | ||||||
; } MyBuffer; ; Offset: 0 Size: 20 | ||||||
; | ||||||
; } | ||||||
``` | ||||||
|
||||||
As all values must be aligned to their scalar size, a scalar or element of a | ||||||
composite type may not lay across the 16 byte boundary. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this supposed to say 4-byte? Or was this meant to refer to the legacy layout? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the language is probably a bit redundant. If scalar objects are aligned to their size for all scalar objects under 16-bytes, it would be impossible to cross a 16-byte boundary. HLSL doesn't have any larger than 8-bytes, but I don't think we need language about not crossing a 4-byte boundary because 64-bit types would be 8-byte aligned and will cross the 4-byte boundary between 8-byte alignments. |
||||||
In the following example the double is aligned identically in both legacy and | ||||||
scalar layout, to 8 bytes, demonstrating this point. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Depending on what the prior sentence was supposed to say this might need some more explanation. |
||||||
|
||||||
``` | ||||||
[bufferlayout("legacy")] | ||||||
cbuffer MyBuffer { | ||||||
float3 Foo; | ||||||
double Bar; | ||||||
}; | ||||||
|
||||||
; cbuffer MyBuffer | ||||||
; { | ||||||
; | ||||||
; struct MyBuffer | ||||||
; { | ||||||
; | ||||||
; float3 Foo; ; Offset: 0 | ||||||
; double Bar; ; Offset: 16 | ||||||
; | ||||||
; } MyBuffer; ; Offset: 0 Size: 24 | ||||||
; | ||||||
; } | ||||||
|
||||||
[bufferlayout("scalar")] | ||||||
cbuffer MyBuffer { | ||||||
float3 Foo; | ||||||
double Bar; | ||||||
}; | ||||||
|
||||||
; cbuffer MyBuffer | ||||||
; { | ||||||
; | ||||||
; struct MyBuffer | ||||||
; { | ||||||
; | ||||||
; float3 Foo; ; Offset: 0 | ||||||
; double Bar; ; Offset: 16 | ||||||
; | ||||||
; } MyBuffer; ; Offset: 0 Size: 24 | ||||||
; | ||||||
; } | ||||||
``` | ||||||
|
||||||
Matrix types in legacy layout are significantly different to scalar layout. | ||||||
With legacy layout a matrix is laid out so each row is 16 byte aligned. | ||||||
The exception to this is a matrix of doubles, in which case a double3x and | ||||||
double4x matrix will have the rows aligned to 32 bytes. | ||||||
With scalar layouts, as with vectors, each element is aligned to it's own scalar | ||||||
size. | ||||||
The following example demonstrates the differences in matrix layout by | ||||||
showing the cbuffer loads and extractvalue instructions needed to load each | ||||||
element of a 2x2 float matrix. | ||||||
Note that the legacy example requires two 16 byte loads from the cbuffer, while | ||||||
the scalar example requires only one. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As someone not entirely familiar with DXIL, it would be useful to explain here why the legacy version needs an extra load, because it's not at all clear here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The cbuffer storage for a matrix<T,2,2> is basically a vector<T,2>[2]. The packing rules for arrays in cbuffers are that each array index starts on a new "row" of the cbuffer. You can see this visualized here: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah thanks, that does help quite a bit - would be good to get this explanation added to the proposal! |
||||||
|
||||||
|
||||||
``` | ||||||
[bufferlayout("legacy")] | ||||||
cbuffer MyBuffer { | ||||||
float2x2 Foo; | ||||||
}; | ||||||
|
||||||
%loadOne = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %cbufferHandle, i32 0) | ||||||
%elemOne = extractvalue %dx.types.CBufRet.f32 %loadOne, 0 | ||||||
%elemTwo = extractvalue %dx.types.CBufRet.f32 %loadOne, 1 | ||||||
%loadTwo = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %cbufferHandle, i32 1) | ||||||
%elemThree = extractvalue %dx.types.CBufRet.f32 %loadTwo, 0 | ||||||
%elemFour = extractvalue %dx.types.CBufRet.f32 %loadTwo, 1 | ||||||
|
||||||
|
||||||
[bufferlayout("scalar")] | ||||||
cbuffer MyBuffer { | ||||||
float2x2 Foo; | ||||||
}; | ||||||
|
||||||
%loadOne = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %cbufferHandle, i32 0) | ||||||
%elemOne = extractvalue %dx.types.CBufRet.f32 %loadOne, 0 | ||||||
%elemTwo = extractvalue %dx.types.CBufRet.f32 %loadOne, 1 | ||||||
%elemThree = extractvalue %dx.types.CBufRet.f32 %loadThree, 2 | ||||||
%elemFour = extractvalue %dx.types.CBufRet.f32 %loadFour, 3 | ||||||
|
||||||
``` | ||||||
|
||||||
### SPIR-V Support | ||||||
|
||||||
SPIR-V already supports this scalar layout proposal through the | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit:
Suggested change
(noting that SPIR-V can express this natively, the Vulkan extension just makes it valid to use with Vulkan) |
||||||
[VK_EXT_scalar_block_layout](https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_scalar_block_layout.html) | ||||||
extension. | ||||||
|
||||||
|
||||||
## Open Questions | ||||||
|
||||||
* Do we require an update to DXIL metadata to describe that scalar layout is in | ||||||
use per module or per buffer, if at all? | ||||||
|
||||||
|
||||||
## Transition Strategy for Breaking Changes (Optional) | ||||||
|
||||||
As the default layout of buffers is not changing without user intervention, | ||||||
there should be no breakages introduced by this change. | ||||||
What may break things is if the default layout is ever changed by a new HLSL | ||||||
version; but while that seems likely, whether and how that happens is outside | ||||||
of the scope of this proposal. | ||||||
|
||||||
If the new attribute is specified without targeting the new shader model, the | ||||||
compiler should raise an error directing the user to use a later shader model. | ||||||
|
||||||
|
||||||
## Alternatives considered (Optional) | ||||||
|
||||||
One alternative method to solve this issue would be to add something similar | ||||||
to the `#pragma pack` directive in C/C++, but this would need to be set on all | ||||||
members, and requires a more complex implementation to be fully supported, | ||||||
given the push/pop variants supported by compilers. | ||||||
It may still be worthwhile to support something like this for more control over | ||||||
layouts, but that can be considered as a separate feature. | ||||||
|
||||||
|
||||||
## Acknowledgments (Optional) | ||||||
|
||||||
- Chris Bieneman | ||||||
- Jesse Natalie | ||||||
- Contributors to | ||||||
[VK_EXT_scalar_block_layout](https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_scalar_block_layout.html) | ||||||
|
||||||
<!-- {% endraw %} --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Alexander-Johnston you should probably add yourself as author at this point!