-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long vector test plan #421
base: main
Are you sure you want to change the base?
Long vector test plan #421
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far.
We try and format these specs to 80-columns to make them easier to review in source form.
Are the tables of operations a list of all the things that'll need tests added or is that still being determined?
- For tests where we only add new test cases we do not need to update HLK test GUIDs. | ||
- If we want to add a new HLK requirement, then we will probably want to create new tests which consume that requirement. Individual test cases can not have individual HLK requirements. | ||
- The existing HLK reqs we have look pretty dated. "Device.Graphics.WDDM27.AdapterRender.D3D12.DXILCore.ShaderModel65.CoreRequirement". Add a requirement for SM 6.9? We're requiring Long Vectort support for SM 6.9 | ||
3. Do the work in two 'phases'? Add the tests. Then do the HLK side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should build up a plan that involves multiple deliveries to the IHVs. If we can identify a set of high-priority tests (eg those needed by a particular demo) we could try and get those out first to support IHVs building drivers for the preview release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this and if IHVs are OK with it I think the easiest way is to just share stand-alone test binaries/collateral. Can you help me sync up with the right people to figure out what works for IHVs? What have we done in the past?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, we can sort that out, but I'm more interested in what and when we'll deliver these rather than the mechanism used to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, given that the DXC repo is public we should be able to just share build/run instructions for the tests once checked in?
Will update the formatting. The intrinsics in the table are all of the explicitly listed intrinsics in the hlsl long vector. I think some operators are missing from this table. I'll look to get everything added so we have an exhaustive list. |
|
||
# Vector Sizes to test | ||
|
||
I don't think there are any particularly interesting vector sizes to test. So I |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there are any particularly interesting vector sizes to test.
I think we might be able to come up with some.
For example, I think vector<half, 7>, vector<half, 8> and vector<half, 9> may be interesting because they cover the 128-bit boundary that has been in place most of the time before and could be interesting for load/stores as well as how they work in groupshared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to Damyan's suggestions, what comes to mind are:
- vec5s - one above the previous vector limit. Also for the 128 boundary case in the case of 32 bit values that Damyan mentioned.
- vec16 - previously only appeared as matrices, which might introduce some complications
- vec17 - bigger than any vector previously possible
- double3, uint64_t3 - another way to just exceed the 128 boundary, but with the added complexity of 64-bit types.
- 1024 for float, half, double, and int64 - It's the maximum size after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've updated to reflect these test cases.
Can either of you help point me in a direction to better understand how this could be interesting when testing load/stores and how they work in groupshared?
'ExecTests' source code in the DXC repo. There is a script in the WinTools repo | ||
which generates and annotates the HLK tests. | ||
|
||
There are three test categories we are concerned with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need special tests for groupshared?
Are there any other types of storage that might be interesting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, I was thinking about this and I need to learn/explore more. I suspect I'm actually missing details on test case granularity. Does testing across data types matter? Do we care to verify the output of operations for correctness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably be testing for correctness, yes. Sizes, especially around interesting alignment cases, are probably interesting. It might be interesting to build cases that could conceivably generate overruns and check for them. For example, most GPUs will operate on at least 32-bits at once, so if you have 16-bit values then what happens if you have an odd number of elements - could it accidentally overwrite the next variable if it is assuming alignment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And so the size of a type, and the element count, gets interesting. I expect you could come up with some interesting boundary cases and reuse that across all the types without having to think hard about each combination individually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just trying to add detail and answer questions where I can.
- These are the test cases for the 'basic' HLSL operators listed in the | ||
HLSL operators table below. | ||
- We have added some test cases for these with the long vectors work, but | ||
we will want to make sure we add HLK coverage for all of these. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're saying that the HLK tests will cover the same operations as the added test cases, that's useful, but they don't add anything to HLK coverage directly as this seems to imply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably confused here. I was assuming most, if not all, operators would map to some LLVM or DXIL operator? If that's true then wouldn't it make sense to have test cases in the HLK that use those operators with vector arguments?
|
||
# Vector Sizes to test | ||
|
||
I don't think there are any particularly interesting vector sizes to test. So I |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to Damyan's suggestions, what comes to mind are:
- vec5s - one above the previous vector limit. Also for the 128 boundary case in the case of 32 bit values that Damyan mentioned.
- vec16 - previously only appeared as matrices, which might introduce some complications
- vec17 - bigger than any vector previously possible
- double3, uint64_t3 - another way to just exceed the 128 boundary, but with the added complexity of 64-bit types.
- 1024 for float, half, double, and int64 - It's the maximum size after all.
| asint | Emulated | | | ||
| asint16 | Emulated | | | ||
| asuint | DXIL::OpCode::SplitDouble | | | ||
| asuint16 | Emulated | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm mostly confused by the differing levels of detail in the "emulated" entries. Maybe they are just placeholders for eventual expansion. That's fine if so, but ultimately we want a list of DXOps and LLVM operators that we need to test. Showing how they map to the HLSL intrinsics is showing your work and that's fine, but most useful would be the list of DXIL::OpCode::* and LLVM operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'emulated' was essentially a placeholder. When I first started going through the list of HLSL intrinsics I was taking some notes on the LLVM/DXIL ops that I could see it distilled down to. I got stuck on several and gave up and went with 'emulated' for now.
I'll go back through everything that is marked as 'emulated' so we have an inventory of the operators used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good - there's still some open issues.
I'm also keen to see a prioritization of which tests will be implemented first.
These operations are good candidates for high-priority tests I think:
- Initializing a vector with another.
- Multiply all components of a vector with a scalar value
- Add all components of a vector with a scalar value
- Clamp all components of a vector to the range [c, t]
- Component wise minimum between 2 vectors
- Component wise maximum between 2 vectors
- Component wise multiply between 2 vectors
- Component wise add between 2 vectors
- Subscript access, vec[i] = c
their mapped DXIL OpCodes. We just need coverage for each DXIL OpCode. If | ||
the DXIL OpCode column lists 'emulated' then this means that there are | ||
multiple LLVM/DXIL Ops used to compute the intrinsic. We will need to do an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the DXIL OpCode column lists 'emulated' then this means that there are multiple LLVM/DXIL Ops used to compute the intrinsic.
I think you've changed how this appears in the document now?
vectors across buffer types. __TODO: Add details about which types of | ||
buffers to test and why.__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Add details about which types of buffers to test and why
This comment here to remind us that there's a TODO. If we want to complete the PR without doing the TODO then we should get a follow-up issue filed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this include vector load/stores as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to fill in the details before completing this PR.
| atan2 | Atan | CreateFDiv, CreateFAdd, CreateFSub, CreateFCmpOLT, | ||
||| CreateFCpmOEQ, CreateFCmpOGE, CreateFCmpOLT, CreateAnd, CreateSelect | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing that these "CreateXXX" function names came from the C++ code. The LLVM instructions are things like fdiv
, fadd
, fcmp
, and
, select
etc.
I also note that if I try atan2
in godbolt it doesn't actually generate fsub
. https://godbolt.org/z/7xqz6448P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they're the C++ helper functions. I was intending to distill down further on a subsequent iteration. Wanted to sanity check that they do what I would expect (always return the llvm instruction in the name).
I'll have to peel a little more in TranslateAtan2, for this 'inventory' I went through the various Translate* functions in HLSLOperationLower and noted the possible operations and instructions. For TranslateAtan2 it looks like FSub is always used, so it must be optimized or simplified out along the way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asked for some help. The FSub is optimized out to normalize to FAdd. If you pass '-O0' instead you'll see that the FSub is there.
But this makes me wonder about the right testing approach to ensure coverage. Given that the goal is to test that DXIL Ops and LLVM Instructions are functional with vectors, maybe we want to disable optimizations for the test cases?
- vector<TYPE, 16> : This size of 'vector' previously only appeared as matrices. | ||
- vector<TYPE, 17> : Larger than any vector previously possible. | ||
- vector<TYPE, 1024> : The new max size of a vector. Test for float, half, | ||
double, and int64. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
" Test for float, half, double, and int64.'" this is for all the above sizes right? What about packed types?
.