Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long vector test plan #421

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

alsepkow
Copy link
Collaborator

@alsepkow alsepkow commented Mar 7, 2025

.

Copy link
Member

@damyanp damyanp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far.

We try and format these specs to 80-columns to make them easier to review in source form.

Are the tables of operations a list of all the things that'll need tests added or is that still being determined?

- For tests where we only add new test cases we do not need to update HLK test GUIDs.
- If we want to add a new HLK requirement, then we will probably want to create new tests which consume that requirement. Individual test cases can not have individual HLK requirements.
- The existing HLK reqs we have look pretty dated. "Device.Graphics.WDDM27.AdapterRender.D3D12.DXILCore.ShaderModel65.CoreRequirement". Add a requirement for SM 6.9? We're requiring Long Vectort support for SM 6.9
3. Do the work in two 'phases'? Add the tests. Then do the HLK side.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should build up a plan that involves multiple deliveries to the IHVs. If we can identify a set of high-priority tests (eg those needed by a particular demo) we could try and get those out first to support IHVs building drivers for the preview release.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this and if IHVs are OK with it I think the easiest way is to just share stand-alone test binaries/collateral. Can you help me sync up with the right people to figure out what works for IHVs? What have we done in the past?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, we can sort that out, but I'm more interested in what and when we'll deliver these rather than the mechanism used to do it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, given that the DXC repo is public we should be able to just share build/run instructions for the tests once checked in?

@alsepkow
Copy link
Collaborator Author

Looks good so far.

We try and format these specs to 80-columns to make them easier to review in source form.

Are the tables of operations a list of all the things that'll need tests added or is that still being determined?

Will update the formatting. The intrinsics in the table are all of the explicitly listed intrinsics in the hlsl long vector. I think some operators are missing from this table. I'll look to get everything added so we have an exhaustive list.

@alsepkow alsepkow marked this pull request as ready for review March 12, 2025 17:49

# Vector Sizes to test

I don't think there are any particularly interesting vector sizes to test. So I
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there are any particularly interesting vector sizes to test.

I think we might be able to come up with some.

For example, I think vector<half, 7>, vector<half, 8> and vector<half, 9> may be interesting because they cover the 128-bit boundary that has been in place most of the time before and could be interesting for load/stores as well as how they work in groupshared.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to Damyan's suggestions, what comes to mind are:

  • vec5s - one above the previous vector limit. Also for the 128 boundary case in the case of 32 bit values that Damyan mentioned.
  • vec16 - previously only appeared as matrices, which might introduce some complications
  • vec17 - bigger than any vector previously possible
  • double3, uint64_t3 - another way to just exceed the 128 boundary, but with the added complexity of 64-bit types.
  • 1024 for float, half, double, and int64 - It's the maximum size after all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've updated to reflect these test cases.

Can either of you help point me in a direction to better understand how this could be interesting when testing load/stores and how they work in groupshared?

'ExecTests' source code in the DXC repo. There is a script in the WinTools repo
which generates and annotates the HLK tests.

There are three test categories we are concerned with:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need special tests for groupshared?

Are there any other types of storage that might be interesting?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, I was thinking about this and I need to learn/explore more. I suspect I'm actually missing details on test case granularity. Does testing across data types matter? Do we care to verify the output of operations for correctness?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably be testing for correctness, yes. Sizes, especially around interesting alignment cases, are probably interesting. It might be interesting to build cases that could conceivably generate overruns and check for them. For example, most GPUs will operate on at least 32-bits at once, so if you have 16-bit values then what happens if you have an odd number of elements - could it accidentally overwrite the next variable if it is assuming alignment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And so the size of a type, and the element count, gets interesting. I expect you could come up with some interesting boundary cases and reuse that across all the types without having to think hard about each combination individually.

Copy link
Member

@pow2clk pow2clk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just trying to add detail and answer questions where I can.

- These are the test cases for the 'basic' HLSL operators listed in the
HLSL operators table below.
- We have added some test cases for these with the long vectors work, but
we will want to make sure we add HLK coverage for all of these.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're saying that the HLK tests will cover the same operations as the added test cases, that's useful, but they don't add anything to HLK coverage directly as this seems to imply.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably confused here. I was assuming most, if not all, operators would map to some LLVM or DXIL operator? If that's true then wouldn't it make sense to have test cases in the HLK that use those operators with vector arguments?


# Vector Sizes to test

I don't think there are any particularly interesting vector sizes to test. So I
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to Damyan's suggestions, what comes to mind are:

  • vec5s - one above the previous vector limit. Also for the 128 boundary case in the case of 32 bit values that Damyan mentioned.
  • vec16 - previously only appeared as matrices, which might introduce some complications
  • vec17 - bigger than any vector previously possible
  • double3, uint64_t3 - another way to just exceed the 128 boundary, but with the added complexity of 64-bit types.
  • 1024 for float, half, double, and int64 - It's the maximum size after all.

| asint | Emulated | |
| asint16 | Emulated | |
| asuint | DXIL::OpCode::SplitDouble | |
| asuint16 | Emulated | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm mostly confused by the differing levels of detail in the "emulated" entries. Maybe they are just placeholders for eventual expansion. That's fine if so, but ultimately we want a list of DXOps and LLVM operators that we need to test. Showing how they map to the HLSL intrinsics is showing your work and that's fine, but most useful would be the list of DXIL::OpCode::* and LLVM operators.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'emulated' was essentially a placeholder. When I first started going through the list of HLSL intrinsics I was taking some notes on the LLVM/DXIL ops that I could see it distilled down to. I got stuck on several and gave up and went with 'emulated' for now.

I'll go back through everything that is marked as 'emulated' so we have an inventory of the operators used.

Copy link
Member

@damyanp damyanp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good - there's still some open issues.

I'm also keen to see a prioritization of which tests will be implemented first.

These operations are good candidates for high-priority tests I think:

  • Initializing a vector with another.
  • Multiply all components of a vector with a scalar value
  • Add all components of a vector with a scalar value
  • Clamp all components of a vector to the range [c, t]
  • Component wise minimum between 2 vectors
  • Component wise maximum between 2 vectors
  • Component wise multiply between 2 vectors
  • Component wise add between 2 vectors
  • Subscript access, vec[i] = c

Comment on lines +12 to +14
their mapped DXIL OpCodes. We just need coverage for each DXIL OpCode. If
the DXIL OpCode column lists 'emulated' then this means that there are
multiple LLVM/DXIL Ops used to compute the intrinsic. We will need to do an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the DXIL OpCode column lists 'emulated' then this means that there are multiple LLVM/DXIL Ops used to compute the intrinsic.

I think you've changed how this appears in the document now?

Comment on lines +25 to +26
vectors across buffer types. __TODO: Add details about which types of
buffers to test and why.__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Add details about which types of buffers to test and why

This comment here to remind us that there's a TODO. If we want to complete the PR without doing the TODO then we should get a follow-up issue filed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this include vector load/stores as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to fill in the details before completing this PR.

Comment on lines +133 to +134
| atan2 | Atan | CreateFDiv, CreateFAdd, CreateFSub, CreateFCmpOLT,
||| CreateFCpmOEQ, CreateFCmpOGE, CreateFCmpOLT, CreateAnd, CreateSelect | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing that these "CreateXXX" function names came from the C++ code. The LLVM instructions are things like fdiv, fadd, fcmp, and, select etc.

I also note that if I try atan2 in godbolt it doesn't actually generate fsub. https://godbolt.org/z/7xqz6448P

Copy link
Collaborator Author

@alsepkow alsepkow Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they're the C++ helper functions. I was intending to distill down further on a subsequent iteration. Wanted to sanity check that they do what I would expect (always return the llvm instruction in the name).

I'll have to peel a little more in TranslateAtan2, for this 'inventory' I went through the various Translate* functions in HLSLOperationLower and noted the possible operations and instructions. For TranslateAtan2 it looks like FSub is always used, so it must be optimized or simplified out along the way?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asked for some help. The FSub is optimized out to normalize to FAdd. If you pass '-O0' instead you'll see that the FSub is there.

But this makes me wonder about the right testing approach to ensure coverage. Given that the goal is to test that DXIL Ops and LLVM Instructions are functional with vectors, maybe we want to disable optimizations for the test cases?

- vector<TYPE, 16> : This size of 'vector' previously only appeared as matrices.
- vector<TYPE, 17> : Larger than any vector previously possible.
- vector<TYPE, 1024> : The new max size of a vector. Test for float, half,
double, and int64.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

" Test for float, half, double, and int64.'" this is for all the above sizes right? What about packed types?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants