Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance gap in sgemm_wmma gpu #256

Open
xiefan46 opened this issue Mar 3, 2025 · 2 comments
Open

Performance gap in sgemm_wmma gpu #256

xiefan46 opened this issue Mar 3, 2025 · 2 comments

Comments

@xiefan46
Copy link

xiefan46 commented Mar 3, 2025

Image

Found a big performance gap between custom sgemm_wmma implementation and cublas impl in A100 GPU. I tried to increase the number of stages to 10 but seems like it didn't help.

@DefTruth
Copy link
Member

DefTruth commented Mar 3, 2025

The performance of sgemm_wmma has not been fully optimized yet. We welcome you to submit a PR with optimizations.

@xiefan46
Copy link
Author

xiefan46 commented Mar 3, 2025

@DefTruth sure, let me take a look. Any idea where the gap came from?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants