QKV Fine-grained Tiling
What's Changed
- [ELU] support ELU F32/F16 kernel✔️ by @southkarl in https://github.com/DefTruth/CUDA-Learn-Notes/pull/194
- [HARDSHRINK][FP16] support HARDSHRINK F32/FP16 kernel by @southkarl in https://github.com/DefTruth/CUDA-Learn-Notes/pull/195
- [swizzle] update smem swizzle layout tools✔️ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/196
- [swizzle] update smem swizzle layout tools✔️ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/197
- [swizzle] add padding -> swizzle layout tools🎉 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/198
- [HGEMM] HGEMM TN A&B SMEM Swizzle✔️ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/199
- [HGEMM] HGEMM TN A&B SMEM Swizzle✔️ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/200
- [FA2] shared-qkv + HMMA F32F16F16F32✔️ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/201
- [HARDSWISH] HARDSWISH F32/F16 kernel✔️ by @southkarl in https://github.com/DefTruth/CUDA-Learn-Notes/pull/202
- [FA2] kOStorageAccFloat32 flag -> shared-qkv✔️ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/203
- [FA2] kOStorageAccFloat32 -> share-qkv✔️ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/204
- [FA2] kOStorageAccFloat32 -> share-qkv✔️ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/205
- [FA2] share-kv + MMA F32F16F16F16F32✔️ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/206
- [FA2] share-kv + MMA F32F16F16F16F32✔️ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/207
- [FA2] tiling-kv + MMA F32F16F16F16F32✔️ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/208
- [FA2] tiling-qkv + MMA F32F16F16F32✔️ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/209
- [FA2] tiling-qkv + MMA F32F16F16F32✔️ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/210
- [FA2] flash-attn-mma fully tiling-qkv🎉 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/211
- [FA2] flash-attn-mma fully tiling-qkv🎉 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/212
- [FA2] tiling-qkv F32/F16 + swizzle q/qk/qkv🎉 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/213
New Contributors
- @southkarl made their first contribution in https://github.com/DefTruth/CUDA-Learn-Notes/pull/194
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.6.13...v2.6.14