-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add thread throttling profile for DGEMV on NEOVERSEV1 #5175
base: develop
Are you sure you want to change the base?
Add thread throttling profile for DGEMV on NEOVERSEV1 #5175
Conversation
Please help with review |
e311258
to
d7a2b6b
Compare
interface/gemv.c
Outdated
@@ -89,6 +89,24 @@ static inline int get_gemv_optimal_nthreads_neoversev2(BLASLONG MN, int ncpu) { | |||
} | |||
#endif | |||
|
|||
//thread throttling for dgemv | |||
#if defined(DYNAMIC_ARCH) || defined(NEOVERSEV1) | |||
static inline int get_dgemv_optimal_nthreads_neoversev1(BLASLONG MN, int ncpu) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of defining a new function, I think it is cleaner to just use get_gemv_optimal_nthreads_<uarch>
.
Inside get_gemv_optimal_nthreads_<uarch>
, we can #ifdef DOUBLE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please keep this inside the existing function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done as suggested
interface/gemv.c
Outdated
@@ -98,6 +116,8 @@ static inline int get_gemv_optimal_nthreads(BLASLONG MN) { | |||
#endif | |||
#if defined(NEOVERSEV1) && !defined(COMPLEX) && !defined(DOUBLE) && !defined(BFLOAT16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#if defined(NEOVERSEV1) && !defined(COMPLEX) && !defined(DOUBLE) && !defined(BFLOAT16) | |
#if defined(NEOVERSEV1) && !defined(COMPLEX) && !defined(BFLOAT16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
interface/gemv.c
Outdated
@@ -98,6 +116,8 @@ static inline int get_gemv_optimal_nthreads(BLASLONG MN) { | |||
#endif | |||
#if defined(NEOVERSEV1) && !defined(COMPLEX) && !defined(DOUBLE) && !defined(BFLOAT16) | |||
return get_gemv_optimal_nthreads_neoversev1(MN, ncpu); | |||
#elif defined(NEOVERSEV1) && !defined(COMPLEX) && defined(DOUBLE) && !defined(BFLOAT16) | |||
return get_dgemv_optimal_nthreads_neoversev1(MN, ncpu); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed as suggested
interface/gemv.c
Outdated
@@ -98,6 +116,8 @@ static inline int get_gemv_optimal_nthreads(BLASLONG MN) { | |||
#endif | |||
#if defined(NEOVERSEV1) && !defined(COMPLEX) && !defined(DOUBLE) && !defined(BFLOAT16) | |||
return get_gemv_optimal_nthreads_neoversev1(MN, ncpu); | |||
#elif defined(NEOVERSEV1) && !defined(COMPLEX) && defined(DOUBLE) && !defined(BFLOAT16) | |||
return get_dgemv_optimal_nthreads_neoversev1(MN, ncpu); | |||
#elif defined(NEOVERSEV2) && !defined(COMPLEX) && !defined(DOUBLE) && !defined(BFLOAT16) | |||
return get_gemv_optimal_nthreads_neoversev2(MN, ncpu); | |||
#elif defined(DYNAMIC_ARCH) && !defined(COMPLEX) && !defined(DOUBLE) && !defined(BFLOAT16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#elif defined(DYNAMIC_ARCH) && !defined(COMPLEX) && !defined(DOUBLE) && !defined(BFLOAT16) | |
#elif defined(DYNAMIC_ARCH) && !defined(COMPLEX) && !defined(BFLOAT16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done as suggested
interface/gemv.c
Outdated
if (strcmp(gotoblas_corename(), "neoversev1") == 0) { | ||
return get_dgemv_optimal_nthreads_neoversev1(MN, ncpu); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed as suggested
interface/gemv.c
Outdated
: MN < 435600L ? MIN(ncpu, 24) | ||
: MN < 810000L ? MIN(ncpu, 32) | ||
: MN < 1050625 ? MIN(ncpu, 40) | ||
: ncpu; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we please move this inside get_gemv_optimal_nthreads_neoversev1
and use #ifdef DOUBLE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
d7a2b6b
to
8e289ec
Compare
@shubhamsvc Thank you for the changes. LGTM. |
This PR introduces thread throttling for DGEMV on Neoverse V1.
Benchmarking results for matrix sizes [2,1024]:
Machine: AWS Graviton3 Processor
- dgemv_n: Geometric mean speedup of 2.2x

- dgemv_t: Geometric mean speedup of 2.7x
