-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tiered Cache] Using a single cache manager for all ehcache disk caches #17513
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Sagar Upadhyaya <[email protected]>
Signed-off-by: Sagar Upadhyaya <[email protected]>
Signed-off-by: Sagar <[email protected]>
❌ Gradle check result for 69bbc69: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Sagar <[email protected]>
❌ Gradle check result for c332f1b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Sagar Upadhyaya <[email protected]>
❌ Gradle check result for c13dcfa: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
* @param threadPoolAlias alias for disk thread pool | ||
* @return persistent cache manager | ||
*/ | ||
public synchronized static PersistentCacheManager getCacheManager( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need synchronized keyword with computeIfAbsent
?
public class EhcacheDiskCacheManager { | ||
|
||
// Defines one cache manager per cache type. | ||
private static final Map<CacheType, Tuple<PersistentCacheManager, AtomicInteger>> cacheManagerMap = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interface should be concurrentMap
and implementation should be concurrentHashMap
?
int referenceCount = cacheManagerMap.get(cacheType).v2().decrementAndGet(); | ||
// All caches have been closed associated with this cache manager, lets close this as well. | ||
if (referenceCount == 0) { | ||
try { | ||
cacheManager.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible that we try to createCache
between reading referenceCount
and cacheManager.close
? How are we avoiding that situation?
(key) -> Setting.intSetting( | ||
key, | ||
max(2, Runtime.getRuntime().availableProcessors() / 8), | ||
1, | ||
Runtime.getRuntime().availableProcessors(), | ||
NodeScope | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reducing the number of threads can cause queuing of disk write operations, unless there is some batching logic within Ehcache. IMO, it is desirable for TieredCaching to write the computed results to disk as soon as possible. Number of threads is not worrisome as long as those threads are not compute intensive, which in this case they are not
I am really curious if we have observed any cases via hot_threads / flamegraph that confirms disk write threads being responsible for CPU spikes. These threads should be I/O bound, and I won't really expect them to cause observable CPU spike.
The default of 2 looks really low to me. Assuming |
Not yet. We don't have a performance test which is able to reproduce this scenario. We ran our OSB benchmark with/without changes, and both were pretty similar in terms of performance(latency p50, p90 etc)
We can discuss on the default and increase it further. But main objective of this change is to have a way to increase/decrease the number of disk write threads when needed irrespective of the number of N partitions we are creating within tiered cache. Right now, each disk cache object have its own write thread pool, and when we create N(CPU * 1.5) segments/disk cache object, we are essentially creating (N * 4) disk write threads which seems unnecessary and cause unknown problems, and it is not possible this configure to <=(CPU*1.5). |
Description
Earlier while trying to create N ehcache disk caches, we were creating those via N cache managers which had their own disk write thread pools (so total N). We create N disk caches based on tiered cache setting and it is decided based on number of CPU cores. So essentially we were creating
(CPU_CORE * 4)
disk write threads which is a lot and can cause CPU spikes with tiered cache enabled.This change essentially creates a single cache manager, and all subsequent caches are created via this single manager. Through this we only have one disk write thread pool and is configured to have 2 threads by default and can go up until
max(2, CPU_CORES / 8)
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
[ ] Public documentation issue/PR created, if applicable.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.