Refactor/stealing #8271

Vesyrak · 2023-10-13T11:56:07Z

Refactors work stealing to facilitate external algorithms. Functionally it should be the same.

This Pull Request enables more readable code, reduces side-effects and further facilitates external algorithms. It allows me to re-define which workers should be considered thieves or victims, and when to steal from these without impacting too much code.

I do recommend a second refactor taking out the WorkStealing dependency from the WorkStealingBalancer.
I recommend going for a more functional design of the class, where it returns the (recommended) movements of the tasks rather than executing them whilst stealing.
However, subsequent "thefts" depend on the state of the cluster with the previously stolen tasks. This requires wrapping the occupancy/processing/... metric calculations so that the recommended-but-not-yet-stolen tasks' new locations are considered when stealing another task. I do not yet possess the knowledge of how this works on the scheduler itself, that is why it is missing from the current PR.

I am willing to add more tests, as soon as the pipelines for tests work properly.

[ 0 / ? ] Tests added / passed
- I get a timeout on test_blocks_until_full, with a side dish of Too many open files
[ x ] Passes pre-commit run --all-files

GPUtester · 2023-10-13T11:56:10Z

Can one of the admins verify this patch?

Admins can comment ok to test to allow this one PR to run or add to allowlist to allow all future PRs from the same author to run.

github-actions · 2023-10-13T14:13:45Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

27 files ± 0 27 suites ±0 9h 57m 58s ⏱️ + 4m 6s
3 995 tests ± 0 3 881 ✅ - 2 110 💤 ±0 4 ❌ +2
50 241 runs +18 47 939 ✅ +17 2 293 💤 - 6 9 ❌ +7

For more details on these failures, see this check.

Results for commit 297eb88. ± Comparison against base commit 1211e79.

♻️ This comment has been updated with latest results.

Vesyrak · 2024-02-14T14:39:31Z

Could I get some support regarding the test pipelines?
I'm not able to get the tests to succeed locally (neither on barebones OS level (macos) nor in an ubuntu container (arm64 architecture though).

Reproducing the github pipelines using act fails as well, and I can't even get the pipelines to work in my fork (https://github.com/Vesyrak/distributed/actions/runs/7902375846/job/21567935269).

I already merged the fix for the existing issue with the pipelines (#8504), so that's not the cause.

Thanks in advance!

Vesyrak requested a review from fjetter as a code owner October 13, 2023 11:56

Vesyrak mentioned this pull request Dec 28, 2023

Distributed core logic maintenance #8432

Closed

hendrikmakait self-requested a review January 9, 2024 16:57

hendrikmakait added the needs review Needs review from a contributor. label Jan 9, 2024

Vesyrak force-pushed the refactor/stealing branch 2 times, most recently from 49a52d1 to 9d80eab Compare February 14, 2024 09:45

Vesyrak added 2 commits February 20, 2024 09:30

refactor(work-stealing): separated victim/thief selection from balancing

48c8984

refactor(work-stealing): separated balancing algorithm

0d980c2

Vesyrak force-pushed the refactor/stealing branch from 9d80eab to 0d980c2 Compare February 20, 2024 08:30

Merge branch 'main' into refactor/stealing

297eb88

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor/stealing #8271

Refactor/stealing #8271

Vesyrak commented Oct 13, 2023 •

edited

Loading

GPUtester commented Oct 13, 2023

github-actions bot commented Oct 13, 2023 •

edited

Loading

Vesyrak commented Feb 14, 2024

Refactor/stealing #8271

Are you sure you want to change the base?

Refactor/stealing #8271

Conversation

Vesyrak commented Oct 13, 2023 • edited Loading

GPUtester commented Oct 13, 2023

github-actions bot commented Oct 13, 2023 • edited Loading

Unit Test Results

Vesyrak commented Feb 14, 2024

Vesyrak commented Oct 13, 2023 •

edited

Loading

github-actions bot commented Oct 13, 2023 •

edited

Loading