-
-
Notifications
You must be signed in to change notification settings - Fork 728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Expr instead of HLG #9008
Use Expr instead of HLG #9008
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 27 files ± 0 27 suites ±0 11h 30m 17s ⏱️ +38s For more details on these failures and errors, see this check. Results for commit 462aedb. ± Comparison against base commit 4ef21ad. This pull request removes 4 and adds 7 tests. Note that renamed tests count towards both.
♻️ This comment has been updated with latest results. |
x = da.ones(10, chunks=5) | ||
assert len(x.dask) == 2 | ||
|
||
with dask.config.set(optimizations=[c._optimize_insert_futures]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is no longer possible
docs build failure is due to wrong version of dask being installed |
There are two test failures that feel related but I don't anticipate any major changes. I think this is good for a review |
|
# ************************************* | ||
# BELOW THIS LINE HAS TO BE SYNCHRONOUS | ||
# | ||
# Everything that compares the submitted graph to the current state | ||
# has to happen in the same event loop. | ||
# ************************************* | ||
|
||
lost_keys = self._find_lost_dependencies(dsk, dependencies, keys) | ||
|
||
if lost_keys: | ||
self.report( | ||
{ | ||
"op": "cancelled-keys", | ||
"keys": lost_keys, | ||
"reason": "lost dependencies", | ||
}, | ||
client=client, | ||
) | ||
self.client_releases_keys( | ||
keys=lost_keys, client=client, stimulus_id=stimulus_id | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nasty one and I think I'll want this in another PR. the lost keys check is currently performed between materialization and ordering (for no reason, just chance) which are both offloaded to give the event loop a chance to breath.
However, that also means that tasks can be cancelled during the ordering step that would otherwise cause the computation to be flagged as lost. Instead of receiving a CancelledError the user will see obscure state transition issues.
From what I can tell, the remaining test failures are unrelated |
.pre-commit-config.yaml
Outdated
@@ -64,7 +64,7 @@ repos: | |||
- tornado | |||
- pyarrow | |||
- urllib3 | |||
- git+https://github.com/dask/dask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- revert before merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, thanks, @fjetter
distributed/tests/test_steal.py
Outdated
df.shuffle( | ||
"A", | ||
# If we don't have enough partitions, we'll fall back to a | ||
# simple shuffle | ||
max_branch=npart - 1, | ||
) | ||
# Block optimizer from killing the shuffle | ||
.map_partitions(lambda x: len(x)).sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
df.shuffle( | |
"A", | |
# If we don't have enough partitions, we'll fall back to a | |
# simple shuffle | |
max_branch=npart - 1, | |
) | |
# Block optimizer from killing the shuffle | |
.map_partitions(lambda x: len(x)).sum() | |
df.shuffle( | |
"A", | |
# If we don't have enough partitions, we'll fall back to a | |
# simple shuffle | |
max_branch=npart - 1, | |
# Block optimizer from killing the shuffle | |
force=True, | |
) | |
.sum() |
distributed/tests/test_scheduler.py
Outdated
df.shuffle( | ||
"A", | ||
# If we don't have enough partitions, we'll fall back to a | ||
# simple shuffle | ||
max_branch=npart - 1, | ||
) | ||
# Block optimizer from killing the shuffle | ||
.map_partitions(lambda x: len(x)).sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
df.shuffle( | |
"A", | |
# If we don't have enough partitions, we'll fall back to a | |
# simple shuffle | |
max_branch=npart - 1, | |
) | |
# Block optimizer from killing the shuffle | |
.map_partitions(lambda x: len(x)).sum() | |
df.shuffle( | |
"A", | |
# If we don't have enough partitions, we'll fall back to a | |
# simple shuffle | |
max_branch=npart - 1, | |
# Block optimizer from killing the shuffle | |
force=True, | |
) | |
.sum() |
Sibling to dask/dask#11736