Use Expr instead of HLG #9008

fjetter · 2025-02-13T09:35:56Z

github-actions · 2025-02-13T12:19:14Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

27 files ± 0 27 suites ±0 11h 30m 17s ⏱️ +38s
4 106 tests + 3 3 990 ✅ ± 0 112 💤 + 1 3 ❌ +1 1 🔥 +1
51 479 runs +38 49 166 ✅ +23 2 309 💤 +13 3 ❌ +1 1 🔥 +1

For more details on these failures and errors, see this check.

Results for commit 462aedb. ± Comparison against base commit 4ef21ad.

This pull request removes 4 and adds 7 tests. Note that renamed tests count towards both.

distributed.tests.test_client ‑ test_auto_normalize_collection
distributed.tests.test_client ‑ test_auto_normalize_collection_sync
distributed.tests.test_client ‑ test_worker_clients_do_not_claim_ownership_of_serialize_futures[False]
distributed.tests.test_client ‑ test_worker_clients_do_not_claim_ownership_of_serialize_futures[True]

distributed.tests.test_client ‑ test_compute_no_collection_or_future
distributed.tests.test_client ‑ test_submit_persisted_collection_as_argument[False]
distributed.tests.test_client ‑ test_submit_persisted_collection_as_argument[True]
distributed.tests.test_client ‑ test_worker_clients_do_not_claim_ownership_of_serialize_futures[False-False]
distributed.tests.test_client ‑ test_worker_clients_do_not_claim_ownership_of_serialize_futures[False-True]
distributed.tests.test_client ‑ test_worker_clients_do_not_claim_ownership_of_serialize_futures[True-True]
distributed.tests.test_dask_collections ‑ test_persist

♻️ This comment has been updated with latest results.

fjetter · 2025-03-19T11:11:41Z

distributed/tests/test_client.py

-    x = da.ones(10, chunks=5)
-    assert len(x.dask) == 2
-
-    with dask.config.set(optimizations=[c._optimize_insert_futures]):


this is no longer possible

fjetter · 2025-03-19T11:15:00Z

docs build failure is due to wrong version of dask being installed

fjetter · 2025-03-20T13:34:03Z

There are two test failures that feel related but I don't anticipate any major changes. I think this is good for a review

fjetter · 2025-03-20T16:43:30Z

test_worker_clients_do_not_claim_ownership_of_serialize_futures still not fixed. it's certainly a race condition but the events apparently didn't help.... 💢

fjetter · 2025-03-20T17:44:59Z

distributed/scheduler.py

+            # *************************************
+            # BELOW THIS LINE HAS TO BE SYNCHRONOUS
+            #
+            # Everything that compares the submitted graph to the current state
+            # has to happen in the same event loop.
+            # *************************************
+
+            lost_keys = self._find_lost_dependencies(dsk, dependencies, keys)
+
+            if lost_keys:
+                self.report(
+                    {
+                        "op": "cancelled-keys",
+                        "keys": lost_keys,
+                        "reason": "lost dependencies",
+                    },
+                    client=client,
+                )
+                self.client_releases_keys(
+                    keys=lost_keys, client=client, stimulus_id=stimulus_id
+                )
+


This is a nasty one and I think I'll want this in another PR. the lost keys check is currently performed between materialization and ordering (for no reason, just chance) which are both offloaded to give the event loop a chance to breath.
However, that also means that tasks can be cancelled during the ordering step that would otherwise cause the computation to be flagged as lost. Instead of receiving a CancelledError the user will see obscure state transition issues.

fjetter · 2025-03-21T09:25:21Z

From what I can tell, the remaining test failures are unrelated

fjetter · 2025-03-21T12:38:02Z

.pre-commit-config.yaml

@@ -64,7 +64,7 @@ repos:
          - tornado
          - pyarrow
          - urllib3
-          - git+https://github.com/dask/dask


revert before merge

hendrikmakait

Overall LGTM, thanks, @fjetter

hendrikmakait · 2025-03-21T16:29:01Z

distributed/tests/test_steal.py

+            df.shuffle(
+                "A",
+                # If we don't have enough partitions, we'll fall back to a
+                # simple shuffle
+                max_branch=npart - 1,
+            )
+            # Block optimizer from killing the shuffle
+            .map_partitions(lambda x: len(x)).sum()


Suggested change

df.shuffle(

"A",

# If we don't have enough partitions, we'll fall back to a

# simple shuffle

max_branch=npart - 1,

)

# Block optimizer from killing the shuffle

.map_partitions(lambda x: len(x)).sum()

df.shuffle(

"A",

# If we don't have enough partitions, we'll fall back to a

# simple shuffle

max_branch=npart - 1,

# Block optimizer from killing the shuffle

force=True,

)

.sum()

hendrikmakait · 2025-03-21T16:29:38Z

distributed/tests/test_scheduler.py

+            df.shuffle(
+                "A",
+                # If we don't have enough partitions, we'll fall back to a
+                # simple shuffle
+                max_branch=npart - 1,
+            )
+            # Block optimizer from killing the shuffle
+            .map_partitions(lambda x: len(x)).sum()


Suggested change

df.shuffle(

"A",

# If we don't have enough partitions, we'll fall back to a

# simple shuffle

max_branch=npart - 1,

)

# Block optimizer from killing the shuffle

.map_partitions(lambda x: len(x)).sum()

df.shuffle(

"A",

# If we don't have enough partitions, we'll fall back to a

# simple shuffle

max_branch=npart - 1,

# Block optimizer from killing the shuffle

force=True,

)

.sum()

distributed/client.py

fjetter force-pushed the wrap_hlg_expr branch from 566934a to ce62eee Compare March 17, 2025 16:33

fjetter mentioned this pull request Mar 18, 2025

Wrap HLGs in an Expr to avoid Client side materialization dask/dask#11736

Merged

fjetter force-pushed the wrap_hlg_expr branch from 19397fc to 3b40318 Compare March 19, 2025 11:04

fjetter commented Mar 19, 2025

View reviewed changes

fjetter marked this pull request as ready for review March 20, 2025 13:33

fjetter commented Mar 20, 2025

View reviewed changes

fjetter mentioned this pull request Mar 21, 2025

Consolidate getattr for expr classes dask/dask#11835

Merged

fjetter force-pushed the wrap_hlg_expr branch from 75b9e0c to c77cae7 Compare March 21, 2025 12:36

fjetter commented Mar 21, 2025

View reviewed changes

hendrikmakait approved these changes Mar 21, 2025

View reviewed changes

fjetter added 2 commits March 24, 2025 11:31

use expressions instead of HLG

2459454

review comments

c60f6d2

fjetter force-pushed the wrap_hlg_expr branch from 462aedb to c60f6d2 Compare March 24, 2025 10:31

fjetter merged commit c5ca1ff into dask:main Mar 24, 2025
1 of 30 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Expr instead of HLG #9008

Use Expr instead of HLG #9008

fjetter commented Feb 13, 2025

github-actions bot commented Feb 13, 2025 •

edited

Loading

fjetter Mar 19, 2025

fjetter commented Mar 19, 2025 •

edited

Loading

fjetter commented Mar 20, 2025

fjetter commented Mar 20, 2025

fjetter Mar 20, 2025

fjetter commented Mar 21, 2025

fjetter Mar 21, 2025

hendrikmakait left a comment

hendrikmakait Mar 21, 2025

hendrikmakait Mar 21, 2025

Use Expr instead of HLG #9008

Use Expr instead of HLG #9008

Conversation

fjetter commented Feb 13, 2025

github-actions bot commented Feb 13, 2025 • edited Loading

Unit Test Results

fjetter Mar 19, 2025

Choose a reason for hiding this comment

fjetter commented Mar 19, 2025 • edited Loading

fjetter commented Mar 20, 2025

fjetter commented Mar 20, 2025

fjetter Mar 20, 2025

Choose a reason for hiding this comment

fjetter commented Mar 21, 2025

fjetter Mar 21, 2025

Choose a reason for hiding this comment

hendrikmakait left a comment

Choose a reason for hiding this comment

hendrikmakait Mar 21, 2025

Choose a reason for hiding this comment

hendrikmakait Mar 21, 2025

Choose a reason for hiding this comment

github-actions bot commented Feb 13, 2025 •

edited

Loading

fjetter commented Mar 19, 2025 •

edited

Loading