We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distributed/diagnostics/tests/test_rmm_diagnostics.py::test_rmm_metrics
https://github.com/rapidsai/dask-upstream-testing/actions/runs/13142239873/job/36671965774#step:9:751 has an unexpected failure.
_______________________________ test_rmm_metrics _______________________________ c = <Client: No scheduler connected> s = <Scheduler 'tcp://127.0.0.1:45841', workers: 0, cores: 0, tasks: 0> workers = (<dask_cuda.cuda_worker.CUDAWorker object at 0x7fac03201e50>,) w = <WorkerState 'tcp://127.0.0.1:39413', name: 0, status: closed, memory: 0, processing: 0> @py_assert0 = 0, @py_assert4 = None @gen_cluster( client=True, nthreads=[("127.0.0.1", 1)], Worker=dask_cuda.CUDAWorker, worker_kwargs={ "rmm_pool_size": parse_bytes("10MiB"), "rmm_track_allocations": True, }, ) async def test_rmm_metrics(c, s, *workers): w = list(s.workers.values())[0] assert "rmm" in w.metrics assert w.metrics["rmm"]["rmm-used"] == 0 assert w.metrics["rmm"]["rmm-total"] == parse_bytes("10MiB") result = delayed(rmm.DeviceBuffer)(size=10) result = result.persist() await asyncio.sleep(1) > assert w.metrics["rmm"]["rmm-used"] != 0 E assert 0 != 0 distributed/diagnostics/tests/test_rmm_diagnostics.py:36: AssertionError ----------------------------- Captured stderr call ----------------------------- 2025-02-04 18:51:35,885 - distributed.scheduler - INFO - State start 2025-02-04 18:51:35,892 - distributed.scheduler - INFO - Scheduler at: tcp://127.0.0.1:45841 2025-02-04 18:51:35,892 - distributed.scheduler - INFO - dashboard at: http://127.0.0.1:40321/status 2025-02-04 18:51:35,893 - distributed.scheduler - INFO - Registering Worker plugin shuffle 2025-02-04 18:51:36,504 - distributed.nanny - INFO - Start Nanny at: 'tcp://127.0.0.1:37431' 2025-02-04 18:51:40,305 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize 2025-02-04 18:51:40,305 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize 2025-02-04 18:51:40,316 - distributed.preloading - INFO - Run preload setup: dask_cuda.initialize 2025-02-04 18:51:40,317 - distributed.worker - INFO - Start worker at: tcp://127.0.0.1:39413 2025-02-04 18:51:40,317 - distributed.worker - INFO - Listening to: tcp://127.0.0.1:39413 2025-02-04 18:51:40,317 - distributed.worker - INFO - Worker name: 0 2025-02-04 18:51:40,317 - distributed.worker - INFO - dashboard at: 127.0.0.1:43515 2025-02-04 18:51:40,317 - distributed.worker - INFO - Waiting to connect to: tcp://127.0.0.1:45841 2025-02-04 18:51:40,317 - distributed.worker - INFO - ------------------------------------------------- 2025-02-04 18:51:40,318 - distributed.worker - INFO - Threads: 1 2025-02-04 18:51:40,318 - distributed.worker - INFO - Memory: 503.77 GiB 2025-02-04 18:51:40,318 - distributed.worker - INFO - Local Directory: /tmp/dask-scratch-space/worker-9wj1ulk_ 2025-02-04 18:51:40,318 - distributed.worker - INFO - Starting Worker plugin CPUAffinity-28ec392a-beeb-467a-9d56-56f39cd3dace 2025-02-04 18:51:40,318 - distributed.worker - INFO - Starting Worker plugin PreImport-eee38003-29e5-48a9-8217-0e310bad4f93 2025-02-04 18:51:40,318 - distributed.worker - INFO - Starting Worker plugin CUDFSetup-595987ee-b46c-4ccd-9fa6-d4e1601346c9 2025-02-04 18:51:42,750 - distributed.worker - INFO - Starting Worker plugin RMMSetup-9c57beab-dd4b-4095-8dfc-6032596d55fc 2025-02-04 18:51:43,087 - distributed.worker - INFO - ------------------------------------------------- 2025-02-04 18:51:43,096 - distributed.scheduler - INFO - Register worker addr: tcp://127.0.0.1:39413 name: 0 2025-02-04 18:51:43,098 - distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:39413 2025-02-04 18:51:43,098 - distributed.core - INFO - Starting established connection to tcp://127.0.0.1:53618 2025-02-04 18:51:43,098 - distributed.worker - INFO - Starting Worker plugin shuffle 2025-02-04 18:51:43,099 - distributed.worker - INFO - Registered to: tcp://127.0.0.1:45841 2025-02-04 18:51:43,099 - distributed.worker - INFO - ------------------------------------------------- 2025-02-04 18:51:43,100 - distributed.core - INFO - Starting established connection to tcp://127.0.0.1:45841 2025-02-04 18:51:43,141 - distributed.scheduler - INFO - Receive client connection: Client-13749a6f-e329-11ef-86b0-0242ac120002 2025-02-04 18:51:43,142 - distributed.core - INFO - Starting established connection to tcp://127.0.0.1:53620 2025-02-04 18:51:44,268 - distributed.scheduler - INFO - Remove client Client-13749a6f-e329-11ef-86b0-0242ac120002 2025-02-04 18:51:44,269 - distributed.core - INFO - Received 'close-stream' from tcp://127.0.0.1:53620; closing. 2025-02-04 18:51:44,269 - distributed.scheduler - INFO - Remove client Client-13749a6f-e329-11ef-86b0-0242ac120002 2025-02-04 18:51:44,270 - distributed.scheduler - INFO - Close client connection: Client-13749a6f-e329-11ef-86b0-0242ac120002 2025-02-04 18:51:44,271 - distributed.nanny - INFO - Closing Nanny at 'tcp://127.0.0.1:37431'. Reason: nanny-close 2025-02-04 18:51:44,272 - distributed.nanny - INFO - Nanny asking worker to close. Reason: nanny-close 2025-02-04 18:51:44,273 - distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:39413. Reason: nanny-close 2025-02-04 18:51:44,273 - distributed.worker - INFO - Removing Worker plugin CPUAffinity-28ec392a-beeb-467a-9d56-56f39cd3dace 2025-02-04 18:51:44,273 - distributed.worker - INFO - Removing Worker plugin PreImport-eee38003-29e5-48a9-8217-0e310bad4f93 2025-02-04 18:51:44,273 - distributed.worker - INFO - Removing Worker plugin CUDFSetup-595987ee-b46c-4ccd-9fa6-d4e1601346c9 2025-02-04 18:51:44,273 - distributed.worker - INFO - Removing Worker plugin RMMSetup-9c57beab-dd4b-4095-8dfc-6032596d55fc 2025-02-04 18:51:44,273 - distributed.worker - INFO - Removing Worker plugin shuffle 2025-02-04 18:51:44,275 - distributed.core - INFO - Connection to tcp://127.0.0.1:45841 has been closed. 2025-02-04 18:51:44,276 - distributed.core - INFO - Received 'close-stream' from tcp://127.0.0.1:53618; closing. 2025-02-04 18:51:44,277 - distributed.scheduler - INFO - Remove worker addr: tcp://127.0.0.1:39413 name: 0 (stimulus_id='handle-worker-cleanup-1738695104.2[769](https://github.com/rapidsai/dask-upstream-testing/actions/runs/13142239873/job/36671965774#step:9:770)7') 2025-02-04 18:51:44,277 - distributed.scheduler - INFO - Lost all workers 2025-02-04 18:51:44,282 - distributed.nanny - INFO - Worker closed 2025-02-04 18:51:44,973 - distributed.nanny - INFO - Nanny at 'tcp://127.0.0.1:37431' closed. 2025-02-04 18:51:44,973 - distributed.scheduler - INFO - Closing scheduler. Reason: unknown 2025-02-04 18:51:44,974 - distributed.scheduler - INFO - Scheduler closing all comms
It also failed on the cuda==11.8.0 test: https://github.com/rapidsai/dask-upstream-testing/actions/runs/13142239873/job/36671965377#step:9:773
Looking into it.
The text was updated successfully, but these errors were encountered:
Poll in test_rmm_metrics test
02c1e43
xref rapidsai/dask-upstream-testing#4
Closed by dask/distributed#9004. Passed in today's run
Sorry, something went wrong.
No branches or pull requests
https://github.com/rapidsai/dask-upstream-testing/actions/runs/13142239873/job/36671965774#step:9:751 has an unexpected failure.
It also failed on the cuda==11.8.0 test: https://github.com/rapidsai/dask-upstream-testing/actions/runs/13142239873/job/36671965377#step:9:773
Looking into it.
The text was updated successfully, but these errors were encountered: