Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

During optimize Iceberg(ORC) table coordinator exceed memory limit. #24794

Closed
alex-art-repos opened this issue Jan 24, 2025 · 3 comments
Closed
Labels
iceberg Iceberg connector

Comments

@alex-art-repos
Copy link

After some hours of running query (approx. 1761 files, 115 GB total).

ALTER TABLE ice.v.t EXECUTE optimize
    where
    "timestamp" >= timestamp '2025-01-17 00:00:00'
    and "timestamp"  < timestamp '2025-01-20 00:00:00'

Trino-server process on coordinator eat all available memory and was killed by OOM.

Here strange memory consuption while processing request:

Coordinator's mem

Image

Other nodes

Image

Questions:

  1. Why coordinator process exceed heap limit (but workers is not) ? Does it use some libraries with huge native(off-heap) mem consumption?
  2. Any suggestions to limit memory or other ways to regular optimize this table?

Trino 468: 12 nodes, 1 coordinator (standalone) -- 125 GB RAM each node.

config.props

Coordinator

coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=9080
discovery.uri=http://storage3:9080
query.max-memory=170GB
query.max-total-memory=180GB
query.max-memory-per-node=30GB
memory.heap-headroom-per-node=10GB
internal-communication.http2.enabled=false
exchange.http-client.connect-timeout=5m
exchange.http-client.request-timeout=10m

Workers

coordinator=false
node-scheduler.include-coordinator=false
http-server.http.port=9080
discovery.uri=http://storage3:9080
query.max-memory=170GB
query.max-total-memory=180GB
query.max-memory-per-node=30GB
memory.heap-headroom-per-node=10GB
internal-communication.http2.enabled=false
exchange.http-client.connect-timeout=5m
exchange.http-client.request-timeout=10m

JVM on all nodes

-server
-Xms80G
-Xmx80G
-XX:InitialRAMPercentage=80
-XX:MaxRAMPercentage=80
-XX:G1HeapRegionSize=32M
-XX:+ExplicitGCInvokesConcurrent
-XX:+ExitOnOutOfMemoryError
-XX:+HeapDumpOnOutOfMemoryError
-XX:-OmitStackTraceInFastThrow
-XX:ReservedCodeCacheSize=512M
-XX:PerMethodRecompilationCutoff=10000
-XX:PerBytecodeRecompilationCutoff=10000
-Djdk.attach.allowAttachSelf=true
-Djdk.nio.maxCachedBufferSize=2000000
-Dfile.encoding=UTF-8
# Allow loading dynamic agent used by JOL
-XX:+EnableDynamicAgentLoading

OOM killer report

Out of memory: Killed process 3007584 (trino-server) total-vm:181122636kB, anon-rss:116305736kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:253108kB oom_score_adj:0

Table definition

CREATE TABLE ice.v.t (
   c_id bigint NOT NULL,
   timestamp timestamp(6) with time zone NOT NULL,
   s_m varchar,
   s_i varchar,
   s_im varchar,
   s_mc integer,
   s_mn integer,
   lc integer,
   cl bigint,
   tc integer,
   ci bigint,
   t_s integer,
   sw_id varchar NOT NULL,
   o_l integer
)
WITH (
   format = 'ORC',
   format_version = 2,
   location = 's3a://trino/v/t-b11111111111111a00c88de3509bb9c',
   orc_bloom_filter_columns = ARRAY['s_m','s_i','s_im'],
   orc_bloom_filter_fpp = 5E-2,
   partitioning = ARRAY['day(timestamp)'],
   sorted_by = ARRAY['timestamp ASC NULLS FIRST']
)
@alex-art-repos
Copy link
Author

I reproduced with detailed nodes monitoring. It seems RSS leak is related to peak of availability errors (minio_node_drive_errors_availability) from underlying s3 (minio) storage.

@findinpath findinpath added the iceberg Iceberg connector label Jan 29, 2025
@alex-art-repos
Copy link
Author

I will check on 469, may be leak was fixed in #24572.

@alex-art-repos
Copy link
Author

16+ hours worked stable. Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
iceberg Iceberg connector
Development

No branches or pull requests

2 participants