Replies: 3 comments
-
This results in this issue: #25192 |
Beta Was this translation helpful? Give feedback.
-
I had a similar experience (also on Trino 470). I found that I needed to run several commands: alter table my_catalog.my_schema.my_table execute optimize This did not remove the DELETE files, but when I ran it again 7-8 days later, the number began to drop. I put it on a schedule using an orchestrator and eventually saw the total data in iceberg drop by multiple orders of magnitude. The key isn't running it multiple times, but running it on a schedule such as once per week. I can't fully explain this behavior, but I suspect is has to do with the default snapshot retention settings for iceberg being set to 7 days. Running an optimize can't immediately remove anything because it has to keep 7 days of snapshot data. We've seen total data storage across all tables drop by 50% since orchestrating a weekly optimization on Trino 470 about 3 weeks ago. |
Beta Was this translation helpful? Give feedback.
-
Rewrite delete files in Spark, it will shrink to 0.
…On Fri, Mar 14, 2025 at 10:09 AM Riley McDowell ***@***.***> wrote:
I had a similar experience (also on Trino 470). I found that I needed to
run several commands:
alter table my_catalog.my_schema.my_table execute optimize
alter table my_catalog.my_schema.my_table execute optimize_manifests
alter table my_catalog.my_schema.my_table execute expire_snapshots
alter table my_catalog.my_schema.my_table execute remove_orphan_files
This did not remove the DELETE files, but when I ran it again 7-8 days
later, the number began to drop. I put it on a schedule using an
orchestrator and eventually saw the total data in iceberg drop by multiple
orders of magnitude. The key isn't running it multiple times, but running
it on a schedule such as once per week.
I can't fully explain this behavior, but I suspect is has to do with the
default snapshot retention settings for iceberg being set to 7 days.
Running an optimize can't immediately remove anything because it has to
keep 7 days of snapshot data.
We've seen total data storage across all tables drop by 50% since
orchestrating a weekly optimization on Trino 470 about 3 weeks ago.
—
Reply to this email directly, view it on GitHub
<#25211 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABBGHK5PCAG3EW2LYBJ2S3L2ULWKZAVCNFSM6AAAAABZA5VP6KVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTENJQGIZDONI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I am running into an issue where I get rapidly accumulated the large number of DELETE files in all our Iceberg tables. There are partitions with thousands of DELETE files. I thought that running OPTIMIZE is going to fix the issue, apparently it doesn't.
I found some reference in this issue: #24086
However, I am only running simple OPTIMIZE on tables updated via MERGE INTO without any filters.
Beta Was this translation helpful? Give feedback.
All reactions