-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ZstdZarrCompressor #149
base: master
Are you sure you want to change the base?
Changes from all commits
f2a432f
2def22f
a04080e
96a93cc
7232211
5ab65d9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
module CodecZstdExt | ||
|
||
using Zarr: Zarr | ||
using JSON: JSON | ||
using CodecZstd: CodecZstd | ||
|
||
""" | ||
ZstdZarrCompressor(clevel::Int=0) | ||
ZstdZarrCompressor(c::CodecZstd.ZstdCompressor, [d::CodecZstd.ZstdDecompressor]) | ||
|
||
Zstandard compression for Zarr.jl. This is a `Zarr.Compressor` wrapper around | ||
`CodecZstd`. Construct with either the compression level, `clevel`, or by | ||
providing an instance of a `ZstdCompressor`. `ZstdFrameCompressor` is | ||
recommended. | ||
""" | ||
struct ZstdZarrCompressor <: Zarr.Compressor | ||
compressor::CodecZstd.ZstdCompressor | ||
decompressor::CodecZstd.ZstdDecompressor | ||
end | ||
# Use default ZstdDecompressor if only compressor is provided | ||
function ZstdZarrCompressor(compressor::CodecZstd.ZstdCompressor) | ||
return ZstdZarrCompressor( | ||
compressor, | ||
CodecZstd.ZstdDecompressor() | ||
) | ||
end | ||
function ZstdZarrCompressor(clevel::Int) | ||
return ZstdZarrCompressor( | ||
CodecZstd.ZstdFrameCompressor(; level = clevel) | ||
) | ||
end | ||
ZstdZarrCompressor(;clevel::Int=3) = ZstdZarrCompressor(clevel) | ||
|
||
function Zarr.getCompressor(::Type{ZstdZarrCompressor}, d::Dict) | ||
return ZstdZarrCompressor(d["level"]) | ||
end | ||
|
||
function Zarr.zuncompress(a, z::ZstdZarrCompressor, T) | ||
result = transcode(z.decompressor, a) | ||
return Zarr._reinterpret(Base.nonmissingtype(T), result) | ||
end | ||
|
||
function Zarr.zcompress(a, z::ZstdZarrCompressor) | ||
a_uint8 = Zarr._reinterpret(UInt8,a)[:] | ||
transcode(z.compressor, a_uint8) | ||
end | ||
|
||
JSON.lower(z::ZstdZarrCompressor) = Dict("id"=>"zstd", "level" => z.compressor.level) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This |
||
|
||
function __init__() | ||
Zarr.compressortypes["zstd"] = ZstdZarrCompressor | ||
end | ||
|
||
end # module CodecZstdExt |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -333,6 +333,14 @@ function zcreate(::Type{T},storage::AbstractStore, | |
attrs=Dict(), | ||
writeable=true, | ||
) where T | ||
|
||
if compressor isa AbstractString | ||
if haskey(compressortypes, String(compressor)) | ||
compressor = compressortypes[compressor]() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This would make it impossible to set custom compression levels for the compression algorithm. Do we need another keyword argument for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The idea here is that the simple option of just passing a string will give you default compression options. If you want to specify the compression level, you can use the compression constructor and pass the instatiated compressor instance. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i get and error when trying to pass a compression constructor:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will not work. Passing a TranscodingStream instance specifically is not going to work at the moment. The constructor you need is the ZstdZarrCompressor type I created in this PR. @nhz2 is also cooking up something here: Maybe we should invite him over to this side of the river to work on this? |
||
else | ||
throw(UnknownCompressorException(compressor)) | ||
end | ||
end | ||
|
||
length(dims) == length(chunks) || throw(DimensionMismatch("Dims must have the same length as chunks")) | ||
N = length(dims) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
using Zarr | ||
using Test | ||
|
||
@testset "Zarr Extension Packages" begin | ||
@test_throws Zarr.UnknownCompressorException("zstd") zzeros(UInt8, 512, compressor="zstd") | ||
@test_throws Zarr.UnknownCompressorException("asdf") zzeros(UInt8, 512, compressor="asdf") | ||
d = Dict("id" => "zstd") | ||
@test_throws Zarr.UnknownCompressorException("zstd") Zarr.getCompressor(d) | ||
|
||
iob = IOBuffer() | ||
show(iob, Zarr.UnknownCompressorException("zstd")) | ||
@test occursin("CodecZstd.jl", String(take!(iob))) | ||
|
||
iob = IOBuffer() | ||
show(iob, Zarr.UnknownCompressorException("asdf")) | ||
@test occursin("issue", String(take!(iob))) | ||
@test Zarr.getCompressor(nothing) == Zarr.NoCompressor() | ||
end | ||
|
||
using CodecZstd | ||
@testset "Zarr CodecZstd Extension" begin | ||
CodecZstdExt = Base.get_extension(Zarr, :CodecZstdExt) | ||
@test haskey(Zarr.compressortypes, "zstd") | ||
@test Zarr.compressortypes["zstd"] == CodecZstdExt.ZstdZarrCompressor | ||
td = tempname() | ||
zarray = zzeros(UInt16, 16, 16, compressor="zstd", path=td) | ||
zarray .= reshape(1:256,16,16) | ||
@test isa(zarray, ZArray{UInt16}) | ||
@test zarray.metadata.compressor isa CodecZstdExt.ZstdZarrCompressor | ||
zarray2 = zopen(td) | ||
@test isa(zarray2, ZArray{UInt16}) | ||
@test zarray2.metadata.compressor isa CodecZstdExt.ZstdZarrCompressor | ||
@test zarray2 == reshape(1:256,16,16) | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -266,9 +266,7 @@ end | |
end | ||
|
||
include("storage.jl") | ||
|
||
|
||
|
||
include("python.jl") | ||
include("ext.jl") | ||
|
||
end # @testset "Zarr" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't support multithreaded use IIUC. I think this should be like
Zarr.jl/src/Compressors.jl
Lines 129 to 131 in f436713
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially wrote it like this, but then I was thinking about all the other potential parameters, even if they do not need to be serialized. I think what we should implement is the ability to copy a compessor.
Frankly, I'm somewhat confused about why one actually needs to serialize the compression level into the array metadata. You do not need that information to decompress the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I don't understand this correctly, but what would happen in a scenario where a user opens an existing array and wants to add some new data? Of course one can set a different compression level for the new chunks, but for consistency of the dataset I think it is good to write all compression parameters to the metadata struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this implementation respects all this and other compressors in Zarr.jl currently don't work multithreaded as well so ok from my side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the thread safety issues, this also leaks memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For other potential parameters what about something like:
https://github.com/nhz2/ChunkCodecs.jl/blob/799b154bd400633f0ae3bd1cf78d0cc95957f2cf/ChunkCodecLibZstd/src/encode.jl#L21-L25
Where the advanced parameters are set with
ZSTD_CCtx_setParameter
after the compression level and checksum options are set.