Releases: pola-rs/polars
Python Polars 1.25.2
🏆 Highlights
- Enable common subplan elimination across plans in
collect_all
(#21747) - Add lazy sinks (#21733)
- Add
PartitionByKey
for new streaming sinks (#21689) - Enable new streaming memory sinks by default (#21589)
🚀 Performance improvements
- Implement linear-time rolling_min/max (#21770)
- Improve InputIndependentSelect by delegating to InMemorySourceNode (#21767)
- Enable common subplan elimination across plans in
collect_all
(#21747) - Allow elementwise functions in recursive lowering (#21653)
- Add primitive single-key hashtable to new-streaming join (#21712)
- Remove unnecessary black_boxes in Kahan summation (#21679)
- Box large enum variants (#21657)
- Improve join performance for new-streaming engine (#21620)
- Pre-fill caches (#21646)
- Optimize only a single cache input (#21644)
- Collect parquet statistics in one contiguous buffer (#21632)
- Update Cargo.lock (mainly for zstd 1.5.7) (#21612)
- Don't maintain order when maintain_order=False in new streaming sinks (#21586)
- Pre-sort groups in group-by-dynamic (#21569)
✨ Enhancements
- Add support for rolling_(sum/min/max) for booleans through casting (#21748)
- Support multi-column sort for all nested types and nested search-sorted (#21743)
- Add lazy sinks (#21733)
- Add
PartitionByKey
for new streaming sinks (#21689) - Fix replace flags (#21731)
- Add
mkdir
flag to sinks (#21717) - Enable joins on list/array dtypes (#21687)
- Add a config option to specify the default engine to attempt to use during lazyframe calls (#20717)
- Support all elementwise functions in IO plugin predicates (#21705)
- Stabilize Enum datatype (#21686)
- Support Polars int128 in from arrow (#21688)
- Use FFI to read dataframe instead of transmute (#21673)
- Enable new streaming memory sinks by default (#21589)
- Cloud support for new-streaming scans and sinks (#21621)
- Add len method to arr (#21618)
- Closeable files on unix (#21588)
- Add new
PartitionMaxSize
sink (#21573) - Support engine callback for
LazyFrame.profile
(#21534) - Dispatch new-streaming CSV negative slice to separate node (#21579)
- Add NDJSON source to new streaming engine (#21562)
- Support passing
token
instorage_options
for GCP cloud (#21560)
🐞 Bug fixes
- Expose and document partitions (#21765)
- Fix lazy schema for truediv ops involving List/Array dtypes (#21764)
- Fix error due to race condition in file cache (#21753)
- Clear NaNs due to zero-weight division in rolling var/std (#21761)
- Allow init from BigQuery Arrow data containing ExtensionType cols with irrelevant metadata (#21492)
- Disallow cast from boolean to categorical/enum (#21714)
- Don't check sortedness in
join_asof
when 'by' groups supplied, but issue warning (#21724) - Incorrect multithread path taken for aggregations (#21727)
- Disallow cast to empty Enum (#21715)
- Fix
list.mean
andlist.median
returning Float64 for temporal types (#21144) - Incorrect (FixedSize)ListArrayBuilder gather implementation (#21716)
- Always fallback in SkipBatchPredicate (#21711)
- New streaming multiscan deadlock (#21694)
- Ensure new-streaming join BuildState is correct even if never fed morsels (#21708)
- IO plugin; support empty iterator (#21704)
- Support nulls in multi-column sort (#21702)
- Window function check length of groups state (#21697)
- Support 128 sum reduction on new streaming (#21691)
- IPC round-trip of list of empty view with non-empty bufferset (#21671)
- Variance can never be negative (#21678)
- Incorrect loop length in new-streaming group by (#21670)
- Right join on multiple columns not coalescing left_on columns (#21669)
- Casting Struct to String panics if n_chunks > 1 (#21656)
- Fix
Future attached to different loop
error onread_database_uri
(#21641) - Fix deadlock in cache + hconcat (#21640)
- Properly handle phase transitions in row-wise sinks (#21600)
- Enable new streaming memory sinks by default (#21589)
- Always use global registry for object (#21622)
- Check enum categories when reading csv (#21619)
- Unspecialized prefiltering on nullable arrays (#21611)
- Release the gil on explain (#21607)
- Take into account scalar/partitioned columns in DataFrame::split_chunks (#21606)
- Bad null handling in unordered row encoding (#21603)
- Fix deadlock in new streaming CSV / NDJSON sinks (#21598)
- Bad view index in BinaryViewBuilder (#21590)
- Fix CSV count with comment prefix skipped empty lines (#21577)
- New streaming IPC enum scan (#21570)
- Several aspects related to ParquetColumnExpr (#21563)
- Don't hit parquet::pre-filtered in case of pre-slice (#21565)
📖 Documentation
- Add skrub to ecosystem.md (#21760)
- Add example for percentile rank (#21746)
- Make python/rust getting-started consistent and clarify performance risk of infer_schema_length=None (#21734)
- Add expression composability to PySpark comparison (#21473)
- Document
read_().lazy()
antipattern (#21623) - Update Polars Cloud interactive workflow examples (#21609)
- Add a
Plotnine
example to the visualization docs (#21597) - Add cloud api reference to Ref guide (#21566)
🛠️ Other improvements
- Remove variance numerical stability hack (#21749)
- Only use chrono_tz timezones in hypothesis testing (#21721)
- Remove order check from flaky test (#21730)
- Add sinks into the DSL before optimization (#21713)
- Add missing test case for #21701 (#21709)
- Remove old-streaming from engine argument (#21667)
- Add as_phys_any to PrivateSeries for downcasting (#21696)
- Use FFI to read dataframe instead of transmute (#21673)
- Work around typos ignore bug (#21672)
- Added Test For
datetime_range
Nanosecond Overflow (#21354) - Update to edition 2024 (#21662)
- Update rustc (#21647)
- Support object from chunks (#21636)
- Push versioned docs on workflow dispatch (#21630)
- Fail docs early (#21629)
- Check major/minor in docs (#21626)
- Add docs workflow (#21624)
- Add test for 21581 (#21617)
- Remove even more parquet multiscan handling (#21601)
- Remove multiscan handling from new streaming parquet source (#21584)
- Prepare skeleton for partitioning sinks (#21536)
Thank you to all our contributors for making this release possible!
@GaelVaroquaux, @Kevin-Patyk, @MarcoGorelli, @Matt711, @NathanHu725, @alexander-beedie, @coastalwhite, @dependabot[bot], @jrycw, @kdn36, @lukemanley, @mcrumiller, @nameexhaustion, @orlp, @r-brink, @ritchie46, @wence- and dependabot[bot]
Python Polars 1.24.0
🚀 Performance improvements
- Provide a fallback skip batch predicate for constant batches (#21477)
- Parallelize the passing in new streaming multiscan (#21430)
✨ Enhancements
- Add lossy decoding to
read_csv
for non-utf8 encodings (#21433) - Add
DataFrame.write_iceberg
(#15018) - Add 'nulls_equal' parameter to
is_in
(#21426) - Improve numeric stability
rolling_{std, var, cov, corr}
(#21528) - IR Serde cross-filter (#21488)
- Give priority to pycapsule interface in from_dataframe (#21377)
- Support writing
Time
type in json (#21454) - Activate all optimizations in sinks (#21462)
- Add
AssertionError
variant toPolarsError
inpolars-error
(#21460) - Pass filter to inner readers in multiscan new streaming (#21436)
🐞 Bug fixes
- Categorical min/max panicking when string cache is enabled (#21552)
- Don't encode IPC record batch twice (#21525)
- Respect rewriting flag in Node rewriter (#21516)
- Correct skip batch predicate for partial statistics (#21502)
- Make the Parquet Sink properly phase aware (#21499)
- Don't divide by zero in partitioned group-by (#21498)
- Create new linearizer between rowwise new streaming sink phases (#21490)
- Don't drop rows in sinks between new streaming phases (#21489)
- Incorrect lazy schema for
Expr.list.diff
(#21484) - Give priority to pycapsule interface in from_dataframe (#21377)
- Duration Series arithmetic operations (#21425)
- Fix unwrap None panic when filtering delta with missing columns (#21453)
- Use stable sort for rolling-groupby (#21444)
- Throw exception if dataframe is too large to be compatible with Excel (#20900)
- Address regression with
read_excel
not handling URL paths correctly (#21428)
📖 Documentation
- Fix typo (#21554)
- Correct typos and grammar in Python docstrings (#21524)
- Move llm page under misc (#21550)
- Polars Cloud docs (#21548)
- Add LazyFrame.remote docs entry (#21529)
- Specify that the key column must be sorted in ascending order in
merge_sorted
(#21501) - Add Polars & LLMs page to the user guide (#21218)
- Mention that
statistics=True
doesn't enable all statistics insink_parquet()
(#21434)
🛠️ Other improvements
- Don't take ownership of IRplan in new streaming engine (#21551)
- Refactor code for re-use by streaming NDJSON source (#21520)
- Simplify the phase handling of new streaming sinks (#21530)
- Improve IPC sink node parallelism (#21505)
- Use tikv-jemallocator (#21486)
- Rename 'join_nulls' parameter to 'nulls_equal' in join functions (#21507)
- Move rolling to polars-compute (#21503)
- Remove Growable in favor of ArrayBuilder (#21500)
- Introduce a Sink Node trait in the new streaming engine (#21458)
- Add test for rolling stability sort (#21456)
- Add test for empty
.is_in
predicate filter (#21455) - Test for unique length on multiple columns (#21418)
Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @MarcoGorelli, @Matt711, @alexander-beedie, @banflam, @braaannigan, @coastalwhite, @dependabot[bot], @etiennebacher, @ghuls, @kevinjqliu, @lukemanley, @mcrumiller, @nameexhaustion, @orlp, @ritchie46, @stijnherfst, @thomasjpfan and dependabot[bot]
Python Polars 1.23.0
🚀 Performance improvements
- Toggle projection pushdown for eager rolling (#21405)
- Fix pathologic
rolling + group-by
performance and memory explosion (#21403) - Add sampling to new-streaming equi join to decide between build/probe side (#21197)
✨ Enhancements
- Implement i128 -> str cast (#21411)
- Connect polars-cloud (#21387)
- Version DSL (#21383)
- Make user facing binary formats mostly self describing (#21380)
- Filter hive files using predicates in new streaming (#21372)
- Add negative slicing to new streaming multiscan (#21219)
- Allow iterable of frames as input to
align_frames
(#21209) - Implement sorted flags for struct series (#21290)
- Support reading arrow Map type from Delta (#21330)
- Add a dedicated
remove
method forDataFrame
andLazyFrame
(#21259) - Rename
credentials
parameter tocredential
inCredentialProviderAzure
(#21295) - Implement
merge_sorted
for struct (#21205) - Add positive slice for new streaming MultiScan (#21191)
- Don't take in rewriting visitor (#21212)
- Add SQL support for the
DELETE
statement (#21190) - Add row index to new streaming multiscan (#21169)
- Improve DataFrame fmt in explain (#21158)
🐞 Bug fixes
- Method
dt.ordinal_day
was returning UTC results as opposed to those on the local timestamp (#21410) - Use Kahan summation for rolling sum kernels. Fix numerical stability issues (#21413)
- Add scalar checks for
n
andfill_value
parameters inshift
(#21292) - Upcast small integer dtypes for rolling sum operations (#21397)
- Don't silently produce null values from invalid input to
pl.datetime
andpl.date
(#21013) - Allow duration multiplied w/ primitive to propagate in IR schema (#21394)
- Struct arithmetic broadcasting behavior (#21382)
- Prefiltered optional plain primitive kernel (#21381)
- Panic when projecting only row index from IPC file (#21361)
- Properly update groups after
gather
in aggregation context (#21369) - Mark test as may_fail_auto_streaming (#21373)
- Properly set
fast_unique
in EnumBuilder (#21366) - Rust test race condition (#21368)
- Fix unequal DataFrame column heights from parquet hive scan with filter (#21340)
- Fix ColumnNotFound error selecting
len()
after semi/anti join (#21355) - Merge Parquet nested and flat decoders (#21342)
- Incorrect atomic ordering in Connector (#21341)
- Method
dt.offset_by
was discarding month and year info if day was included in offset for timezone-aware columns (#21291) - Fix pickling
polars.col
on Python versions <3.11 (#21333) - Fix duplicate column names after join if suffix already present (#21315)
- Skip Batches Expression for boolean literals (#21310)
- Fix performance regression for eager
join_where
(#21308) - Fix incorrect predicate pushdown for predicates referring to right-join key columns (#21293)
- Panic in
to_physical
for series of arrays and lists (#21289) - Resolve deadlock due to leaking in Connector recv drop (#21296)
- Incorrect result for merge_sorted with lexical categorical (#21278)
- Add
Int128
path forjoin_asof
(#21282) - Categorical min/max returning String dtype rather than Categorical (#21232)
- Checking overflow in Sliced function (#21207)
- Adding a struct field using a literal raises InvalidOperationError (#21254)
- Return nulls for
is_finite
,is_infinite
, andis_nan
when dtype ispl.Null
(#21253) - Account for minor change in new
connectorx
release (#21277) - Properly implement and test Skip Batch Predicate (#21269)
- Infinite recursion when broadcasting into struct zip_outer_validity (#21268)
- Deadlock due to bad logic in new-streaming join sampling (#21265)
- Incorrect result for top_k/bottom_k when input is sorted (#21264)
- UTF-8 validation of nested string slice in Parquet (#21262)
- Raise instead of panicking when casting a Series to a Struct with the wrong number of fields (#21213)
- Defer credential provider resolution to take place at query collection instead of construction (#21225)
- Do not panic in
strptime()
ifformat
ends with '%' (#21176) - Raise error instead of panicking for unsupported SQL operations (#20789)
- Projection of only row index in new streaming IPC (#21167)
- Fix projection count query optimization (#21162)
📖 Documentation
- Fix doc for SQL Functions navigation (#21412)
- Fix initial selector example (#21321)
- Add pandas strictness API difference (#21312)
- Improve
Expr.name.map
docstring example (#21309) - Add logo to Ask AI (#21261)
- Fix docs for Catalog (#21252)
- AI widget again (#21257)
- Revert plugin (#21250)
- Add kappa ask ai widget (#21243)
- Update social icons in API reference docs (#21214)
- Improve Arrow key feature description (#21171)
- Improve example in IO plugins user guide (#21146)
🛠️ Other improvements
- Move storage of hive partitions to DataFrame (#21364)
- Feature gate merge sorted in new streaming engine (#21338)
- Remove new streaming old multiscan (#21300)
- Add tests for fixed open issues (#21185)
- Try to mimic all steps (#21249)
- Require version for POLARS_VERSION (#21248)
- Fix docs (#21246)
- Avoid unnecessary
packaging
dependency (#21223) - Remove unused file (#21240)
- Add use_field_init_shorthand = true to rustfmt (#21237)
- Don't mutate arena by default in Rewriting Visitor (#21234)
- Disable the TraceMalloc allocator (#21231)
- Add feature gate to old streaming deprecation warning (#21179)
- Install seaborn when running remote benchmark (#21168)
Thank you to all our contributors for making this release possible!
@GiovanniGiacometti, @JakubValtar, @MarcoGorelli, @Matt711, @Shoeboxam, @YichiZhang0613, @alexander-beedie, @bschoenmaeckers, @coastalwhite, @edwinvehmaanpera, @erikbrinkman, @etiennebacher, @hemanth94, @henryharbeck, @jqnatividad, @lukemanley, @mcrumiller, @nameexhaustion, @orlp, @r-brink, @ritchie46 and @ydagosto
Python Polars 1.22.0
🚀 Performance improvements
- Reduce sharing in stringview arrays in new-streaming equijoin (#21129)
- Implement native Expr.count() on new-streaming (#21126)
- Speed up list operations that use amortized_iter() (#20964)
- Use Cow as output for rechunk and add rechunk_mut (#21116)
- Reduce arrow slice mmap overhead (#21113)
- Reduce conversion cost in chunked string gather (#21112)
- Enable prefiltered by default for new streaming (#21109)
- Enable parquet column expressions for streaming (#21101)
- Deduplicate buffers again in stringview concat kernel (#21098)
- Add dedicated concatenate kernels (#21080)
- Rechunk only once during join probe gather (#21072)
- Micro-optimise internal
DataFrame
height and width checks (#21071) - Speed up from_pandas when converting frame with multi-index columns (#21063)
- Change default memory prefetch to MADV_WILLNEED (#21056)
- Remove cast to boolean after comparison in optimizer (#21022)
- Split last rowgroup among all threads in new-streaming parquet reader (#21027)
- Recombine into larger morsels in new-streaming join (#21008)
- Improve
list.min
andlist.max
performance for logical types (#20972) - Ensure count query select minimal columns (#20923)
✨ Enhancements
- Add projection pushdown to new streaming multiscan (#21139)
- Implement join on struct dtype (#21093)
- Use unique temporary directory path per user and restrict permissions (#21125)
- Enable ingest of objects supporting the PyCapsule interface via
from_arrow
(#21128) - Enable new streaming multiscan for CSV (#21124)
- Environment
POLARS_MAX_CONCURRENT_SCANS
in multiscan for new streaming (#21127) - Ensure AWS credential provider sources AWS_PROFILE from environment after deserialization (#21121)
- Multi/Hive scans in new streaming engine (#21011)
- Add
linear_spaces
(#20941) - IO plugins suppport lazy schema (#21079)
- Add
write_table()
function to Unity catalog client (#21089) - Add
is_object
method to PolarsDataType
class (#21074) - Implement
merge_sorted
for binary (#21045) - Hold string cache in new streaming engine and fix row-encoding (#21039)
- Add CredentialProviderAzure parameter to accept user-instantiated azure credential classes (#21047)
- Expose unity catalog dataclasses and type aliases (#21046)
- Support max/min method for Time dtype (#19815)
- Implement a streaming merge sorted node (#20960)
- Automatically use temporary credentials API for scanning Unity catalog tables (#21020)
- Add negative slice support to new-streaming engine (#21001)
- Allow for more RG skipping by rewriting expr in planner (#20828)
- Rename catalog
schema
tonamespace
(#20993) - Add functionality to create and delete catalogs, tables and schemas to Unity catalog client (#20956)
- Allow custom JSONEncoder for the
json_normalize
function, minor speedup (#20966) - Support passing
aws_profile
instorage_options
(#20965) - Improved support for KeyboardInterrupts (#20961)
- Make the available
concat
alignment strategies more generic (#20644) - Extract timezone info from python datetimes (#20822)
- Add hint for
POLARS_AUTO_USE_AZURE_STORAGE_ACCOUNT_KEY
to error message (#20942) - Filter Parquet pages with
ParquetColumnExpr
(#20714) - Expose descending and nulls last in window order-by (#20919)
🐞 Bug fixes
- Fix
Expr.over
applying scale incorrectly for Decimal types (#21140) - Fix IO plugin predicate with failed serialization (#21136)
- Ensure
lit
handles datetimes with tzinfo that represents a fixed offset from UTC (#21003) - Correctly implement take_(opt_)chunked_unchecked for structs (#21134)
- Restore printing backtraces on panics (#21131)
- Use microseconds for Unity catalog datetime unit (#21122)
- Fix incorrect output height for SQL
SELECT COUNT(*) FROM
(#21108) - Validate/coerce types for comparisons within join_where predicates (#21049)
- Do not auto-init credential providers if credential fetch returns error (#21090)
- Fix
join_where
incorrectly dropping transformations on RHS of equality expressions (#21067) - Quadratic allocations when loading nested Parquet column metadata (#21050)
- Invalidate sortedness flag when sorting from pl.Categorical to pl.Categorical("lexical") (#21044)
- Calling
top_k
on list type panics (#21043) - Fix rolling on empty DataFrame panicking (#21042)
- Fix
set_tbl_width_chars
panicking with negative width (#20906) - Ensure
write_excel
recognises the Array dtype and writes it out as a string (#20994) - Fix
merge_sorted
producing incorrect results or panicking for some logical types (#21018) - Fix all-null list aggregations returning Null dtype (#20992)
- Ensure scalar-only with_columns are broadcasted on new-streaming (#20983)
- Improve SQL interface behaviour when
INTERVAL
is not a fixed duration (#20958) - Address minor regression for one-column DataFrame passed to
is_in
expressions (#20948) - Add Arrow Float16 conversion DataType (#20970)
- Revert length check of
patterns
instr.extract_many()
(#20953) - Add maintain order for flaky new-streaming test (#20954)
- Allow for respawning of new streaming sinks (#20934)
- Ensure Function name correctness in cse (#20929)
- Don't consume c_stream as iterable (#20899)
- Validate
pl.Array
shape argument types (#20915) - Fix
from_numpy
returning Null dtype for empty 1D numpy array (#20907) - Consider the original dtypes when selecting columns in
write_excel
function (#20909) - Handle boolean comparisons in Iceberg predicate pushdown (#18199)
- Fix
map_elements
panicking with Decimal type (#20905)
📖 Documentation
- Replace pandas
where
withmask
in Migrating -> Coming from Pandas (#21085) - Correct Arrow misconception (#21053)
- Add example showing use of
write_delta
withdelta_lake.WriterProperties
(#20746) - Add missing
shape
param toArray
docstring (#20747) - Add IO plugins to Python API reference (#21028)
- Document IO plugins (#20982)
- Ensure
set_sorted
description references single-column behavior (#20709)
📦 Build system
- Speed up CI by running a few more tests in parallel (#21057)
🛠️ Other improvements
- Add test for equality filters in Parquet (#21114)
- Add various tests for open issues (#21075)
- Upgrade packages and apply latest formatting (#21086)
- Move python dsl and builder_dsl code to dsl folder (#21077)
- Organize python related logics in polars-plan (#21070)
- Improve binary dispatch (#21061)
- Skip physical order test (#21060)
- Fix new ruff lints (#21040)
- Added test to check for the computation of list.len for null (#20938)
- Add make fix for running cargo clippy --fix (#21024)
- Add tests for resolved issues (#20999)
- Update code coverage workflow to use macos-latest runners (#20995)
- Remove unused arrow file (#20974)
- Deprecate the old streaming engine (#20949)
- Move
dt.replace
tests to dedicated file, add "typing :: typed" classifier, remove unused testing function (#20945) - Extract merge sorted IR node (#20939)
- Update copyright year (#20764)
- Move Parquet deserialization to
BitmapBuilder
(#20896) - Also publish polars-python (#20933)
- Remove verify_dict_indices_slice from main (#20928)
- Add tests for already resolved issues (#20921)
- Fix the
verify_dict_indices
codegen (#20920) - Add ProjectionContext in projection pushdown opt (#20918)
Thank you to all our contributors for making this release possible!
@FBruzzesi, @MarcoGorelli, @aberres, @alexander-beedie, @arnabanimesh, @bschoenmaeckers, @coastalwhite, @deanm0000, @dependabot[bot], @dimfeld, @eitsupi, @etiennebacher, @henryharbeck, @itamarst, @lmmx, @lukemanley, @mcrumiller, @mullimanko, @nameexhaustion, @orlp, @petrosbar, @ritchie46, @siddharth-vi, @skritsotalakis, @taureandyernv and dependabot[bot]
Rust Polars 0.46.0
🏆 Highlights
- Add new
Int128Type
(#20232)
💥 Breaking changes
- Support writing partitioned parquet to cloud (#20590)
🚀 Performance improvements
- Use BitmapBuilder in yet more places (#20868)
- Make an owned version of append (#20800)
- Use BitmapBuilder in a lot more places (#20776)
- Extend functionality on BitmapBuilder and use in Growables (#20754)
- Specialize first/last agg for simple types in new-streaming engine (#20728)
- Improve state caching and parallelism of window functions (#20689)
- Broadcast without materialization in
concat_arr
(#20681) - Cache rolling groups (#20675)
- Use downcast_ref instead of dtype equality in
<dyn SeriesTrait as AsRef<ChunkedArray<T>>
(#20664) - Fix performance regression for DataFrame serialization/pickling (#20641)
- Make Parquet
verify_dict_indices
SIMD (#20623) - Move to
zlib-rs
by default and usezstd::with_buffer
(#20614) - Skip filter expansion in eager (#20586)
- Use AtomicWaker in async engine task joiner (#20604)
- Move morsel distribution to the computational async engine (#20600)
- Improve unique pred-pd (#20569)
- Collapse expanded filters in eager (#20493)
- Remove predicate from
IR::DataFrame
(#20492) - Add proper distributor to new-streaming parquet reader (#20372)
- Use different binview dedup strategy depending on chunks ratio (#20451)
- Generalize the
arg_sort
fast path ontoColumn
(#20437) - Dedup binviews up front (#20449)
- Re-enable common subplan elim for new-streaming engine (#20443)
- Don't collect all LHS arrays in gather (#20441)
- Remove prepare_series for gather kernels (#20439)
- Don't always take all data buffers when gathering views (#20435)
- Order observability optimizations (#20396)
- Purge ChunkedArray Metadata (#20371)
- Drop probe tables in parallel in new-streaming equi-join (#20373)
- Explicit transpose in new-streaming equi-join finalize (#20363)
- Cache dtype on ExprIR (#20331)
✨ Enhancements
- Expose descending and nulls last in window order-by (#20919)
- Add
linear_space
(#20678) - Implement df.unique() on new-streaming engine (#20875)
- Add unique operations for Decimal dtype (#20855)
- Add NDJson sink for the new streaming engine (#20805)
- Support nested keys in window functions (#20837)
- Add CSV sink for the new streaming engine (#20804)
- Periodically check python signals ('CTRL-C' handling) (#20826)
- Experimental unity catalog client (#20798)
- Support cumulative aggregations for
Decimal
dtype (#20802) - Improve window function caching strategy (#20791)
- Allow different python versions for pickle (#20740)
- Add SQL support for the
NORMALIZE
string function (#20705) - Add 'allow_exact_matches' join_asof' (#20723)
- Add new-streaming first/last aggregations (#20716)
- Add Parquet Sink to new streaming engine (#20690)
- Expose IRBuilder (#20710)
- Make automatic use of Azure storage account keys opt-in (#20652)
- Improve
GroupsProxy/GroupsPosition
to be sliceable and cheaply cloneable (#20673) - Add
str.normalize()
(#20483) - Allow more group_by agg expressions in the new streaming engine (#20663)
- Support writing partitioned parquet to cloud (#20590)
- Add hint to error message for extra struct field in JSON (#20612)
- Add
index_of()
function toSeries
andExpr
(#19894) - Update
sqlparser-rs
, enabling "LEFT" keyword to be optional for anti/semi joins in SQL queries (#20576) - Add
cat.starts_with
/cat.ends_with
(#20257) - Add
Int128
IO support for csv & ipc (#20535) - Support arbitrary expressions in 'join_where' (#20525)
- Allow more join lossless casting (#20474)
- Always resolve dynamic types in schema (#20406)
- Order observability optimizations (#20396)
- Add FirstArgLossless supertype (#20394)
- Add
dt.replace
(#19708) - Polars build for Pyodide (#20383)
- Add Azure credential provider using
DefaultAzureCredential()
(#20384) - Add env var to ignore file cache allocate error (#20356)
- Enable joins between compatible differing numeric key columns (#20332)
- Cache dtype on ExprIR (#20331)
- Serialize DataFrame/Series using IPC in serde (#20266)
- Improve error message on SchemaError (#20326)
- Use better error messages when opening files (#20307)
- Add 'skip_lines' for CSV (#20301)
- Allow subtraction of time dtype columns (#20300)
- Add
bin.reinterpret
(#20263) - Allow decoding of non-Polars arrow dictionaries in Arrow and Parquet (#20248)
- Add new
Int128Type
(#20232) - IR formatting QoL improvements (#20246)
- Add
cat.len_chars
andcat.len_bytes
(#20211) - Expose AexprArena (#20230)
🐞 Bug fixes
- Fix
from_numpy
returning Null dtype for empty 1D numpy array (#20907) - Fix
map_elements
panicking with Decimal type (#20905) - Warn if asof keys not sorted (#20887)
- Avoid name collisions and panicking in object conversion (#20890)
- Incorrect scale used in
log
andexp
for Decimal type (#20888) - Don't deep clone manuallydrop in GroupsPosition (#20886)
- Fix DuplicateError when selecting columns after
join_where
or cross join + filter (#20865) - Incorrect
Decimal
value forfill_null(strategy="one")
(#20844) - Fix one edge case (out of many) of int128 literals not working (#20830)
- Add height check to frame-level row indexing when key is int (#20778)
- Remove
assert
that panics ongroup_by
followed byhead(n)
, wheren
is larger then the frame height (#20819) - Fix panic
InvalidHeaderValue
scanning from S3 on Windows (#20820) - Fix
clip
forDecimal
returning wrong values (#20814) - Incorrect height from slicing after projecting only the file path column (#20817)
- Shift mask when skipping Bitpacked values in Parquet (#20810)
- Error instead of truncate if length mismatch for several
str
functions (#20781) - Support cumulative aggregations for
Decimal
dtype (#20802) - Do not print sensitive information to output on
POLARS_VERBOSE
(#20797) - Ignore file cache allocation error if
fallocate()
is not permitted (#20796) - Incorrect logic in
assert_series_equal
for infinities (#20763) - Avoid blocking on async runtime when resolving cloud scans (#20750)
- Fix
allow_invalid_certificates
being ignored instorage_options
(#20744) - Incorrect output type for
map_groups
returning all-NULL column (#20743) - Fix
unique(maintain_order=True)
raisingInvalidOperationError
for null array (#20737) - Don't collapse into a Nested Loop Join if the cross join maintains order (#20729)
- Don't serialize credentials provider (#20741)
- Fix
Series.n_unique
raising for list of struct (#20724) - Fix incorrect top-k by sorted column, fix
head()
returning extra rows (#20722) - Add outer validity to AnyValueBufferTrusted for structs (#20713)
- Don't partition group-by with non-scalar literals in agg (#20704)
- Incorrect view buffer dedup (#20691)
- Only verify Parquet ConvertedType if no LogicalType is given (#20682)
- Validate length of
schema_overrides
inread_csv
(#20672) - Fix
map_elements
ignoringskip_nulls=True
for struct dtype (#20668) - Check for MAP-GROUPS in cloud-eligible (#20662)
- Fix empty output of
to_arrow()
on filtered unit height DataFrame (#20656) - Add
.default
to azure credential provider scope URL (#20651) - Fix
join_asof
panicking for invalidtolerance
input (#20643) - Incorrect flag check on is_elementwise (#20646)
- Don't panic but set null type if type is unknown (#20647)
- Fix performance regression for DataFrame serialization/pickling (#20641)
- Fix
Int128
dtype serialization (#20629) - Ensure that SQL
LIKE
andILIKE
operators support multi-line matches (#20613) - Properly broadcast in sort_by (#20434)
- Properly load nested Parquet Statistics (#20610)
- AWS environment config was not loaded when credential provider was used (#20611)
- Fix order observability of group-by-dyn (#20615)
- Soundness when loading Parquet string statistics (#20585)
- Fix error filtering after
with_columns()
on unit height LazyFrame (#20584) - Restore symbols on Apple by bumping nightly version (#20563)
- Fix variable name in error message for "unsupported data type" in rolling and upsampling operations (#20553)
- Output index type instead of u32 for
sum_horizontal
with boolean inputs (#20531) - Fix more global categorical issues (#20547)
- Update eager join doctest on multiple columns (#20542)
- Revert categorical unique code (#20540)
- Add
unique
fast path for empty categoricals (#20536) - Fix various
Int128
operations (#20515) - Fix global cat unique (#20524)
- Fix union (#20523)
- Fix rolling aggregations for various integer types (#20512)
- Ensure
ignore_nulls
is respected in horizontal sum/mean (#20469) - Fix incorrectly added sorted flag after append for lexically ordered categorical series (#20414)
- More
Int128
testing and related fixes (#20494) - Validate column names in
unique()
for empty DataFrames (#20411) - Implement
list.min
andlist.max
forlist[i128]
(#20488) - Decimal from physical in horizontal min/max and shift (#20487)
- Don't remove sort if first/last strategy is set in unique (#20481)
- Fix join literal behavior (#20477)
- Validate asof join by args in IR resolving phase (#20473)
- Fix
align_frames
with single row panicking (#20466) - Allow multiple column sort for Decimal (#20452)
- Fix mode panicking for String dtype (#20458)
- Return correct schema for
sum_horizontal
with boolean dtype (#20459) - Properly handle
to_physical_repr
of nested types (#20413) - Workaround for
mmap
crash under Emscripten (#20418) - Fix using
new_columns
inscan_csv
with compressed file (#20412) - Fix decimal arithmetic schema (#20398)
- Raise on categorical search_sorted (#20395)
- Don't try to load non-existend List/FSL statistics (#20388)
- Propagate nulls for float methods on all numeric types (#20386)
- Add env var to ignore file cache allocate error (#20356)
- Flip o...
Python Polars 1.21.0
🚀 Performance improvements
- Use BitmapBuilder in yet more places (#20868)
- Make an owned version of append (#20800)
- Use BitmapBuilder in a lot more places (#20776)
✨ Enhancements
- Stabilize methods/functions (#20850)
- Add
linear_space
(#20678) - Improve string → temporal parsing in
read_excel
andread_ods
(#20845) - Implement df.unique() on new-streaming engine (#20875)
- Experimental credential provider support for Delta read/scan/write (#20842)
- Allow column expressions in DataFrame
unnest
(#20846) - Auto-initialize Python credential providers in more cases (#20843)
- Add unique operations for Decimal dtype (#20855)
- Add NDJson sink for the new streaming engine (#20805)
- Support nested keys in window functions (#20837)
- Add CSV sink for the new streaming engine (#20804)
- Periodically check python signals ('CTRL-C' handling) (#20826)
- Experimental unity catalog client (#20798)
- Support cumulative aggregations for
Decimal
dtype (#20802) - Account for SurrealDB Python API updates (handle both
SurrealDB
andAsyncSurrealDB
classes) inread_database
(#20799) - Drop
nest-asyncio
in favor of custom logic (#20793) - Improve window function caching strategy (#20791)
- Support
lakefs://
URI for delta scanner (#20757) - Additional support for loading
numpy.float16
values (as Float32) (#20769)
🐞 Bug fixes
- Warn if asof keys not sorted (#20887)
- Ensure explicit values given to
column_widths
override autofit inwrite_excel
(#20893) - Avoid name collisions and panicking in object conversion (#20890)
- Incorrect scale used in
log
andexp
for Decimal type (#20888) - Don't deep clone manuallydrop in GroupsPosition (#20886)
- Fix DuplicateError when selecting columns after
join_where
or cross join + filter (#20865) - Incorrect
Decimal
value forfill_null(strategy="one")
(#20844) - Fix one edge case (out of many) of int128 literals not working (#20830)
- Add height check to frame-level row indexing when key is int (#20778)
- Remove
assert
that panics ongroup_by
followed byhead(n)
, wheren
is larger then the frame height (#20819) - Selectors should raise on
+
between themselves (#20825) - Fix panic
InvalidHeaderValue
scanning from S3 on Windows (#20820) - Fix
clip
forDecimal
returning wrong values (#20814) - Incorrect height from slicing after projecting only the file path column (#20817)
- Shift mask when skipping Bitpacked values in Parquet (#20810)
- Error instead of truncate if length mismatch for several
str
functions (#20781) - Support cumulative aggregations for
Decimal
dtype (#20802) - Allow
is_in
values to be given as customCollection
(#20801) - Propagate null instead of panicking in
pl.repeat_by()
(#20787) - Do not print sensitive information to output on
POLARS_VERBOSE
(#20797) - Ignore file cache allocation error if
fallocate()
is not permitted (#20796) - Incorrect logic in
assert_series_equal
for infinities (#20763)
📖 Documentation
- Update source URL for
legislators-historical.csv
(#20858) - Update ML part of ecosystem user guide page (#20596)
🛠️ Other improvements
- Disable 'catalog' in build (#20897)
- Implement negative slice for new streaming IPC (#20866)
- Debloat Series bitops (#20873)
- Reduce python map bloat (#20871)
- Remove todo and test restriction for new-streaming (#20861)
- Dispatch to the in-mem engine for
AExpr::Gather
(#20862) - Dispatch to the in-memory engine for multifile sources (#20860)
- Add tests for open issues (#20857)
- Mark 'register_startup' as unsafe (#20841)
- Reduce mode bloat (#20839)
- Rename
ContainsMany
toContainsAny
(#20785) - Unpin NumPy in type checking workflow (#20792)
- Add various tests (#20768)
- Small drive-by's (#20772)
- Touch the upload probe for the remote benchmark (#20767)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @arnabanimesh, @braaannigan, @burakemir, @coastalwhite, @etiennebacher, @ion-elgreco, @itamarst, @lukemanley, @mcrumiller, @nameexhaustion, @orlp, @ritchie46 and @stinodego
Python Polars 1.20.0
⚠️ Deprecations
- Make parameter of
str.to_decimal
keyword-only (#20570)
🚀 Performance improvements
- Extend functionality on BitmapBuilder and use in Growables (#20754)
- Specialize first/last agg for simple types in new-streaming engine (#20728)
- Use PyO3 to convert between Python and Rust datetimes (#20660)
- Improve state caching and parallelism of window functions (#20689)
- Broadcast without materialization in
concat_arr
(#20681) - Cache rolling groups (#20675)
- Use downcast_ref instead of dtype equality in
<dyn SeriesTrait as AsRef<ChunkedArray<T>>
(#20664) - Fix performance regression for DataFrame serialization/pickling (#20641)
- Make Parquet
verify_dict_indices
SIMD (#20623) - Move to
zlib-rs
by default and usezstd::with_buffer
(#20614) - Skip filter expansion in eager (#20586)
- Improve unique pred-pd (#20569)
✨ Enhancements
- Allow different python versions for pickle (#20740)
- Add SQL support for the
NORMALIZE
string function (#20705) - Add 'allow_exact_matches' join_asof' (#20723)
- Add new-streaming first/last aggregations (#20716)
- Add Parquet Sink to new streaming engine (#20690)
- Make automatic use of Azure storage account keys opt-in (#20652)
- Reduce scan_csv() (and friends') memory usage when using BytesIO (#20649)
- Improve
GroupsProxy/GroupsPosition
to be sliceable and cheaply cloneable (#20673) - Add
str.normalize()
(#20483) - Allow more group_by agg expressions in the new streaming engine (#20663)
- Support loading Excel Table objects by name (#20654)
- Support writing to file objects from
write_excel
(#20638) - Raise
DuplicateError
if given a pyarrow Table object with duplicate column names (#20624) - Support writing partitioned parquet to cloud (#20590)
- Add hint to error message for extra struct field in JSON (#20612)
- Add
index_of()
function toSeries
andExpr
(#19894) - Update
sqlparser-rs
, enabling "LEFT" keyword to be optional for anti/semi joins in SQL queries (#20576) - Add
cat.starts_with
/cat.ends_with
(#20257)
🐞 Bug fixes
- Avoid blocking on async runtime when resolving cloud scans (#20750)
- Fix
allow_invalid_certificates
being ignored instorage_options
(#20744) - Incorrect output type for
map_groups
returning all-NULL column (#20743) - Fix
unique(maintain_order=True)
raisingInvalidOperationError
for null array (#20737) - Don't collapse into a Nested Loop Join if the cross join maintains order (#20729)
- Don't serialize credentials provider (#20741)
- Fix
Series.n_unique
raising for list of struct (#20724) - Fix incorrect top-k by sorted column, fix
head()
returning extra rows (#20722) - Add outer validity to AnyValueBufferTrusted for structs (#20713)
- Don't partition group-by with non-scalar literals in agg (#20704)
- Fix xor operation of selector with Expr (#20702)
- Incorrect view buffer dedup (#20691)
- Only verify Parquet ConvertedType if no LogicalType is given (#20682)
- Validate length of
schema_overrides
inread_csv
(#20672) - Fix
map_elements
ignoringskip_nulls=True
for struct dtype (#20668) - Check for MAP-GROUPS in cloud-eligible (#20662)
- Fix empty output of
to_arrow()
on filtered unit height DataFrame (#20656) - Add
.default
to azure credential provider scope URL (#20651) - Fix
join_asof
panicking for invalidtolerance
input (#20643) - Incorrect flag check on is_elementwise (#20646)
- Don't panic but set null type if type is unknown (#20647)
- Fix performance regression for DataFrame serialization/pickling (#20641)
- Fix
Int128
dtype serialization (#20629) - Ensure
read_excel
andread_ods
support reading from rawbytes
for all engines (#20636) - Ensure that SQL
LIKE
andILIKE
operators support multi-line matches (#20613) - Properly broadcast in sort_by (#20434)
- Properly load nested Parquet Statistics (#20610)
- AWS environment config was not loaded when credential provider was used (#20611)
- Fix order observability of group-by-dyn (#20615)
- Soundness when loading Parquet string statistics (#20585)
- Fix error filtering after
with_columns()
on unit height LazyFrame (#20584) - Propagate
tenant_id
toCredentialProviderAzure
if given (#20583) - Restore symbols on Apple by bumping nightly version (#20563)
- Fix type annotation of
str.strip_chars_*
methods (#20565) - Fix variable name in error message for "unsupported data type" in rolling and upsampling operations (#20553)
📖 Documentation
- Add more information for cross joins (#20753)
- Fix typo in sql functions (cosinus -> cosine) (#20676)
- Add links to
read_excel
"engine_options" and "read_options" docstring (#20661) - Fix small typo in plugins (polars-dt -> polars-st) (#20657)
- Add polars-h3 and polars-st to plugin list (#20653)
- Add docs reference for
Field
(#20625) - Update
DataFrame
join examples (#20587) - Miscellaneous minor updates/fixes (#20573)
- Update "group_by_rolling" (deprecated) to "rolling" in user guide (#20548)
📦 Build system
🛠️ Other improvements
- Fix remote benchmark script (#20755)
- Fix tests (#20745)
- Simplify hive predicate handling in
NEW_MULTIFILE
(#20730) - Add tests for various open issues (#20720)
- Fixes an Excel test following new
fastexcel
release (#20703) - Add tests for various open issues that have been fixed (#20680)
- Don't include debug symbols in benchmark run (#20571)
- Implement CSV, IPC and NDJson in the
MultiScanExec
node (#20648) - Don't rely on argument order of optimization_toggle (#20622)
- Fix Python deps installation in remote-benchmark workflow (#20619)
- Fix flaky categorical test (#20591)
- Bump multiversion from 0.7 to 0.8 (#20543)
- Remove unused nested function in
LazyFrame.fill_null
(#20558) - Improve bin size info (#20551)
Thank you to all our contributors for making this release possible!
@Jesse-Bakker, @MarcoGorelli, @MoizesCBF, @SamuelAllain, @alexander-beedie, @bschoenmaeckers, @coastalwhite, @eitsupi, @etiennebacher, @itamarst, @jqnatividad, @lukemanley, @mcrumiller, @nameexhaustion, @orlp, @ritchie46 and @stinodego
Python Polars 1.19.0
🚀 Performance improvements
- Collapse expanded filters in eager (#20493)
- Remove predicate from
IR::DataFrame
(#20492) - Use different binview dedup strategy depending on chunks ratio (#20451)
- Generalize the
arg_sort
fast path ontoColumn
(#20437) - Dedup binviews up front (#20449)
- Re-enable common subplan elim for new-streaming engine (#20443)
- Don't collect all LHS arrays in gather (#20441)
- Remove prepare_series for gather kernels (#20439)
- Don't always take all data buffers when gathering views (#20435)
✨ Enhancements
- Add
Int128
IO support for csv & ipc (#20535) - Support arbitrary expressions in 'join_where' (#20525)
- Allow use of Python types in
cs.by_dtype
andcol
(#20491) - Add an "include_file_paths" parameter to
read_excel
andread_ods
(#20476) - Allow more join lossless casting (#20474)
- Accept more generic
Iterable[bool]
in Series.filter (#20431) - Allow loading data from multiple Excel/ODS workbooks and worksheets (#20465)
🐞 Bug fixes
- Output index type instead of u32 for
sum_horizontal
with boolean inputs (#20531) - Fix more global categorical issues (#20547)
- Update eager join doctest on multiple columns (#20542)
- Revert categorical unique code (#20540)
- Add
unique
fast path for empty categoricals (#20536) - Fix various
Int128
operations (#20515) - Fix global cat unique (#20524)
- Fix union (#20523)
- Fix rolling aggregations for various integer types (#20512)
- Ensure
ignore_nulls
is respected in horizontal sum/mean (#20469) - Fix incorrectly added sorted flag after append for lexically ordered categorical series (#20414)
- More
Int128
testing and related fixes (#20494) - Validate column names in
unique()
for empty DataFrames (#20411) - Implement
list.min
andlist.max
forlist[i128]
(#20488) - Decimal from physical in horizontal min/max and shift (#20487)
- Don't remove sort if first/last strategy is set in unique (#20481)
- Fix join literal behavior (#20477)
- Validate asof join by args in IR resolving phase (#20473)
- Fix
align_frames
with single row panicking (#20466) - Allow multiple column sort for Decimal (#20452)
- Fix mode panicking for String dtype (#20458)
- Return correct schema for
sum_horizontal
with boolean dtype (#20459) - Fix return type for
add_business_days
,millennium
,century
andcombine
methods inSeries.dt
namespace (#20436)
📖 Documentation
- Fix typo in
DataFrame.cast
(#20532) - Fix flaky doctests (#20516)
- Add examples for bitwise expressions (#20503)
- Clarify the join pre-condition of
join_asof
(#20509) - Fix
Expr.all
description of Kleene logic (#20409)
🛠️ Other improvements
- Increase categorical test coverage (#20514)
- Report wheel sizes (#20541)
- Add tests for
floor/ceil
on integers (#20479) - Expose and rewrite 'can_pre_agg' (#20450)
- Skip test on windows; kuzu import segfaults (#20463)
- Add a
TypeCheckRule
to the optimizer (#20425)
Thank you to all our contributors for making this release possible!
@Biswas-N, @IndexSeek, @Prathamesh-Ghatole, @Terrigible, @alexander-beedie, @brifitz, @coastalwhite, @dependabot, @dependabot[bot], @jqnatividad, @lukemanley, @mcrumiller, @orlp, @ritchie46 and @siddharth-vi
Python Polars 1.18.0
🏆 Highlights
- Add new
Int128Type
(#20232)
🚀 Performance improvements
- Order observability optimizations (#20396)
- Purge ChunkedArray Metadata (#20371)
- Explicit transpose in new-streaming equi-join finalize (#20363)
- Cache dtype on ExprIR (#20331)
- Lower overhead for
BytecodeParser
on introspection of incompatible UDFs (#20280)
✨ Enhancements
- Always resolve dynamic types in schema (#20406)
- Support loading data from multiple Excel/ODS workbooks (#20404)
- Add "drop_empty_cols" parameter for
read_excel
andread_ods
(#20430) - Order observability optimizations (#20396)
- Add FirstArgLossless supertype (#20394)
- Add
dt.replace
(#19708) - Polars build for Pyodide (#20383)
- Add Azure credential provider using
DefaultAzureCredential()
(#20384) - Add env var to ignore file cache allocate error (#20356)
- Enable joins between compatible differing numeric key columns (#20332)
- Cache dtype on ExprIR (#20331)
- Serialize DataFrame/Series using IPC in serde (#20266)
- Improve error message on SchemaError (#20326)
- Use better error messages when opening files (#20307)
- Add 'skip_lines' for CSV (#20301)
- Allow subtraction of time dtype columns (#20300)
- Add
bin.reinterpret
(#20263) - Allow decoding of non-Polars arrow dictionaries in Arrow and Parquet (#20248)
- Streamline creation of empty frame from
Schema
(#20267) - Add
cat.len_chars
andcat.len_bytes
(#20211) - Expose AexprArena (#20230)
🐞 Bug fixes
- Fix nullable object in map_elements (#20422)
- Properly handle
to_physical_repr
of nested types (#20413) - Properly raise UDF errors (#20417)
- Workaround for
mmap
crash under Emscripten (#20418) - Fix using
new_columns
inscan_csv
with compressed file (#20412) - Fix return type of
Series.dt.add_business_days
(#20402) - Fix decimal series dispatch (#20400)
- Fix decimal arithmetic schema (#20398)
- Raise on categorical search_sorted (#20395)
- Fix plotting f-strings and docstrings (#20399)
- Don't try to load non-existend List/FSL statistics (#20388)
- Propagate nulls for float methods on all numeric types (#20386)
- Add env var to ignore file cache allocate error (#20356)
- Flip order on right join (#20358)
- Correctly parse special float values in
from_repr
(#20351) - Fix incorrect object store caching for ADLS URI (#20357)
- Use the same encoding for nullable as non-nullable arrays (#20323)
- Improve error message on SchemaError (#20326)
- Boolean optional slice pushdown (#20315)
- Properly handle
from_physical
for List/Array (#20311) - Ignore quotes in csv comments (#20306)
- Ensure pl.datetime returns empty column when input columns are empty (#20278)
- Ensure output height does not change on lazy projection pushdown with aggregations (#20223)
- Fix error writing on Windows to locations outside of C drive (#20245)
- Incorrect comparison in some cases with filtered list/array columns (#20243)
- Ensure height is maintained in SQL
SELECT 1 FROM
(#20241) - Properly account for updated Categorical in .unique() kernel (#20235)
📖 Documentation
- Improve docstring clarity (#20416)
- Update GPU engine installation instructions to remove
--extra-index-url
from CUDA 12 packages (#20381) - Remove Plugins overview page without information (#20348)
- Small fixes/clarifications in user guide (#20335)
- Improve docs about NaN (#20310)
- Fix substr function param definition (#19054)
- Include parquet options in BigQuery I/O write sample (#20292)
- Fix typo in
fork
warning (#20258)
📦 Build system
- Add
project.dynamic = ["version"]
to pyproject.toml (#20345) - Update
pyo3
andnumpy
crates to version0.23
(#20111) - Build wheels for ARM Windows in Python release workflow (#20247)
🛠️ Other improvements
- Enable masked out list, struct and array elements in parametric tests (#20365)
- Move hive partitioning/multi-file handling outside of readers (#20203)
- Purge ChunkedArray Metadata (#20371)
- Correcting misspelled return value and unifying regional spelling (#20375)
- Add test for
select(len())
(#20343) - Make parametric tests include
pl.List
andpl.Array
by default (#20319) - Use Column in Row Encoding (#20312)
- Don't warn on fork hook (#20309)
- Don't deconstruct
CsvParseOptions
(#20302) - Allow decoding of non-Polars arrow dictionaries in Arrow and Parquet (#20248)
- Prepare test suite for Python 3.13 support (#20297)
- Add
FunctionCastOptions
and conservative IR-level cast type-checking (#20286) - Add more descriptive error message for failure of vstack/extend (#20299)
- Clean up some remnants of Python 3.8 support (#20293)
- Add new
Int128Type
(#20232) - Add test for BytesIO overwritten after scan (#20240)
- Expose AexprArena (#20230)
Thank you to all our contributors for making this release possible!
@Jesse-Bakker, @Terrigible, @ZemanOndrej, @alexander-beedie, @balbok0, @beckernick, @bschoenmaeckers, @coastalwhite, @georgestagg, @hamdanal, @haocheng6, @kszlim, @lukemanley, @mcrumiller, @nameexhaustion, @noexecstack, @orlp, @ptiza, @r-brink, @ritchie46, @rodrigogiraoserrao, @stijnherfst, @stinodego, @tswast and @zero-stroke
Python Polars 1.17.1
🐞 Bug fixes
- Fix incorrect lazy
select(len())
with some select orderings (#20222) - Fix assertion panic on LazyFrame
scratch.is_empty()
(#20219)
Thank you to all our contributors for making this release possible!
@nameexhaustion and @ritchie46