Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38915: [Java] Upgrade Arrow Java project to JPMS Java Platform Module System #38876

Closed
wants to merge 35 commits into from

Conversation

jduo
Copy link
Member

@jduo jduo commented Nov 24, 2023

Rationale for this change

This will allow for static linking, better support for newer JDKs, and integration with tools that require modules.

What changes are included in this PR?

  • Create and integrate a maven plugin to compile module-info.java files without using a newer release target. This allows compiling JARs that have Java 8 bytecode but contain module-info.class files for JPMS support.
  • Refactor arrow-memory-netty such that code written in Netty packages go into a new maven module arrow-memory-netty-buffer-patch. The code from this module must be patched using patch-module when launching Java.
  • add module-info.java files for vector, memory modules, and format.
  • Update several plugins to use module-enabled versions:
    • maven-compiler-plugin is updated to 3.11.0
    • maven-shade-plugin is updated to 3.2.4
    • maven-dependency-plugin is updated to 3.1.2
    • CycloneDX is updated to 2.7.10
  • Update grpc-java to 1.59 for module support
  • Ignored TestBaseAllocator#testRootAllocator_getEmpty() for now

Are these changes tested?

Yes, unit tests now run with modules when using JDK9+.

Are there any user-facing changes?

Yes. The Netty module has been moved around significantly for users that depend on that code. Since modules are now named, the add-opens calls needed to run Arrow is significantly more complicated.

Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@jduo
Copy link
Member Author

jduo commented Nov 24, 2023

This is roughly comparable to where #13072 got to, minus the actual module-info.java files.
With the following additions:

  • Built on the current state of main
  • Updated gRPC to a module-friendly version (1.59).
  • Updated plugin dependencies to module-friendly versions.
  • Staged to compile the codebase with Java 8, with the exception of module-info files which will use a custom plugin (to be added. this is already implemented at https://github.com/Bit-Quill/maven-module-info-compiler-plugin and needs to get integrated to the Arrow build).

@davisusanibar @lidavidm @danepitkin

Next up will be adding the maven plugin, then adding the module-info files from #13072, then modularizing other modules such as algorithm, adapters, ...

@jduo
Copy link
Member Author

jduo commented Nov 24, 2023

This suggests that we need to install JDK 9+ on some of the build agents:
https://github.com/apache/arrow/actions/runs/6982068919/job/19000467907?pr=38876#step:6:791

@jduo
Copy link
Member Author

jduo commented Nov 24, 2023

This failure is happening for somewhat complicated reasons:
https://github.com/apache/arrow/actions/runs/6983421948/job/19004513065?pr=38876#step:6:2511

Before refactoring, the allocator used by this test would be the Netty module's DefaultAllocationManagerFactory since TestBaseAllocator lived in the Netty module. This one correctly returns a non-zero value when given a zero-allocation.

After refactoring, TestBaseAllocator lives in memory-core, and uses the dummy DefaultAllocationManagerFactory in memory-core's tests. This one however does return zero when given a zero-allocation. This might itself be a bug unrelated to the refactoring. This calls MemoryUtil.UNSAFE.allocateMemory(), which calls sun.misc.Unsafe.allocateMemory(), which is supposed to return a non-zero value.

I'm going to @ignore this test for now.

Should these allocation manager factorys return a non-zero address when the user asks for an empty allocation? I would assume so since C-allocators have this property.

@davisusanibar , did you see anything like this?
@lidavidm @danepitkin , any thoughts around this?

@jduo jduo requested a review from danepitkin November 24, 2023 19:21
@jduo jduo force-pushed the java-modules branch 2 times, most recently from 7344241 to c67a59b Compare November 26, 2023 14:39
@jduo
Copy link
Member Author

jduo commented Nov 26, 2023

I'm not sure what's happening in this CI failure after the Netty refactoring.
This test was in TestBaseAllocator before, but that test class got moved to memory-core. This test got separated into its own test class, moved back to Netty core (because it has Netty dependencies and wouldn't build in memory-core), and now throws a ConcurrentModificationException.

Error: org.apache.arrow.memory.netty.TestNettyAllocator.testMemoryUsage Time elapsed: 0.317 s <<< ERROR!
java.util.ConcurrentModificationException
at java.base/java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1604)
at java.base/java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:129)
at java.base/java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:527)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:513)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:230)
at java.base/java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:196)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.anyMatch(ReferencePipeline.java:632)
at org.apache.arrow.memory.netty.TestNettyAllocator.testMemoryUsage(TestNettyAllocator.java:56)

@jduo
Copy link
Member Author

jduo commented Nov 27, 2023

I'm not sure what's happening in this CI failure after the Netty refactoring.
This test was in TestBaseAllocator before, but that test class got moved to memory-core. This test got separated into its own test class, moved back to Netty core (because it has Netty dependencies and wouldn't build in memory-core), and now throws a ConcurrentModificationException.

Error: org.apache.arrow.memory.netty.TestNettyAllocator.testMemoryUsage Time elapsed: 0.317 s <<< ERROR!
java.util.ConcurrentModificationException
at java.base/java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1604)
at java.base/java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:129)
at java.base/java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:527)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:513)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:230)
at java.base/java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:196)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.anyMatch(ReferencePipeline.java:632)
at org.apache.arrow.memory.netty.TestNettyAllocator.testMemoryUsage(TestNettyAllocator.java:56)

This didn't happen in my last test run and all I did was change the module info computer plugin to inherit its version. So this is probably a threading issue. Maybe extracting this test from other TestBaseAllocator tests is causing them to run in parallel when they weren't before.

@davisusanibar
Copy link
Contributor

This failure is happening for somewhat complicated reasons: https://github.com/apache/arrow/actions/runs/6983421948/job/19004513065?pr=38876#step:6:2511

Before refactoring, the allocator used by this test would be the Netty module's DefaultAllocationManagerFactory since TestBaseAllocator lived in the Netty module. This one correctly returns a non-zero value when given a zero-allocation.

After refactoring, TestBaseAllocator lives in memory-core, and uses the dummy DefaultAllocationManagerFactory in memory-core's tests. This one however does return zero when given a zero-allocation. This might itself be a bug unrelated to the refactoring. This calls MemoryUtil.UNSAFE.allocateMemory(), which calls sun.misc.Unsafe.allocateMemory(), which is supposed to return a non-zero value.

I'm going to @ignore this test for now.

Should these allocation manager factorys return a non-zero address when the user asks for an empty allocation? I would assume so since C-allocators have this property.

@davisusanibar , did you see anything like this? @lidavidm @danepitkin , any thoughts around this?

Thanks for pushing this feature, James. I am observing changes. Does this problem persist?

@jduo
Copy link
Member Author

jduo commented Nov 27, 2023

This failure is happening for somewhat complicated reasons: https://github.com/apache/arrow/actions/runs/6983421948/job/19004513065?pr=38876#step:6:2511
Before refactoring, the allocator used by this test would be the Netty module's DefaultAllocationManagerFactory since TestBaseAllocator lived in the Netty module. This one correctly returns a non-zero value when given a zero-allocation.
After refactoring, TestBaseAllocator lives in memory-core, and uses the dummy DefaultAllocationManagerFactory in memory-core's tests. This one however does return zero when given a zero-allocation. This might itself be a bug unrelated to the refactoring. This calls MemoryUtil.UNSAFE.allocateMemory(), which calls sun.misc.Unsafe.allocateMemory(), which is supposed to return a non-zero value.
I'm going to @ignore this test for now.
Should these allocation manager factorys return a non-zero address when the user asks for an empty allocation? I would assume so since C-allocators have this property.
@davisusanibar , did you see anything like this? @lidavidm @danepitkin , any thoughts around this?

Thanks for pushing this feature, James. I am observing changes. Does this problem persist?

Yes this still happens. I have marked the test as ignored to get past it.
It's not clear if this is a new bug or if it is exposing a bug from the memory-core test version of DefaultAllocationManagerFactory.

@jduo
Copy link
Member Author

jduo commented Nov 28, 2023

I'm looking for some ideas on how to fix up a maven issue. I've rigged the general build configuration in the parent POM to continue to use JDK 8 and exclude module-info.java files. I've also shut off using the module path for dependencies.

In the memory-netty build, I need a profile (that runs on JDK9) that compiles using the module-path for dependencies (using the flag added to maven-compiler-plugin 3.11.0). It needs to run an extra command to patch code from another JAR into some of the packages that Netty modules export. I'd also like to compile module-info.java for this.

I likely need two executions for the Netty build. One that compiles everything in JDK8 except module-info.java, and one that uses JDK 9, compiles module-info.java and does the patch command. I don't want to compile anything else with JDK9 since I need to generate JDK8 bytecode for everything else (or I can compile everything in JDK9, then recompile everything except module-info.java in 8)

The problem I'm seeing is that setting the flag for setting the flag in the profile's configuration isn't taking effect. This problem happens when both when it's an execution-level configuration or a plugin-level configuration. It does take effect when set on the parent POM.

@jduo jduo changed the title Upgrade Arrow Java project to JPMS Java Platform Module System GH-38915: [Java] Upgrade Arrow Java project to JPMS Java Platform Module System Nov 28, 2023
Copy link

⚠️ GitHub issue apache/arrow-java#151 has been automatically assigned in GitHub to PR creator.

@jduo jduo marked this pull request as ready for review November 28, 2023 11:59
@jduo jduo requested a review from lidavidm as a code owner November 28, 2023 11:59
@jduo
Copy link
Member Author

jduo commented Nov 28, 2023

The test errors in memory-core have this cause:
Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make field long java.nio.Buffer.address accessible: module java.base does not "opens java.nio" to module org.apache.arrow.memory.core

I've addressed this is in maven by adding --add-opens java.base/java.nio=org.apache.arrow.memory.core, but perhaps I haven't covered a profile that CI is using. If anyone has any ideas on what else might need to be edited that'd be really helpful.

@jduo
Copy link
Member Author

jduo commented Nov 28, 2023

This PR has alot of changes. Perhaps we handle other modules (such as Flight, algorithm, dataset) in a separate PR so that this one can close.

@jduo
Copy link
Member Author

jduo commented Nov 28, 2023

These changes are requiring alot of patching/add-opens calls to work (see the changes It is very unfriendly for the user.
https://github.com/apache/arrow/pull/38876/files#diff-d8225ebcfc11b480a6e4f54e183b67c3ead51635a167c106d928c2abf1f9ef66R892

The patch-module call is particularly difficult because we, and the user, do not necessarily know where their JARs are.

@lidavidm
Copy link
Member

I'm not going to be able to review this for a while.

FWIW, I would be fine deferring other modules for another PR, and I would also be fine saying that arrow-memory-netty is just not supported when using modules because of the patching it does.

@lidavidm
Copy link
Member

Given how new this is, I would even be OK saying that the only supported memory implementation is the Java 21+ one to begin with and we can see if there is even demand to use memory-unsafe or memory-netty.

Finally, if there's need to reorganize which package files live in, maybe we could split some of that into another PR.

@jduo jduo force-pushed the java-modules branch 5 times, most recently from 46ee8fb to 90a8ee3 Compare November 28, 2023 23:46
jduo added 24 commits November 29, 2023 14:31
This test fails because we no longer use Netty's allocation manager factory since
the test has moved into memory-core. To be fixed.
There are issues looking up gRPC in the flight-integration0tests
tests if flight-core gets shaded with a newer version up to
at least 3.5.1.
Note that some configuration from the arrow root POM
is copied to the maven plugins module because it cannot
be inherited without creating a cyclic depdendency.
Reports a used dependency as unused only on Windows 2022 CI runs
It is not correctly getting skipped.
Newer versions of the checkstyle plugin do support module-info.java files
but require rewriting the checkstyle rules file.
Change the project to not use the module path for most compilation
since most compilation will target JDK8.
Not sure if this is necessary if we just use the plugin to compile module-info.java
Needed for the module to get added to the module-path when testing
Move tests outside of io.netty.buffer package because that causes a conflict
with Netty's modules. Change functions marked as protected that are used in
tests to be public to facilitate this refactoring.
2.16.1 in particular had faulty module-info files.
Do not add tests in org.apache.arrow.util because
that is an exported package for arrow-memory-core
and causes module conflicts
Having an implicit dependency from arrow-memory-core
does not put immutables on the module-path when
running tests and causes module issues.
Export public package for arrow-vector module.
Allow jackson to use arrow-vector pojo classes
reflectively for serialization.
Log what message was reported if the message didn't contain
the correct information.
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Nov 29, 2023
@jduo
Copy link
Member Author

jduo commented Dec 1, 2023

This PR has been split into several smaller PRs and issues.
See apache/arrow-java#151 for details.

@jduo jduo closed this Dec 1, 2023
Copy link

⚠️ GitHub issue #38915 has no components, please add labels for components.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Java] Build Arrow as JPMS modules
3 participants