Skip to content

Commit 5559930

Browse files
lidavidmkou
andauthored
apacheGH-546: Migrate Sphinx documentation (apache#553)
Fixes apache#546. --------- Co-authored-by: Sutou Kouhei <[email protected]>
1 parent 3dd0311 commit 5559930

32 files changed

+5126
-4
lines changed

.github/workflows/rc.yml

+43-4
Original file line numberDiff line numberDiff line change
@@ -421,8 +421,8 @@ jobs:
421421
- name: Prepare docs
422422
run: |
423423
mkdir -p docs
424-
cp -a target/site/apidocs docs/reference
425-
tar -cvzf docs.tar.gz docs
424+
cp -a target/site/apidocs reference
425+
tar -cvzf reference.tar.gz reference
426426
- name: Upload binaries
427427
uses: actions/upload-artifact@65c4c4a1ddee5b72f698fdd19549f0f0fb45cf08 # v4.6.0
428428
with:
@@ -431,8 +431,46 @@ jobs:
431431
- name: Upload docs
432432
uses: actions/upload-artifact@65c4c4a1ddee5b72f698fdd19549f0f0fb45cf08 # v4.6.0
433433
with:
434-
name: release-docs
435-
path: docs.tar.gz
434+
name: reference
435+
path: reference.tar.gz
436+
docs:
437+
name: Docs
438+
needs:
439+
- binaries
440+
runs-on: ubuntu-latest
441+
permissions:
442+
contents: read
443+
packages: write
444+
steps:
445+
- uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
446+
with:
447+
cache: 'pip'
448+
- name: Download source archive
449+
uses: actions/download-artifact@fa0a91b85d4f404e444e00e005971372dc801d16 # v4.1.8
450+
with:
451+
name: release-source
452+
- name: Download Javadocs
453+
uses: actions/download-artifact@fa0a91b85d4f404e444e00e005971372dc801d16 # v4.1.8
454+
with:
455+
name: reference
456+
- name: Extract source archive
457+
run: |
458+
tar -xf apache-arrow-java-*.tar.gz --strip-components=1
459+
- name: Build
460+
run: |
461+
cd docs
462+
python -m venv venv
463+
source venv/bin/activate
464+
pip install -r requirements.txt
465+
make html
466+
tar -xf ../reference.tar.gz -C build/html
467+
- name: Compress into single artifact to keep directory structure
468+
run: tar -cvzf html.tar.gz -C docs/build html
469+
- name: Upload artifacts
470+
uses: actions/upload-artifact@65c4c4a1ddee5b72f698fdd19549f0f0fb45cf08 # v4.6.0
471+
with:
472+
name: release-html
473+
path: html.tar.gz
436474
verify:
437475
name: Verify
438476
needs:
@@ -473,6 +511,7 @@ jobs:
473511
name: Upload
474512
if: github.ref_type == 'tag'
475513
needs:
514+
- docs
476515
- verify
477516
runs-on: ubuntu-latest
478517
permissions:

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
/dev/release/apache-rat-0.16.1.jar
2121
/dev/release/filtered_rat.txt
2222
/dev/release/rat.xml
23+
/docs/build/
2324
CMakeCache.txt
2425
CMakeFiles/
2526
Makefile

dev/release/rat_exclude_files.txt

+1
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,4 @@
1717

1818
.gitmodules
1919
dataset/src/test/resources/data/student.csv
20+
docs/Makefile

docs/Makefile

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line, and also
5+
# from the environment for the first two.
6+
SPHINXOPTS ?= -W
7+
SPHINXBUILD ?= sphinx-build
8+
SOURCEDIR = source
9+
BUILDDIR = build
10+
11+
# Put it first so that "make" without argument is like "make help".
12+
help:
13+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14+
15+
.PHONY: help Makefile
16+
17+
# Catch-all target: route all unknown targets to Sphinx using the new
18+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19+
%: Makefile
20+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

docs/README.md

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
<!---
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# Documentation
21+
22+
Build with Sphinx.
23+
24+
```bash
25+
cd docs
26+
pip install -r requirements.txt
27+
make html
28+
```

docs/requirements.txt

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
furo==2024.8.6
19+
myst-parser==4.0.0
20+
Sphinx==8.1.3
21+
sphinx-autobuild==2024.10.3
22+
sphinx-basic-ng==1.0.0b2
23+
sphinxcontrib-applehelp==2.0.0
24+
sphinxcontrib-devhelp==2.0.0
25+
sphinxcontrib-htmlhelp==2.1.0
26+
sphinxcontrib-jsmath==1.0.1
27+
sphinxcontrib-qthelp==2.0.0
28+
sphinxcontrib-serializinghtml==2.0.0

docs/source/_static/.gitignore

Whitespace-only changes.

docs/source/algorithm.rst

+92
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
.. Licensed to the Apache Software Foundation (ASF) under one
2+
.. or more contributor license agreements. See the NOTICE file
3+
.. distributed with this work for additional information
4+
.. regarding copyright ownership. The ASF licenses this file
5+
.. to you under the Apache License, Version 2.0 (the
6+
.. "License"); you may not use this file except in compliance
7+
.. with the License. You may obtain a copy of the License at
8+
9+
.. http://www.apache.org/licenses/LICENSE-2.0
10+
11+
.. Unless required by applicable law or agreed to in writing,
12+
.. software distributed under the License is distributed on an
13+
.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
.. KIND, either express or implied. See the License for the
15+
.. specific language governing permissions and limitations
16+
.. under the License.
17+
18+
Java Algorithms
19+
===============
20+
21+
Arrow's Java library provides algorithms for some commonly-used
22+
functionalities. The algorithms are provided in the ``org.apache.arrow.algorithm``
23+
package of the ``algorithm`` module.
24+
25+
Comparing Vector Elements
26+
-------------------------
27+
28+
Comparing vector elements is the basic for many algorithms. Vector
29+
elements can be compared in one of the two ways:
30+
31+
1. **Equality comparison**: there are two possible results for this type of comparisons: ``equal`` and ``unequal``.
32+
Currently, this type of comparison is supported through the ``org.apache.arrow.vector.compare.VectorValueEqualizer``
33+
interface.
34+
35+
2. **Ordering comparison**: there are three possible results for this type of comparisons: ``less than``, ``equal to``
36+
and ``greater than``. This comparison is supported by the abstract class ``org.apache.arrow.algorithm.sort.VectorValueComparator``.
37+
38+
We provide default implementations to compare vector elements. However, users can also define ways
39+
for customized comparisons.
40+
41+
Vector Element Search
42+
---------------------
43+
44+
A search algorithm tries to find a particular value in a vector. When successful, a vector index is
45+
returned; otherwise, a ``-1`` is returned. The following search algorithms are provided:
46+
47+
1. **Linear search**: this algorithm simply traverses the vector from the beginning, until a match is
48+
found, or the end of the vector is reached. So it takes ``O(n)`` time, where ``n`` is the number of elements
49+
in the vector. This algorithm is implemented in ``org.apache.arrow.algorithm.search.VectorSearcher#linearSearch``.
50+
51+
2. **Binary search**: this represents a more efficient search algorithm, as it runs in ``O(log(n))`` time.
52+
However, it is only applicable to sorted vectors. To get a sorted vector,
53+
one can use one of our sorting algorithms, which will be discussed in the next section. This algorithm
54+
is implemented in ``org.apache.arrow.algorithm.search.VectorSearcher#binarySearch``.
55+
56+
3. **Parallel search**: when the vector is large, it takes a long time to traverse the elements to search
57+
for a value. To make this process faster, one can split the vector into multiple partitions, and perform the
58+
search for each partition in parallel. This is supported by ``org.apache.arrow.algorithm.search.ParallelSearcher``.
59+
60+
4. **Range search**: for many scenarios, there can be multiple matching values in the vector.
61+
If the vector is sorted, the matching values reside in a contiguous region in the vector. The
62+
range search algorithm tries to find the upper/lower bound of the region in ``O(log(n))`` time.
63+
An implementation is provided in ``org.apache.arrow.algorithm.search.VectorRangeSearcher``.
64+
65+
Vector Sorting
66+
--------------
67+
68+
Given a vector, a sorting algorithm turns it into a sorted one. The sorting criteria must
69+
be specified by some ordering comparison operation. The sorting algorithms can be
70+
classified into the following categories:
71+
72+
1. **In-place sorter**: an in-place sorter performs the sorting by manipulating the original
73+
vector, without creating any new vector. So it just returns the original vector after the sorting operations.
74+
Currently, we have ``org.apache.arrow.algorithm.sort.FixedWidthInPlaceVectorSorter`` for in-place
75+
sorting in ``O(nlog(n))`` time. As the name suggests, it only supports fixed width vectors.
76+
77+
2. **Out-of-place sorter**: an out-of-place sorter does not mutate the original vector. Instead,
78+
it copies vector elements to a new vector in sorted order, and returns the new vector.
79+
We have ``org.apache.arrow.algorithm.sort.FixedWidthInPlaceVectorSorter.FixedWidthOutOfPlaceVectorSorter``
80+
and ``org.apache.arrow.algorithm.sort.FixedWidthInPlaceVectorSorter.VariableWidthOutOfPlaceVectorSorter``
81+
for fixed width and variable width vectors, respectively. Both algorithms run in ``O(nlog(n))`` time.
82+
83+
3. **Index sorter**: this sorter does not actually sort the vector. Instead, it returns an integer
84+
vector, which correspond to indices of vector elements in sorted order. With the index vector, one can
85+
easily construct a sorted vector. In addition, some other tasks can be easily achieved, like finding the ``k`` th
86+
smallest value in the vector. Index sorting is supported by ``org.apache.arrow.algorithm.sort.IndexSorter``,
87+
which runs in ``O(nlog(n))`` time. It is applicable to vectors of any type.
88+
89+
Other Algorithms
90+
----------------
91+
92+
Other algorithms include vector deduplication, dictionary encoding, etc., in the ``algorithm`` module.

0 commit comments

Comments
 (0)