Skip to content

Commit 5598d2f

Browse files
trxcllntwesm
authored andcommitted
ARROW-2828: [JS] Refactor Data, Vectors, Visitor, Typings, build, tests, dependencies
It's the big one; The Great ArrowJS Refactor of 2018. Thanks for bearing with me through yet another huge PR. [Check out this sweet gif](https://user-images.githubusercontent.com/178183/50551046-19a94d00-0c30-11e9-80ed-74b9290e8c49.gif) of all the new features in action. With streaming getting to a good place, we've already started working on demos/integrations with other projects like [uber/deck.gl](https://github.com/Pessimistress/deck.gl/tree/a5940e20cb1659a44cba7839082b0803a997a12f/test/apps/arrow) 🎉 ### The JIRAs In addition to everything I detail below, this PR closes the following JIRAs: * [ARROW-2828](https://issues.apache.org/jira/browse/ARROW-2828): Refactor Vector Data classes * [ARROW-2839](https://issues.apache.org/jira/browse/ARROW-2839): Support whatwg/streams in IPC reader/writer * [ARROW-2235](https://issues.apache.org/jira/browse/ARROW-2235): Add tests for IPC messages split across multiple buffers * [ARROW-3337](https://issues.apache.org/jira/browse/ARROW-3337): IPC writer doesn't serialize the dictionary of nested Vectors * [ARROW-3689](https://issues.apache.org/jira/browse/ARROW-3689): Upgrade to TS 3.1 * [ARROW-3560](https://issues.apache.org/jira/browse/ARROW-3560): Remove @std/esm * [ARROW-3561](https://issues.apache.org/jira/browse/ARROW-3561): Update ts-jest * [ARROW-2778](https://issues.apache.org/jira/browse/ARROW-2778): Add Utf8Vector.from * [ARROW-2766](https://issues.apache.org/jira/browse/ARROW-2766): Add ability to construct a Table from a list of Arrays/TypedArrays ### The stats The gulp scripts have been updated to parallelize as much as possible. These are the numbers from my Intel Core i7-8700K CPU @ 3.70GHz × 12 running Ubuntu 18.04 and node v11.6.0: ```sh $ time npm run build [22:11:04] Finished 'build' after 39 s real 0m40.341s user 4m55.428s sys 0m5.559s ``` ```sh $ npm run test:coverage =============================== Coverage summary =============================== Statements : 90.45% ( 4321/4777 ) Branches : 76.7% ( 1570/2047 ) Functions : 84.62% ( 1106/1307 ) Lines : 91.5% ( 3777/4128 ) ================================================================================ Test Suites: 21 passed, 21 total Tests: 5644 passed, 5644 total Snapshots: 0 total Time: 16.023s ``` ### The fixes * `Vector#indexOf(value)` works for all DataTypes * `Vector#set(i, value)` now works for all DataTypes * Reading from node streams is now fully zero-copy * The IPC writers now serialize dictionaries of nested Vectors correctly (ARROW-3337) * DictionaryBatches marked as `isDelta` now correctly updates the dictionaries for all Vectors that point to that dictionary, even if they were created before the delta batch arrived * A few `arrow2csv` fixes: * Ignore `stdin` if it's a TTY * Now read all the Arrow formats from `stdin` * Always show the `help` text when we don't understand the input * Proper backpressure support to play nicely with other Unix utilities like `head` and `less` * [Fixes an unfiled bug](trxcllnt@070ec98) we encountered last week where JS would throw an error creating RowProxies for a Table or Struct with duplicate column names ### The upgrades * New zero-copy Message/RecordBatchReaders! * [`RecordBatchReader.from()`](https://github.com/trxcllnt/arrow/blob/b58e29bc83675583238bbb94fba2f3ebf8f1e4aa/js/test/unit/ipc/reader/from-inference-tests.ts#L37) will peek at the underlying bytes, and return the correct implementation based on whether the data is an Arrow File, Stream, or JSON * [`RecordBatchFileReader`](https://github.com/trxcllnt/arrow/blob/b58e29bc83675583238bbb94fba2f3ebf8f1e4aa/js/test/unit/ipc/reader/file-reader-tests.ts#L74) now supports random-access seek, enabling more efficient web-worker/multi-process workflows * [`RecordBatchStreamReader`](https://github.com/trxcllnt/arrow/blob/b58e29bc83675583238bbb94fba2f3ebf8f1e4aa/js/test/unit/ipc/reader/streams-dom-tests.ts#L119) can now read multiple tables from the same underlying socket * `MessageReader` now [guarantees/enforces](https://github.com/trxcllnt/arrow/blob/b58e29bc83675583238bbb94fba2f3ebf8f1e4aa/js/src/ipc/message.ts#L126) message body byte alignment (this one even surfaced bugs in [node core](nodejs/node#24817) and the [DOM streams polyfill](MattiasBuelens/web-streams-polyfill#3)) * New RecordBatchWriters * Adds RecordBatchJSONWriter, RecordBatchFileWriter and RecordBatchStreamWriter * Adds static `RecordBatchWriter.writeAll()` method to easily write a Table or stream of RecordBatches * Both sync and async flushes based on the WritableSink * Full integration with platform I/O primitives * We can still synchronously read JSON, Buffers, `Iterable<Buffer>`, or `AsyncIterable<Buffer>` * In node, we can now read from any [`ReadableStream`](https://nodejs.org/docs/latest/api/stream.html#stream_class_stream_readable), [`fs.FileHandle`](https://nodejs.org/docs/latest/api/fs.html#fs_class_filehandle) * In the browser, we can read from any [`ReadableStream` or `ReadableByteStream`](https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream), or the [`Response`](https://developer.mozilla.org/en-US/docs/Web/API/Response) returned from the `fetch()` API. (Wrapping the [FileReader](https://developer.mozilla.org/en-US/docs/Web/API/FileReader) is still todo) * We also [accept Promises](https://github.com/Pessimistress/deck.gl/blob/a5940e20cb1659a44cba7839082b0803a997a12f/test/apps/arrow/loader.js#L20) of any of the above * New convenience methods for integrating with node or DOM streams * [`throughNode()`](https://github.com/trxcllnt/arrow/blob/b58e29bc83675583238bbb94fba2f3ebf8f1e4aa/js/test/unit/ipc/reader/streams-node-tests.ts#L54)/[`throughDOM()`](https://github.com/trxcllnt/arrow/blob/b58e29bc83675583238bbb94fba2f3ebf8f1e4aa/js/test/unit/ipc/reader/streams-dom-tests.ts#L50) * [`toReadableNodeStream()`](https://github.com/trxcllnt/arrow/blob/b58e29bc83675583238bbb94fba2f3ebf8f1e4aa/js/test/unit/ipc/reader/streams-node-tests.ts#L69)/[`toReadableDOMStream()`](https://github.com/trxcllnt/arrow/blob/b58e29bc83675583238bbb94fba2f3ebf8f1e4aa/js/test/unit/ipc/reader/streams-dom-tests.ts#L65) * [`pipe()`](https://github.com/trxcllnt/arrow/blob/b58e29bc83675583238bbb94fba2f3ebf8f1e4aa/js/test/unit/ipc/writer/streams-node-tests.ts#L91)/[`pipeTo()`/`pipeThrough()`](https://github.com/trxcllnt/arrow/blob/b58e29bc83675583238bbb94fba2f3ebf8f1e4aa/js/test/unit/ipc/writer/streams-dom-tests.ts#L92) * Generic type parameters inherited from `DataType` now flow recursively ```js const table = Table.from<{ str: Utf8, i32: Int32, bools: List<Bool> }>(data); table.get(0); // will be of type { str: string, i32: number, bools: BoolVector } ``` * New simplified [`Data` class](https://github.com/trxcllnt/arrow/blob/b58e29bc83675583238bbb94fba2f3ebf8f1e4aa/js/src/data.ts) * New simplified, faster `Visitor` class with support for optional, more narrow [`visitT` implementations](https://github.com/trxcllnt/arrow/blob/b58e29bc83675583238bbb94fba2f3ebf8f1e4aa/js/src/visitor.ts#L181) * New specialized Visitor implementations to enable runtime reflection (e.g. dynamically lookup the Vector constructor for a given DataType) * New abstract `Chunked` base class for the applicative (concat) operation * public `chunkedInst.chunks` field is the list of inner chunks * New `Column` class extends `Chunked`, combines `Field` with the chunks (provides access to the field `name` from the Schema) * `RecordBatch#concat(...batchesOrTables)` now returns a Table * Table now extends `Chunked`, so it inherits: * `Table#slice(from, to)` * `Table#concat(...batchesOrTables)` * `Table#getChildAt(i)` exists, alias of `getColumnAt(i)` * `Table#getColumn[At]()` returns a Column ### The breaking changes * All the old IPC functions are gone, but the new APIs will live for much longer * `Table#batches` is now `Table#chunks`, which it inherits from `Chunked` (maybe controversial, open to aliasing) * `Table#batchesUnion` is now just... the Table instance itself (also maybe controversial, open to aliasing) * `DataType#TType` is now `DataType#typeId` -- it should have always been this, was a typo. Easy to alias if necessary. * The complicated View classes are now gone, logic centralized as specialized [`Visitors`](https://github.com/trxcllnt/arrow/tree/b58e29bc83675583238bbb94fba2f3ebf8f1e4aa/js/src/visitor) ### The tests * **Tests no longer rely on any C++ or Java generated integration files** * Integration tests have been moved into `bin/integration.js`, and they finish much quicker * The tsconfig files have been tweaked to speed up test run time and improve the async debugging experience * A streaming `RecordBatchJSONWriter` has been implemented so we can easily debug and validate written output * The JSON results are also tested against the corresponding binary representation, similar to the integration tests * A [suite of test-data helpers](https://github.com/trxcllnt/arrow/blob/d9970bb9a6a9d80bbe07b321dc6389bccf1b0835/js/test/generate-test-data.ts) have been added to auto-generate data for validation at runtime * They produce the underlying Arrow VectorData buffers, as well as the expected plain-JS-value representation [for verification](https://github.com/trxcllnt/arrow/blob/d9970bb9a6a9d80bbe07b321dc6389bccf1b0835/js/test/unit/generated-data-tests.ts#L23) * This allows us to test all possible type configuration combinations, e.g. [all types Dictionary-encode](https://github.com/trxcllnt/arrow/blob/d9970bb9a6a9d80bbe07b321dc6389bccf1b0835/js/test/data/tables.ts#L61), all types serialize when nested, etc. * A [suite of IO test helpers](https://github.com/trxcllnt/arrow/blob/d9970bb9a6a9d80bbe07b321dc6389bccf1b0835/js/test/unit/ipc/helpers.ts#L36) has been added * We use [`memfs`](https://www.npmjs.com/package/memfs) to mock the file system, which contributes to test performance improvements * This enables us to [easily test](https://github.com/trxcllnt/arrow/blob/d9970bb9a6a9d80bbe07b321dc6389bccf1b0835/js/test/unit/ipc/reader/file-reader-tests.ts#L38) all the flavors of io primitives across node and browser environments * A vscode debugging launch configuration has been added to ease the process of contributing more tests (and because I've been asked for mine so often) ### The build * Faster * Node 11+ (needs `Symbol.asyncIterator` enabled) * Closure-compiler upgrades and build enhancements mean we can auto-generate the externs file during compilation, rather than maintaining it by hand ### Misc * Added `arrow2csv` to `js/bin/arrow2csv`, so anybody with the JS project dependencies installed can easily view a CSV-ish thing (`cat foo.arrow | js/bin/arrow2csv.js`) ### Todos * Docs/Recipes/Examples * Highlight/write more tools (like `arrow2csv`) * Flesh out the RecordBatchWriters a bit more * Gather feedback on the new RecordBatchReader APIs Author: ptaylor <[email protected]> Author: Paul Taylor <[email protected]> Closes apache#3290 from trxcllnt/js-data-refactor and squashes the following commits: 2ef150f <ptaylor> bind getByteWidth to the vector type 9acfaa3 <ptaylor> handle the case where collapsed Uint8Arrays fully overlap 6a97ee0 <ptaylor> perf: defer creating rowProxy on nested types, use Array instead of Object for creating Data instances 2cad760 <ptaylor> pipe directly to stdout to ensure backpressure is preserved f006a26 <ptaylor> ensure schema and field always have a metadata map 8dc5d2c <ptaylor> fix Float64 Array typings 162c7d8 <ptaylor> fix arrow2csv left-pad measurement for new bignum/decimal output 64dc015 <ptaylor> teach closure about Symbol.toPrimitive ca0db9e <ptaylor> fix lint ec12cdd <ptaylor> add a small BigNum mixin to make working with Int64 and Decimal values a bit easier 62578b9 <ptaylor> fix bug where valueToString function would return undefined (JSON.striingify(undefined) === undefined) 4b58bde <ptaylor> fix visitor method overload type signatures d165413 <ptaylor> don't print comma that includes system paths 708f1b4 <ptaylor> move stride to data, fix chunked slicing, remove intermediate binding and getters in favor of direct property accesses 78ecc4c <ptaylor> use the textencoders from the global instead of Buffer for perf testing 47f0677 <ptaylor> perf: use a closure instead of binding 380dbc7 <ptaylor> add a single-chunk column type 6bcaad6 <ptaylor> fix lint f7d2b2e <ptaylor> add getters for the dictionary and indices of chunked dictionary vectors aaf42c8 <Paul Taylor> Consolidated JS data handling refactor
1 parent be663c1 commit 5598d2f

File tree

162 files changed

+17451
-12440
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

162 files changed

+17451
-12440
lines changed

.travis.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,7 @@ matrix:
230230
- if [ $ARROW_CI_INTEGRATION_AFFECTED != "1" ]; then exit; fi
231231
- $TRAVIS_BUILD_DIR/ci/travis_install_linux.sh
232232
- $TRAVIS_BUILD_DIR/ci/travis_install_clang_tools.sh
233-
- nvm install 10.1
233+
- nvm install 11.6
234234
- $TRAVIS_BUILD_DIR/ci/travis_before_script_js.sh
235235
- $TRAVIS_BUILD_DIR/ci/travis_before_script_cpp.sh
236236
script:
@@ -240,7 +240,7 @@ matrix:
240240
language: node_js
241241
os: linux
242242
node_js:
243-
- '10.1'
243+
- '11.6'
244244
before_script:
245245
- if [ $ARROW_CI_JS_AFFECTED != "1" ]; then exit; fi
246246
- $TRAVIS_BUILD_DIR/ci/travis_install_linux.sh

ci/travis_script_integration.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ pushd $ARROW_JS_DIR
3636

3737
# lint and compile JS source
3838
npm run lint
39-
npm run build
39+
npm run build -- -t apache-arrow
4040

4141
popd
4242

ci/travis_script_js.sh

+3-2
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,10 @@ source $TRAVIS_BUILD_DIR/ci/travis_env_common.sh
2323

2424
pushd $ARROW_JS_DIR
2525

26-
npm run lint
26+
npm run lint:ci
2727
npm run build
28-
# run the non-snapshot unit tests
2928
npm test
29+
npm run test:coverage
30+
bash <(curl -s https://codecov.io/bash) || echo "Codecov did not collect coverage reports"
3031

3132
popd

integration/integration_test.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1122,7 +1122,7 @@ def _run(self, exe_cmd, arrow_path=None, json_path=None,
11221122
if json_path is not None:
11231123
cmd.extend(['-j', json_path])
11241124

1125-
cmd.extend(['--mode', command, '-t', 'es5', '-m', 'umd'])
1125+
cmd.extend(['--mode', command])
11261126

11271127
if self.debug:
11281128
print(' '.join(cmd))

js/.gitignore

+6-2
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,8 @@ npm-debug.log*
2323
yarn-debug.log*
2424
yarn-error.log*
2525

26-
.vscode
26+
.vscode/**
27+
!.vscode/launch.json
2728

2829
# Runtime data
2930
pids
@@ -78,10 +79,13 @@ yarn.lock
7879
.env
7980

8081
# compilation targets
82+
doc
8183
dist
8284
targets
8385

8486
# test data files
85-
test/data/
87+
test/data/**/*.json
88+
test/data/**/*.arrow
89+
8690
# jest snapshots (too big)
8791
test/__snapshots__/

js/.vscode/launch.json

+169
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
{
2+
// Use IntelliSense to learn about possible attributes.
3+
// Hover to view descriptions of existing attributes.
4+
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
5+
"version": "0.2.0",
6+
"configurations": [
7+
{
8+
"type": "node",
9+
"request": "launch",
10+
"name": "Debug Gulp Build",
11+
"program": "${workspaceFolder}/node_modules/gulp/bin/gulp.js",
12+
"args": [
13+
"build",
14+
// Specify we want to debug the "src" target, which won't clean or build -- essentially a "dry-run" of the gulp build
15+
"--target", "src"
16+
]
17+
},
18+
{
19+
"type": "node",
20+
"request": "launch",
21+
"name": "Debug Unit Tests",
22+
"cwd": "${workspaceRoot}",
23+
"program": "${workspaceFolder}/node_modules/.bin/jest",
24+
"skipFiles": [
25+
"<node_internals>/**/*.js",
26+
"${workspaceFolder}/node_modules/**/*.js"
27+
],
28+
"env": {
29+
"NODE_NO_WARNINGS": "1",
30+
"READABLE_STREAM": "disable",
31+
"TEST_DOM_STREAMS": "true",
32+
"TEST_NODE_STREAMS": "true",
33+
// Modify these environment variables to run tests on a specific compilation target + module format combo
34+
"TEST_TS_SOURCE": "true",
35+
// "TEST_TS_SOURCE": "false",
36+
// "TEST_TARGET": "es5",
37+
// "TEST_MODULE": "umd"
38+
},
39+
"args": [
40+
// "-i",
41+
"test/unit/",
42+
43+
// Uncomment any of these to run individual test suites
44+
// "test/unit/int-tests.ts",
45+
// "test/unit/table-tests.ts",
46+
// "test/unit/generated-data-tests.ts",
47+
48+
// "test/unit/vector/vector-tests.ts",
49+
// "test/unit/vector/bool-vector-tests.ts",
50+
// "test/unit/vector/date-vector-tests.ts",
51+
// "test/unit/vector/float16-vector-tests.ts",
52+
// "test/unit/vector/numeric-vector-tests.ts",
53+
54+
// "test/unit/visitor-tests.ts",
55+
56+
// "test/unit/ipc/message-reader-tests.ts",
57+
// "test/unit/ipc/reader/file-reader-tests.ts",
58+
// "test/unit/ipc/reader/json-reader-tests.ts",
59+
// "test/unit/ipc/reader/from-inference-tests.ts",
60+
// "test/unit/ipc/reader/stream-reader-tests.ts",
61+
// "test/unit/ipc/reader/streams-dom-tests.ts",
62+
// "test/unit/ipc/reader/streams-node-tests.ts",
63+
// "test/unit/ipc/writer/file-writer-tests.ts",
64+
// "test/unit/ipc/writer/json-writer-tests.ts",
65+
// "test/unit/ipc/writer/stream-writer-tests.ts",
66+
// "test/unit/ipc/writer/streams-dom-tests.ts",
67+
// "test/unit/ipc/writer/streams-node-tests.ts",
68+
]
69+
},
70+
{
71+
"type": "node",
72+
"request": "launch",
73+
"name": "Debug Integration Tests",
74+
"cwd": "${workspaceRoot}",
75+
"program": "${workspaceFolder}/bin/integration.js",
76+
"skipFiles": [
77+
"<node_internals>/**/*.js",
78+
"${workspaceFolder}/node_modules/**/*.js"
79+
],
80+
"env": {
81+
"NODE_NO_WARNINGS": "1",
82+
"READABLE_STREAM": "disable"
83+
},
84+
"args": [
85+
"--mode", "VALIDATE"
86+
]
87+
},
88+
{
89+
"type": "node",
90+
"request": "launch",
91+
"name": "Debug bin/arrow2csv",
92+
"env": { "ARROW_JS_DEBUG": "src", "TS_NODE_CACHE": "false" },
93+
"runtimeArgs": ["-r", "ts-node/register"],
94+
"console": "integratedTerminal",
95+
"skipFiles": [
96+
"<node_internals>/**/*.js",
97+
"${workspaceFolder}/node_modules/**/*.js"
98+
],
99+
"args": [
100+
"${workspaceFolder}/src/bin/arrow2csv.ts",
101+
"-f", "./test/data/cpp/stream/simple.arrow"
102+
]
103+
},
104+
{
105+
"type": "node",
106+
"request": "launch",
107+
"name": "Debug bin/file-to-stream",
108+
"env": { "ARROW_JS_DEBUG": "src", "TS_NODE_CACHE": "false" },
109+
"runtimeArgs": ["-r", "ts-node/register"],
110+
"skipFiles": [
111+
"<node_internals>/**/*.js",
112+
"${workspaceFolder}/node_modules/**/*.js"
113+
],
114+
"args": [
115+
"${workspaceFolder}/bin/file-to-stream.js",
116+
"./test/data/cpp/file/struct_example.arrow",
117+
"./struct_example-stream-out.arrow",
118+
]
119+
},
120+
{
121+
"type": "node",
122+
"request": "launch",
123+
"name": "Debug bin/stream-to-file",
124+
"env": { "ARROW_JS_DEBUG": "src", "TS_NODE_CACHE": "false" },
125+
"runtimeArgs": ["-r", "ts-node/register"],
126+
"skipFiles": [
127+
"<node_internals>/**/*.js",
128+
"${workspaceFolder}/node_modules/**/*.js"
129+
],
130+
"args": [
131+
"${workspaceFolder}/bin/stream-to-file.js",
132+
"./test/data/cpp/stream/struct_example.arrow",
133+
"./struct_example-file-out.arrow",
134+
]
135+
},
136+
{
137+
"type": "node",
138+
"request": "launch",
139+
"name": "Debug bin/json-to-arrow",
140+
"env": { "ARROW_JS_DEBUG": "src", "TS_NODE_CACHE": "false" },
141+
"runtimeArgs": ["-r", "ts-node/register"],
142+
"skipFiles": [
143+
"<node_internals>/**/*.js",
144+
"${workspaceFolder}/node_modules/**/*.js"
145+
],
146+
"args": [
147+
"${workspaceFolder}/bin/json-to-arrow.js",
148+
"-j", "./test/data/json/struct_example.json",
149+
"-a", "./struct_example-stream-out.arrow",
150+
"-f", "stream"
151+
]
152+
},
153+
{
154+
"type": "node",
155+
"request": "launch",
156+
"name": "Debug bin/print-buffer-alignment",
157+
"env": { "ARROW_JS_DEBUG": "src", "TS_NODE_CACHE": "false" },
158+
"runtimeArgs": ["-r", "ts-node/register"],
159+
"skipFiles": [
160+
"<node_internals>/**/*.js",
161+
"${workspaceFolder}/node_modules/**/*.js"
162+
],
163+
"args": [
164+
"${workspaceFolder}/bin/print-buffer-alignment.js",
165+
"./test/data/cpp/stream/struct_example.arrow"
166+
]
167+
}
168+
]
169+
}

js/README.md

+26-17
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ Check out our [API documentation][7] to learn more about how to use Apache Arrow
4949

5050
### Get a table from an Arrow file on disk (in IPC format)
5151

52-
```es6
52+
```js
5353
import { readFileSync } from 'fs';
5454
import { Table } from 'apache-arrow';
5555

@@ -70,7 +70,7 @@ null, null, null
7070

7171
### Create a Table when the Arrow file is split across buffers
7272

73-
```es6
73+
```js
7474
import { readFileSync } from 'fs';
7575
import { Table } from 'apache-arrow';
7676

@@ -93,33 +93,42 @@ console.log(table.toString());
9393

9494
### Create a Table from JavaScript arrays
9595

96-
```es6
96+
```js
97+
import {
98+
Table,
99+
FloatVector,
100+
DateVector
101+
} from 'apache-arrow';
102+
97103
const LENGTH = 2000;
98-
const rainAmounts = Float32Array.from({length: LENGTH}, () => Number((Math.random() * 20).toFixed(1)));
99-
const rainDates = Array.from({length: LENGTH}, (_, i) => new Date(Date.now() - 1000 * 60 * 60 * 24 * i));
100104

101-
const rainfall = arrow.Table.fromVectors(
105+
const rainAmounts = Float32Array.from(
106+
{ length: LENGTH },
107+
() => Number((Math.random() * 20).toFixed(1)));
108+
109+
const rainDates = Array.from(
110+
{ length: LENGTH },
111+
(_, i) => new Date(Date.now() - 1000 * 60 * 60 * 24 * i));
112+
113+
const rainfall = Table.fromVectors(
102114
[FloatVector.from(rainAmounts), DateVector.from(rainDates)],
103115
['precipitation', 'date']
104116
);
105117
```
106118

107119
### Load data with `fetch`
108120

109-
```es6
121+
```js
110122
import { Table } from "apache-arrow";
111123

112-
fetch(require("simple.arrow")).then(response => {
113-
response.arrayBuffer().then(buffer => {
114-
const table = Table.from(new Uint8Array(buffer));
115-
console.log(table.toString());
116-
});
117-
});
124+
const table = await Table.from(fetch(("/simple.arrow")));
125+
console.log(table.toString());
126+
118127
```
119128

120129
### Columns look like JS Arrays
121130

122-
```es6
131+
```js
123132
import { readFileSync } from 'fs';
124133
import { Table } from 'apache-arrow';
125134

@@ -131,7 +140,7 @@ const table = Table.from([
131140
const column = table.getColumn('origin_lat');
132141

133142
// Copy the data into a TypedArray
134-
const typed = column.slice();
143+
const typed = column.toArray();
135144
assert(typed instanceof Float32Array);
136145

137146
for (let i = -1, n = column.length; ++i < n;) {
@@ -141,7 +150,7 @@ for (let i = -1, n = column.length; ++i < n;) {
141150

142151
### Usage with MapD Core
143152

144-
```es6
153+
```js
145154
import MapD from 'rxjs-mapd';
146155
import { Table } from 'apache-arrow';
147156

@@ -164,7 +173,7 @@ MapD.open(host, port)
164173
)
165174
.map(([schema, records]) =>
166175
// Create Arrow Table from results
167-
Table.from(schema, records))
176+
Table.from([schema, records]))
168177
.map((table) =>
169178
// Stringify the table to CSV with row numbers
170179
table.toString({ index: true }))

js/bin/arrow2csv.js

+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
#! /usr/bin/env node
2+
3+
// Licensed to the Apache Software Foundation (ASF) under one
4+
// or more contributor license agreements. See the NOTICE file
5+
// distributed with this work for additional information
6+
// regarding copyright ownership. The ASF licenses this file
7+
// to you under the Apache License, Version 2.0 (the
8+
// "License"); you may not use this file except in compliance
9+
// with the License. You may obtain a copy of the License at
10+
//
11+
// http://www.apache.org/licenses/LICENSE-2.0
12+
//
13+
// Unless required by applicable law or agreed to in writing,
14+
// software distributed under the License is distributed on an
15+
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
16+
// KIND, either express or implied. See the License for the
17+
// specific language governing permissions and limitations
18+
// under the License.
19+
20+
const Path = require(`path`);
21+
const here = Path.resolve(__dirname, '../');
22+
const tsnode = require.resolve(`ts-node/register`);
23+
const arrow2csv = Path.join(here, `src/bin/arrow2csv.ts`);
24+
25+
require('child_process').spawn(`node`, [
26+
`-r`, tsnode, arrow2csv, ...process.argv.slice(2)
27+
], { cwd: here, env: process.env, stdio: `inherit` });

0 commit comments

Comments
 (0)