Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow field name declaration in ROW literal #25261

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

dain
Copy link
Member

@dain dain commented Mar 9, 2025

Description

Add support for row(a 1, b 2) instead of the much more complex cast(row(1, 2) as row(a integer, b integer)). The old syntax is particularly annoying because you have to repeat the types for the fields.

Release notes

(X) Release notes are required, with the following suggested text:

## General
* Allow field name declaration in ROW literal.  For example, `row(a 1, b 2)` is now legal. ({issue}`issuenumber`)

@dain dain requested a review from martint March 9, 2025 20:41
@cla-bot cla-bot bot added the cla-signed label Mar 9, 2025
@dain dain force-pushed the row-with-field-names-literal branch 3 times, most recently from 95e89ed to 6cbfff3 Compare March 10, 2025 05:09
@martint
Copy link
Member

martint commented Mar 10, 2025

I like this, but I would use the following syntax to make it similar to how the implicit row in the SELECT clause is constructed:

row(<expr> (AS? <identifier>)?, ...)

That way, the only syntactic difference with the SELECT clause, is that the fields are wrapped in row(...)

@martint
Copy link
Member

martint commented Mar 10, 2025

As another example, the syntax I suggested above makes these two equivalent:

SELECT 1 AS x, 2 AS y
VALUES row(1 AS x, 2 AS y)

@dain
Copy link
Member Author

dain commented Mar 11, 2025

As another example, the syntax I suggested above makes these two equivalent:

SELECT 1 AS x, 2 AS y
VALUES row(1 AS x, 2 AS y)

I'm working on the syntax change, but for VALUES the names of the relation aren't sourced from the rows. Today you can name row fields with a cast in values, but they don't effect the values column aliases:

values cast(row(1, 2) as row(foo bigint, bar bigint));
 _col0 | _col1 
-------+-------
     1 |     2 
(1 row)

I'm guessing that can be improved, but I have no idea how... work for follow up

@dain dain force-pushed the row-with-field-names-literal branch 2 times, most recently from 129dee6 to 85c7a67 Compare March 11, 2025 03:01
@dain dain force-pushed the row-with-field-names-literal branch 4 times, most recently from e9d221a to fb992f3 Compare March 13, 2025 03:56
dain added 3 commits March 12, 2025 20:58
Add support for `row(a 1, b 2)` instead of the much more complex
`cast(row(1, 2) as row(a integer, b integer))`.
@dain dain force-pushed the row-with-field-names-literal branch from fb992f3 to 652c533 Compare March 13, 2025 03:58
@wendigo
Copy link
Contributor

wendigo commented Mar 13, 2025

Is this expected:

trino> select {'a': row('b' as b, 'c' as c, 1::varchar as d)};
        _col0
---------------------
 {a={B=b, C=c, D=1}}
(1 row)

field names are uppercased

@martint
Copy link
Member

martint commented Mar 13, 2025

Yes, that's standard SQL identifier canonicalization behavior.

@martint
Copy link
Member

martint commented Mar 13, 2025

Syntax looks good.

@wendigo
Copy link
Contributor

wendigo commented Mar 13, 2025

@martint this is unfortunate if you want to use this new syntax to produce a JSON like:

trino> select row(orderkey as orderkey, partkey as partkey, {'returnflag': returnflag} as flag)::json from tpch.sf1.lineitem limit 10;
                           _col0
-----------------------------------------------------------
 {"ORDERKEY":1,"PARTKEY":155190,"FLAG":{"returnflag":"N"}}
 {"ORDERKEY":1,"PARTKEY":67310,"FLAG":{"returnflag":"N"}}
 {"ORDERKEY":1,"PARTKEY":63700,"FLAG":{"returnflag":"N"}}
 {"ORDERKEY":1,"PARTKEY":2132,"FLAG":{"returnflag":"N"}}
 {"ORDERKEY":1,"PARTKEY":24027,"FLAG":{"returnflag":"N"}}
 {"ORDERKEY":1,"PARTKEY":15635,"FLAG":{"returnflag":"N"}}
 {"ORDERKEY":2,"PARTKEY":106170,"FLAG":{"returnflag":"N"}}
 {"ORDERKEY":3,"PARTKEY":4297,"FLAG":{"returnflag":"R"}}
 {"ORDERKEY":3,"PARTKEY":19036,"FLAG":{"returnflag":"R"}}
 {"ORDERKEY":3,"PARTKEY":128449,"FLAG":{"returnflag":"A"}}
(10 rows)

@martint
Copy link
Member

martint commented Mar 13, 2025

What if you quote the identifiers?

@wendigo
Copy link
Contributor

wendigo commented Mar 13, 2025

@martint

trino> select row(orderkey as "orderkey", partkey as "key", {'returnflag': returnflag} as "flag")::json from tpch.sf1.lineitem limit 10;
                            _col0
-------------------------------------------------------------
 {"orderkey":2400001,"key":132304,"flag":{"returnflag":"N"}}
 {"orderkey":2400001,"key":24513,"flag":{"returnflag":"R"}}
 {"orderkey":2400001,"key":175232,"flag":{"returnflag":"N"}}
 {"orderkey":2400001,"key":119658,"flag":{"returnflag":"A"}}
 {"orderkey":2400001,"key":89532,"flag":{"returnflag":"A"}}
 {"orderkey":2400002,"key":188783,"flag":{"returnflag":"R"}}
 {"orderkey":2400002,"key":67505,"flag":{"returnflag":"R"}}
 {"orderkey":2400002,"key":142916,"flag":{"returnflag":"R"}}
 {"orderkey":2400002,"key":182905,"flag":{"returnflag":"A"}}
 {"orderkey":2400002,"key":80484,"flag":{"returnflag":"R"}}
(10 rows)

Query 20250313_170353_00044_angdu, FINISHED, 3 nodes
Splits: 39 total, 39 done (100.00%)
0.14 [1.27M rows, 0B] [8.92M rows/s, 0B/s]

that works but not super obvious (note " vs ' - ' doesn't work)

@wendigo
Copy link
Contributor

wendigo commented Mar 13, 2025

trino> select row(orderkey as "orderkey", partkey as "key", {'returnflag': returnflag} as 'flag')::json from tpch.sf1.lineitem limit 10;
Query 20250313_170521_00047_angdu failed: line 1:84: mismatched input ''flag''. Expecting: <identifier>
select row(orderkey as "orderkey", partkey as "key", {'returnflag': returnflag} as 'flag')::json from tpch.sf1.lineitem limit 10

@martint
Copy link
Member

martint commented Mar 13, 2025

that works but not super obvious (note " vs ' - ' doesn't work)

It becomes more obvious when you think about how identifiers work. The type of row(1 as x, 2 as "y") is row(X bigint, y bigint)

@dain
Copy link
Member Author

dain commented Mar 13, 2025

@wendigo I agree the changing of case really sucks, but it is a existing issue.


assertThat((boolean) typeOperators.getIndeterminateOperator(emptyRowType, simpleConvention(FAIL_ON_NULL, BLOCK_POSITION_NOT_NULL))
.invokeExact(singleEmptyRow, 0)).isFalse();
assertThat((Boolean) typeOperators.getEqualOperator(emptyRowType, simpleConvention(NULLABLE_RETURN, BLOCK_POSITION_NOT_NULL, BLOCK_POSITION_NOT_NULL))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this one have a different convention?

@@ -574,7 +574,7 @@ primaryExpression
| QUESTION_MARK #parameter
| POSITION '(' valueExpression IN valueExpression ')' #position
| '(' expression (',' expression)+ ')' #rowConstructor
| ROW '(' expression (',' expression)* ')' #rowConstructor
| ROW '(' fieldConstructor (',' fieldConstructor)* ')' #rowConstructor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix alignment of the label

@@ -646,6 +646,10 @@ primaryExpression
')' #jsonArray
;

fieldConstructor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe name this rowField

return node.items().stream()
.map(child -> process(child, context))
.collect(joining(", ", "ROW (", ")"));
List<RowType.Field> fieldTypes = ((RowType) node.type()).getFields();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the IR Row type should override type() to return RowType so that callers don't need to cast

@@ -33,10 +33,9 @@ public record Row(List<Expression> items)
items = ImmutableList.copyOf(items);
}

@Override
public Type type()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep this and change signature to RowType

@@ -141,7 +142,7 @@ public static Values values(Row... row)

public static Row row(Expression... values)
{
return new Row(ImmutableList.copyOf(values));
return new Row(Arrays.stream(values).map(Row.Field::new).collect(Collectors.toList()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use .toList()

List<Type> types = node.getItems().stream()
.map(child -> process(child, context))
List<RowType.Field> fields = node.getFields().stream()
.map(field -> new RowType.Field(field.getName().map(Identifier::getCanonicalValue), process(field.getExpression(), context)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap arguments?

Comment on lines +546 to 548
if (node.getName().isPresent()) {
process(node.getName().get(), context);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node.getName().ifPresent(name -> process(name, context));

Comment on lines +137 to +140
if (name.isPresent()) {
builder.append(name.get());
builder.append(" ");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name.ifPresent(x -> builder.append(x).append(" "));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants