Skip to content

Commit 835e708

Browse files
committed
Change the dependency relation of list items to discourse instead of nummod, as described in UniversalDependencies/UD_English-EWT#518
1 parent 54aaf2c commit 835e708

File tree

1 file changed

+9
-9
lines changed

1 file changed

+9
-9
lines changed

src/edu/stanford/nlp/trees/UniversalEnglishGrammaticalRelations.java

+9-9
Original file line numberDiff line numberDiff line change
@@ -881,11 +881,6 @@ private UniversalEnglishGrammaticalRelations() {}
881881
* the meaning of the NP. Also, the enumeration of lists have
882882
* this relation to the head of the list item.
883883
* <br>
884-
* Also, the enumeration of lists have this relation to the head of
885-
* the list item. For that, we allow the list of constituents which
886-
* have a list under them in any of the training data, as the parser
887-
* will likely not produce anything else anyway.
888-
* <br>
889884
* PTB: PP NP X S FRAG <br>
890885
* EWT: SQ SBARQ SINV SBAR NML VP <br>
891886
* Craft: PRN <br>
@@ -905,9 +900,7 @@ private UniversalEnglishGrammaticalRelations() {}
905900
// Note that the earlier tregexes are usually enough to cover those phrases, such as when
906901
// the QP is by itself in an ADJP or NP, but sometimes it can have other siblings such
907902
// as in the phrase "$ 100 million or more". In that case, this next expression is needed.
908-
"QP < QP=target < /^[$]$/",
909-
// Lists are treated as nummod in UD_English-EWT
910-
"PP|NP|X|S|FRAG|SQ|SBARQ|SINV|SBAR|NML|VP|PRN|ADJP < LST=target");
903+
"QP < QP=target < /^[$]$/");
911904

912905

913906
/**
@@ -1019,12 +1012,19 @@ private UniversalEnglishGrammaticalRelations() {}
10191012
* define this to include: interjections (oh, uh-huh, Welcome), fillers (um, ah),
10201013
* and discourse markers (well, like, actually, but not: you know).
10211014
* We also use it for emoticons.
1015+
* <br>
1016+
* Also, the enumeration of lists have this relation to the head of
1017+
* the list item. For that, we allow the list of constituents which
1018+
* have a list under them in any of the training data, as the parser
1019+
* will likely not produce anything else anyway.
10221020
*/
10231021
public static final GrammaticalRelation DISCOURSE_ELEMENT =
10241022
new GrammaticalRelation(Language.UniversalEnglish, "discourse", "discourse element",
10251023
MODIFIER, ".*", tregexCompiler,
10261024
"__ < (NFP=target [ < " + WESTERN_SMILEY + " | < " + ASIAN_SMILEY + " ] )",
1027-
"__ [ < INTJ=target | < (PRN=target <1 /^(?:,|-LRB-)$/ <2 INTJ [ !<3 __ | <3 /^(?:,|-RRB-)$/ ] ) ]");
1025+
"__ [ < INTJ=target | < (PRN=target <1 /^(?:,|-LRB-)$/ <2 INTJ [ !<3 __ | <3 /^(?:,|-RRB-)$/ ] ) ]",
1026+
// Lists are treated as discourse in UD_English-EWT as of 2.14
1027+
"PP|NP|X|S|FRAG|SQ|SBARQ|SINV|SBAR|NML|VP|PRN|ADJP < LST=target");
10281028

10291029

10301030
/**

0 commit comments

Comments
 (0)