@@ -24,7 +24,8 @@ nodes: # list of child **Glossa
24
24
Example **GlossaryNode**:
25
25
26
26
` ` ` yaml
27
- - name : Shipping # name of the node
27
+ - name : " Shipping" # name of the node
28
+ id : " Shipping-Logistics" # (optional) custom identifier for the node
28
29
description : Provides terms related to the shipping domain # description of the node
29
30
owners : # (optional) owners contains 2 nested fields
30
31
users : # (optional) a list of user IDs
@@ -43,7 +44,8 @@ Example **GlossaryNode**:
43
44
Example **GlossaryTerm**:
44
45
45
46
` ` ` yaml
46
- - name : FullAddress # name of the term
47
+ - name : " Full Address" # name of the term
48
+ id : " Full-Address-Details" # (optional) custom identifier for the term
47
49
description : A collection of information to give the location of a building or plot of land. # description of the term
48
50
owners : # (optional) owners contains 2 nested fields
49
51
users : # (optional) a list of user IDs
@@ -67,10 +69,86 @@ Example **GlossaryTerm**:
67
69
domain : " urn:li:domain:Logistics" # (optional) domain name or domain urn
68
70
` ` `
69
71
70
- To see how these all work together, check out this comprehensive example business glossary file below:
72
+ ## ID Management and URL Generation
73
+
74
+ The business glossary provides two primary ways to manage term and node identifiers:
75
+
76
+ 1. **Custom IDs**: You can explicitly specify an ID for any term or node using the ` id` field. This is recommended for terms that need stable, predictable identifiers:
77
+ ` ` ` yaml
78
+ terms:
79
+ - name: "Response Time"
80
+ id: "support-response-time" # Explicit ID
81
+ description: "Target time to respond to customer inquiries"
82
+ ` ` `
83
+
84
+ 2. **Automatic ID Generation** : When no ID is specified, the system will generate one based on the `enable_auto_id` setting:
85
+ - With `enable_auto_id : false` (default):
86
+ - Node and term names are converted to URL-friendly format
87
+ - Spaces within names are replaced with hyphens
88
+ - Special characters are removed (except hyphens)
89
+ - Case is preserved
90
+ - Multiple hyphens are collapsed to single ones
91
+ - Path components (node/term hierarchy) are joined with periods
92
+ - Example : Node "Customer Support" with term "Response Time" → "Customer-Support.Response-Time"
93
+
94
+ - With `enable_auto_id : true`:
95
+ - Generates GUID-based IDs
96
+ - Recommended for guaranteed uniqueness
97
+ - Required for terms with non-ASCII characters
98
+
99
+ Here's how path-based ID generation works :
100
+ ` ` ` yaml
101
+ nodes:
102
+ - name: "Customer Support" # Node ID: Customer-Support
103
+ terms:
104
+ - name: "Response Time" # Term ID: Customer-Support.Response-Time
105
+ description: "Response SLA"
106
+
107
+ - name: "First Reply" # Term ID: Customer-Support.First-Reply
108
+ description: "Initial response"
109
+
110
+ - name: "Product Feedback" # Node ID: Product-Feedback
111
+ terms:
112
+ - name: "Response Time" # Term ID: Product-Feedback.Response-Time
113
+ description: "Feedback response"
114
+ ` ` `
115
+
116
+ **Important Notes**:
117
+ - Periods (.) are used exclusively as path separators between nodes and terms
118
+ - Periods in term or node names themselves will be removed
119
+ - Each component of the path (node names, term names) is cleaned independently :
120
+ - Spaces to hyphens
121
+ - Special characters removed
122
+ - Case preserved
123
+ - The cleaned components are then joined with periods to form the full path
124
+ - Non-ASCII characters in any component trigger automatic GUID generation
125
+ - Once an ID is created (either manually or automatically), it cannot be easily changed
126
+ - All references to a term (in `inherits`, `contains`, etc.) must use its correct ID
127
+ - Moving terms in the hierarchy does NOT update their IDs :
128
+ - The ID retains its original path components even after moving
129
+ - This can lead to IDs that don't match the current location
130
+ - Consider using `enable_auto_id : true` if you plan to reorganize your glossary
131
+ - For terms that other terms will reference, consider using explicit IDs or enable auto_id
132
+
133
+ Example of how different names are handled :
134
+ ` ` ` yaml
135
+ nodes:
136
+ - name: "Data Services" # Node ID: Data-Services
137
+ terms:
138
+ # Basic term name
139
+ - name: "Response Time" # Term ID: Data-Services.Response-Time
140
+ description: "SLA metrics"
141
+
142
+ # Term name with special characters
143
+ - name: "API @ Response" # Term ID: Data-Services.API-Response
144
+ description: "API metrics"
145
+
146
+ # Term with non-ASCII (triggers GUID)
147
+ - name: "パフォーマンス" # Term ID will be a 32-character GUID
148
+ description: "Performance"
149
+ ` ` `
71
150
72
- <details>
73
- <summary>Example business glossary file</summary>
151
+ To see how these all work together, check out this comprehensive example business glossary file below :
74
152
75
153
` ` ` yaml
76
154
version: "1"
@@ -80,172 +158,108 @@ owners:
80
158
- mjames
81
159
url: "https://github.com/datahub-project/datahub/"
82
160
nodes:
83
- - name : Classification
161
+ - name: "Data Classification"
162
+ id: "Data-Classification" # Custom ID for stable references
84
163
description: A set of terms related to Data Classification
85
164
knowledge_links:
86
165
- label: Wiki link for classification
87
166
url: "https://en.wikipedia.org/wiki/Classification"
88
167
terms:
89
- - name : Sensitive
168
+ - name: " Sensitive Data" # Will generate: Data-Classification.Sensitive-Data
90
169
description: Sensitive Data
91
170
custom_properties:
92
171
is_confidential: "false"
93
- - name : Confidential
172
+ - name: " Confidential Information" # Will generate: Data-Classification.Confidential-Information
94
173
description: Confidential Data
95
174
custom_properties:
96
175
is_confidential: "true"
97
- - name : HighlyConfidential
176
+ - name: "Highly Confidential" # Will generate: Data-Classification.Highly-Confidential
98
177
description: Highly Confidential Data
99
178
custom_properties:
100
179
is_confidential: "true"
101
180
domain: Marketing
102
- - name : PersonalInformation
181
+
182
+ - name: "Personal Information"
103
183
description: All terms related to personal information
104
184
owners:
105
185
users:
106
186
- mjames
107
187
terms:
108
- - name : Email
109
- # # An example of using an id to pin a term to a specific guid
110
- # # See "how to generate custom IDs for your terms" section below
111
- # id: "urn:li:glossaryTerm:41516e310acbfd9076fffc2c98d2d1a3"
188
+ - name: "Email" # Will generate: Personal-Information.Email
112
189
description: An individual's email address
113
190
inherits:
114
- - Classification.Confidential
191
+ - Data- Classification.Confidential # References parent node path
115
192
owners:
116
193
groups:
117
194
- Trust and Safety
118
- - name : Address
195
+ - name: "Address" # Will generate: Personal-Information. Address
119
196
description: A physical address
120
- - name : Gender
197
+ - name: "Gender" # Will generate: Personal-Information. Gender
121
198
description: The gender identity of the individual
122
199
inherits:
123
- - Classification.Sensitive
124
- - name : Shipping
125
- description : Provides terms related to the shipping domain
126
- owners :
127
- users :
128
- - njones
129
- groups :
130
- - logistics
131
- terms :
132
- - name : FullAddress
133
- description : A collection of information to give the location of a building or plot of land.
134
- owners :
135
- users :
136
- - njones
137
- groups :
138
- - logistics
139
- term_source : " EXTERNAL"
140
- source_ref : FIBO
141
- source_url : " https://www.google.com"
142
- inherits :
143
- - Privacy.PII
144
- contains :
145
- - Shipping.ZipCode
146
- - Shipping.CountryCode
147
- - Shipping.StreetAddress
148
- related_terms :
149
- - Housing.Kitchen.Cutlery
150
- custom_properties :
151
- - is_used_for_compliance_tracking : " true"
152
- knowledge_links :
153
- - url : " https://en.wikipedia.org/wiki/Address"
154
- label : Wiki link
155
- domain : " urn:li:domain:Logistics"
156
- knowledge_links :
157
- - label : Wiki link for shipping
158
- url : " https://en.wikipedia.org/wiki/Freight_transport"
159
- - name : ClientsAndAccounts
200
+ - Data-Classification.Sensitive # References parent node path
201
+
202
+ - name: "Clients And Accounts"
160
203
description: Provides basic concepts such as account, account holder, account provider, relationship manager that are commonly used by financial services providers to describe customers and to determine counterparty identities
161
204
owners:
162
205
groups:
163
206
- finance
207
+ type: DATAOWNER
164
208
terms:
165
- - name : Account
209
+ - name: "Account" # Will generate: Clients-And-Accounts. Account
166
210
description: Container for records associated with a business arrangement for regular transactions and services
167
211
term_source: "EXTERNAL"
168
212
source_ref: FIBO
169
213
source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account"
170
214
inherits:
171
- - Classification.HighlyConfidential
215
+ - Data- Classification.Highly-Confidential # References parent node path
172
216
contains:
173
- - ClientsAndAccounts .Balance
174
- - name : Balance
217
+ - Clients-And-Accounts .Balance # References term in same node
218
+ - name: "Balance" # Will generate: Clients-And-Accounts. Balance
175
219
description: Amount of money available or owed
176
220
term_source: "EXTERNAL"
177
221
source_ref: FIBO
178
222
source_url: "https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Balance"
179
- - name : Housing
180
- description : Provides terms related to the housing domain
181
- owners :
182
- users :
183
- - mjames
184
- groups :
185
- - interior
186
- nodes :
187
- - name : Colors
188
- description : " Colors that are used in Housing construction"
189
- terms :
190
- - name : Red
191
- description : " red color"
192
- term_source : " EXTERNAL"
193
- source_ref : FIBO
194
- source_url : " https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account"
195
-
196
- - name : Green
197
- description : " green color"
198
- term_source : " EXTERNAL"
199
- source_ref : FIBO
200
- source_url : " https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account"
201
-
202
- - name : Pink
203
- description : pink color
204
- term_source : " EXTERNAL"
205
- source_ref : FIBO
206
- source_url : " https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account"
223
+
224
+ - name: "KPIs"
225
+ description: Common Business KPIs
207
226
terms:
208
- - name : WindowColor
209
- description : Supported window colors
210
- term_source : " EXTERNAL"
211
- source_ref : FIBO
212
- source_url : " https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account"
213
- values :
214
- - Housing.Colors.Red
215
- - Housing.Colors.Pink
227
+ - name: "CSAT %" # Will generate: KPIs.CSAT
228
+ description: Customer Satisfaction Score
229
+ ` ` `
216
230
217
- - name : Kitchen
218
- description : a room or area where food is prepared and cooked.
219
- term_source : " EXTERNAL"
220
- source_ref : FIBO
221
- source_url : " https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account"
231
+ # # Custom ID Specification
222
232
223
- - name : Spoon
224
- description : an implement consisting of a small, shallow oval or round bowl on a long handle, used for eating, stirring, and serving food.
225
- term_source : " EXTERNAL"
226
- source_ref : FIBO
227
- source_url : " https://spec.edmcouncil.org/fibo/ontology/FBC/ProductsAndServices/ClientsAndAccounts/Account"
228
- related_terms :
229
- - Housing.Kitchen
230
- knowledge_links :
231
- - url : " https://en.wikipedia.org/wiki/Spoon"
232
- label : Wiki link
233
- ` ` `
234
- </details>
233
+ Custom IDs can be specified in two ways, both of which are fully supported and acceptable :
235
234
236
- Source file linked [here](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/bootstrap_data/business_glossary.yml).
235
+ 1. Just the ID portion (simpler approach) :
236
+ ` ` ` yaml
237
+ terms:
238
+ - name: "Email"
239
+ id: "company-email" # Will become urn:li:glossaryTerm:company-email
240
+ description: "Company email address"
241
+ ` ` `
237
242
238
- ## Generating custom IDs for your terms
243
+ 2. Full URN format :
244
+ ` ` ` yaml
245
+ terms:
246
+ - name: "Email"
247
+ id: "urn:li:glossaryTerm:company-email"
248
+ description: "Company email address"
249
+ ` ` `
239
250
240
- IDs are normally inferred from the glossary term/node's name, see the ` enable_auto_id` config. But, if you need a stable
241
- identifier, you can generate a custom ID for your term. It should be unique across the entire Glossary.
251
+ Both methods are valid and will work correctly. The system will automatically handle the URN prefix if you specify just the ID portion.
242
252
243
- Here's an example ID :
244
- `id : " urn:li:glossaryTerm:41516e310acbfd9076fffc2c98d2d1a3" `
253
+ The same applies for nodes :
254
+ ` ` ` yaml
255
+ nodes:
256
+ - name: "Communications"
257
+ id: "internal-comms" # Will become urn:li:glossaryNode:internal-comms
258
+ description: "Internal communication methods"
259
+ ` ` `
245
260
246
- A note of caution: once you select a custom ID, it cannot be easily changed.
261
+ Note : Once you select a custom ID, it cannot be easily changed.
247
262
248
263
# # Compatibility
249
264
250
- Compatible with version 1 of business glossary format.
251
- The source will be evolved as we publish newer versions of this format.
265
+ Compatible with version 1 of business glossary format. The source will be evolved as newer versions of this format are published.
0 commit comments