Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(txt-registry): deprecate legacy txt-format #5172

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

ivankatliarchuk
Copy link
Contributor

@ivankatliarchuk ivankatliarchuk commented Mar 13, 2025

Description

TODO:

  • ✅ fix tests
  • ✅ test on real cluster and account
  • ✅ review docs
  • ✅ added TXT legacy record cleanup script

Checklist

  • Unit tests updated
  • End user documentation updated

Executed on real cluster with arguments

go run main.go \
    --provider=aws \
    --registry=txt \
    --source=fake \
    --aws-zone-type=private \
    --zone-id-filter=/hostedzone/XXXXXX \
    --log-level=debug \
    --policy=sync \
    --interval=60s \
    --fqdn-template=a1.ex.com

Without change, records current and old format created

aws route53 list-resource-record-sets --hosted-zone-id ${ZONE_UNDER_TEST} --query "ResourceRecordSets[?Type=='TXT'].{Name:Name, Value:ResourceRecords[0].Value}" --output table
-----------------------------------------------------------------------------
|                          ListResourceRecordSets                           |
+--------------------+------------------------------------------------------+
|        Name        |                        Value                         |
+--------------------+------------------------------------------------------+
|  a-fhbr.a1.ex.com. |  "heritage=external-dns,external-dns/owner=default"  |
|  a-gwbl.a1.ex.com. |  "heritage=external-dns,external-dns/owner=default"  |
|  a-hgmc.a1.ex.com. |  "heritage=external-dns,external-dns/owner=default"  |
|  a-hmqw.a1.ex.com. |  "heritage=external-dns,external-dns/owner=default"  |
|  a-hoxg.a1.ex.com. |  "heritage=external-dns,external-dns/owner=default"  |
|  a-ipfd.a1.ex.com. |  "heritage=external-dns,external-dns/owner=default"  |
|  a-melr.a1.ex.com. |  "heritage=external-dns,external-dns/owner=default"  |
|  a-mkde.a1.ex.com. |  "heritage=external-dns,external-dns/owner=default"  |
|  a-uknh.a1.ex.com. |  "heritage=external-dns,external-dns/owner=default"  |
|  a-xiqj.a1.ex.com. |  "heritage=external-dns,external-dns/owner=default"  |
|  fhbr.a1.ex.com.   |  "heritage=external-dns,external-dns/owner=default"  |
|  gwbl.a1.ex.com.   |  "heritage=external-dns,external-dns/owner=default"  |
|  hgmc.a1.ex.com.   |  "heritage=external-dns,external-dns/owner=default"  |
|  hmqw.a1.ex.com.   |  "heritage=external-dns,external-dns/owner=default"  |
|  hoxg.a1.ex.com.   |  "heritage=external-dns,external-dns/owner=default"  |
|  ipfd.a1.ex.com.   |  "heritage=external-dns,external-dns/owner=default"  |
|  melr.a1.ex.com.   |  "heritage=external-dns,external-dns/owner=default"  |
|  mkde.a1.ex.com.   |  "heritage=external-dns,external-dns/owner=default"  |
|  uknh.a1.ex.com.   |  "heritage=external-dns,external-dns/owner=default"  |
|  xiqj.a1.ex.com.   |  "heritage=external-dns,external-dns/owner=default"  |
+--------------------+------------------------------------------------------+

With the change. New format records created, letagcy TXT records left untouched

❯❯ aws route53 list-resource-record-sets --hosted-zone-id ${ZONE_UNDER_TEST} --query "ResourceRecordSets[?Type=='TXT'].{Name:Name, Value:ResourceRecords[0].Value}" --output table
-----------------------------------------------------------------------------
|                          ListResourceRecordSets                           |
+--------------------+------------------------------------------------------+
|        Name        |                        Value                         |
+--------------------+------------------------------------------------------+
|  a-dkmw.a1.ex.com. |  "heritage=external-dns,external-dns/owner=default"  |
|  a-ihkf.a1.ex.com. |  "heritage=external-dns,external-dns/owner=default"  |
|  a-iztb.a1.ex.com. |  "heritage=external-dns,external-dns/owner=default"  |
|  a-jlmh.a1.ex.com. |  "heritage=external-dns,external-dns/owner=default"  |
|  a-kmam.a1.ex.com. |  "heritage=external-dns,external-dns/owner=default"  |
|  a-lqpv.a1.ex.com. |  "heritage=external-dns,external-dns/owner=default"  |
|  a-rizx.a1.ex.com. |  "heritage=external-dns,external-dns/owner=default"  |
|  a-sjfa.a1.ex.com. |  "heritage=external-dns,external-dns/owner=default"  |
|  a-uvlq.a1.ex.com. |  "heritage=external-dns,external-dns/owner=default"  |
|  a-vgpc.a1.ex.com. |  "heritage=external-dns,external-dns/owner=default"  |
|  bwwo.a1.ex.com.   |  "heritage=external-dns,external-dns/owner=default"  |
|  ecvx.a1.ex.com.   |  "heritage=external-dns,external-dns/owner=default"  |
|  ilan.a1.ex.com.   |  "heritage=external-dns,external-dns/owner=default"  |
|  iyda.a1.ex.com.   |  "heritage=external-dns,external-dns/owner=default"  |
|  nhhy.a1.ex.com.   |  "heritage=external-dns,external-dns/owner=default"  |
|  pazy.a1.ex.com.   |  "heritage=external-dns,external-dns/owner=default"  |
|  sdeo.a1.ex.com.   |  "heritage=external-dns,external-dns/owner=default"  |
|  tosb.a1.ex.com.   |  "heritage=external-dns,external-dns/owner=default"  |
|  wrej.a1.ex.com.   |  "heritage=external-dns,external-dns/owner=default"  |
|  yymt.a1.ex.com.   |  "heritage=external-dns,external-dns/owner=default"  |
+--------------------+------------------------------------------------------+
``

@ivankatliarchuk ivankatliarchuk marked this pull request as draft March 13, 2025 09:42
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 13, 2025
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 13, 2025
@ivankatliarchuk
Copy link
Contributor Author

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Mar 13, 2025
@ivankatliarchuk
Copy link
Contributor Author

/kind cleanup

@k8s-ci-robot k8s-ci-robot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Mar 13, 2025
@ivankatliarchuk ivankatliarchuk changed the title WIP: feat(txt-registry): deprecate legacy txt-format feat(txt-registry): deprecate legacy txt-format Mar 14, 2025
@ivankatliarchuk ivankatliarchuk marked this pull request as ready for review March 14, 2025 11:28
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 14, 2025
@ivankatliarchuk
Copy link
Contributor Author

For example.com the old format creates a TXT record myprefix.example.com (Success), and the new format will constantly try to create e.g. myprefix.cname-example.com (Failure), which doesn't work as that domain is unlikely to be owned.

This looks like a bug to me for when custom prefix is specified. At least for APEx it should behave differently. Could you share all you arguments and example setup. Maybe there is a way to write a fix

@Evesy
Copy link
Contributor

Evesy commented Mar 17, 2025

I agree, the TXT registry has been a bit of a mess for a while now, ever since the new format was introduced to help support managing the same record with multiple record types. It solved that one problem but introduced a number of other compatibility problems for users along with it. In my opinion it should have been removed back then until a proper working solution was found, but instead it's been in a halfway house for years now.

I do think the newer registry format works for all use cases, including apex domains, where templating is being used, i.e. for my example above instead of --txt-prefix=myprefix. you would use --txt-prefix=myprefix-%{record_type}.. So perhaps what's really needed is an inbuilt support mechanism in external-dns for transferring from one prefix to another. This way anyone using prefixes/suffixes from before the new registry format was added would have a migration path between old and new config?

@ivankatliarchuk
Copy link
Contributor Author

So how it currently works for Apex domains? I have not found a flag to disable upsert for new format. Does external-dns errors out, crashes, or swallow the error?

@Evesy
Copy link
Contributor

Evesy commented Mar 18, 2025

Our setup has the below flags:

        - --source=ingress
        - --source=service
        - --source=crd
        - --provider=cloudflare
        - --policy=sync
        - --registry=txt
        - --txt-owner-id=dpp
        - --annotation-filter=external-dns.alpha.kubernetes.io/ignore notin (true)
        - --crd-source-apiversion=externaldns.k8s.io/v1alpha1
        - --txt-prefix=prefix.

What we see in the case of apex domains is external-dns constantly updating the records no every reconciliation loop:

time="2025-03-18T07:32:16Z" level=debug msg="Skipping record myprefix.cname-apex.co.uk because no hosted zone matching record DNS Name was detected"
time="2025-03-18T07:32:31Z" level=info msg="Changing record." action=UPDATE record=apex.co.uk ttl=1 type=CNAME zone=2db0e2d6e06b42e72e5479c062f3a3de
time="2025-03-18T07:32:31Z" level=info msg="Changing record." action=UPDATE record=myprefix.apex.co.uk ttl=1 type=TXT zone=2db0e2d6e06b42e72e5479c062f3a3de

My assumption, and I'm sure I confirmed a while ago, is that external-dns is reconciling everytime because on each check it sees the ${prefix}-${record_type}. TXT ownership record is missing. Fortunately, the DNS record for the apex is still created and managed by external-dns, because the legacy TXT format is successfully created and shows ownership.

However, if external-dns begins only caring about ${prefix}-${record_type}. prefix type, this does not work for apex domains with the above configuration; the record can't be created and so external-dns will be unable to manage the domain

@ivankatliarchuk
Copy link
Contributor Author

Got it. Will try to find a solution for apex records, before merging this to master

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 18, 2025
@ivankatliarchuk
Copy link
Contributor Author

I think the question is, how to reliably determine if zone apex or subdomain.......

@Evesy
Copy link
Contributor

Evesy commented Mar 18, 2025

I think the question is, how to reliably determine if zone apex or subdomain.......

The way we've handled a similar problem elsewhere (determining the authoritative zone for a domain) is to traverse through the address until an SOA record is returned, e.g.

$ dig SOA app.subdomain.tld.co.uk
// No SOA

$ dig SOA subdomain.tld.co.uk
// No SOA

$ dig SOA tld.co.uk
// SOA, so authorative

The same logic could be applied; if the record being created resolves an SOA, it's probably an apex domain

Though I'm not sure external-dns should be doing this as it'd be a lot of extra lookups and might not be fool proof in all scenarios, especially where split DNS is involved. I do think the better approach is to a) clearly document the downfalls of using --txt-prefix when not using the interpolated %{record_type} (arguably you could enforce that %{record_type} is set within txt-prefix, and b) provide a migration for users to be able to change txt-prefix and have external-dns migrate the existing TXT entries automatically

@ivankatliarchuk
Copy link
Contributor Author

DNS lookups, not sure, on the edge. We all have different networking requirements, network segmentation, what if there is no access to DNS lookups and etc.

Maybe we need a new flag --apex-domains and-or --apex-domains-regex?

And add a flag --allow-dns-lookup ? As we already do for service ?

ips, err := net.LookupIP(lb.Hostname)

@Evesy
Copy link
Contributor

Evesy commented Mar 18, 2025

I suppose even if you could reliably determine a domain is an apex, what would external-dns do to handle the domain differently?

You could suffix it with a period . (unless already the case) to ensure the TXT record is created as a subdomain of the apex. This is probably unlikely to have any compatibility issues that I can think of but as you say, people have very different setups.

Not everyone uses it, but --domain-filter could be used as a known list of apex zones if present (Admittedly haven't given much thought of the implications of this, so would need to think)

I'm still in favour of consistently handling records, rather than having edge cases for apex domains where external-dns exhibits different behaviour; and to achieve that I personally feel either mandating or clearly warning users about the impacts of using --txt-prefix when it comes to apex domains (i.e. If you want to manage apex domains, you should ensure the txt-prefix ultimately becomes a subdomain (the same problem surely applies with txt-suffix too)) is the best approach, with, ideally, a migration path for users

@ivankatliarchuk
Copy link
Contributor Author

Got it. I'll submit a patch with documentation update. Yeah, handling apex corretly is not a trivial thing. The dns lookup could happen for example for domain test.one.example.co.uk from right to left kinda lookup uk, lookup co.uk, lookup example.co.uk BINGO but yeah, this sort of over engineering things.

Will update docs and flag description

@ivankatliarchuk
Copy link
Contributor Author

So, the behaviour you described, could be a source or provider specific, as I was not able to reproduce it for AWS provider. There is an open issue from 2018.... #449

so this are my fixtures, where apex zone is ex.com

---
apiVersion: v1
kind: Namespace
metadata:
  name: extdns
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-v1
  namespace: extdns
  annotations:
    dns.company.com/label: controllertest-v1
    dns.issue/type: issue-5172
    external-dns.alpha.kubernetes.io/hostname: nginx-v1.ex.com
spec:
  type: ClusterIP
  clusterIP: None
  ports:
    - port: 80
      name: http
      targetPort: 80
  selector:
    app: nginx
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-v2
  namespace: extdns
  annotations:
    dns.company.com/label: controllertest-v2
    dns.issue/type: issue-5172
    external-dns.alpha.kubernetes.io/ttl: "1m"
    external-dns.alpha.kubernetes.io/hostname: nginx-v2.ex.com
spec:
  type: LoadBalancer
  allocateLoadBalancerNodePorts: true
  ports:
    - port: 80
      name: http
      targetPort: 80
  selector:
    app: nginx
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-v3
  namespace: extdns
  annotations:
    dns.company.com/label: controllertest-v2
    dns.issue/type: issue-5172
    external-dns.alpha.kubernetes.io/ttl: "180"
    external-dns.alpha.kubernetes.io/hostname: nginx-v3.ex.com
spec:
  type: LoadBalancer
  allocateLoadBalancerNodePorts: true
  ports:
    - port: 80
      name: http
      targetPort: 80
  selector:
    app: nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: extdns
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - image: nginx
          name: nginx
          ports:
            - containerPort: 80
              name: http

and arguments for latest external dns

go run main.go \
    --provider=aws \
    --registry=txt \
    --source=service \
    --aws-zone-type=private \
    --zone-id-filter=/hostedzone/${ZONE} \
    --log-level=debug \
    --policy=sync \
    --interval=10s \
    --txt-owner-id="current" \
    --annotation-filter="dns.issue/type=issue-5172"

results

  ❯❯ aws route53 list-resource-record-sets \
        --hosted-zone-id ${ZONE} \
        --query "ResourceRecordSets[?Type=='TXT'].{Name:Name, Value:ResourceRecords[0].Value}" --output table
----------------------------------------------------------------------------------------------------------------------------
|                                                  ListResourceRecordSets                                                  |
+--------------------+-----------------------------------------------------------------------------------------------------+
|        Name        |                                                Value                                                |
+--------------------+-----------------------------------------------------------------------------------------------------+
|  a-nginx-v2.ex.com.|  "heritage=external-dns,external-dns/owner=current,external-dns/resource=service/extdns/nginx-v2"   |
|  a-nginx-v3.ex.com.|  "heritage=external-dns,external-dns/owner=current,external-dns/resource=service/extdns/nginx-v3"   |
+--------------------+-----------------------------------------------------------------------------------------------------+

and for arguments

go run main.go \
    --provider=aws \
    --registry=txt \
    --source=service \
    --aws-zone-type=private \
    --zone-id-filter=/hostedzone/${ZONE} \
    --log-level=debug \
    --policy=sync \
    --interval=10s \
    --txt-owner-id="current" \
    --annotation-filter="dns.issue/type=issue-5172" \
    --txt-prefix="%{record_type}-prefix-"

and results

❯ aws route53 list-resource-record-sets --hosted-zone-id ${ZONE} --query "ResourceRecordSets[?Type=='TXT'].{Name:Name, Value:ResourceRecords[0].Value}" --output table
-----------------------------------------------------------------------------------------------------------------------------------
|                                                     ListResourceRecordSets                                                      |
+---------------------------+-----------------------------------------------------------------------------------------------------+
|           Name            |                                                Value                                                |
+---------------------------+-----------------------------------------------------------------------------------------------------+
|  a-prefix-nginx-v2.ex.com.|  "heritage=external-dns,external-dns/owner=current,external-dns/resource=service/extdns/nginx-v2"   |
|  a-prefix-nginx-v3.ex.com.|  "heritage=external-dns,external-dns/owner=current,external-dns/resource=service/extdns/nginx-v3"   |
+---------------------------+-----------------------------------------------------------------------------------------------------+

@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 19, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from mloiseleur. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ivankatliarchuk
Copy link
Contributor Author

I've updated documentation. In regards external-dns to provide a migration path, there is a note that was there. I have tried to implement a migration, but there are no easy way to implement that. Current behaviour, the external-dns will create a new TXT records and leave legacy records untouched for cleanup. The script to run targeted deletion of TXT records is already added, not an ideal, but should help a bit.

Note: external-dns will not automatically remove legacy format records when switching to new-format-only mode. You'll need to clean up the old records manually if desired.

If you ok with following changes, submit LGTM or share what is missing so I could un-hold the pull request

@Evesy
Copy link
Contributor

Evesy commented Mar 19, 2025

So, the behaviour you described, could be a source or provider specific, as I was not able to reproduce it for AWS provider.

The problem exists when a user is specifying a txt-prefix that does not include the %{record_type} substitution, and the prefix ends in a subdomain, i.e. myprefix.

So when trying to create a record for i.e. tld.co.uk, the old TXT record format creates correctly (TXT myprefix.tld.co.uk), however the new TXT record format (TXT myprefix.a-tld.co.uk) does not, since it's trying to create a record for an entirely different domain.

@ivankatliarchuk
Copy link
Contributor Author

I just shared my configuration, and there is no problem I discover. Running on latest external-dns. The only different is AWS Route53, not cloudflare.

@ivankatliarchuk
Copy link
Contributor Author

Have a look #5172 (comment). For apex domain, without even having a prefix, the new TXT record is a-nginx-v2.ex.com. There is no subdomain created.

@ivankatliarchuk
Copy link
Contributor Author

I updated the docs. Personally not sure if docs update even need to be required, as not able to reproduce the behaviour in my setup, but no harm to have it there.

@ivankatliarchuk
Copy link
Contributor Author

@mloiseleur I could be missing something obvious for apex domains. wdyt?

@Evesy
Copy link
Contributor

Evesy commented Mar 19, 2025

@ivankatliarchuk If you try with --txt-prefix=foo. you should be able to see the issue I would imagine

@ivankatliarchuk
Copy link
Contributor Author

ivankatliarchuk commented Mar 19, 2025

Understood. But this is a configuration that not default, so one could just break things. For example, if prefix contains IDNA character, example this most likely will cripple/break records as well. And most of the providers do not even support IDNA .

What I'm unsure. The expectation is, that by default the TXT records are going to be update, at least they should. There are dozens of combinations that could break any TXT record format, but this should be on users.

Would you mind share your desired message in documentation, or even create a pull request with examples and mitigations. This could help for sure to progress with this PR as well.

@Evesy
Copy link
Contributor

Evesy commented Mar 19, 2025

The README already contains this:

If using a txt registry and attempting to use a CNAME the --txt-prefix must be set to avoid conflicts. Changing --txt-prefix will result in lost ownership over previously created records.

I would suggest adding something like the below?

If you plan to manage apex domains with external-dns whilst using a txt registry, you should ensure when using --txt-prefix that you specify the record type substitution and that it ends in a period ., to ensure the record is created under the same domain as the apex record being managed, i.e. --txt-prefix=someprefix-%{record_type}.

@ivankatliarchuk
Copy link
Contributor Author

Nice one. Added to docs/registry/txt.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants