Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(cyclonedx): Sanitize copyrights for the CycloneDX XML report #9303

Conversation

nnobelis
Copy link
Member

Some characters in copyrights cannot be outputted to XML. Therefore, sanitize the copyrights content for XML.

This fixes the following exception:

org.cyclonedx.exception.GeneratorException: com.fasterxml.jackson.databind.JsonMappingException: Invalid white space character (0x1) in text to output (in xml 1.1, could output as a character entity) (through reference chain: org.cyclonedx.model.Bom["component"]->java.util.ArrayList[164]->org.cyclonedx.model.Component["copyright"]) at org.ossreviewtoolkit.plugins.reporters.cyclonedx.CycloneDxReporter.writeBom(CycloneDxReporter.kt:369) at org.ossreviewtoolkit.plugins.reporters.cyclonedx.CycloneDxReporterKt.access$generateBom(CycloneDxReporter.kt:1) at org.ossreviewtoolkit.plugins.reporters.cyclonedx.CycloneDxReporterKt.generateBom(CycloneDxReporter.kt:394) at org.cyclonedx.generators.xml.BomXmlGenerator.toXmlString(BomXmlGenerator.java:131) at org.cyclonedx.generators.xml.BomXmlGenerator.toXML(BomXmlGenerator.java:116)

Some characters in copyrights cannot be outputted to XML. Therefore,
sanitize the copyrights content for XML.

Signed-off-by: Nicolas Nobelis <nicolas.nobelis@bosch.com>
@nnobelis nnobelis requested a review from a team as a code owner October 18, 2024 09:57
@@ -90,6 +91,7 @@ asciidoctorj-pdf = { module = "org.asciidoctor:asciidoctorj-pdf", version.ref =
awsS3 = { module = "software.amazon.awssdk:s3", version.ref = "s3" }
clikt = { module = "com.github.ajalt.clikt:clikt", version.ref = "clikt" }
commonsCompress = { module = "org.apache.commons:commons-compress", version.ref = "commonsCompress" }
commonsText = { module = "org.apache.commons:commons-text", version.ref = "commonsText" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super happy about introducing a dependency on commons-text just for escaping XML... can we find a more lightweight solution?

Copy link
Member

@sschuberth sschuberth Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, we once had #2878, so maybe that idea could be recycled in an extended form?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fell free to suggest one!

I didn't find anything by googling. The Cyclone DX lib delegates to Jackson for the XML generation that delegates to StAX, but I found nothing there too 😞

Copy link
Member

@sschuberth sschuberth Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I remember that somewhere we had a discussion about wrapping Copyright texts in <![CDATA[...]]>, but I currently cannot find it anymore. Would that also solve the issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat related, though targeting the description: #2814

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I remember that somewhere we had a discussion about wrapping Copyright texts in <![CDATA[...]]>, but I currently cannot find it anymore. Would that also solve the issue?

Assuming that this would help, I'm wondering why our Copyright's aren't already wrapped in <![CDATA[...]]>. Also this test from upstream shows that the text is nested, which does not seem to be the case for our reports. So maybe that's the root cause: For some reason we're not using the <text> tag, and that's also why data does not get wrapped in <![CDATA[...]]>. Could you investigate into that direction, @nnobelis?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the test you linked, the copyright is under an evidence element:

<component type="application">
            <evidence>
                <copyright>

And Component.evidence.copyright has a special class in the model, with this text element:
https://github.com/CycloneDX/cyclonedx-core-java/blob/6cfef328eb0c6c73f83dff13f9b2fe1934c271b9/src/main/java/org/cyclonedx/model/Copyright.java#L30

If you use Component.copyright instead, this is just a string:
https://github.com/CycloneDX/cyclonedx-core-java/blob/6cfef328eb0c6c73f83dff13f9b2fe1934c271b9/src/main/java/org/cyclonedx/model/Component.java#L174

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should open an upstream ticket.

I think this is not nice that we escape it ourselves: with the code in the PR, we are escaping the BOM for XML even if we generate only some JSON.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it seem odd that Copyrights are wrapped in CDATA if part of <evidence>, but not otherwise.

Copy link

codecov bot commented Oct 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 67.28%. Comparing base (ed4bccf) to head (b146bc3).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #9303      +/-   ##
============================================
+ Coverage     66.29%   67.28%   +0.99%     
  Complexity     1201     1201              
============================================
  Files           239      239              
  Lines          8446     8446              
  Branches        905      905              
============================================
+ Hits           5599     5683      +84     
+ Misses         2478     2394      -84     
  Partials        369      369              
Flag Coverage Δ
funTest-docker 60.07% <ø> (ø)
funTest-non-docker 33.63% <ø> (-1.08%) ⬇️
test 37.36% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nnobelis
Copy link
Member Author

nnobelis commented Oct 18, 2024

@sschuberth Funnily enough, I found out you had an escapeXML function in the past:
https://github.com/oss-review-toolkit/ort/pull/2820/files

@sschuberth
Copy link
Member

@sschuberth Funnily enough, I found out you had an escapeXML function in the past: https://github.com/oss-review-toolkit/ort/pull/2820/files

Yes, that's what I wrote over here an hour ago 😉

@nnobelis
Copy link
Member Author

nnobelis commented Oct 18, 2024

Closed in favor of an upstream issue: CycloneDX/cyclonedx-core-java#538.

@nnobelis nnobelis closed this Oct 18, 2024
@sschuberth
Copy link
Member

Closed in favor of an upstream fix: CycloneDX/cyclonedx-core-java#538.

I was hoping for a fix, but it's in fact an issue 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants