Continuing the Adventure with the CycloneDX Maven Plugin
My investigation into the CycloneDX Maven Plugin began back in November/December 2022 with the intent of integrating the plugin into the Quarkus build process to generate Software Bill of Materials (SBOMs) for the project. I quickly discovered issues in the plugin and raised these with the maintainer early in December, writing a blog post (An Adventure with the CycloneDX Maven Plugin) to help clarify each issue. I finally opened a pull request in early January to move the conversation forward and this is where our story continues .....
Two weeks ago I received some feedback on the pull request from Steve Springett, he ran my version of the CycloneDX plugin and hit some problems. Steve was running the plugin against the WebGoat 8.0.0 codebase and noticed some dependencies were not present in the components section! This was intriguing as I had been running the plugin against a complex codebase (Quarkus) without seeing the issue, and had also included a BOM validation step in my pull request which would emit WARNING log messages if this situation occurred. I took a look at the WebGoat codebase and could not get this specific version to build, however a build of a different version did succeed without displaying the problem. Curiouser and curiouser .......
We now jump forward to this Monday (4 days ago) when I'm trying to arrange a call with Steve to discuss the differences in our environments and help move this forward. Steve suggested we include Hervé Boutemy in the call, the new maintainer of the upstream codebase, however he offered instead to review my pull request as-is. It's at this point I realised the pull request now had conflicts with the base branch, so I quickly rebased and fixed the conflicts. I also decided to give the WebGoat codebase another try.
I spent time investigating the failures I had seen with the WebGoat build and eventually realised I needed to be running on an older version of Java, I needed to install JDK8 in order to make progress. I was now able to build the same version of the code Steve had been using, although with errors, but could now see missing components. Even better, I could also see the expected WARNING messages were present!
[WARNING] CycloneDX: Dependency missing component entry: pkg:maven/org.webjars/jquery@1.11.1?type=jar
[WARNING] CycloneDX: Dependency missing component entry: pkg:maven/commons-io/commons-io@LATEST?type=jar
[WARNING] CycloneDX: Dependency missing component entry: pkg:maven/com.google.guava/guava@18.0?type=jar
[WARNING] CycloneDX: Dependency missing component entry: pkg:maven/org.apache.commons/commons-lang3@3.4?type=jar
This was great, I now had something to work with.
Comparing Upstream Output with my pull request
Before investigating I decided to first understand the differences in output between the current upstream codebase and what was being generated by my pull request. This may provide some insight into the new issue and could possibly hint at a direction to follow.
With regard to components I discovered three were missing from my version of the bom, however in each case the component was never referenced in the dependencies section. These components were
- pkg:maven/com.google.guava/guava@20.0?type=jar
- pkg:maven/commons-io/commons-io@2.11.0?type=jar
- pkg:maven/org.apache.commons/commons-lang3@3.6?type=jar
These are three of the components we were warned about, but suspiciously each has a different version.
I also found we were now including two additional components, these are
- pkg:maven/junit/junit@4.12?type=jar
- pkg:maven/org.hamcrest/hamcrest-core@1.3?type=jar
With regard to dependencies we were expanding the dependency tree to include transitive dependencies for the following
- pkg:maven/com.fasterxml.jackson.core/jackson-databind@2.8.11.1?type=jar
- pkg:maven/org.springframework.boot/spring-boot-autoconfigure@1.5.12.RELEASE?type=jar
- pkg:maven/org.springframework.boot/spring-boot@1.5.12.RELEASE?type=jar
- pkg:maven/org.springframework.security/spring-security-core@4.2.5.RELEASE?type=jar
- pkg:maven/org.springframework.security/spring-security-web@4.2.5.RELEASE?type=jar
- pkg:maven/org.springframework/spring-aop@4.3.16.RELEASE?type=jar
- pkg:maven/org.springframework/spring-beans@4.3.16.RELEASE?type=jar
- pkg:maven/org.springframework/spring-context@4.3.16.RELEASE?type=jar
- pkg:maven/org.springframework/spring-expression@4.3.16.RELEASE?type=jar
- pkg:maven/org.springframework/spring-test@4.3.16.RELEASE?type=jar
- pkg:maven/org.springframework/spring-web@4.3.16.RELEASE?type=jar
- pkg:maven/org.webjars/bootstrap@3.3.7?type=jar
as well as adding new dependencies into the tree
- pkg:maven/aopalliance/aopalliance@1.0?type=jar
- pkg:maven/org.hamcrest/hamcrest-core@1.3?type=jar
- pkg:maven/junit/junit@4.12?type=jar
however, we are also seeing the following dependencies without any mention in the component section
- pkg:maven/com.google.guava/guava@18.0?type=jar
- pkg:maven/commons-io/commons-io@LATEST?type=jar
- pkg:maven/org.apache.commons/commons-lang3@3.4?type=jar
- pkg:maven/org.webjars/jquery@1.11.1?type=jar
These match the list of dependencies reported as WARNINGs in the log, confirming the issue.
We now know the pull request codebase is having a beneficial effect and providing a more detailed dependency graph. What is left to work out is why we are seeing these four dependencies in the tree with no associated component.
Identity, Does it Matter?
Before we take a look at each of the problematic dependencies let us quickly cover how components are identified in the upstream CycloneDX codebase and in my pull request.
The upstream codebase discovers its components by asking maven for those artifacts which it has resolved to be the definitive set for the build. These artifacts are then filtered based on their scope, however as we discovered in the previous post this does not follow the transitive scoping rules applied by maven, and then used to create the set of components included in the bom file. It is also important to realise that when resolving the dependency tree the upstream codebase will not include any dependencies which do not exist in the set of known components. No components will be removed, even if they do not take part in the dependency tree.
In my pull request we take a slightly different approach. To discover the set of possible components we still ask maven for the definitive set of artifacts, but rely instead on maven to handle the filtering when collecting the dependency graph. At the end of the process we check the set of components and remove any which do not appear in the dependency tree. No dependencies will be removed, even if they do not have an associated component, however a warning is emitted on the console. This is the warning we are now seeing.
These approaches are, essentially, tackling the discovery from opposite directions.
With the above in mind let us now take a look at the problematic artifacts and return to our trusty dependency tree graph. We can see from the WARNINGs that we should focus on two of the projects
- xxe for guava and commons-lang3
- webwolf for jquery and commons-io
A look at Guava
The parts of the xxe dependency tree which are of interest are
org.owasp.webgoat.lesson:xxe:jar:v8.0.0.M15
+- com.github.tomakehurst:wiremock:jar:2.8.0:test
| +- com.google.guava:guava:jar:20.0:provided
| +- com.flipkart.zjsonpatch:zjsonpatch:jar:0.3.0:test
| | +- (com.google.guava:guava:jar:18.0:test - omitted for conflict with 20.0)
+- org.owasp.webgoat:webgoat-container:jar:v8.0.0.M15:provided
| +- (com.google.guava:guava:jar:18.0:provided - omitted for conflict with 20.0)
+- org.owasp.webgoat:webgoat-container:jar:tests:v8.0.0.M15:test
| +- (com.google.guava:guava:jar:18.0:test - omitted for conflict with 20.0)
From this we can see guava:20.0 has been resolved as the winner by maven, however the winning artifact is hidden beneath a test scoped artifact (we saw this in our previous issues). We can also see the artifact discovered through the transitive compile scope is being reported as guava:18.0, so while version 20.0 has been declared the winner we are still seeing the marker nodes report the original version of 18.0. How does each version of the plugin handle this scenario?
The upstream code discovers guava:20.0 in the set of resolved artifacts, including it in its set of known components. When creating the dependency tree it discovers guava:18.0, however decides not to include it as this version is not in the set of known components. This results in a bom which includes the guava:20.0 component and a dependency graph which does not reference the guava dependency, losing the dependency relationship between webgoat-container and guava. The bom looks as follows
<component type="library" bom-ref="pkg:maven/com.google.guava/guava@20.0?type=jar">
In my pull request we discover guava:20.0 in the set of resolved artifacts, including it as a known component. When creating the dependency tree we discover the guava:18.0 dependency and include it in the tree. At the end of the process we drop components which are not mentioned in the dependency tree, in this instance the guava:20.0 component, but keep the dependency relationship between webgoat-container and guava:18.0, which is a missing component. The bom looks as follows
<dependency ref="pkg:maven/org.owasp.webgoat/webgoat-container@v8.0.0.M15?type=jar">
<dependency ref="pkg:maven/com.google.guava/guava@18.0?type=jar"/>
</dependency>
<dependency ref="pkg:maven/com.google.guava/guava@18.0?type=jar"/>
A look at commons-lang3
The parts of the xxe dependency tree which are of interest are
org.owasp.webgoat.lesson:xxe:jar:v8.0.0.M15
+- com.github.tomakehurst:wiremock:jar:2.8.0:test
| +- org.apache.commons:commons-lang3:jar:3.6:provided
| \- com.github.jknack:handlebars:jar:4.0.6:test
| +- (org.apache.commons:commons-lang3:jar:3.1:test - omitted for conflict with 3.6)
+- org.owasp.webgoat:webgoat-container:jar:v8.0.0.M15:provided
| +- (org.apache.commons:commons-lang3:jar:3.4:provided - omitted for conflict with 3.6)
+- org.owasp.webgoat:webgoat-container:jar:tests:v8.0.0.M15:test
| +- (org.apache.commons:commons-lang3:jar:3.4:test - omitted for conflict with 3.6)
We can see from the above that the commons-lang3 artifact suffers from the same problem as the guava artifact, with the artifact identified through the transitive compile scope having a version of 3.4 while the resolved winner has a version of 3.6 but is hidden beneath a test scoped artifact. We can also see there is a third version being referenced beneath the test scoped artifact, commons-lang3:3.1.
The upstream bom looks as follows
<component type="library" bom-ref="pkg:maven/org.apache.commons/commons-lang3@3.6?type=jar">
The bom from my pull request looks as follows
<dependency ref="pkg:maven/org.owasp.webgoat/webgoat-container@v8.0.0.M15?type=jar">
<dependency ref="pkg:maven/org.apache.commons/commons-lang3@3.4?type=jar"/>
</dependency>
<dependency ref="pkg:maven/org.apache.commons/commons-lang3@3.4?type=jar"/>
A look at jquery
The parts of the webwolf dependency tree which are of interest are
org.owasp.webgoat:webwolf:jar:v8.0.0.M15
+- org.webjars:bootstrap:jar:3.3.7:compile
| \- (org.webjars:jquery:jar:1.11.1:compile - omitted for conflict with 3.2.1)
+- org.webjars:jquery:jar:3.2.1:compile
This scenario is slightly different from the previous ones in that the resolved component is not hidden behind a test scoped artifact. We can see from the above that we have two artifacts being discovered within the transitive compile scope, jquery:3.2.1 and jquery:1.11.1. Version 3.2.1 is the resolved winner and version 1.11.1 is the marker node for an artifact which lost the resolution process. How does each version of the plugin handle this scenario?
The upstream code discovers jquery:3.2.1 in the set of resolved artifacts, including it in its set of known components. When creating the dependency tree it discovers both jquery:3.2.1 and jquery:1.11.1, including 3.2.1 in the tree but deciding not to include 1.11.1 as this does not match a known component. This results in a bom which includes the jquery:3.2.1 component and the dependency relationship between webwolf and jquery but loses the dependency relationship between bootstrap and jquery. The bom looks as follows
<component type="library" bom-ref="pkg:maven/org.webjars/jquery@3.2.1?type=jar">
<dependency ref="pkg:maven/org.owasp.webgoat/webwolf@v8.0.0.M15?type=jar">
<dependency ref="pkg:maven/org.webjars/jquery@3.2.1?type=jar"/>
</dependency>
<dependency ref="pkg:maven/org.webjars/jquery@3.2.1?type=jar"/>
In my pull request we discover jquery:3.2.1 in the set of resolved artifacts, including it as a known component. When creating the dependency tree we discover both jquery:3.2.1 and jquery:1.11.1, including both in the tree. This results in a bom which includes the jquery:3.2.1 component and the dependency relationship between webwolf and jquery. The bom also keeps the dependency relationship between bootstrap and jquery:1.11.1, which is a missing component. The bom looks as follows
<component type="library" bom-ref="pkg:maven/org.webjars/jquery@3.2.1?type=jar">
<dependency ref="pkg:maven/org.owasp.webgoat/webwolf@v8.0.0.M15?type=jar">
<dependency ref="pkg:maven/org.webjars/jquery@3.2.1?type=jar"/>
</dependency>
<dependency ref="pkg:maven/org.webjars/bootstrap@3.3.7?type=jar">
<dependency ref="pkg:maven/org.webjars/jquery@1.11.1?type=jar"/>
</dependency>
<dependency ref="pkg:maven/org.webjars/jquery@1.11.1?type=jar"/>
<dependency ref="pkg:maven/org.webjars/jquery@3.2.1?type=jar"/>
A look at commons-io
The parts of the webwolf dependency tree which are of interest are
org.owasp.webgoat:webwolf:jar:v8.0.0.M15
+- commons-io:commons-io:jar:LATEST:compile
Now this scenario is very different from the previous ones. In each of the previous scenarios the dependency tree included marker nodes with versions which did not match the version resolved by maven, the first two with the resolved artifact hidden behind a test scoped artifact and the third with both artifacts discovered through the transitive compile scope. So what is going on here? It's time for a quick dive under the covers of maven!
Maven includes support for two metaversions which can be used when specifying the version of an artifact, these are RELEASE and LATEST. These metaversions have specific meanings when resolving artifacts within a pom, these are
- RELEASE: represents the latest non-snapshot version of the artifact within a repository
- LATEST: represents the latest version of the artifact within a repository, which includes both released and snapshot versions
Note: Using either RELEASE or LATEST in a build breaks reproducibility. Thankfully maven is now issuing the following deprecation WARNING when encountering these metaversions, which means support for these versions should be removed at some point in the future.
[WARNING] 'dependencies.dependency.version' for commons-io:commons-io:jar is either LATEST or RELEASE (both of them are being deprecated)
When maven encounters either of these metaversions it will resolve the artifact to a specific version based on the above meanings. In our case, at least as of today, maven will resolve commons-io:LATEST to commons-io:2.11.0. How does each version of the plugin handle this scenario?
The upstream code discovers commons-io:2.11.0 in the set of resolved artifacts, including it in its set of known components. When creating the dependency tree it discovers commons-io:LATEST, however decides not to include it as this version is not in the set of known components. This results in a bom which includes the commons-io:2.11.0 component and a dependency graph which does not reference the common-io dependency, losing the dependency relationship between webwolf and commons-io. The bom looks as follows
<component type="library" bom-ref="pkg:maven/commons-io/commons-io@2.11.0?type=jar">
In my pull request we discover commons-io:2.11.0 in the set of resolved artifacts, including it as a known component. When creating the dependency tree we discover the commons-io:LATEST dependency and include it in the tree. At the end of the process we drop components which are not mentioned in the dependency tree, in this instance the commons-io:2.11.0 component, but keep the dependency relationship between webwolf and commons-io:LATEST, which is a missing component. The bom looks as follows
<dependency ref="pkg:maven/org.owasp.webgoat/webwolf@v8.0.0.M15?type=jar">
<dependency ref="pkg:maven/commons-io/commons-io@LATEST?type=jar"/>
</dependency>
<dependency ref="pkg:maven/commons-io/commons-io@LATEST?type=jar"/>
Summarising the issues
We have three different scenarios here, however each has the same root cause. Maven is returning a dependency graph which includes marker nodes referencing the original artifact versions and not the versions resolved within the context of the build. These marker nodes have a different identity to the resolved dependencies and are, therefore, treated separately. As we would expect, identity does matter!
With the upstream codebase we see the resolved components being included in the bom, but with certain dependency relationships missing from the dependency tree.
With my pull request we see some missing components from the bom, but with all dependency relationships included in the dependency tree. The problem is that some of these relationships reference dependencies with their original version and not the version resolved by maven.
Note: The issues we are seeing do not happen with dependencies which have their version managed, if we look at the node for a managed dependency we can see the version of the marker has been updated
org.slf4j:slf4j-api:1.7.25:compile (org.slf4j:slf4j-api:jar:1.7.25:compile - version managed from 1.6.6; omitted for duplicate)
In this case the version of the marker node has been updated from 1.6.6 to 1.7.25.
Unfortunately this additional information is not available to us other than through the toNodeString method on the VerboseDependencyNode class, that is unless we delve under the covers and work on the internal aether dependency tree which does contain a data map including this information.
Now for the solution
Now we have identified a root cause there is an obvious solution. We know maven is not updating the versions for some marker nodes, leaving them with their original version, so we need to handle this aspect. We need to track the versions of the resolved artifacts and, when creating the dependency graph, ensure all dependency versions reference the resolved version of the artifact. Thankfully this is a straight forward update to the codebase.
Now that we have a working solution how does this look for each component?
Recap and Solution for guava
From our earlier discussion we saw the upstream plugin had identified the correct guava version for the component, but had lost all dependency relationships, and my pull request had kept the dependency relationships but had lost the component as it was referring to the original version of the artifact. What do we see now in the bom file?
<component type="library" bom-ref="pkg:maven/com.google.guava/guava@20.0?type=jar">
<dependency ref="pkg:maven/org.owasp.webgoat/webgoat-container@v8.0.0.M15?type=jar">
<dependency ref="pkg:maven/com.google.guava/guava@20.0?type=jar"/>
</dependency>
<dependency ref="pkg:maven/com.google.guava/guava@20.0?type=jar"/>
Fantastic, we now see a component with the version resolved by maven and all the dependency relationships we were expecting!
Recap and Solution for commons-lang3
From our earlier discussion we saw a similar issue with commons-lang3. The upstream plugin had identified the correct commons-lang3 version for the component, but had lost all dependency relationships, and my pull request had kept the dependency relationships but had lost the component as it was using the original version of the artifact. What do we see now in the bom file?
<component type="library" bom-ref="pkg:maven/org.apache.commons/commons-lang3@3.6?type=jar">
<dependency ref="pkg:maven/org.owasp.webgoat/webgoat-container@v8.0.0.M15?type=jar">
<dependency ref="pkg:maven/org.apache.commons/commons-lang3@3.6?type=jar"/>
</dependency>
<dependency ref="pkg:maven/org.apache.commons/commons-lang3@3.6?type=jar"/>
We are now two for two, we again see the component with the resolved version and also see all the dependency relationships!
Recap and Solution for jquery
In our earlier discussion we had identified a slightly different scenario with jquery, as the compile scoped artifacts included both the resolved version (3.2.1) and an older version (1.11.1). Both plugins had identified the component and included the dependency relationship between webwolf and jquery, however the upstream plugin had lost the dependency relationship between bootstrap and jquery whereas my pull request included the dependency but referenced the original version. What do we now see in the bom file?
<component type="library" bom-ref="pkg:maven/org.webjars/jquery@3.2.1?type=jar">
<dependency ref="pkg:maven/org.owasp.webgoat/webwolf@v8.0.0.M15?type=jar">
<dependency ref="pkg:maven/org.webjars/jquery@3.2.1?type=jar"/>
</dependency>
<dependency ref="pkg:maven/org.webjars/bootstrap@3.3.7?type=jar">
<dependency ref="pkg:maven/org.webjars/jquery@3.2.1?type=jar"/>
</dependency>
<dependency ref="pkg:maven/org.webjars/jquery@3.2.1?type=jar"/>
We are on a roll, and are now three for three. We can see all the expected dependency relationships are present with all the relationships referencing the resolved version!
Recap and Solution for commons-io
Now we come to our final scenario and the use of metaversions, can we make it four for four?
In our earlier discussion we covered the use and meaning of metaversions within maven dependencies and saw the upstream plugin had correctly identified the resolved component, but had no dependency relationships, whereas my pull request identified the dependency relationships using the LATEST metaversion but did not identify the component. What do we now see in the bom file?
<component type="library" bom-ref="pkg:maven/commons-io/commons-io@2.11.0?type=jar">
<dependency ref="pkg:maven/org.owasp.webgoat/webwolf@v8.0.0.M15?type=jar">
<dependency ref="pkg:maven/commons-io/commons-io@2.11.0?type=jar"/>
</dependency>
<dependency ref="pkg:maven/commons-io/commons-io@2.11.0?type=jar"/>
Brilliant, the component and all expected dependency relationships are present, with each referencing the resolved version and not the metaversion!
We have done it, we are now four for four!
Conclusions
With this latest issue now resolved I feel we have a much better solution for generating SBOMs for maven projects. We know these bom files will contain all dependency relationships returned via maven, and now this version mismatch issue has been addressed we can be confident we will only include entries for resolved artifacts.
My original pull request has been updated to include the fix for these issues, in addition to the issues covered in the previous post (An Adventure with the CycloneDX Maven Plugin), and has now been merged into the upstream codebase with help from Hervé. I'm looking forward to having this released in the next CycloneDX Maven Plugin release and being able to use this in earnest as part of our effort to secure our Software Supply Chain. With any luck this can also be of benefit to your efforts, at least I hope that proves to be the case.