Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 2.0.332 #60

Merged
merged 4 commits into from
Jun 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion deps.edn
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
miikka/clj-base62 {:mvn/version "0.1.1"}
com.github.pmonks/clj-spdx {:mvn/version "1.0.176"}
com.github.pmonks/rencg {:mvn/version "1.0.51"}
com.github.pmonks/embroidery {:mvn/version "1.0.41"}}
com.github.pmonks/embroidery {:mvn/version "1.0.44"}}
:aliases
{:build {:deps {com.github.pmonks/pbr {:mvn/version "RELEASE"}}
:ns-default pbr.build}}}
66 changes: 44 additions & 22 deletions doc/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ Each expression-info map in the sequence of values has this structure:
Whether this identifier was unambiguously declared within the input or was instead concluded by lice-comb (see [the SPDX FAQ](https://wiki.spdx.org/view/SPDX_FAQ) for more detail on the definition of these two terms).
* `:confidence` (one of: `:high`, `:medium`, `:low`, only provided when `:type` = `:concluded`):
Indicates the approximate confidence lice-comb has in its conclusions for this particular SPDX identifier.
* `:confidence-explanations` (a set of keywords, optional):
Describes why the associated `:confidence` was not `:high`.
* `:strategy` (a keyword, mandatory):
The strategy lice-comb used to determine this particular SPDX identifier. See [[lice-comb.utils/strategy->string]] for an up-to-date list of all possible values.
* `:source` (a sequence of `String`s):
Expand All @@ -41,34 +43,54 @@ For example, this code:
results in this expressions-info map (pretty printed for clarity):

```clojure
{"GPL-2.0-or-later"
({:id "GPL-2.0-or-later",
:type :concluded,
:confidence :medium,
:strategy :regex-matching,
:source ("https://repo1.maven.org/maven2/javax/mail/javax.mail-api/1.6.2/javax.mail-api-1.6.2.pom"
"https://repo1.maven.org/maven2/com/sun/mail/all/1.6.2/all-1.6.2.pom"
"<licenses><license><name>"
"CDDL/GPLv2+CE"
"GPLv2+")}),
"CDDL-1.1"
({:id "CDDL-1.1",
:type :concluded,
:confidence :low,
:strategy :regex-matching,
:source ("https://repo1.maven.org/maven2/javax/mail/javax.mail-api/1.6.2/javax.mail-api-1.6.2.pom"
"https://repo1.maven.org/maven2/com/sun/mail/all/1.6.2/all-1.6.2.pom"
"<licenses><license><name>"
"CDDL/GPLv2+CE"
"CDDL")})}
{"CDDL-1.1 OR GPL-2.0-only WITH Classpath-exception-2.0"
({:type :concluded
:confidence :low
:strategy :maven-pom-multi-license-rule
:source ("javax.mail/javax.mail-api@1.6.2"
"https://repo1.maven.org/maven2/javax/mail/javax.mail-api/1.6.2/javax.mail-api-1.6.2.pom"
"https://repo1.maven.org/maven2/com/sun/mail/all/1.6.2/all-1.6.2.pom")}
{:id "GPL-2.0-only"
:type :concluded
:confidence :high
:strategy :manual-verification
:source ("javax.mail/javax.mail-api@1.6.2"
"https://repo1.maven.org/maven2/javax/mail/javax.mail-api/1.6.2/javax.mail-api-1.6.2.pom"
"https://repo1.maven.org/maven2/com/sun/mail/all/1.6.2/all-1.6.2.pom"
"<licenses><license><name>"
"CDDL/GPLv2+CE"
"GPLv2+CE"
"GPLv2")}
{:id "Classpath-exception-2.0"
:type :concluded
:confidence :high
:strategy :manual-verification
:source ("javax.mail/javax.mail-api@1.6.2"
"https://repo1.maven.org/maven2/javax/mail/javax.mail-api/1.6.2/javax.mail-api-1.6.2.pom"
"https://repo1.maven.org/maven2/com/sun/mail/all/1.6.2/all-1.6.2.pom"
"<licenses><license><name>"
"CDDL/GPLv2+CE"
"GPLv2+CE"
"CE")}
{:id "CDDL-1.1"
:type :concluded
:confidence :low
:confidence-explanations #{:missing-version}
:strategy :regex-matching
:source ("javax.mail/javax.mail-api@1.6.2"
"https://repo1.maven.org/maven2/javax/mail/javax.mail-api/1.6.2/javax.mail-api-1.6.2.pom"
"https://repo1.maven.org/maven2/com/sun/mail/all/1.6.2/all-1.6.2.pom"
"<licenses><license><name>"
"CDDL/GPLv2+CE"
"CDDL")})}
```

A key insight that the expressions-info map tells us in this case is that the `javax.mail/javax.mail-api@1.6.2` artifact doesn't declare which version of the CDDL it uses, and lice-comb has _inferred_ the latest (`CDDL-1.1`), and in doing so reduced its confidence to "low". This important insight is not apparent when the `simple` variant of the function is used instead:
A key insight that the expressions-info map tells us in this case is that the `javax.mail/javax.mail-api@1.6.2` artifact doesn't declare which version of the CDDL it uses, and lice-comb has _inferred_ the latest (`CDDL-1.1`), and in doing so reduced its confidence to "low" (while also provided a helpful confidence explanation). This important insight is not apparent when the `simple` variant of the function is used instead:

```clojure
(lcmvn/gav->expressions "javax.mail" "javax.mail-api" "1.6.2")

#{"CDDL-1.1" "GPL-2.0-or-later"}
#{"CDDL-1.1 OR GPL-2.0-only WITH Classpath-exception-2.0"}
```

[Back to GitHub](https://github.com/pmonks/lice-comb)
5 changes: 2 additions & 3 deletions resources/lice_comb/names.edn
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
; Map of name values seen in the wild that are too ambiguous / cursed to support any reasonable form of automated parsing
{
; Seen in https://repo.maven.apache.org/maven2/com/sun/mail/all/1.4.7/all-1.4.7.pom
; Seen in https://repo.maven.apache.org/maven2/com/sun/mail/all/1.4.7/all-1.4.7.pom and other javax.mail/javax.mail-api artifacts
"GPLv2+CE" {"GPL-2.0-only WITH Classpath-exception-2.0"
({:type :concluded :confidence :high :strategy :manual-verification :source ("GPLv2+CE")}
{:id "GPL-2.0-only" :type :concluded :confidence :high :strategy :manual-verification :source ("GPLv2+CE" "GPLv2")}
({:id "GPL-2.0-only" :type :concluded :confidence :high :strategy :manual-verification :source ("GPLv2+CE" "GPLv2")}
{:id "Classpath-exception-2.0" :type :concluded :confidence :high :strategy :manual-verification :source ("GPLv2+CE" "CE")})}
}
24 changes: 13 additions & 11 deletions src/lice_comb/deps.clj
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@
license information."
(:require [clojure.string :as s]
[clojure.tools.logging :as log]
[embroidery.api :as e]
[lice-comb.maven :as lcmvn]
[lice-comb.files :as lcf]
[lice-comb.impl.expressions-info :as lciei]))
[lice-comb.impl.expressions-info :as lciei]
[lice-comb.impl.utils :as lciu]))

(defn- normalise-dep
"Normalises a dep, by removing any classifier suffixes from the artifact-id
Expand Down Expand Up @@ -60,15 +60,17 @@
nil))]
gav-expressions
; If we didn't find any licenses in the dep's POM, check the dep's JAR(s)
(into {} (filter identity (e/pmap* #(try
(lcf/zip->expressions-info %)
(catch javax.xml.stream.XMLStreamException xse
(log/warn (str "Failed to parse pom inside " % " - ignoring") xse)
nil)
(catch java.util.zip.ZipException ze
(log/warn (str "Failed to unzip " % " - ignoring") ze)
nil))
(:paths info))))))))
(into {} (filter identity
(lciu/file-handle-bounded-pmap
#(try
(lcf/zip->expressions-info %)
(catch javax.xml.stream.XMLStreamException xse
(log/warn (str "Failed to parse pom inside " % " - ignoring") xse)
nil)
(catch java.util.zip.ZipException ze
(log/warn (str "Failed to unzip " % " - ignoring") ze)
nil))
(:paths info))))))))

(defmulti dep->expressions-info
"Returns an expressions-info map for `dep` (a `MapEntry` or two-element vector
Expand Down
34 changes: 19 additions & 15 deletions src/lice_comb/files.clj
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@
(:require [clojure.string :as s]
[clojure.java.io :as io]
[clojure.tools.logging :as log]
[embroidery.api :as e]
[lice-comb.matching :as lcm]
[lice-comb.maven :as lcmvn]
[lice-comb.impl.expressions-info :as lciei]
Expand Down Expand Up @@ -52,7 +51,7 @@
directories (as defined by `java.io.File.isHidden()`) are included in the
search or not."
([dir] (probable-license-files dir nil))
([dir {:keys [include-hidden-dirs?] :or {include-hidden-dirs? false} :as opts}]
([dir {:keys [include-hidden-dirs?] :or {include-hidden-dirs? false}}]
(when (lciu/readable-dir? dir)
(some-> (lciu/filter-file-only-seq (io/file dir)
(fn [^java.io.File d] (and (not= (.getCanonicalFile d) (.getCanonicalFile (io/file (lcmvn/local-maven-repo)))) ; Make sure to exclude the Maven local repo, just in case it happens to be nested within dir
Expand Down Expand Up @@ -135,7 +134,7 @@
directories (as defined by `java.io.File.isHidden()`) are included in the
search or not."
([dir] (zip-compressed-files dir nil))
([dir {:keys [include-hidden-dirs?] :or {include-hidden-dirs? false} :as opts}]
([dir {:keys [include-hidden-dirs?] :or {include-hidden-dirs? false}}]
(when (lciu/readable-dir? dir)
(some-> (lciu/filter-file-only-seq (io/file dir)
(fn [^java.io.File d] (or include-hidden-dirs? (not (.isHidden d))))
Expand All @@ -145,6 +144,7 @@
(s/ends-with? lname ".jar")))))
set))))

#_{:clj-kondo/ignore [:unused-binding]}
(defn dir->expressions-info
"Returns an expressions-info map for `dir` (a `String` or `File`, which must
refer to a readable directory), or `nil` if or no expressions were found.
Expand All @@ -160,19 +160,23 @@
([dir] (dir->expressions-info dir nil))
([dir {:keys [include-hidden-dirs? include-zips?] :or {include-hidden-dirs? false include-zips? false} :as opts}]
(when (lciu/readable-dir? dir)
(let [file-expressions (into {} (filter identity (e/pmap* #(try
(file->expressions-info %)
(catch Exception e
(log/warn (str "Unexpected exception while processing " % " - ignoring") e)
nil))
(probable-license-files dir opts))))]
(let [file-expressions (into {} (filter identity
(lciu/file-handle-bounded-pmap
#(try
(file->expressions-info %)
(catch Exception e
(log/warn (str "Unexpected exception while processing " % " - ignoring") e)
nil))
(probable-license-files dir opts))))]
(if include-zips?
(let [zip-expressions (into {} (filter identity (e/pmap* #(try
(zip->expressions-info %)
(catch Exception e
(log/warn (str "Unexpected exception while processing " % " - ignoring") e)
nil))
(zip-compressed-files dir opts))))]
(let [zip-expressions (into {} (filter identity
(lciu/file-handle-bounded-pmap
#(try
(zip->expressions-info %)
(catch Exception e
(log/warn (str "Unexpected exception while processing " % " - ignoring") e)
nil))
(zip-compressed-files dir opts))))]
(lciei/prepend-source (lciu/filepath dir) (merge file-expressions zip-expressions)))
(lciei/prepend-source (lciu/filepath dir) file-expressions))))))

Expand Down
14 changes: 13 additions & 1 deletion src/lice_comb/impl/utils.clj
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@
lice-comb and may change without notice."
(:require [clojure.string :as s]
[clojure.java.io :as io]
[clj-base62.core :as base62]))
[clj-base62.core :as base62]
[embroidery.api :as e]))

(defn mapfonk
"Returns a new map where f has been applied to all of the keys of m."
Expand Down Expand Up @@ -326,3 +327,14 @@
(if-not (s/blank? val)
val
default))))

; Note: we could use OSHI to determine the actual number of possible open file
; handles on the runtime environment, but it seems like overkill to bring in
; such a large dependency for this one feature, especially when lice-comb
; typically won't get close to opening this many files.
(defn file-handle-bounded-pmap
"bounded-pmap* hardcoded to no more than 8192 virtual threads. This size is
determined conservatively from macOS, since it's the least common denominator
of the major OSes in terms of number of possible open file handles."
[f coll]
(e/bounded-pmap* 8192 f coll))
10 changes: 5 additions & 5 deletions src/lice_comb/lein.clj
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@
(ns lice-comb.lein
"Functionality related to combing Leiningen dependency sequences for license
information."
(:require [embroidery.api :as e]
[lice-comb.deps :as lcd]
[lice-comb.impl.expressions-info :as lciei]))
(:require [lice-comb.deps :as lcd]
[lice-comb.impl.expressions-info :as lciei]
[lice-comb.impl.utils :as lciu]))

(defn- lein-dep->toolsdeps-dep
"Converts a leiningen style dependency vector into a (partial) tools.deps style
Expand Down Expand Up @@ -54,14 +54,14 @@
expressions were found)."
[deps]
(when deps
(into {} (e/pmap* #(vec [% (dep->expressions-info %)]) deps))))
(into {} (lciu/file-handle-bounded-pmap #(vec [% (dep->expressions-info %)]) deps))))

(defn deps->expressions
"Returns a map of sets of SPDX expressions (`String`s) for each Leiningen
style dep in `deps`. See [[deps->expressions-info]] for details."
[deps]
(when deps
(into {} (e/pmap* #(vec [% (dep->expressions %)]) deps))))
(into {} (lciu/file-handle-bounded-pmap #(vec [% (dep->expressions %)]) deps))))

(defn init!
"Initialises this namespace upon first call (and does nothing on subsequent
Expand Down
2 changes: 2 additions & 0 deletions src/lice_comb/matching.clj
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@
`:type` = `:concluded`):
Indicates the approximate confidence lice-comb has in its conclusions for
this particular SPDX identifier.
* `:confidence-explanations` (a set of keywords, optional):
Describes why the associated `:confidence` was not `:high`.
* `:strategy` (a keyword, mandatory):
The strategy lice-comb used to determine this particular SPDX identifier.
See [[lice-comb.utils/strategy->string]] for an up-to-date list of all
Expand Down
Loading