Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add noticefile generator and update NOTICE.txt #20

Merged
merged 6 commits into from
May 21, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.phony: test lint autocorrect install install-ci install-gems install-jars clean build-docker-ci
.phony: test lint autocorrect install install-ci install-gems install-jars clean notice build-docker-ci

test:
script/rspec $(file)
Expand Down Expand Up @@ -27,5 +27,8 @@ install-jars:
clean:
rm -rf Jars.lock vendor/jars

notice:
script/licenses/generate_notice.rb

build-docker-ci:
docker build -t crawler-ci .buildkite
4,701 changes: 4,697 additions & 4 deletions NOTICE.txt

Large diffs are not rendered by default.

21 changes: 21 additions & 0 deletions script/licenses/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# 3rd Party :tada: dependencies

This directory contains scripts and files for generating a `NOTICE.txt` file containing all licenses for the third-party dependencies that Crawler uses.
It will look at the SPDX license for Ruby gems.
If this cannot be found, it will attempt to download the LICENSE file and add it to the project for future reference.
When a LICENSE file doesn't exist (or is in an unexpected location or format), a manual override must be added.

Downloaded license files are added to the directories `rubygems_licenses` or `misc_licneses`.

All license texts are then added to the repository's [NOTICE.txt](../../NOTICE.txt) file.

## Types of dependencies

- Ruby Gems from `Gemfile` and `Gemfile.lock`
- Misc. dependencies, like JRuby, Tika, etc. not managed by a package manager

## Generate NOTICE.txt

```bash
./script/licenses/generate_notice_txt.rb
```
61 changes: 61 additions & 0 deletions script/licenses/generate_notice.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
#!/usr/bin/env ruby

#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the Elastic License 2.0;
# you may not use this file except in compliance with the Elastic License 2.0.
#

# frozen_string_literal: true

NOTICE_TXT_PATH = File.expand_path('../../NOTICE.txt', __dir__)

require_relative 'lib/third_party'

def write_header_to_file(io)
io.puts 'Elastic Open Web Crawler'
io.puts 'Copyright 2024 Elasticsearch B.V.'
io.puts
io.puts 'The Elastic Open Web Crawler contains the following third-party dependencies:'
io.puts
end

def write_license_to_file(io, klass_instance, identifier, dependency)
io.puts '-' * 80
io.puts "Library: #{klass_instance.format_library_for_notice_txt(identifier, dependency)}"
io.puts "URL: #{dependency[:url]}" if dependency[:url]
io.puts "License: #{dependency[:license]}" if dependency[:license]
io.puts
File.open(dependency[:license_file_path], 'r') do |license_file|
io.puts(license_file.read)
io.puts
end
end

File.open(NOTICE_TXT_PATH, 'w') do |io|
write_header_to_file(io)

[
ThirdParty::RubygemsDependencies,
ThirdParty::MiscDependencies
].each do |klass|
klass_instance = klass.new
dependencies = klass_instance.get(with_license_files: true)
dependencies.keys.sort.each do |identifier|
dependency = dependencies.fetch(identifier)

unless dependency[:license_file_path]
ThirdParty::LOGGER.error("There is no license file for #{identifier}!")
exit(1)
end

unless File.exist?(dependency[:license_file_path])
err = "License file for #{identifier} does not exist locally (path: #{dependency[:license_file_path]})"
ThirdParty::LOGGER.error(err)
exit(2)
end

write_license_to_file(io, klass_instance, identifier, dependency)
end
end
end
95 changes: 95 additions & 0 deletions script/licenses/lib/third_party.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the Elastic License 2.0;
# you may not use this file except in compliance with the Elastic License 2.0.
#

# frozen_string_literal: true

require 'logger'

module ThirdParty
LOGGER = Logger.new($stdout, level: Logger::DEBUG)

LICENSE_FILE_NAME_OPTIONS = %w[
LICENSE
LICENSE.md
LICENSE.txt
License.txt
LICENCE
LICENSE-MIT
Licence.md
Licence.rdoc
MIT_LICENSE
MIT-LICENSE
MIT-LICENSE.txt
BSDL
COPYING
COPYING.txt
].freeze
UNKNOWN_LICENSE = 'UNKNOWN'

module SPDX
class << self
def normalize_license(license)
return license if SUPPORTED_IDENTIFIERS.include?(license) || license.match?(/\s+OR|AND|WITH\s+/)

ALIASES.fetch(license, nil)
end
end

SUPPORTED_IDENTIFIERS = %w[
0BSD
Apache-2.0
AFL-2.1
BSD-2-Clause
BSD-3-Clause
CC0-1.0
CC-BY-3.0
CC-BY-4.0
Elastic-2.0
EPL-1.0
ISC
GPL-2.0
LGPL-2.1
MIT
MPL-2.0
Ruby
Unlicense
].freeze

IDENTIFIER_TO_ALIASES = {
'AFL-2.1' => [
'AFLv2.1'
],
'BSD-2-Clause' => [
'BSD 2-Clause',
'BSD',
'BSD*',
'2-clause BSDL'
],
'Apache-2.0' => [
'Apache License Version 2.0',
'Apache License (2.0)'
],
'Ruby' => [
'ruby'
],
'Python-2.0' => [
'PSFL'
],
'MIT' => [
'MIT*'
]
}.freeze

ALIASES = IDENTIFIER_TO_ALIASES.each_with_object({}) do |(spdx_identifier, aliases), out|
aliases.each do |a|
out[a] = spdx_identifier
end
end
end
end

require_relative 'third_party/misc_dependencies'
require_relative 'third_party/rubygems_dependencies'
99 changes: 99 additions & 0 deletions script/licenses/lib/third_party/base.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the Elastic License 2.0;
# you may not use this file except in compliance with the Elastic License 2.0.
#

# frozen_string_literal: true

require 'httpclient'

module ThirdParty
class Base
class << self
def get(*args)
new.get(*args)
end
end

def type
raise 'implement in subclass'
end

def licenses_path
raise 'implement in subclass'
end

def license_fallbacks
raise 'implement in subclass'
end

def license_file_fallbacks
raise 'implement in subclass'
end

def get(*)
raise 'implement in subclass'
end

def spdx_license_for_dependency(identifier, licenses)
spdx_licenses = licenses.filter_map { |license| SPDX.normalize_license(license) }

logger.info("Ruby Gem #{identifier} using SPDX license.")
spdx_license = find_spdx_license(identifier, licenses.size, spdx_licenses)

unless spdx_license
logger.warn("#{type} #{identifier} has no SPDX license identifier. Original licenses: #{licenses.inspect}")
end

spdx_license
end

def license_file_path_for_dependency(identifier)
unless license_file_fallbacks.key?(identifier)
logger.error("#{type} #{identifier} has no license file.")
exit(2)
end

override = license_file_fallbacks.fetch(identifier)
add_license_to_path(identifier, override)
end

def format_library_for_notice_txt(_identifier, dependency)
"#{dependency[:name]} #{dependency[:version]}"
end

private

def find_spdx_license(identifier, total_licenses, spdx_licenses)
if spdx_licenses.any? && spdx_licenses.size == total_licenses
spdx_licenses.join(' OR ')
elsif license_fallbacks.key?(identifier)
license_fallbacks.fetch(identifier)
end
end

def add_license_to_path(identifier, override)
identifier_in_filename = identifier.gsub('/', '--')

if override[:manually_added]
logger.info("#{type} #{identifier} using manually added file.")
licenses_path.join("_manually_added_#{identifier_in_filename}-LICENSE.txt").to_s
elsif override[:url]
download_license_file(identifier, identifier_in_filename, override)
end
end

def download_license_file(identifier, identifier_in_filename, override)
licenses_path.join("_downloaded_#{identifier_in_filename}-LICENSE.txt").to_s.tap do |license_file_path|
logger.info("#{type} #{identifier} downloading license from #{override[:url]}")
content = HTTPClient.get_content(override[:url])
File.write(license_file_path, content)
end
end

def logger
LOGGER
end
end
end
60 changes: 60 additions & 0 deletions script/licenses/lib/third_party/misc_dependencies.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the Elastic License 2.0;
# you may not use this file except in compliance with the Elastic License 2.0.
#

# frozen_string_literal: true

require 'pathname'
require_relative 'base'

module ThirdParty
class MiscDependencies < Base
def type
'Misc. Dependency'
end

def licenses_path
LICENSES_PATH
end

def license_fallbacks
{}
end

def license_file_fallbacks
DEPENDENCIES.transform_values do |dependency|
dependency.fetch(:license_file_override)
end
end

def get(with_license_files: false)
DEPENDENCIES.each_with_object({}) do |(identifier, dependency), out|
out[identifier] = dependency.slice(:name, :version, :license, :url)

out[identifier][:license_file_path] = license_file_path_for_dependency(identifier) if with_license_files
end
end

LICENSES_PATH = Pathname.new(__dir__).join('..', '..', 'misc_licenses')
JRUBY_VERSION = File.read(File.expand_path('../../../../.ruby-version', __dir__)).strip.delete_prefix('jruby-')

DEPENDENCIES = {
'jruby' => {
name: 'jruby',
version: JRUBY_VERSION,
license: 'EPL-2.0 OR GPL-2.0 OR LGPL-2.1',
license_file_override: { manually_added: true },
url: 'https://www.jruby.org'
},
'tika' => {
name: 'tika',
version: '1.23',
license: 'Apache-2.0',
license_file_override: { manually_added: true },
url: 'https://github.com/apache/tika'
}
}.freeze
end
end
Loading