Skip to content

Commit

Permalink
Fix scripts and update README
Browse files Browse the repository at this point in the history
  • Loading branch information
navarone-feekery committed May 29, 2024
1 parent c06c6a4 commit 096659c
Show file tree
Hide file tree
Showing 11 changed files with 38 additions and 278 deletions.
3 changes: 0 additions & 3 deletions .buildkite/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,4 @@ RUN apt-get update && \
apt-get install -y --no-install-recommends \
libicu-dev netbase make

# used for skipping jenv/rbenv setup
ENV IS_DOCKER=1

ENTRYPOINT [ "/bin/bash" ]
2 changes: 1 addition & 1 deletion Jarfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# our java dependencies into vendor/jars (see https://github.com/mkristian/jar-dependencies for details)
#
# If you update this file, please run the following command to update the jars cache:
# rm -rf Jars.lock vendor/jars && script/development exec script/vendor_jars
# make clean install
#
# When adding a new dependency, please explain what it is and why we're adding it in a comment.
#---------------------------------------------------------------------------------------------------
Expand Down
59 changes: 23 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,42 +32,29 @@ See [Crawling content](#crawling-content) for examples.

#### Running from source

Crawler uses `jenv` and `rbenv` to manage both java and ruby versions when running from source.

1. Install `jenv` and `rbenv`
- [Official documentation for installing jenv](https://www.jenv.be/)
- [Official documentation for installing rbenv](https://github.com/rbenv/rbenv?tab=readme-ov-file#installation)
2. Install the required java version (check the file `.java-version`)
- Crawler was developed using OpenJDK, so we recommend using an OpenJDK version of Java
- [Instructions for installing OpenJDK](https://openjdk.org/install/)
- Mac users can also use `brew` to install
```bash
# install with brew
$ brew install openjdk@21

# create symlink
$ sudo ln -sfn \
/opt/homebrew/opt/openjdk@21/libexec/openjdk.jdk \
/Library/Java/JavaVirtualMachines/openjdk-21.jdk
```
3. Add Java version to `jenv`
```bash
# add to jenv and update JAVA_HOME
$ jenv add /Library/Java/JavaVirtualMachines/openjdk-21.jdk/Contents/Home
$ export JAVA_HOME=$(/usr/libexec/java_home -v21)
# check java version has been correctly set by `.java-version` file
$ java --version
```
4. Install the required jruby version
```bash
# rbenv is easier to use and can install a version based on `.ruby-version` file
$ rbenv install
# check ruby version
$ ruby --version
```
5. Run `make install` to install Crawler dependencies
Crawler uses both JRuby and Java.
We recommend using version managers for both.
When developing Crawler we use `rbenv` and `jenv`.
There are instructions for setting up these env managers here:

- [Official documentation for installing jenv](https://www.jenv.be/)
- [Official documentation for installing rbenv](https://github.com/rbenv/rbenv?tab=readme-ov-file#installation)

Go to the root of the Crawler directory and check the expected Java and Ruby versions are being used:

```bash
# should output the same version as `.ruby-version`
$ ruby --version

# should output the same version as `.java-version`
$ java --version
```

If the versions seem correct, you can install dependencies:

```bash
$ make install
```

Crawler should now be functional.
See [Configuring Crawlers](#configuring-crawlers) to begin crawling web content.
Expand Down
2 changes: 1 addition & 1 deletion lib/crawler/api/config.rb
Original file line number Diff line number Diff line change
Expand Up @@ -344,7 +344,7 @@ def document_mapper
# Receives a crawler event object and outputs it into relevant systems
def output_event(event)
# Log the event
event_logger << "#{event.to_json}\n"
# event_logger << "#{event.to_json}\n"

# Count stats for the crawl
stats.update_from_event(event)
Expand Down
4 changes: 0 additions & 4 deletions lib/crawler/url_validator.rb
Original file line number Diff line number Diff line change
Expand Up @@ -67,10 +67,6 @@ class InvalidCrawlConfigError < Error; end
attr_reader :raw_url, :checks, :results, :url_crawl_result

def initialize(url:, crawl_config:, checks: nil)
if configuration && configuration.crawler_domains.empty?
raise InvalidCrawlConfigError, 'Please configure at least one domain in the crawl config file.'
end

@crawl_config = crawl_config
# Default to running all checks for the given context
checks ||= valid_checks
Expand Down
2 changes: 1 addition & 1 deletion lib/crawler/url_validator/url_request_check_concern.rb
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ def redirect_validation_result(details) # rubocop:disable Metrics/AbcSize
end

# If we're running in a domain context, this is an inter-domain redirect that we cannot follow
unless configuration
unless @crawl_config
return validation_fail(:url_request, <<~MESSAGE, details)
The web server at #{url} redirected us to a different domain URL (#{location}).
If you want to crawl this site, please use #{location.domain_name} as the domain name.
Expand Down
2 changes: 1 addition & 1 deletion script/bundle
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ function bundle_command() {
set +x
echo
red_echo "ERROR: Bundle command failed!"
yellow_echo "Try to run ./script/setup-rubies and then retry this command"
yellow_echo "Try to run 'make install' and then retry this command"
echo
exit 42
fi
Expand Down
5 changes: 0 additions & 5 deletions script/development

This file was deleted.

16 changes: 0 additions & 16 deletions script/environment
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,4 @@ source "$(dirname $0)/functions.sh"
set -e

load_version_constraints

echo "JRuby version required: ${JRUBY_VERSION}"
echo "Java version required: ${JAVA_VERSION}"

if [ -n "${IS_DOCKER}" ] && [ "${IS_DOCKER}" = "1" ]; then
echo "Skipping jenv and rbenv setup because it isn't required for Docker setups."
else
jenv_init
ensure_java_installed "$JAVA_VERSION"
jenv shell "$JAVA_VERSION"

rbenv_init
ensure_jruby_installed "$JRUBY_VERSION"
rbenv shell "$JRUBY_VERSION"
fi

check_bundle
139 changes: 11 additions & 128 deletions script/functions.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ else
RESET=$(tput sgr0)
fi

#---------------------------------------------------------------------------------------------------
function yellow_echo() {
echo "${YELLOW}${*}${RESET}"
}
Expand All @@ -41,7 +40,6 @@ function green_echo_date() {
green_echo "[$(date +"%Y-%m-%d %H:%M:%S")] ${*}"
}

#---------------------------------------------------------------------------------------------------
function load_version_constraints() {
if [[ -z "${PROJECT_ROOT}" ]]; then
PROJECT_ROOT="$(dirname "${BASH_SOURCE[0]}")/.."
Expand All @@ -50,99 +48,21 @@ function load_version_constraints() {
export BUNDLER_VERSION="$(cat "$PROJECT_ROOT/.bundler-version")"
export BUNDLER_CONSTRAINT="~> $BUNDLER_VERSION"

export JRUBY_VERSION="$(cat "$PROJECT_ROOT/.ruby-version")"
export RUBY_VERSION="$(cat "$PROJECT_ROOT/.ruby-version")"
export JAVA_VERSION="$(cat "$PROJECT_ROOT/.java-version")"
}

#---------------------------------------------------------------------------------------------------
function rbenv_init() {
yellow_echo "Checking if rbenv is installed..."
if ! command -v rbenv; then
echo "ERROR: rbenv is not installed! Please install it by running 'brew install rbenv' (or use your OS-specific install methods)."
exit 2
fi
green_echo "rbenv: OK"
echo

yellow_echo "Enabling rbenv support..."
eval "$(rbenv init -)"
green_echo "Done!"
echo
}

#---------------------------------------------------------------------------------------------------
function jenv_init() {
yellow_echo "Checking if jenv is installed..."
if ! command -v jenv; then
echo "ERROR: jenv is not installed! Please install it by running 'brew install jenv' (or use your OS-specific install methods)."
exit 2
fi
green_echo "jenv: OK"
echo

yellow_echo "Enabling jenv support..."
eval "$(jenv init -)"
green_echo "Done!"
echo
}

#---------------------------------------------------------------------------------------------------
function nvm_init() {
yellow_echo "Checking if nvm is installed..."
if [ ! -s "$NVM_DIR/nvm.sh" ]; then
echo "WARNING: nvm is not installed! Skipping..."
return 0
fi
green_echo "nvm: OK"
echo
RUNNING_RUBY_VERSION=$(ruby --version)
RUNNING_JAVA_VERSION=$(java --version)

yellow_echo "Enabling nvm support..."
set +e
source "$NVM_DIR/nvm.sh"
nvm use
rt=$?
if [ $rt -eq 3 ]; then
try_then_error "Node version from .nvmrc is not installed" "nvm install"
elif [ $rt -ne 0 ]; then
red_echo "ERROR: something went wrong with 'nvm use'"
exit $rt
fi
set -e
green_echo "Done!"
echo
echo "----"
echo "Required Ruby version: ${RUBY_VERSION}"
echo "Running Ruby version: ${RUNNING_RUBY_VERSION}"
echo "----"
echo "Required Java version: ${JAVA_VERSION}"
echo "Running Java version: ${RUNNING_JAVA_VERSION}"
echo "----"
}

#---------------------------------------------------------------------------------------------------
function ensure_java_installed() {
JAVA_VERSION="$1"
yellow_echo "Checking if JAVA $JAVA_VERSION is installed..."
set +e
if ! jenv prefix; then
red_echo "ERROR: Java version $JAVA_VERSION is not installed! Please install it from homebrew or use your OS-specific install methods."
echo
yellow_echo "If you are on a mac, you may need to add homebrew-installed java to jenv: "
echo
echo " jenv add /Library/Java/JavaVirtualMachines/temurin-$JAVA_VERSION.jdk/Contents/Home && jenv rehash"
echo
exit 2
fi
set -e
green_echo "Done!"
echo
}

#---------------------------------------------------------------------------------------------------
function ensure_jruby_installed() {
JRUBY_VERSION="$1"
yellow_echo "Checking if JRuby $JRUBY_VERSION is installed..."
if [ -z "$(rbenv versions --bare | grep "^$JRUBY_VERSION")" ]; then
try_then_error "JRuby version $JRUBY_VERSION is not installed" "script/setup-rubies"
fi
green_echo "Done!"
echo
}

#---------------------------------------------------------------------------------------------------
function check_bundle() {
yellow_echo "Checking for missing gems..."
if ! bundle check > /dev/null; then
Expand All @@ -152,32 +72,6 @@ function check_bundle() {
echo
}

#---------------------------------------------------------------------------------------------------
function check_yarn() {
yellow_echo "Checking for missing NPM packages..."
if ! which yarn; then
red_echo "ERROR: yarn is not installed! Please install it by running 'brew install yarn' (or use your OS-specific install methods described here: https://legacy.yarnpkg.com/en/docs/install)."
exit 2
fi
if ! yarn check --integrity > /dev/null; then
try_then_error "NPM packages are missing" "yarn install"
fi
green_echo "Done!"
echo
}

#---------------------------------------------------------------------------------------------------
function check_git() {
yellow_echo "Checking your git tags..."
if [ -n "$(git tag | grep -E '^(dal05|v0\.2|show)')" ]; then
yellow_echo "Pruning bad tags"
git fetch --prune-tags --prune origin
yellow_echo "Bad tags pruned"
fi
green_echo "Done!"
}

#---------------------------------------------------------------------------------------------------
function try_then_error() {
ISSUE="$1"
COMMAND="$2"
Expand All @@ -190,7 +84,6 @@ function try_then_error() {
fi
}

#---------------------------------------------------------------------------------------------------
function __install_macosx_dev_deps() {
local root_dir

Expand All @@ -202,7 +95,7 @@ function __install_macosx_dev_deps() {
exit 1
fi

# Doesn't matter where the calling script is called from, this is where we at yo
# Doesn't matter where the calling script is called from
root_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )/../"

# Only run homebrew if dependencies have changed since last time
Expand All @@ -218,16 +111,6 @@ function __install_macosx_dev_deps() {
echo "this is fine as the temurin11 recipe needs to install files outside of /usr/local...${RESET}"
echo

if [[ "$(uname -s)" == 'Darwin' ]] && [[ "$(arch)" == *'arm64'* ]]
then
echo "${RED}⚠️ Warning ⚠️"
echo "It looks like you're running on an Apple M1 (nice), so you'll need to install Mailhog manually:${RESET}"
echo "${BLUE}$ go get github.com/mailhog/MailHog${RESET}"
echo
echo "${RED}And run it like so:${RESET}"
echo "${BLUE}$ ~/go/bin/MailHog${RESET}"
fi

sleep 2

( cd "${root_dir}" && brew bundle --verbose )
Expand Down
Loading

0 comments on commit 096659c

Please sign in to comment.