Skip to content

Commit

Permalink
docs
Browse files Browse the repository at this point in the history
  • Loading branch information
austin1237 committed Feb 29, 2024
1 parent 4d04af2 commit 4c9a7df
Show file tree
Hide file tree
Showing 4 changed files with 60 additions and 0 deletions.
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,19 @@
# job-scraper
A timed event that once a day scraps relevant jobs links and sends them to discord.
![job-scraper (1)](https://github.com/austin1237/job-scraper/assets/1394341/39688936-66f2-4819-93bf-fcafb83930c4)

## Deployment
Deployment currently uses [Terraform](https://www.terraform.io/) to set up AWS services.
### Prerequisites
This repo needs a private [Amazon ECR repo](https://us-east-1.console.aws.amazon.com/ecr/repositories?region=us-east-1) to be created in the same region that our container based lambda is deployed to (in our case us-east-1). Name the private repo to headless.

### Setting up remote state
Terraform has a feature called [remote state](https://www.terraform.io/docs/state/remote.html) which ensures the state of your infrastructure to be in sync for mutiple team members as well as any CI system.

This project **requires** this feature to be configured. To configure **USE THE FOLLOWING COMMAND ONCE PER TEAM**.

```bash
cd terraform/remote-state
terraform init
terraform apply
```
31 changes: 31 additions & 0 deletions headless/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# lol-counter-source-api
A lambda that invokes a headless browser to render a page (including it's javascript) and passes along the rendered html.

## Why is this lambda using a container deployment rather than the standard zip deployment?
[Pupeteer](https://pptr.dev/) requires a chrome/chromium binary which execeeded the standard [lambda size limit](https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html#function-configuration-deployment-and-execution). Using a container image greatly increases the limit and allows for the binary to be deployed. Currently this service also uses [@sparticuz/chromium](https://github.com/Sparticuz/chromium) due to the standard pupeeteer chromium install having permissions issues when running in the deployed aws env.

## Prerequisites
You must have the following installed/configured on your system for this to work correctly<br />
1. [Docker](https://www.docker.com/)
2. [Docker-Compose](https://docs.docker.com/compose/)

## Development Environment
The development environment uses a pinned version of [aws's node 18 image](https://gallery.ecr.aws/lambda/nodejs) to mimic the running lambda.

```bash
docker-compose up
```

The output is similar to what you would see in cloudwatch logs ex.

```bash
headless-lambda-1 | 18 Aug 2023 09:47:04,515 [INFO] (rapid) exec '/var/runtime/bootstrap' (cwd=/var/task, handler=)
```

The endpoint of the local container is localhost:3000/2015-03-31/functions/function/invocations send a POST request with the following body
```json
{
"queryStringParameters": {
"url": "https://www.google.com"
}}
```
7 changes: 7 additions & 0 deletions proxy/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Scraper
This is go lamda that recieves a url as a query string and passes along that website html. This lambda does not render any javascript, for that functionality look folder called headless.

## Prerequisites
You must have the following installed/configured on your system for this to work correctly<br />
1. [Go](https://go.dev/doc/install)

6 changes: 6 additions & 0 deletions scraper/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Scraper
This is a go lambda that goes through the proxy api to receive website html. Once received it parses the html and does a keyword check on the job description. If any keyword exists in the description then the job link and company are sent to discord for manual review.

## Prerequisites
You must have the following installed/configured on your system for this to work correctly<br />
1. [Go](https://go.dev/doc/install)

0 comments on commit 4c9a7df

Please sign in to comment.