-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
4d04af2
commit 4c9a7df
Showing
4 changed files
with
60 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,19 @@ | ||
# job-scraper | ||
A timed event that once a day scraps relevant jobs links and sends them to discord. | ||
 | ||
|
||
## Deployment | ||
Deployment currently uses [Terraform](https://www.terraform.io/) to set up AWS services. | ||
### Prerequisites | ||
This repo needs a private [Amazon ECR repo](https://us-east-1.console.aws.amazon.com/ecr/repositories?region=us-east-1) to be created in the same region that our container based lambda is deployed to (in our case us-east-1). Name the private repo to headless. | ||
|
||
### Setting up remote state | ||
Terraform has a feature called [remote state](https://www.terraform.io/docs/state/remote.html) which ensures the state of your infrastructure to be in sync for mutiple team members as well as any CI system. | ||
|
||
This project **requires** this feature to be configured. To configure **USE THE FOLLOWING COMMAND ONCE PER TEAM**. | ||
|
||
```bash | ||
cd terraform/remote-state | ||
terraform init | ||
terraform apply | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# lol-counter-source-api | ||
A lambda that invokes a headless browser to render a page (including it's javascript) and passes along the rendered html. | ||
|
||
## Why is this lambda using a container deployment rather than the standard zip deployment? | ||
[Pupeteer](https://pptr.dev/) requires a chrome/chromium binary which execeeded the standard [lambda size limit](https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html#function-configuration-deployment-and-execution). Using a container image greatly increases the limit and allows for the binary to be deployed. Currently this service also uses [@sparticuz/chromium](https://github.com/Sparticuz/chromium) due to the standard pupeeteer chromium install having permissions issues when running in the deployed aws env. | ||
|
||
## Prerequisites | ||
You must have the following installed/configured on your system for this to work correctly<br /> | ||
1. [Docker](https://www.docker.com/) | ||
2. [Docker-Compose](https://docs.docker.com/compose/) | ||
|
||
## Development Environment | ||
The development environment uses a pinned version of [aws's node 18 image](https://gallery.ecr.aws/lambda/nodejs) to mimic the running lambda. | ||
|
||
```bash | ||
docker-compose up | ||
``` | ||
|
||
The output is similar to what you would see in cloudwatch logs ex. | ||
|
||
```bash | ||
headless-lambda-1 | 18 Aug 2023 09:47:04,515 [INFO] (rapid) exec '/var/runtime/bootstrap' (cwd=/var/task, handler=) | ||
``` | ||
|
||
The endpoint of the local container is localhost:3000/2015-03-31/functions/function/invocations send a POST request with the following body | ||
```json | ||
{ | ||
"queryStringParameters": { | ||
"url": "https://www.google.com" | ||
}} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Scraper | ||
This is go lamda that recieves a url as a query string and passes along that website html. This lambda does not render any javascript, for that functionality look folder called headless. | ||
|
||
## Prerequisites | ||
You must have the following installed/configured on your system for this to work correctly<br /> | ||
1. [Go](https://go.dev/doc/install) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# Scraper | ||
This is a go lambda that goes through the proxy api to receive website html. Once received it parses the html and does a keyword check on the job description. If any keyword exists in the description then the job link and company are sent to discord for manual review. | ||
|
||
## Prerequisites | ||
You must have the following installed/configured on your system for this to work correctly<br /> | ||
1. [Go](https://go.dev/doc/install) |