Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🕷️ Fix spider: Cuyahoga County Archives Advisory Commission #70

Merged
merged 16 commits into from
Jan 24, 2024

Conversation

SimmonsRitchie
Copy link
Contributor

@SimmonsRitchie SimmonsRitchie commented Jan 23, 2024

What's this PR do?

Fixes our Cuyahoga County Archives Advisory Commission spider (aka. cuya_archives_advisory), which broke due to URL and page structure changes across the Cuyahoga County website.

[This PR builds on #69, which should be reviewed first]

Why are we doing this?

We want working scrapers, of course 🤖 The changes in this PR include URL and parser changes.

Steps to manually test

After installing the project using pipenv (see Readme):

  1. Activate the virtual environment:
pipenv shell
  1. Run the spider:
scrapy crawl cuya_archives_advisory -O test_output.csv
  1. Monitor stdout and ensure that the crawl proceeds without raising any errors. Pay attention to the final status report from scrapy.

  2. Inspect test_output.csv to ensure the data looks valid. I suggest opening a few of the URLs under the source column of test_output.csv and comparing the data for the row with what you see on the page.

Are there any smells or added technical debt to note?

  • This is one of several coming spider fixes to respond to changes on the Cuyahoga County website.

@SimmonsRitchie SimmonsRitchie marked this pull request as ready for review January 23, 2024 13:34
@SimmonsRitchie SimmonsRitchie merged commit c194277 into main Jan 24, 2024
2 checks passed
@SimmonsRitchie SimmonsRitchie deleted the fix-cuya-county branch January 24, 2024 01:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant