Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Cuyahoga elections spider #65

Closed
wants to merge 21 commits into from
Closed

Conversation

SimmonsRitchie
Copy link
Contributor

@SimmonsRitchie SimmonsRitchie commented Jan 9, 2024

What's this PR do?

Fixes our Cuyahoga County Board of Elections spider (aka. cuya_elections), which broke due to page structure and URL changes.

Why are we doing this?

We want working scrapers, of course 🤖 The changes in this PR include changes to URLs and certain parsing methods.

Steps to manually test

After installing the project using pipenv (see Readme):

  1. Activate the virtual environment:
pipenv shell
  1. Run the spider:
scrapy crawl cuya_elections -O test_output.csv
  1. Monitor the stdout and ensure that the crawl proceeds without raising any errors. Pay attention to the final status report from scrapy.

  2. Inspect test_output.csv to ensure the data looks valid. I suggest opening a few of the URLs under the source column of test_output.csv and comparing the data for that row with what you see.

Are there any smells or added technical debt to note?

  • This scraper is now using a hardcoded link for the "links" field for each meeting. This agency has a special page where they locate all the attachments for every meeting. If we want to get fancy, we could scrape this page and combine the data with the final data. At present, given the number of broken scrapers that we need to fix, I think it's better to take this approach for now and move on. We can come back to this when time permits.

@SimmonsRitchie SimmonsRitchie changed the title Fix cuya elections Fix Cuyahoga elections spider Jan 9, 2024
@SimmonsRitchie SimmonsRitchie deleted the fix-cuya-elections branch January 22, 2024 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant