Fix Cuyahoga elections spider #65

SimmonsRitchie · 2024-01-09T22:57:43Z

What's this PR do?

Fixes our Cuyahoga County Board of Elections spider (aka. cuya_elections), which broke due to page structure and URL changes.

Why are we doing this?

We want working scrapers, of course 🤖 The changes in this PR include changes to URLs and certain parsing methods.

Steps to manually test

After installing the project using pipenv (see Readme):

Activate the virtual environment:

pipenv shell

Run the spider:

scrapy crawl cuya_elections -O test_output.csv

Monitor the stdout and ensure that the crawl proceeds without raising any errors. Pay attention to the final status report from scrapy.
Inspect test_output.csv to ensure the data looks valid. I suggest opening a few of the URLs under the source column of test_output.csv and comparing the data for that row with what you see.

Are there any smells or added technical debt to note?

This scraper is now using a hardcoded link for the "links" field for each meeting. This agency has a special page where they locate all the attachments for every meeting. If we want to get fancy, we could scrape this page and combine the data with the final data. At present, given the number of broken scrapers that we need to fix, I think it's better to take this approach for now and move on. We can come back to this when time permits.

This reverts commit 738dc61.

This reverts commit ae583c2, reversing changes made to 6e72039.

SimmonsRitchie added 7 commits January 8, 2024 15:50

Fix meeting detail page URLs

45e7974

Fix title parsing

8ca1925

Adjust URL to filter out non-meetings

92ad0bd

Hardcode links

1793cb5

Formatting and linting fixes

6e72039

Replace pipenv github action

d9e6915

Replace pipenv github action

e9ccf14

SimmonsRitchie changed the title ~~Fix cuya elections~~ Fix Cuyahoga elections spider Jan 9, 2024

SimmonsRitchie added 14 commits January 9, 2024 16:59

Merge branch 'fix-deps3' into fix-cuya-elections

ae583c2

Black formatting

91cb916

Use experimental scrapy-sentry branch

738dc61

Revert "Use experimental scrapy-sentry branch"

5203b25

This reverts commit 738dc61.

Use experimental scrapy-sentry branch + lock

2efac1c

Revert "Merge branch 'fix-deps3' into fix-cuya-elections"

ff089fc

This reverts commit ae583c2, reversing changes made to 6e72039.

Merge branch 'test-scrapy-sentry' into fix-cuya-elections

87ab67a

Strip whitespace from desc

ac38dc4

Update tests to reflect spider updates

617ff3b

Formatting

7032aa8

Replace scrapy-sentry package

fa6907b

Modify ext name

f267b1e

upgrade to scrapy-sentry-errors v1

6fe2faf

Merge branch 'sentry-upgrade' into fix-cuya-elections

874a33f

SimmonsRitchie closed this Jan 22, 2024

SimmonsRitchie deleted the fix-cuya-elections branch January 22, 2024 18:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Cuyahoga elections spider #65

Fix Cuyahoga elections spider #65

SimmonsRitchie commented Jan 9, 2024 •

edited

Loading

Fix Cuyahoga elections spider #65

Fix Cuyahoga elections spider #65

Conversation

SimmonsRitchie commented Jan 9, 2024 • edited Loading

What's this PR do?

Why are we doing this?

Steps to manually test

Are there any smells or added technical debt to note?

SimmonsRitchie commented Jan 9, 2024 •

edited

Loading