Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MembersPage class #10

Merged
merged 3 commits into from
Apr 19, 2017
Merged

Add MembersPage class #10

merged 3 commits into from
Apr 19, 2017

Conversation

ondenman
Copy link
Contributor

@ondenman ondenman commented Mar 20, 2017

This PR adds the MembersPage class which represents a page of a paginated list of member URLs.

Part of: everypolitician/everypolitician-data#27746

@ondenman ondenman changed the title WIP: Add memberspage class WIP: Add MembersPage class Mar 20, 2017
decorator Scraped::Response::Decorator::CleanUrls

field :member_urls do
noko.css('div#RESULTADOS_DIPUTADOS div.listado_1 ul li a').map { |p| p[:href] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the comments on everypolitician-scrapers/spain_congreso_es#17, our normal approach to this is to have the a/@href in the selector. If there's a reason why that won't work here, you should note it, otherwise it's better to be consistent.

private

def next_page_link
noko.css('//div[@class = "paginacion"]//a[contains("Página Siguiente")]').first
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here.

end

it 'should contain the expected url' do
response.member_urls.first.must_equal 'http://www.congreso.es/portal/page/portal/Congreso/Congreso/Diputados/BusqForm?_piref73_1333155_73_1333154_1333154.next_page=/wc/fichaDiputado&idDiputado=87&idLegislatura=10'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little surprised to see the "_piref" here. I thought the plan was to remove this from URLs where possible? Has that plan changed, or is that something that will happen later?

Copy link
Contributor Author

@ondenman ondenman Mar 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with stripping the '_piref' part of the URL out is that when we request the next page without the the '_piref' in the url, we get a page with the members listed twice:
screen shot 2017-03-20 at 15 32 23

I think the decorator should be used on the PersonMembership pages though. My plan is to use these URLs to create the MembershipPages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we get a page with the members listed twice

Is that a problem? Calling .uniq on the list is fairly trivial…

chrismytton and others added 2 commits March 21, 2017 14:26
Following links with session information doesn't work correctly with
Ruby's open-uri, so rather than using Capybara this simply strips away
the session information.
Uses decorator to 'piref' info from MembersPage urls

URLs stripped of piref info point to pages that list the members
in duplicate. Calling .uniq ensures we get each URL only once.
@ondenman ondenman force-pushed the add-memberspage-class branch from 425f5f7 to 33ff039 Compare March 21, 2017 14:28
@ondenman ondenman force-pushed the add-memberspage-class branch from 33ff039 to b1d5690 Compare March 21, 2017 14:30
@ondenman ondenman changed the title WIP: Add MembersPage class Add MembersPage class Mar 21, 2017
This was referenced Mar 21, 2017
@ondenman ondenman requested a review from chrismytton April 19, 2017 09:54
@ondenman ondenman assigned chrismytton and unassigned tmtmtmtm Apr 19, 2017
Copy link
Contributor

@chrismytton chrismytton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✨ Looks good! 👍 ✨

@chrismytton chrismytton dismissed tmtmtmtm’s stale review April 19, 2017 10:17

Have re-reviewed the PR and confirmed that the changes requested have been made.

@chrismytton chrismytton merged commit 30bc5d4 into master Apr 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants