Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using pageimages instead of using custom code to find imageless article #10

Open
PierreSelim opened this issue Jul 15, 2015 · 5 comments
Labels

Comments

@PierreSelim
Copy link
Member Author

So with mwclient

site.api('query', prop='pageimages', titles='Nicolas Sarkozy')

@PierreSelim
Copy link
Member Author

Not totally convinced by this approach due to performance. It implies doing more query than needed to the API, apparently.

@JeanFred
Copy link
Member

JeanFred commented Aug 3, 2015

Can’t you do both queries at the same time?

@PierreSelim
Copy link
Member Author

Thing is I believe (I might be wrong), that mwclient preloads the article text when accessing to Page object, sending a new request would be an overhead :)

@PierreSelim
Copy link
Member Author

A pretty stupid implementation could go as this

def isthereanimage(article):
    """Returns whether there is an image in the article or not."""
    LOG.info("Analyzing: %s", article.name.encode('utf-8'))
    result = site.api('query', prop='pageimages', titles=article.name.encode('utf-8'))
    ...
    # test result length :-)

Something a bit more clever would be to try to do all the queries in one with something like

 result = site.api('query', prop='pageimages', titles='|'.join(articlenames))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants