Page MenuHomePhabricator

pagegenerators: follow redirects, intersection, exclusion
Closed, DeclinedPublic

Description

Originally from: http://sourceforge.net/p/pywikipediabot/patches/625/
Reported by: andreasjs
Created on: 2013-08-24 21:57:56.794000
Subject: Pagegenerator: follow redirects, intersection, exclusion
Original description:
I added three new arguments:

-followredirects  
Used with other arguments that specify a set of pages. 
                  If a specified page is a redirect page, work on its 
                  target page.

-intersecting     
Argument to be used between two other arguments. 
                  Work only on pages normally specified by both the
                  previous and the next argument.

-excluding        
Argument to be used between two other arguments. 
                  Work only on pages normally specified by the
                  previous argument but not by the next argument.

For example, one could want to find the pages edited by a specific user that contain a certain keyword in a title.

A few other suggestions:
Exclude sections, even on files.
Compare pages via the Page.\_\_cmp\_\_ property to exclude duplicate pages instead of
u"%s:%s:%s" % (page._site.family.name, page._site.lang, page._title).
(more transparent and easier to maintain).


Version: unspecified
Severity: enhancement
See Also:
https://sourceforge.net/p/pywikipediabot/patches/625

Details

Reference
bz54537

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:09 AM
bzimport set Reference to bz54537.

Patch does not apply cleanly to either core or compat

excluding would be very helpful to workaround bugs, where one page is causing a problem, such as bug 69133.

Patch is by Andreas, if I am understanding the old sf.net history correctly.

Mpaa is creating 'intersecting' with https://gerrit.wikimedia.org/r/#/c/170832/

jayvdb set Security to None.
Xqt removed jayvdb as the assignee of this task.Sep 24 2019, 3:53 PM
  1. -intersect option is already implemented
  2. -exclusive is not very useful because all generators must be preloaded first and need too much memory
  3. FollowRedirectPageBot can be used instead of -followredirect option