Page MenuHomePhabricator

Use API module 'parse' for retrieving interwiki links
Closed, ResolvedPublic

Description

Originally from: http://sourceforge.net/p/pywikipediabot/feature-requests/151/
Reported by: melancholie
Created on: 2008-06-13 14:47:11
Subject: Use API module 'parse' for retrieving interwiki links
Original description:
Currently pages are retrieved in a batch by using Special:Export.
Although being fast \(as only one request is done\), there is a huge data overhead with this method\!

Why not use the API with its 'parse' module? Only interwiki links can be fetched with that, reducing traffic \(overhead\) a lot\!

See:
http://de.wikipedia.org/w/api.php?action=parse&format=xml&page=Test&prop=langlinks

Outputs could be downloaded in parallel to virtualize a batch \(faster\).

\----
At least make this method optional \(config.py\) for being able of reducing data traffic, if wanted. API is just more efficient.


Version: unspecified
Severity: enhancement
See Also:
https://sourceforge.net/p/pywikipediabot/feature-requests/151

Details

Reference
bz55100

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:19 AM
bzimport set Reference to bz55100.
bzimport added a subscriber: Unknown Object (????).

Logged In: YES
user\_id=2089773
Originator: YES

Note: Maybe combine it with 'generator'.

  • summary: Use API module parse for retrieving interwiki links --> Use API module 'parse' for retrieving interwiki links

Logged In: YES
user\_id=2089773
Originator: YES

Important note for getting pages' interwikis in a batch:
http://de.wikipedia.org/w/api.php?action=parse&text=\{\{:Test\}\}\{\{:Bot\}\}\{\{:Haus\}\}&prop=langlinks

Either the bot could figure out what interwikis belong together then, or

maybe a marker could placed in between:
http://de.wikipedia.org/w/api.php?action=parse&text=\{\{:Test\}\}\{\{MediaWiki:Iwmarker\}\}\{\{:Bot\}\}\{\{MediaWiki:Iwmarker\}\}\{\{:Haus\}\}&prop=langlinks

\[\[MediaWiki:Iwmarker\]\] \(or 'Llmarker'?\) would have to be set up by the MediaWiki developers with \[\[en:/de:Abuse-save-mark\]\] as content \(but this is potentially misusable\).

Logged In: YES
user\_id=2089773
Originator: YES

For not being misusable of confusing bots, the yet to be set up MediaWiki message could contain \[\[foreigncode:\{\{CURRENTTIMESTAMP\}\}\]\] \(cache issue?\)

\(sorry for spamming with this request ;-\)

Logged In: YES
user\_id=1806226
Originator: NO

Backwards compatibility with non Wikimedia wikis?

Logged In: YES
user\_id=2089773
Originator: YES

Backwards compatibility?

That's no reason for not making software more efficient, where possible ;-\)
That's also why I wrote something about "optional", too.
Because for current MediaWiki wikis there is a much more efficient way of retrieving \(only\) certain contents \(langlinks, categories\), there should be a method of using that advantage\! Will reduce load \(bot owner's and server's\)...

We are working on a rewrite. The rewrite uses the api as much as possible.

parse mode is deactivated due to overloading the squids. Nothing to do now.