Page MenuHomePhabricator

DRTRIGON-25 F41 (copied from wiki, priority 4)
Closed, ResolvedPublic

Description

This issue was converted from https://jira.toolserver.org/browse/DRTRIGON-25.
Summary: F41 (copied from wiki, priority 4)
Issue type: Task - A task that needs to be done.
Priority: Major
Status: Closed
Assignee: drtrigon <dr.trigon@surfeu.ch>


From: drtrigon <dr.trigon@surfeu.ch>

Date: Fri, 03 Sep 2010 10:17:46

[1]
Wenn/falls gelösst kann der code schön vereinfacht werden. Ist anscheinend schon etwas in der Art in Arbeit, siehe [2] und [3] (ist schon [4]).

[1] https://bugzilla.wikimedia.org/show_bug.cgi?id=19417
[2] http://de.wikipedia.org/w/index.php?title=Benutzer_Diskussion:DrTrigon&oldid=61622369#Dein_Bot_k.C3.BCrzt_etwas_stark_ab
[3] http://translatewiki.net/w/api.php?action=parse&page=Support&prop=sections
[4] http://toolserver.org/~drtrigon/cgi-bin/g_api_ver.py


Version: unspecified
Severity: major

Details

Reference
bz59578

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:40 AM
bzimport set Reference to bz59578.

From: drtrigon <dr.trigon@surfeu.ch>

Date: Sat, 04 Sep 2010 00:14:00

http://de.wikipedia.org/w/api.php?action=parse&text=%7B%7BBenutzer:DrTrigon%7D%7D&prop=sections

as visible 'byteoffset' is now available BUT empty... ?!?


From: drtrigon <dr.trigon@surfeu.ch>

Date: Tue, 14 Sep 2010 16:53:25

Solved in r18: https://fisheye.toolserver.org/changelog/drtrigon/?cs=18

This solution is based on 'action=parse', its 'anchor' and 'byteoffset' values. The 'anchor' value is critical since it cannot be generated (except by emulating the wiki parser) local and HAS TO BE REQUESTED. In cases this is not possible (e.g. because the request fails) a page cannot be separated into sections and thus has to be processed IN ONE PIECE.

In case the 'byteoffset' is missing or wrong (look e.g. [1]) it will be generated with help of 'line' and difflib [2]. In rare cases when this does not work or has problems action=query&prop=revisions with 'rvsection' can be used for support (this is not implemented yet). The drawback is this will be VERY SLOW and HIGH TRAFFIC.

It should be obvious that the content of the page (and thus the get) is always critical.

If any unsolvable problems occur, the page cannot be separated into sections and thus has to be processed IN ONE PIECE. This cases should be very rare anyway.

Side note: the issues with <references> tags do not exist anymore.

[1] http://de.wikipedia.org/w/api.php?action=parse&page=Wikipedia:Testseite&prop=sections
[2] http://docs.python.org/library/difflib.html


From: drtrigon <dr.trigon@surfeu.ch>

Date: Fri, 17 Sep 2010 00:23:08

There is also the 'noreferences.py' and 'reflinks.py' scripts/bot to handle the last issue (if it appears again).


From: drtrigon <dr.trigon@surfeu.ch>

Date: Mon, 20 Sep 2010 18:41:19

(switch close <-> resolve management)

This bug was imported as RESOLVED. The original assignee has therefore not been
set, and the original reporters/responders have not been added as CC, to
prevent bugspam.

If you re-open this bug, please consider adding these people to the CC list:
Original assignee: dr.trigon@surfeu.ch
CC list: dr.trigon@surfeu.ch