Page MenuHomePhabricator

Section headers with templates are not correctly recognised
Open, MediumPublic

Description

Originally from: http://sourceforge.net/p/pywikipediabot/bugs/914/
Reported by: Anonymous user
Created on: 2009-04-20 16:13:14
Subject: Section headers with templates are not correctly recognised
Original description:
Sometimes bot removes valid interwiki which lads to anchor in other article.
See https://cs.wikipedia.org/w/index.php?title=Platnost&action=history


Version: unspecified
Severity: normal
See Also:
https://sourceforge.net/p/pywikipediabot/bugs/914

Details

Reference
bz55307

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:33 AM
bzimport set Reference to bz55307.
bzimport added a subscriber: Unknown Object (????).

valhallasw@dorthonion:~/src/pywikipedia/trunk$ python interwiki.py cs:Platnost\_%28právo%29
Getting 1 page from wikipedia:cs...
\[\[cs:Platnost \(právo\)\]\]: \[\[cs:Platnost \(právo\)\]\] gives new interwiki \[\[de:Gültigkeit\#Gültigkeit im Recht\]\]
Getting 1 page from wikipedia:de...

NOTE: \[\[de:Gültigkeit\#Gültigkeit im Recht\]\] does not exist. Skipping. ======Post-processing \[\[cs:Platnost \(právo\)\]\]====== Updating links on page \[\[cs:Platnost \(právo\)\]\]. Changes to be made: Robot: Removing \[\[de:Gültigkeit\#Gültigkeit im Recht\]\] \- \[\[de:Gültigkeit\#Gültigkeit im Recht\]\]

ERROR: Found incorrect link to de in \[\[cs:Platnost \(právo\)\]\]
Submit? \(\[y\]es, \[n\]o, open in \[b\]rowser, \[g\]ive up, \[a\]lways\)

tries to refer to

\{\{Anker|Rechtsg\xfcltig\}\}G\xfcltigkeit im Recht ===\

using \#G.C3.BCltigkeit\_im\_Recht instead does not help

removing \{\{Anker|...\}\} does work, if the \#Gültigkeit im Recht version is used...

  • labels: --> interwiki
  • milestone: --> confirmed
  • priority: 5 --> 6
  • summary: Removing of interwiki to anchor --> Section headers with templates are not correctly recognised

Based on the wikitext, it's hard to determine whether the section title is there \(evil regexp\). We could combine it with a fallback to the API with action=parse, i.e.
http://de.wikipedia.org/w/api.php?action=parse&text=\{\{:G%C3%BCltigkeit\}\}

the rewrite doesn't raise SectionErrors altogether.

options:
\- stripping the check
\- try to get the regexp working
\- keep a simple regexp with an API fallback

questions
\- how to implement this in the rewrite?

Long story short: it's impossible to do without another API query. We can use

http://en.wikipedia.org/w/api.php?action=parse&prop=sections&page=Help:Editing

to do this. We cannot do it using regexps because of template expansions.

Xqt triaged this task as Medium priority.Jan 27 2019, 6:30 AM