Page MenuHomePhabricator

weblinkchecker should ignore URLs inside some tags, part 2
Closed, DeclinedPublicBUG REPORT

Description

Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1164/
Reported by: djbarrett
Created on: 2010-04-12 18:33:11
Subject: weblinkchecker should ignore URLs inside some tags, part 2
Assigned to: xqt
Original description:
This is a followup to \[pywikipediabot-Bugs-1969051\] \"weblinkchecker should ignore URLs inside some tags\"

The fix in pyrev:8076 by xqt is appreciated, but not an appropriate solution. The particular tag I listed in the ticket, \"<sql>\", was just an example. The fix by xqt simply hard-coded this example \(bogus\) tag into the Pywikipedia source code:

svn diff -c8076 http://svn.wikimedia.org/svnroot/pywikipedia/trunk/pywikipedia

A better fix would be to recognize when you are reading a tag attribute:

<AnyTagGoesHere ... attr=\'http://whatever\' ...>

\{\{AnyTemplateOrParserFunction | attr=http://whatever

and ignore the URL in these situations.

$ python version.py
Pywikipedia \[http\] trunk/pywikipedia \(r8050, 2010/04/01, 15:43:14\)
Python 2.4.3 \(\#1, Sep 3 2009, 15:37:37\)
\[GCC 4.1.2 20080704 \(Red Hat 4.1.2-46\)\]


Version: unspecified
Severity: normal
See Also:
https://sourceforge.net/p/pywikipediabot/bugs/1164

Details

Reference
bz55276

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:31 AM
bzimport set Reference to bz55276.
bzimport added a subscriber: Unknown Object (????).

I disagree. It is very well possible to have a sensible URL in a template \(e.g. a reference\). I'd suggest to only add 'exceptions', as has been done in r8076.

I do not agree. Since it is legal putting URLs into <ref /> tags as well as others like <noinclude> etc. or assigning URLs to a template field, this normally shouldn't be ignored by the weblinkchecker but checked if this URL is still valid.

  • status: open --> open-rejected
  • assigned_to: nobody --> xqt
  • status: open-rejected --> pending-rejected

I see your point. Three notes:

1\. Can this be an OPTION for weblinkchecker?

2\. If not, can you at least strip off the trailing single quotes \(shown in bug 1969051\) so you don't get broken URLs? Since single quotes are valid in tags but should not be part of the URL.

3\. In any case, you should revert pyrev:8076 because there is no such tag as <sql>.

  • status: pending-rejected --> open-rejected

The <sql> tag is a non-standard tag, but is used by on the other bug reporters' wiki \(as was clearly stated in his/hers bug report\)

valhallasw: Actually, I \*am\* the other bug reporter. :-\) <sql> is a made-up tag for the example. We have 40 tags that exhibit the problem behavior.

Aklapper triaged this task as Low priority.Feb 4 2022, 8:07 PM
Aklapper changed the subtype of this task from "Task" to "Bug Report".

We use API:Extlinks now