Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1164/
Reported by: djbarrett
Created on: 2010-04-12 18:33:11
Subject: weblinkchecker should ignore URLs inside some tags, part 2
Assigned to: xqt
Original description:
This is a followup to \[pywikipediabot-Bugs-1969051\] \"weblinkchecker should ignore URLs inside some tags\"
The fix in pyrev:8076 by xqt is appreciated, but not an appropriate solution. The particular tag I listed in the ticket, \"<sql>\", was just an example. The fix by xqt simply hard-coded this example \(bogus\) tag into the Pywikipedia source code:
svn diff -c8076 http://svn.wikimedia.org/svnroot/pywikipedia/trunk/pywikipedia
A better fix would be to recognize when you are reading a tag attribute:
<AnyTagGoesHere ... attr=\'http://whatever\' ...>
\{\{AnyTemplateOrParserFunction | attr=http://whatever
and ignore the URL in these situations.
$ python version.py
Pywikipedia \[http\] trunk/pywikipedia \(r8050, 2010/04/01, 15:43:14\)
Python 2.4.3 \(\#1, Sep 3 2009, 15:37:37\)
\[GCC 4.1.2 20080704 \(Red Hat 4.1.2-46\)\]
Version: unspecified
Severity: normal
See Also:
https://sourceforge.net/p/pywikipediabot/bugs/1164