Page MenuHomePhabricator

action=parse&pageid=n seems to not be parsing correctly in some cases
Closed, InvalidPublic

Description

Author: tristen_e

Description:
This URL:

http://en.wikipedia.org/w/api.php?action=parse&pageid=3209758

yields a page with a snippet of html looking like this:

&amp;lt;a href=&amp;quot;<a href="http://toolserver.org/~geohack/geohack.php?pagename=St_James_Old_Cathedral&amp;amp;amp;params=37_42_26_S_144_56_17.9_E_&amp;quot;">http://toolserver.org/~geohack/geohack.php?pagename=St_James_Old_Cathedral&amp;amp;amp;params=37_42_26_S_144_56_17.9_E_&amp;quot;</a> class=&amp;quot;external text&amp;quot; rel=&amp;quot;nofollow&amp;quot;&amp;gt;

which winds up looking somewhat like this:

<a href="<a href="http://toolserver.org/~geohack/geohack.php?pagename=St_James_Old_Cathedral&params=37_42_26_S_144_56_17.9_E_"">http://toolserver.org/~geohack/geohack.php?pagename=St_James_Old_Cathedral&params=37_42_26_S_144_56_17.9_E_"</a> class="external text" rel="nofollow">

you can reproduce it quite easily with the url:

http://en.wikipedia.org/w/api.php?action=parse&pageid=3209758

and by viewing the source on the original page helps see how the html should look:

http://en.wikipedia.org/wiki/St_James_Old_Cathedral

sorry in advance in case i've made a mistake and i'm not interpreting the html correctly.

sorry in advance also if major is an exaggeration of the severity!

best regards and thank you for the API.

tristen


Version: unspecified
Severity: major
URL: http://en.wikipedia.org/w/api.php?action=parse&pageid=3209758

Details

Reference
bz27630

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:28 PM
bzimport set Reference to bz27630.

To me that looks perfectly sane, and correct...

What do you think is wrong?

I'm getting this:

&lt;a href=&quot;http://toolserver.org/~geohack/geohack.php?pagename=St_James_Old_Cathedral&amp;amp;params=37_42_26_S_144_56_17.9_E_&quot; class=&quot;external text&quot; rel=&quot;nofollow&quot;&gt; which looks good to me. If the API would output what you said it did, that'd be wrong.

What did you use for the &format= parameter?

tristen_e wrote:

i didn't use a format= parameter, therefore i suppose it was using the default of xmlfm. when i explicitly use format=xml it appears ok.

that's a solution, i'm happy with that! sorry to take your time.

The fm means html pretty printing for all of them...

And it is default

(In reply to comment #4)

The fm means html pretty printing for all of them...

And it is default

To clarify: the default format is xmlfm, which is XML with HTML pretty-printing. It's designed to be human-readable, not necessarily machine-readable. For actual XML, use format=xml

Change 740858 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] calico: Allow to configure the IPAM module

https://gerrit.wikimedia.org/r/740858