Page MenuHomePhabricator

OpenSearch XML description excludes inline template text
Closed, ResolvedPublic

Description

Author: fedot.p

Description:
opensearch results in XML format contain Description node in every Item, which most times is the first sentence in page described by Item.

And the text of Description many times corrupted because of stripping templates, like in the URL for bug.

Moscow (Russian: Москва, romanised: Moskva, IPA: ru-Moskva.ogg [mɐˈskva] (help·info); see also other names) is the capital and the largest city of Russia.

Became:

Moscow (, romanised: Moskva, IPA: ; see also other names) is the capital and the largest city of Russia.

Some times this results are wrong at all since there is a character (like: + or -) between templates written in this page and it's became the Description in results.
See: http://en.wikipedia.org/w/api.php?action=opensearch&limit=1&format=xmlfm&search=Roger%20Federer

Maybe it's possible to change the way opensearch return this results to include rendered/unrendered wiki text or at least to resolve some of templates and detect non alphanumeric characters in the start of the sentence?..


Version: unspecified
Severity: enhancement
URL: http://en.wikipedia.org/w/api.php?action=opensearch&limit=1&format=xmlfm&search=moscow

Details

Reference
bz20411

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:47 PM
bzimport set Reference to bz20411.

Changing Product/Component to MediaWiki extensions/[other] since we don't have a OpenSearchXml component.

New search engine seems to have resolved this.