OpenSearchXml first sentences extraction produces bad results
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	MaxSem
	Mar 9 2012, 3:05 PM

Description

The current regex, roughly "end capture after the second dot followed by whitespace produces wildly inaccurate results for sentences with dots in the middle, for example if article title contains dots:

https://en.wikipedia.org/w/api.php?action=opensearch&format=xmlfm&search=.s.p.%20v&limit=10

<Item>
  <Text xml:space="preserve">S. P. Venkatesh</Text>
  <Description xml:space="preserve">S. P. </Description>
  <Url xml:space="preserve">https://en.wikipedia.org/wiki/S._P._Venkatesh</Url>
</Item>
<Item>
  <Text xml:space="preserve">S. P. Velumani</Text>
  <Description xml:space="preserve">S. P. </Description>
  <Url xml:space="preserve">https://en.wikipedia.org/wiki/S._P._Velumani</Url>
</Item>

It should be something like "first dot followed by whitespace after a certain number of characters".

Version: unspecified
Severity: normal

Details

Reference: bz35083

Event Timeline

• bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 12:09 AM

• bzimport added a project: MediaWiki-extensions-OpenSearchXml.

• bzimport set Reference to bz35083.

MaxSem created this task.Mar 9 2012, 3:05 PM

Fixed (well, improved) in r113475.

OpenSearchXml first sentences extraction produces bad resultsClosed, ResolvedPublicActions

Description

Details

Event Timeline

OpenSearchXml first sentences extraction produces bad results
Closed, ResolvedPublic
Actions