Page MenuHomePhabricator

DidYouMean extension submitted for comment and testing
Closed, DeclinedPublic

Description

DidYouMean is designed for the English Wiktionary to automate the use of the
{{see}} template there which links articles whose titles differ only by
capitalisation, use of diacritics, spaces, hyphenation, apostrophes, etc.

It adds two metadata tables which are maintained by hooks in all places where
articles can be created, renamed, or deleted. Metadata is kept only for
non-redirects in the main namespace.

A list of links to "similar" articles is added to all articles pages in view
mode and also to the 'nogomatch' and 'noarticletext' pages.


Version: unspecified
Severity: enhancement
URL: http://www.mediawiki.org/wiki/Extension:DidYouMean

Details

Reference
bz8648

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:30 PM
bzimport set Reference to bz8648.

source for DidYouMean extension

source for DidYouMean extension

attachment didyoumean.tar.bzip ignored as obsolete

DidYouMean extension diff for mainline code

Hooks for 'noarticletext' and SpecialUndelete

attachment phase3-diff.txt ignored as obsolete

DidYouMean diff for the extension itself

The code for the extension and its installer

attachment extensions-diff.txt ignored as obsolete

wikt.3.connelm wrote:

Since en.wiktionary.org (and presumably others) have [Appendix:Names] and all
those name entries, were you planning on adding any other name-oriented
normalizing to this? Or is SOUNDEX the next phase?

Handling appendices would require parsing whole pages which is more complex than
just parsing the {{see}} template.

Soundex turned out to be a lot more promiscuous than I expected. It seemed to
only take into account the first part of the words resuling in enormous lists of
matching words for each word and not being as alike as you'd expect.

Metaphone should be better but I couldn't get the library to work in the account
you gave me.

I'd been thinkig about anagrams and textonyms next but a) they are
language-dependent, and b) they require parsing and replacing whole sections of
articles which as often as not are not in any well-defined format.

Another idea is to scan all redlinks and possibly blue links except that they
won't have canonical casing and there is no easy way to sort the wheat from the
chaff akin to ignoring redirects in article space.

wikt.3.connelm wrote:

Well, I meant for the resulting main namespace entries, not taking apart the
Appendices themselves.

rotemliss wrote:

Please add wikibugs-l@wikipedia.org to the CC list when you assign the bugs.

robchur wrote:

First impressions are that this is quite a neat little extension and could have
great potential use. The "did you mean" message itself needs to be more
obtrusive - think coloured boxes - it's almost invisible on a search results page.

Thanks Rob. The idea was that on the English Wiktionary it will just look like
what we've already been doing for ages without all the manual labour. Once it's
out there people should modify it to do something bigger on the search page, and
maybe not ignore redirects for Wikipedia like it does for wiktionary.

DidYouMean extension diff for mainline code

  • Fixed return value at 'noarticletext'
  • Use new hook in SpecialUndelete instead of my own

Attached:

DidYouMean diff for the extension itself

  • Fix broken installer
  • Use new SpecialDelete hook instead of my own

attachment extensions-diff.txt ignored as obsolete

extension diff with changes suggested by Brion

Added table prefix in .sql file
Added addQuotes and tableName calls to constructed queries

attachment extensions-diff.txt ignored as obsolete

extension diff with changes suggested by Tim Starling

  • All functions and variables are now prefixed with wfDym-
  • The database lookup is now done inside the parser hook

attachment extensions-diff.txt ignored as obsolete

Fixed extension diff

Fixed a regression that slipped in.

Attached:

Committed the current version to extensions in r19837 to make it a little easier
to work with updates while testing.

A few notes on current state of the extension...

Setup:

  • Should use update hooks so the table can get installed by standard update.php
  • install.php should be replaced with a script that simply allows rebuilding the normalization entries

Caching:

  • 'see also' bits embedded into pages won't be automatically updated when the page is already cached. For cache-correctness, it'll need to look up affected pages on addition/removal of normalization entries and schedule them for purges (and, possibly, link refresh)

Internationalization:

  • It's hardcoded for particular English templates, which seems a bit icky.

In general I'm not too comfortable with the way it messes about with the text of pages as they're parsed. A totally separate 'similar pages' UI component might be cleaner. *shrug*

*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*

sumanah wrote:

Marking "reviewed" as the extension has been reviewed by Brion in comment 16.

sumanah wrote:

I've removed DidYouMean from https://www.mediawiki.org/wiki/Review_queue until the author responds to comment 16 .

Andrew Dunbar: Resetting the assignee and status of this issue because there has been no progress in the last years. Feel free to take it again when you are actually planning to fix this. Thanks.

hashar subscribed.

I am getting the extension archived ( T196430 ) since nothing happened over ten years.