Page MenuHomePhabricator

Regularly publish updated word lists and definition lists
Closed, DeclinedPublic

Description

The world at large would greatly benefit from Wiktionary publishing lists of words and definitions on a regular basis much like Wikimedia publishes raw dump files of all its wikis.

I envisage several levels, the rougher ones will be trivial to implement. The better ones will take a little more work. For each English is obivously wanted with all other languages also desired.

  1. Raw list of words (page titles).
  2. List of words with all "common misspellings" removed.
  3. As per 2. but with all inflected forms removed (alternative spellings should stay)
  1. List of words per 3 (or possibly 2) with all definitions but lacking information on homonyms, example sentences, quotations, etc
  2. As per 3 but with senses clearly separated from homonyms

Note that 4 and 5 will require some structure. A very basic XML format seems obvious.


Version: unspecified
Severity: enhancement

Details

Reference
bz21164

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:46 PM
bzimport set Reference to bz21164.
bzimport added a subscriber: Unknown Object (MLST).

There is Python code for a Wiktionary translation extractor now on toolserver: https://svn.toolserver.org/svnroot/p_enwikt/translations/

It's not fully documented and is no longer maintained by its author but apparently worked on many different Wiktionaries.

Wiktionary dumps are available at http://dumps.wikimedia.org/backup-index.html - closing as WORKSFORME.