Page MenuHomePhabricator

Special page and/or parser function to check quotations from references
Closed, DeclinedPublic

Description

It is possible to check quotations from references to see if a page pointed to by an url infact contains the given quote. If not a correct quote the reference and what it is used for validating can be invalidated.


Version: unspecified
Severity: normal

Details

Reference
bz41529

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:54 AM
bzimport set Reference to bz41529.
bzimport added a subscriber: Unknown Object (MLST).

IMO, this would be best done by bots. Quotes are often modified from the original (for example I could quote you as "jeblad@gmail.com: If not a correct quote the reference [...] can be invalidated." ) and even if genuinely no longer present, human intervention is required (for example, if you are referencing web content the page might have moved or it may still be in the Internet Archive). Also, it could be used on Wikipedia and is not Wikidata-specific.

I have a more complete description somewhere, but for Wikipedia it can be implemented as a "quote" tag function that also takes a url to the referred site. In Wikidata it would be part of the reference object.

An easy way to do it is to first assume the quote to be correct, but push a job to the job queue if it doesn't already exist in memcached. If it exist in memcached it can be mared as valid or invalid right away. It will be cached for a day or two in memcached, then a new job will be generated. When the job is run it will check the external site.

There should be a small set of markers that act as wildcards during testing, mostly just square brackets (could need localization) that can contain anything. During matching they will be replaced by a non-greedy dot-star (.*?).

Also the page requested will need some cleanup, but it seems that a pretty simple regex-base scrubbing will be sufficient. Getting the raw text from a page (screen scraping) isn't that uncommon for bot and it is fairly simple.

And yes, some parts of the code can be shared with an extension for Wikipedia. ;)

I don't see us doing this. Let's close it to get the number of bugs down.