Page MenuHomePhabricator

Add wikidata revision id to article's html
Closed, DeclinedPublic

Description

Author: zhjyong

Description:
Currently many wiki articles have started to use wikidata repository to get interwiki links. However, for a specific article html page, it is hard to know what is the wikidata revision used for page render. Then it is hard to sync-up wikitext's revision and wikidata's revision for that specific page.

Since we have put wikitext's revision number in an article html, a simple method to solve the above issue is also put wikidata revision number in article html.


Version: unspecified
Severity: minor

Details

Reference
bz45627

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:39 AM
bzimport set Reference to bz45627.
bzimport added a subscriber: Unknown Object (MLST).

To give some context here: this is a request from Google's Wikipedia mapping team.

The revision from Wikidatas item can be in effect several places on the page. Is this about all those places or is it about an <meta> entry in the header?

zhjyong wrote:

Based on my knowledge, one wiki article only reads one revision of wikidata even though it may include interlanguage links, infobox data, and/or list data, isn't it?

If so, then it seems simpler to have an <meta> entry in the header. How do you think about it?

Something like this?
https://gerrit.wikimedia.org/r/#/c/51976/

Note that the revision stuff does not work, its just in there to show a codeflow. The item id stuff works.

In a client it seems like the page (article in wikipedia) either uses one revision everywhere, or either the sitelinks or the statements use a newer revision than the opposite. That mean we could list the revision for each block that might get out of sync with the rest.

I'm not sure how we can handle a page consisting of blocks from several revisions.. But in theory parts should be kept if they are not changed, so the last known revision should be good enough. There is a revision id in the change propagation, we could use that instead of trying to request it from the repo. In some newer changesets we keep the iten id for later, we could do the same for the items revision id. Unless anything triggers a rebuild of the page it can be assumed to use the same revision, and if only parts of the page changes the other parts would be the same anyhow. Yes that could work.

@John: that looks like the right way to go, in principle. The original request was for an HTML comment though - MediaWiki already provides some, mainly for debugging. I like the meta tag, it's cleaner, but we could still add something like <!-- wikidata item q1234, rev 732784 --> to the end of the page, for good measure. What do you think?

Also note that getting the ItemContent from the EntityLookup as you do in your patch introduces performance issues, but that would be fixed via bug 45566.

...quick follow-up: we could just put the item revision into the ParserOutput object, and use that in a hook later.

Yes, it is in a todo in the example code.

zhjyong wrote:

Hi,

Just want to know the progress of this bug. Have we decided to implement some mechanism for this bug? If so, what is the current status? Thanks.

zhjyong: No progress, otherwise it would be mentioned here...

At this point a Wikipedia article indeed does only use data from one item. But this will change in the future. Is it a good idea to put all of the items and their revisions in the article's html?

(In reply to Lydia Pintscher from comment #10)

At this point a Wikipedia article indeed does only use data from one item.
But this will change in the future. Is it a good idea to put all of the
items and their revisions in the article's html?

IMHO yes, but the item uniquely associated with the page should be tagged as "main".

Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
Lydia_Pintscher claimed this task.

Yes we could say which one is the "main" item but what information would this give you. Soon a single page can get information from many many items. The API is really a lot better for handling this kind of information.
I am going to close this because I don't think we can do this in a meaningful way that isn't covered better in other ways at this point.