Page MenuHomePhabricator

Add Wikibase API module that is usable from client wikis and available as a generator & prop module
Closed, ResolvedPublic

Description

Extend MediaWiki API Query module to support basic Wikidata data retrieval locally. This would allow Wikidata data to be included as part of other API queries and even use it with generators (https://www.mediawiki.org/wiki/API:Query#Generators). Minimum requirement would be to retrieve wikidata descriptions using page titles or ids. (This would facilitate their use in search suggestions.) Other possible capabilities would include retrieving the Wikidata labels, aliases, claims, and inter-language links.


Version: unspecified
Severity: normal
Whiteboard: u=dev c=backend p=0
URL: https://www.mediawiki.org/wiki/Requests_for_comment/Wikidata_API

Details

Reference
bz72729

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:47 AM
bzimport set Reference to bz72729.
bzimport added a subscriber: Unknown Object (MLST).

Yuri's RFC is for use on the Repo, though. The idea there is to use Wikibase stuff as generators. Ryan's request, if I understand it correctly, is to implement a property module that can be used to provide extra properties for pages listed by a generator on a client wiki.

If I understand correctly, the intended use case is this: you have a list of local pages titles (e.g. from a prefix search), and want to list the; in the listing, you want to show some extra info from Wikidata, like the description. The suggestion is to allow API queries to include this extra information using an API prop module.

This could be done, but I wonder whether it's worth the effort. You can get the same info easily from Wikidata directly, with a single API call. For example, to get the wikidata labels and descriptions, in English, associated with the Pages Birch, Beech, and Beetle on enwiki, you can use the following query:

http://www.wikidata.org/w/api.php?action=wbgetentities&format=json&sites=enwiki&titles=Birch%7CBeech%7CBeetle&props=labels%7Cdescriptions&languages=en%7Cen-ca%7Cen-gb

Isn't this sufficient?

Yes, that's basically what folks are currently doing, but it isn't ideal. Ideally, we would like to be able to get regular page props and wikidata data from a single API call. Also, we would like to avoid the extra DNS lookup of an external HTTP request in high-traffic contexts (like search suggestions) if possible.

I second what Kaldari has said. Sure, it's sufficient, but it shouldn't be necessary. :-)

Considering that with my approach, you would be hitting wbgetentities with a couple of hundreds of queries from the mobile search interface, I suppose you are right: that isn't going to work. wbgetentities needs to load the full entity structure from the blob store, that's slow...

We already have the data you wan in the wb_terms table. I suppose adding a client side module that works much like the ApiQueryPageProps would be easy enough, and should make this a lot faster.

I can't promise that it will be performant enough though, I hear the API servers are pretty loaded. An alternative solution would be to add this information directly to Elastic, so it can be returned directly by the search module.

By the way, what do you use to generate the original list of local page titles? action=opensearch? action=wbsearchentities?

I have implemented a pageterms module, see I9b6b52f6b75e4d6a

Daniel, the apps currently use both prefixsearch and search generators. I can't speak for mobile web, but I guess it's similar. When the user clicks search we perform a title search first, then allow the user to switch to full text search from there. We currently have to collect the wikibase_items and then send off another request to wikidata.org to get the descriptions. Like Kaldari mentioned above, we would like to avoid that. Below are some examples we have currently implemented.

(1) Title search:
https://en.m.wikipedia.org/w/api.php?action=query&format=json&generator=prefixsearch&gpssearch=foo&gpsnamespace=0&gpslimit=12&prop=pageprops%7Cpageimages&ppprop=wikibase_item&piprop=thumbnail&pithumbsize=96&pilimit=12&list=prefixsearch&pssearch=formula&pslimit=12

(2) Full text search:
https://en.m.wikipedia.org/w/api.php?action=query&format=json&prop=pageprops%7Cpageimages&ppprop=wikibase_item&generator=search&gsrsearch=foo&gsrnamespace=0&gsrwhat=text&gsrinfo=&gsrprop=redirecttitle&gsroffset=0&gsrlimit=12&list=search&srsearch=foo&srnamespace=0&srwhat=text&srinfo=suggestion&srprop=&sroffset=0&srlimit=12&piprop=thumbnail&pithumbsize=96&pilimit=12

As Kaldari noted, "PageTerms" is not self-explanatory. My first thought was it would contain per-page legal terms (e.g. license).

Change 173543 merged by jenkins-bot:
Introduce PageTerms API module

https://gerrit.wikimedia.org/r/173543

Tobi_WMDE_SW claimed this task.

@daniel Thank you for providing this! Just a quick question: Why is the description field an array?
https://de.m.wikipedia.org/w/api.php?action=query&format=json&prop=pageterms%7Cpageimages&wbptterms=description&generator=search&gsrsearch=ufo&gsrnamespace=0&gsrwhat=text&gsrinfo=&gsrprop=redirecttitle&gsrlimit=12&piprop=thumbnail&pithumbsize=96&pilimit=12&continue=

gives me
...
"terms": {
"description": [
"Wikimedia-Begriffsklärungsseite"
]
}

Currently I'm using the first entry only. Just curious.

@MaxSem Would it be possible to add the pageterms prop to mobileview? That would enable us to get the page description without an extra request.

@MaxSem Would it be possible to add the pageterms prop to mobileview? That would enable us to get the page description without an extra request.

Just for the sake of posterity, I will note here that this was done in this patch: https://gerrit.wikimedia.org/r/#/c/180895/

@bearND Sorry for the late response, totally missed your question.

Basically, some terms can have multiple values for each languages (aliases, currently), while others are not (labels, descriptions). The API module doesn't know anything about the specific kinds of terms, to it just treats them all as multi-value.

Your app knows about the semantics of "description" and uses this to determine that it will only ever need to look at the first value, and there should never be more than one value. That's fine, but worth a comment in the code.

@daniel I later figured that one. But thanks for responding.