Page MenuHomePhabricator

[Epic] Support for queries on-wiki (automated list generation)
Open, HighPublic

Assigned To
None
Authored By
Lydia_Pintscher
May 22 2014, 12:20 PM
Referenced Files
None
Tokens
"Like" token, awarded by Ameisenigel."Like" token, awarded by robertvazan."Love" token, awarded by Moebeus."Like" token, awarded by YFdyh000."Love" token, awarded by geraki."Like" token, awarded by abian."Like" token, awarded by Liuxinyu970226.

Description

From the development plan:
Users are able to write queries like “all poets who lived in 1982” or “all cities with more than 1 Million inhabitants”. They are entered in a page in the Query namespace and internally saved as JSON. They are then executed when resources are available - usually not immediately. The result is cached. A query can be set to rerun at regular intervals or on-demand by an administrator. The result of the query is shown on the same page. It can also be accessed via the API. The clients can include the result of a query in their pages to for example create list articles. This will enable for example to have automatically updated list articles on Wikipedia.

However it is not decided yet whether the Queries will reside on Wikidata.org or on wiki. It's part of the discussion we will have to define the needs better.

Details

Reference
bz65626

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
JanZerebecki renamed this task from support for complex queries to [Epic] support for complex queries.Sep 10 2015, 7:56 PM

Isn't this the very definition of the Wikidata-Query-Service, which is now deployed in production, therefore meaning this is resolved?

This is about being able to use the query service from a wiki via a Lua module.

Deskana renamed this task from [Epic] support for complex queries to [Epic] support for complex queries on-wiki using Lua.Dec 23 2015, 6:20 AM

This is about being able to use the query service from a wiki via a Lua module.

I've updated the task title to reflect this.

Lydia_Pintscher renamed this task from [Epic] support for complex queries on-wiki using Lua to [Epic] support for complex queries on-wiki using Lua (automated list generation).Mar 9 2016, 1:56 PM
iecetcwcpggwqpgciazwvzpfjpwomjxn renamed this task from [Epic] support for complex queries on-wiki using Lua (automated list generation) to [Epic] Support for queries on-wiki (automated list generation).Jun 1 2018, 1:25 PM

As one of the possible solutions for this: https://commons.wikimedia.org/wiki/User:TabulistBot

Not sure whether it closes the task, but at least it implements a major part of it.

@Lydia_Pintscher - I wonder if this may be enough?

I am noticing that there's no use of the TabulistBot at all. So I wonder if the use case is there...

Listeria is a 3rd party tool, and we need a feature built-in Wikimedia. Also Listeria does not store the result in any structured way.

Listeria is a 3rd party tool, and we need a feature built-in Wikimedia.

What makes you think so?

Listeria is a 3rd party tool, and we need a feature built-in Wikimedia.

What makes you think so?

Another issue to consider is by updating Listeriabot make an edit to each page, and this clutters page histories for pages with embedded lists. It is much worse if the wiki host thousands of Listeria-generated pages, many of them are barely viewed but refreshed frequently, like cywiki. Snapshots are sometimes useful (T291091: Snapshots for saved queries), but in most times, they are not.

For some queries frequently used, there are various types: one is those that should be regularly refreshed (currently ListeriaBot do it but many pages are not refreshed as frequent as the page defined); one is those that may be refreshed on demand; one is those relatively stable, such as a list of chemical elements or US presidents. For the last type, it is useful to introduce a semi-permanent result and users who want to refresh it may see the diff between result sets. This also keep off most vandalisms, and may even become a way to moniter vandalisms (with some mechanism to regularly check query results). These three types are stored queries.

Also, they may be a parser function to invoke an arbitrary query on-the-fly, probably via some caching mechanism. This may be part of Wikilambda. In addition, there may be a "query template" mechanism to handle some sets of queries whose only difference are some parameters (one example: people died on <date>).

Listeria is a 3rd party tool, and we need a feature built-in Wikimedia. Also Listeria does not store the result in any structured way.

I want to make one point deeper: The current Listeriabot is 100% not a good design - it does multiple things at once.

First we can run a SPARQL query and we have a (raw) result set. secondly Listeria does some basic transform - both automatic and manual: You can look up labels, descriptions, aliases, some statements/qualifiers (this is not relied on SPARQL) and then since 2.0 the values are automatically formated (i.e. turned to links) which (it seems) there are no parameter to disable. Thirdly the results are formated into some table or template calls.

Ideally we should made them three independent part: the query part may be cached in ways described above; result the transform part may be cached, possibly per language if results in interface language is requested (if no terms is requested, or only monolingual terms is required which can already queried in SPARQL, this step can be skipped entirely); and the content is displayed using the cached result.

(If there are multiple possible transforms of one query, a further optimalization may be caching query results by some sorts of hash, or make transforms a top-level object, though the latter may be confuing to some users.)

They are entered in a page in the Query namespace and internally saved as JSON.

Why JSON? We have WDQS, which uses SPARQL, so it seems straightforward for me to store these on-wiki queries in SPARQL as well. (The cached results should probably be JSON, as it’s easier to consume and format for display, but for the queries themselves we don’t need to reinvent the wheel by creating a JSON schema.)

Another feature Listeriabot missing: Listeriabot always assumes that each entry is a item. Thus it is not easily usable for lists of properties, lexemes or non-entities (such as dates). Hacks exist, but they are merely hacks, not proper solution.

If we want to support this in 3rd party wikis, a solution that is not highly coupled with Wikifunctions is recommended. This may be:

  • A feature to store specific query on a query namespace and cached results in some storage outside of WikiLambda, plus a feature to run arbitrary query on-the-fly via Wikilambda, and/or
  • Extracting the async rendering feature of Wikilambda to some other extension and the complex query service will be based on such extension

In https://wikitech.wikimedia.org/wiki/User:DCausse/WDQS_Graph_Split_Impact_Analysis it is mentioned that Listeria is heavily throttled by WDQS (only 1.33% success rate). It is therefore not a scalable approach.