Page MenuHomePhabricator

[Story] Create Special:EmptyItems
Closed, DeclinedPublic

Description

Create a special page that lists all pages with an "empty" item.
There should be a proper way to select the "emptiness" of an item giving the possibility to search for:

  • Empty description, label and aliases in all languages
  • Empty site links

Details

Reference
bz39150

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:04 AM
bzimport set Reference to bz39150.
bzimport added a subscriber: Unknown Object (MLST).

Requires a way to actually query for this information.

thomas.douillard wrote:

I'm working on this issue (to start with something small), and I can see two possibilities for an empty item definition :

  • An empty item is an item with just a Label or a description, no Alias : just one row in the term table
  • An empty item is an item not linked with anything (currently no Wikipage)

Which one should it be, one of these two above, or another one ?

I would thing that an empty item is empty. It contains nothing. No label, no description in any language, nothing. I don't think we should consider an item that has a label as empty. Programatically, this is defined by ItemContent::isEmpty() - but there's currently no way to query that in the database. Eventually, page_site should be usable for this, but currently, the number there is not very meaningful, and never null. I think for empty items it's currently between 2 and 14 or something (it counts bytes in the item's serialized form).

Anyway.

Re your first definition: does that mean it has a label or a description in only one language? Or in any number of languages? Note that there will be quite a few items that just have a label and a description, often only in one language. I would consider these "stubs", not "empty". They will often be created when filling in a property of the type "item reference", when the respective target item doesn't exist yet: if I want to provide the mayor of a city, but the mayor has no Wikidata item yet (and no Wikipedia page in any language), I'd just give a label and description, and the system would create a stub item.

Re your second definition: there will be quite a few items with no Wikipedia links that cant have Wikiepdia links because there just isn't any Wikipedia page about them. Stubs like the above, but also books needed for citations, etc. They should not be considered empty.

All that being said, I think it would be useful to have lists for all of these: empty items, "stub" items (label and/or description and/or aliases only), and "unconnected" items (items with no sitelinks).

It's currently not trivial to get these lists from the database, and I imagine we might get even more involved definitions once we have full support for properties (and maybe things like categories). One solution would be to detect these "states" of the item whenever a new revision is saved, and store them to the page_props table.

Yea, thinking about it, that's probably the way to go. ItemContent (or better, EntityContent) should get a getPageProps() method that would be used to push the appropriate page_props into the ParserOutput object returned by EntityContent::getParserOutput. That should cause these props to be saved in the DB, which makes it easy to construct the respective lists.

I have filed bug 40157 for the work required to get the necessary info into page_props, so we have separate tickets for the feature and the underlying mechanism.

@Thomas: want to have a go at 40157?

I wonder if we need a writeup for what different things like "empty", "stub", "unlinked", etc, means in our context. I guess its somewhat confusing that an item can be empty and still contain stuff.

(In reply to comment #5)

I wonder if we need a writeup for what different things like "empty", "stub",
"unlinked", etc, means in our context. I guess its somewhat confusing that an
item can be empty and still contain stuff.

I tried to define the relevant things in bug 40157.

But... in my mind, "empty" means *empty*. It's not empty if it contains stuff. But the serialization may still contain empty arrays for labels, etc, so the size of the serialized form is not useful.

Stuff in this context is things like '{}' and '{"labels":{}}'.

(In reply to comment #7)

Stuff in this context is things like '{}' and '{"labels":{}}'.

EntityObject::isEmpty() implements the check.

thomas.douillard wrote:

(In reply to comment #4)

I have filed bug 40157 for the work required to get the necessary info into
page_props, so we have separate tickets for the feature and the underlying
mechanism.

@Thomas: want to have a go at 40157?

This bug is the answer to my unasked next question, so yes I will take this one.

(In reply to comment #9)

(In reply to comment #4)

@Thomas: want to have a go at 40157?

This bug is the answer to my unasked next question, so yes I will take this
one.

Excellent, thank you!

If you have any questions, just post it to the bug or ask me directly. Or send mail to wikidata-l.

Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
Jonas renamed this task from Create Special:EmptyItems to [Story] Create Special:EmptyItems.Sep 10 2015, 6:52 PM
Jonas updated the task description. (Show Details)
Jonas set Security to None.

There are already on-wiki checks in place for this. I don't think we should add an additional special page at this point.