Page MenuHomePhabricator

[Epic] Support new types of Entities in Wikibase Repository
Closed, ResolvedPublic

Description

This is a bug to track the issues in Wikibase Repository that prevent it from working with new types of Entities, and thus the blockers for commons, advanced queries, etc.

The two main issues are:

  1. Bad assumptions about Entities. For instance that they all have statements, or all have a fingerprint.
  1. OCP violations: code that handles entities in a way that requires specific handling for different types, without providing an extension mechanism.

These can typically be found together.

Related emails:

Wikibase DataModel and Wikibase QueryEngine contained several instances of these problems as well. Since those have all been fixed some time ago, there are plently of examples of how this can be done.

To spot code making one of the described mistakes, you can look for:

  • code type hinting against Entity
  • code using deprecated methods of Entity
  • classes that have a name starting with "Entity" and have checks like "if is item"

See also: T76019: [Story] Support new types of Entities in Wikibase Client

Details

Reference
bz73496

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.
StatusSubtypeAssignedTask
StalledNone
OpenNone
InvalidLydia_Pintscher
ResolvedAddshore
Resolvedthiemowmde
ResolvedAddshore
DeclinedNone
ResolvedAddshore
Invaliddaniel
Invaliddaniel
Resolveddaniel
Resolveddaniel
Invaliddaniel
Invaliddaniel
Resolveddaniel
ResolvedBene
DeclinedNone
DeclinedBene
Resolvedthiemowmde
Resolvedthiemowmde
ResolvedBene
ResolvedNone
Resolveddaniel
Resolvedadrianheine
Resolvedadrianheine
Resolvedthiemowmde
Resolvedthiemowmde
Resolvedadrianheine
Declinedadrianheine
Resolvedadrianheine
Resolvedadrianheine
Resolvedthiemowmde
Resolvedadrianheine
Resolvedthiemowmde
Resolvedthiemowmde
DeclinedNone
InvalidNone
Resolvedthiemowmde
OpenNone
InvalidNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Jonas renamed this task from Support new types of Entities in Wikibase Repository to [Epic] Support new types of Entities in Wikibase Repository.Sep 10 2015, 4:17 PM

A brief survey how places that have the available entity types hard-coded:

Using Item::ENTITY_TYPE or Property::ENTITY_TYPE:

  • ItemContent and ItemHandler, resp. PropertyContent and PropertyHandler (no need to change this)
  • WikibaseClient::getEntityFactory, getEntityChangeFactory
  • WikibaseRepo::getEntityFactory, getEntityChangeFactory, getContentModelMappings
  • ValidatorBuilders::buildItemValidators
  • EntityPerPageTable::getItemsWithoutSitelinks (should probably move to SiteLinksTable)
  • EntityConstraintProvider::getUpdateValidators, getCreationValidators
  • TermValidatorFactory::getFingerprintValidator, getLabelValidator
  • EntityDiff::newFromType (move this to ItemHandler etc)
  • EntityViewFactory::newEntityView
  • ...lots of test cases, which are mostly fine

Using CONTENT_MODEL_WIKIBASE_ITEM or CONTENT_MODEL_WIKIBASE_PROPERTY:

  • ItemContent and ItemHandler, resp. PropertyContent and PropertyHandler (no need to change this)
  • WikibaseRepo::getContentModelMappings
  • RepoHooks::onContentHandlerForModelID
  • Wikibase.php entry point: $wgHooks['FormatAutocomments'][]

Code directly bound to the concrete Item or Property classes:

  • DeserializerFactory and SerializerFactory (needs an extension point in the DataModelSerialization component)
  • SpecialNewItem and SpecialNewProperty (need a way to register more such special pages)
  • EditEntity API module checks instanceof Item
  • RdfBuilder checks instanceof Property
  • SpecialModifyTerm::checkTermChangePermissions checks entity type
  • EntityParserOutputGenerator::getParserOutput does instanceof Item
  • ParserOutputJsConfigBuilder typehints against Entity instead of EntityDocument

All of the above should use some sort of registry of entity types. The extension point for registering entity types could look like the extension point for data types: a global array, with one entry per entity type, mapping to a set of callback functions for performing type-specific tasks. Alternatively, we could try to push all registration logic into EntityHandler subclasses, and use the ContentHandler registration mechanism as the basis for registering entity types.

In addition, we need to change all type hints against Entity to EntityDocument, so we can support entities that e.g. have no aliases, or no statements.

Schema changes:

  • If we want Entities other than Items to support sitelinks, the schema of wb_items_per_site needs to change from numeric to prefixed ids.
  • If we need support for non-numeric IDs, the schema of wb_terms and wb_entity_per_page needs to change from numeric to prefixed ids.

API/DUmp changes:

  • If we need support for non-numeric IDs, the JSON representation of entity IDs needs to change.
NOTE: this doesn't include the places in the frontend JS code that have entity types hardcoded.

Summary of some points brought up during a discussion among the Wikidata team, with respect to introducing a MediaInfo entity type:

  • MediaInfo should live in a separate extension. The data model and serialization for MediaInfo could be split out into a separate library that doesn't depend on Wikibase or MediaWiki. A separate git repo would however be tricky to manage, especially at the beginning when things are still very unstable. Perhaps it should be a separate component managed in the same git repo.
  • Q: Do we want to / need to support filenames in for entity IDs, or do we want to manage numeric IDs for internal use? Can we (ab)use the id of the image description page?
  • Most things that are specific to an entity type are only needed on the repo. These can be managed by factory methods in a subclass of EntityHandler.
  • Entity-type specific things needed on the client:
    • serialization/deserialization
    • messages for localizing edit summaries
  • JS: for each entity type, we need to be able to defined additional resource loader modules. Such modules would then register themselves with the appropriate factories/registries in front end JS code.
  • MediaInfo should probably not derive from Entity, but implement EntityDocument directly. This requires a lot of type hints to be fixed in wikibase.

I moved my comment to the relevant ticket T125826#2001486.

@thiemowmde if we use the ID of the file description page, instead of an auto-increment id, we have a stable id and don't have track renames - the page table does it for us. I'm currently favoring that dirty hack... This will however only owrk once we integrate with the File namespace. The first iteration is plannend to be completely standalone.

Addshore claimed this task.
Addshore subscribed.

Going to just mark this as resolved, as we do have multiple "other" entity types now