Page MenuHomePhabricator

Design and decide how redirects should work
Closed, ResolvedPublic

Description

Redirects within the item namespace, redirects into and from the item namespace, etc. How are they created, why are they needed, etc? Design and specify their usage. May be closely related to bug 38664 (Merge items)


Version: unspecified
Severity: major
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=57744

Details

Reference
bz38962

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 12:52 AM
bzimport set Reference to bz38962.
bzimport added a subscriber: Unknown Object (MLST).

pinkampersand.wikimedia wrote:

I think redirects should be created through two processes:

Through special pages: For things like bug 38664, where a built-in tool will exist to make some sort of change that results in the elimination of an item, where there's another item on the same topic as the now-irrelevant item.

Manually: There should be a 'redirectitem' user right (the assignment of which would be determined by the community, but would probably by default only be given to admins), which would allow a user who holds it to redirect a deleted item to another item. This would be primarily for cases where an item's been merged manually (i.e. all current merges, since there's still no special page for merges, and, presumably, some merges nonetheless even once we have a special page), as well as for cases such as when an item's been deleted after its linked article was deleted, but then the linked article is created and a new item is made for it (see also bug 49100). There have also been some cases where a vandal removes a sitelink from an item, and then a bot just creates a new item for that link. All of these aren't huge issues right now, but are clearly something we're going to need to deal with if we expect third-party sites to want to use our information - it would be very difficult to deal with Q#s suddenly becoming invalid, and their topics being replaced by new numbers.

I'm obviously not commenting on the technical part of this, since I know nothing about that stuff, but I'm just giving a user-experience perspective of the situations where item redirects are necessary, and how they should be implemented.

I agree with everything that has been said in the comment above.

Being able to redirect items would greatly reduce the work of admins having to delete items that have been merged, the user right mentioned above could be assigned to a group of people that have proven they know how to merge items without losing any data.

This feature would make it allot easier for external sites to use data on wikidata, the risk of an item 'moving' from one unique id to another would be reduced.

Also once this is implemented it would be great to go back through every deleted item and try to 'redirect' as many as possible to their new locations. Many deletions currently include a link to the new / merged item in the summary. I will try to write a script to do some analysis of this.

My speedy analysis shows approximately 218,060 items that are deleted and that could more than likely be redirected to another item.

These are simply items that reference another item in their deletion summary.

As far as I know this is 218,060 out of 532,144 which is about 40%.

What happens when you are on a redirect page and edit?
What happens when you try to EditEntity a redirect through the API?
How do history and diffs of redirects look like?
Undo on redirects?
How to display redirected entities in the UI?
How should ChangeOps deal with redirects?

Just a few comments / ideas on the above down so I don't forget :)

What happens when you are on a redirect page and edit?

  • If you are on a redirect page and edit you should have the ability to edit the redirect, but when visiting the redirect in the first place you would ,, be redirected to the target item, genrally you would only find yourself on the actual redirect page you are there because you want to be there and edit it.

What happens when you try to EditEntity a redirect through the API?

  • This is an interesting one :/ I almost feel like saying you cant.. Imagine if a bot were to get a list of items and work on them throught a day, during the day one of the items gets redirected by another user, the bot then comes around to make its edit, if it had the ability to edit it may simply overwrite the page, which may not be the right thing to do. Perhaps a parameter to not follow redirects, so by default they would be followed, but if you really want to you can still edit the entity from the api (there may be a case this is needed)

How do history and diffs of redirects look like?

  • History simply shows the changes made to the item up to the point of redirect and then the redirecting edit.
  • Diff would show all content in the entity on the left before the redirect and on the right something to signify the redirect (maybe #Redirect [[Q12345]]) is we want to stay along the lines of MW

Undo on redirects?

  • Naturally there should be an undo button incase mistakes are made. Potentially this could be something limited via permissions.

How to display redirected entities in the UI?

  • Pretty much the same as in core, a simple (Redirected from Q12345). In core this is just below the title of an article. It could go in the same place in wikidata (below the label), it could also go just above, benefit of this would be no content moving slightly from one non redirected item to another non redirected.

(In reply to comment #5)

What happens when you try to EditEntity a redirect through the API?

  • This is an interesting one :/ I almost feel like saying you cant.. Imagine

if
a bot were to get a list of items and work on them throught a day, during the
day one of the items gets redirected by another user, the bot then comes
around
to make its edit, if it had the ability to edit it may simply overwrite the
page, which may not be the right thing to do. Perhaps a parameter to not
follow
redirects, so by default they would be followed, but if you really want to
you
can still edit the entity from the api (there may be a case this is needed)

In the standard API you can use the &redirects parameter (https://www.mediawiki.org/wiki/API:Query#Resolving_redirects). Something like that would work.

pinkampersand.wikimedia wrote:

One technical note: On your standard MW redirect, you're still "on" the redirected page, instead of on the target. I.e., if you go to [[Foo]], you're viewing the content of [[Foobar]], but the URL still reads http://en.wikipedia.org/wiki/Foo. It occcurs to me that this behavior could be rather inconvenient with Wikibase: Redirects will, inherently, be somewhat less stable than normal Q#s (since someone can always change a redirect's target), so we want to incentivize people to use the "right" Q#; also, the URL is normally the only place a non-bot user can figure out an item's Q# without having to click somewhere else. So, once a redirect system actually gets built, I think it would be best if the URL actually redirected, like with a special page. (I don't know how technically feasible this is, of course.) And then we could have, like, a &redirectfrom= URL parameter, which would generate that familiar "redirected from Q000" text.

Oh, also...

(In reply to comment #5)

  • Diff would show all content in the entity on the left before the redirect

and
on the right something to signify the redirect (maybe #Redirect [[Q12345]])
is
we want to stay along the lines of MW

Ewww, no #REDIRECT. Wikibase already takes advantage of ContentHandler's having liberated us from unnecessary use of MediaWiki markup. I'm sure some of the devs can make us some pretty new Wikibase-type redirect content model. For diff view, it could just be "redirect-target: Q000" or something like that.

(That's assuming we even want to have the pre-redirect history shown... seeing as the redirect's target should have all the data the redirect had, we *could* have it that either only deleted items can be redirected, or that redirecting an item automatically deletes its old history.)

I have been using the 'Merge' tool and even with our current (fairly low) level of activity I often get a message telling me the tool could not create a delete message because of an edit conflict.

If the merge tool automatically created a redirect then these edit conflicts would not happen.

If the conversion to a redirect could be reversed easily then it would not be necessary to limit access to the revert tool to special approved persons and it could be made available to anyone using the merge tool.

Suggested workflow to revert a redirect:

  • go to the redirect destination page.
  • click "What links here"
  • select the link to the redirect page.
  • Revert the redirect (via the history page or a special tool or whatever code magic you come up with).

Change 89810 had a related patch set uploaded by Daniel Kinzler:
Allow ItemContent to represent a redirect.

https://gerrit.wikimedia.org/r/89810

Redirects will be implemented to be supported on the level of EntityContent, but not Entity. That is, they exist on the level of MediaWiki, not on the level of the Wikibase data model.

(In reply to comment #10)

If you say "on the level of MediaWiki" I guess you mean on the level of WikibaseRepo extension, right? Or do you mean Wikibase should handle redirects with the MediaWiki built-in feature for them?

@Bene: i mean Wikibase should handle redirects with the MediaWiki built-in feature for them. That sort of thing is exactly why the Content interface exists.

But I'm unsure how to go about it. I just sent a mail to wikidata-tech, asking for feedback. Here's the mail:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Hello all!

Once more, I'm taking a whack at implementing support for entity redirects -
that is, we want Q1234 to become a redirect to Q2345 after we merged Q1234 into
Q2345.

My question is how to best model this.

First off, a quick primer of the PHP classes involved:

  • We have the Entity base class, with Item and Property deriving from it, for

modeling entities.

  • As "glue" for treating entities as MediaWiki page content, we have the

EntityContent base class, and ItemContent and PropertyContent deriving from that
(along with their handler/factory classes, EntityHandler, ItemHandler and
PropertyHandler).

Presently, each type of Entity is assigned a MediaWiki namespace, and all pages
in that namespace have the corresponding content type.

Now, there are various ways to model redirects. I have tried a few:

  1. first I tried implementing redirects as a "mode" any Entity may have. The

advantage is that redirects can seamlessly occur anywhere "real" entities can
occur, e.g. in dumps, diffs, undo operations, etc. However, many operations that
are defined on "real" entities are not defined on redirects (e.g. "set label",
etc), so we would end up with a lot if "if redirect, do this, else, do that"
checks all over the code base.

  1. second I tried implementing redirects as a "mode" any EntityContent may have,

following the idea that redirects are really a MediaWiki concept, and should be
implemented outside the Wikibase data model. This still requires a lot of "if
redirect, do this, else, do that" checks, but only in places dealing with
EntityContent, not all places dealing with Entities.

  1. third I tried using a separate entity type for representing redirects: a

RedirectContent points to an EntityContent, there is no "mode", no need for
extra checks, no chance for confusion. However, we break a few basic
assumptions, most importantly the assumption that all pages in an entity
namespace contain an EntityContent of the respective type.

None of these solutions seem satisfactory for the indicated reasons, and also
some additional consideration explained below.

  1. This lead me to come fully cycle and consider another option: make redirects

a special *type* of Entity, besides Item and Property. This would again allow
straight forward diffs (and thus undo-operations) between the redirected and the
previous, un-redirected version of a page; Also, it would not compromise code
that operates on Item on Property objects. But since there are still many
operations defined for Entity that make no sense for redirects, we would need to
insert another class into the hierarchy (let's call it LabeledEntity) which
would be a base class for Item and Property, but not for Redirect.

This would require quite a bit of refactoring, and would introduce redirects as
a "first order citizen" into the Wikibase data model.

Which of the four options do you prefer? Or can you think of another, better,
option?

Below are some more points to consider when deciding on a design:

In a addition to the OO design questions outlined above, one important question
is how to handle redirects in the wb_entity_per_page table; that table contains
a 1:1 mapping of entity_id <-> page_id. It is used, among other things, for
listing all entities and for checking whether an entity exists. HOw should
redirects be represented in that table? They could be:

a) omitted (making it hard to list redirects),
b) or use the redirect's page ID (maintaining the 1:1 nature, but pointing to a
page that actually does not contain an entity),
c) or use the target entities page ID (breaking the 1:1 nature of the table, but
associating the ID with the accurate place).

For b) and c), a new column would be needed to mark redirects as such.

When deciding on how to represent redirects internally, we need to consider how
they should be handled externally. Of course, when asking for an entity that got
redirected, we should get the target entity's content, be it using the "regular"
HTML page interface, the API, the linked data interface, etc.

In RDF dumps, we would probably want redirects to show up as owl:sameAs
relations (or something similar that allows one URI to be indicated as the
primary one). Should redirects also be present in JSON dumps? If so, should it
also be possible to retrieve the JSON representation of entities from the API,
in places where usually "real" entities would be?

  • daniel

Since the decision was made (according to the document linked in comment #13) to implement option 2, I am closing this bug.