Page MenuHomePhabricator

Add rd_interwiki colum to redirect table
Closed, ResolvedPublic

Description

While trying to fix 9237, it turns out that interwiki redirects are stored in the database without their interwiki prefix. While 9237 has a patch to remove interwiki redirects entirely, it would make more sense to keep them. Looking at the database, it appears the prefix is stripped entirely from the title.

After adding a redirect from [[Test]] to [[fr:Platypus]] on my local install, the query return this:
Rd_from = 58
Rd_namespace = 0
Rd_title = Platypus

Adding this column (and making sure it's populated correctly, I'm not 100% sure where this is done), allows bug 9237 to be fixed very easily and also opens the door to something such as [[Special:InterwikiRedirects]].


Version: unspecified
Severity: enhancement

Details

Reference
bz14418

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:08 PM
bzimport set Reference to bz14418.
bzimport added a subscriber: Unknown Object (MLST).

Seems like I'm not the only one who wants this. In my view, the redirect table should cover *all types* of redirects, so we can actually use it to redirect people, rather than parsing the article text like we do now.

Created attachment 5076
Basic implementation and population of rd_interwiki column

In this patch, I have implemented the rd_interwiki column for the redirect table. Changes to Article class to make sure it is set/updated as needed. Nothing makes use of this yet, but it's working properly.

Could probably use a maintenance script to back-fill all current redirects.

attachment rd_iw.patch ignored as obsolete

Patch looks good, but what's the whole "username of blocker" and "global blocking" thing about? I don't see how that ties in to rd_interwiki. Also, the comment in question doesn't seem to describe (at all) what the rd_interwiki field is about. Futhermore, why do you declare it as CHAR (32) rather than VARCHAR (32)?

(In reply to comment #3)

Patch looks good, but what's the whole "username of blocker" and "global
blocking" thing about? I don't see how that ties in to rd_interwiki. Also, the
comment in question doesn't seem to describe (at all) what the rd_interwiki
field is about. Futhermore, why do you declare it as CHAR (32) rather than
VARCHAR (32)?

The blocking comment was an oversight from a copy+paste of a different sql patch. Sorry about that.

I chose char(32) instead of varchar(32) as it's what the interwiki table uses for the iw_prefix column.

I'd recommend using the varchar form; we should update the interwiki table if it's still using char.

My inclination is also to let the field be NULL for non-interwikis, rather than forcing it to empty string.

Would it be desirable to also include a field for target fragments? (eg [[Dest#section]]) Currently to get the total destination link you'd have to load up the source text and parse it.

Created attachment 5111
Updated patch

Updated patch to incorporate feedback:

  1. Using varchar instead of char
  2. Use null as default instead of empty string
  3. Remove unrelated comment from sql patch
  4. Add rd_fragment column to populate with section links

Redirecting [[Test]] to [[wikipedia:Platypus#Virus]] then querying the redirect table returns the following:

rd_from = 2
rd_namespace = 0
rd_title = Platypus
rd_interwiki = wikipedia
rd_fragment = Virus

Changing values for the iw prefix, target page, fragment, et all update columns as expected.

attachment new.patch ignored as obsolete

Looking good. With rd_interwiki *and* rd_fragment, we can finally use the redirect table to resolve redirects rather than parsing the page contents. I attempted to do this in my fix fro bug 10931, but that failed for interwiki and fragment redirects.

Created attachment 5125
Proposed patch

Expanded on Demon's patch. Basically, it automatically sets rd_interwiki and rd_fragment for redirects that don't have them yet when they're viewed, and (at last!) implements redirect resolution using the redirect table rather than the page text. I've tested it and it works for me (even fragment+interwiki redirects).

In detail:

  • Get and insert/update rd_interwiki and rd_fragment where needed
  • Change Article::getRedirectTarget() to cover all cases and to update the row if rd_interwiki or rd_fragment is NULL (i.e. if it's a leftover from the old schema without those fields)
  • Rewrite Article::followRedirect() to incorporate most of the code from followRedirectText() (which has undergone a minor cleanliness rewrite) and to use getRedirectTarget()
  • Remove followRedirectText() as it's unneeded now
  • Add $interwiki parameter to Title::makeTitle(), makeTitleSafe() and makeName()
  • Document and update Title::getRedirectsHere()

attachment patch-14418.patch ignored as obsolete

Looks good to me. I was also doing some testing last night (already spoke to Roan about this), and the refreshLinks maintenance script will work just fine for back-filling any non-existent redirect fields. Eventually the code goes through Article::updateRedirectOn(), which is part of the overall updates here, so everything should populate as needed.

How about committing something and closing the issue?

Waiting for Brion to coordinate the schema changes, afaik.

Created attachment 5828
Update previous patch

Same patch, works exactly as intended. Updated to apply cleanly to HEAD.

Attached:

Put both field additions into the same patch. They can share an add_field call.

Change the rd_ns_title index, it will need to have rd_interwiki at the start of it.

  • array('rd_namespace', 'rd_title'),

+ array( 'rd_namespace', 'rd_title', 'rd_interwiki', 'rd_fragment' ),

Use "*", it's shorter

  • 'rd_title' => $retval->getDBKey()

+ 'rd_title' => $retval->getDBKey(),
+ 'rd_title' => $retval->getDBKey(),

You probably only need one of these.

	public function insertRedirect() {
  • $retval = Title::newFromRedirect( $this->getContent() );

+ $retval = Title::newFromRedirectRecurse( $this->getContent() );

This appears to be wrong and broken. There are two choices: have the redirect table point to the final destination and use a separate tracking table to update the table when the intermediate redirects change, or have the redirect table point to the immediate destination and follow the chain on fetch. This appears to be like the first option, except missing the complex update logic.

This was done in r69181, r51583.