Page MenuHomePhabricator

clean up null filenames from database by update.php
Closed, ResolvedPublic

Description

In SpecialWantedpages.php:

$tsafe = htmlspecialchars( $result->title );
return "Invalid title in result set; {$tsafe}";

  1. make sure all lines like these are translated.
  1. print out some mention of what should be done. E.g., for the above URI one just sees
  1. 分類:新光三越(1) ‎(1 link)
  2. 分類:玩具反斗城 ‎(1 link)
  3. Invalid title in result set;
  4. 分類:仁化村 ‎(1 link)
  5. 分類:交通隊 ‎(1 link)

And doesn't know how to proceed to alleviate the problem.
Perhaps it was expected some partial name clue would be always printed.


Version: 1.15.x
Severity: major

Details

Reference
bz17751

Related Objects

StatusSubtypeAssignedTask
OpenFeatureNone
ResolvedNone

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:31 PM
bzimport set Reference to bz17751.
bzimport added a subscriber: Unknown Object (MLST).

Looks like that in this case the title contains some invisible characters, which are, well, hard to show.

Created attachment 5888
still must make multilingual

OK, this patch will improve the feedback to the user.

However, it is still English only.

And in my case it revealed the title is just ""! And

return "-".strlen($result->title) ."-";

just gives "-0-".

How could such an item get into the results? Is the problem with
SpecialWantedpages.php or does it indicate a deeper problem...? I
looked at SpecialWantedpages.php but it is too complex for me.

Attached:

Localized the message in r48061.

Ideally, an invalid title should have never made it into the database in the first place, as it should be validated prior to editing.

OK, found the BIG BUG. Big because it corrupts the database, in my eyes.

You know internal anchor links, like #Bla, #中文?

Well, the former is innocuous. However, it turns out the latter,
because it is Chinese, causes a (empty!) entry to be made in
wiki_pagelinks!

$ echo "SELECT * FROM wiki_pagelinks WHERE pl_title = '';"|mysql mydb
pl_from pl_namespace pl_title
22 0

(So that's why Wantedpages gets junk!)
(Happens at least on my $wgLanguageCode='zh-tw' wikis. Didn't test 'en').

(In reply to comment #4)

OK, found the BIG BUG. Big because it corrupts the database, in my eyes.

You know internal anchor links, like #Bla, #中文?

Well, the former is innocuous. However, it turns out the latter,
because it is Chinese, causes a (empty!) entry to be made in
wiki_pagelinks!

$ echo "SELECT * FROM wiki_pagelinks WHERE pl_title = '';"|mysql mydb
pl_from pl_namespace pl_title
22 0

Doesn't that make this bug a duplicate of bug 17713?

  • This bug has been marked as a duplicate of bug 17713 ***

The problem now is these null entries are still left in the database even through runs of update.php.

So even though no new null entries will be added, the ones that got into the database remain, so bug reports will come in from users who see the weird side effects.

Therefore update.php needs to clean them up.

I.e., without cleaning them up via update.php, each page with the null entries must be "null edited" by hand to correct the problem...
But of course the problem is not with the page at all.

Happening at WikiFur with our Spanish language version. As I recall, the null entries went away when we removed the Cite extension (or didn't use references). Previously, this resulted in a crash, now just this message.

As a workaround, the administrator manually running maintenance/refreshLinks.php apparently fixes them, though the output doesn't indicate such.

You can remove them from the database by running the SQL query DELETE FROM pagelinks WHERE pl_namespace=0 AND pl_title=''; using maintenance/sql.php , but they'll also disappear over time as the wiki is being edited. For these reasons I'm not sure this is worth an update script.

(In reply to comment #11)

You can remove them from the database by running the SQL query DELETE FROM
pagelinks WHERE pl_namespace=0 AND pl_title=''; using maintenance/sql.php , but
they'll also disappear over time as the wiki is being edited. For these reasons
I'm not sure this is worth an update script.

I have done this cleanup on all Wikimedia wikis.