Page MenuHomePhabricator

ipblocks.ipb_address has inconsistent format
Closed, ResolvedPublic

Description

Usernames of block users are stored in ipblocks.ipb_address. This field generally uses spaces in place of underscores. However many entries are inexplicably using underscores. This is screwing up the API.

Example:

https://en.wikipedia.org/w/api.php?action=query&list=blocks&bkusers=Mistress+Selina+Kyle

No results are currently found.

From the Toolserver's enwiki_p, however:

mysql> select * from ipblocks where ipb_address ='Mistress_Selina_Kyle' limit 1\G

  • 1. row ******* ipb_id: 175820 ipb_address: Mistress_Selina_Kyle ipb_user: 697854 ipb_by: 129409 ipb_reason: restoring block as this was not discussed; this user has been extremely disruptive and was never a useful contributor ipb_timestamp: 20060528233006 ipb_auto: 0 ipb_anon_only: 0 ipb_create_account: 1 ipb_expiry: infinity ipb_range_start: ipb_range_end:

ipb_enable_autoblock: 1

       ipb_deleted: 0
   ipb_block_email: 0
       ipb_by_text: SlimVirgin
ipb_allow_usertalk: 1

1 row in set (0.10 sec)

Similarly, https://en.wikipedia.org/w/api.php?action=query&list=blocks&bkusers=MilkMan_Chocolate only references one block entry. The database actually has two:

mysql> select * from ipblocks where ipb_address ='MilkMan_Chocolate' limit 1\G***** 1. row *****

            ipb_id: 36301
       ipb_address: MilkMan_Chocolate
          ipb_user: 416818
            ipb_by: 61329
        ipb_reason: Vandal
     ipb_timestamp: 20050904050301
          ipb_auto: 0
     ipb_anon_only: 0
ipb_create_account: 1
        ipb_expiry: infinity
   ipb_range_start: 
     ipb_range_end:

ipb_enable_autoblock: 1

       ipb_deleted: 0
   ipb_block_email: 0
       ipb_by_text: Rdsmith4
ipb_allow_usertalk: 1

1 row in set (0.01 sec)

mysql> select * from ipblocks where ipb_address ='MilkMan Chocolate' limit 1\G

  • 1. row ******* ipb_id: 36680 ipb_address: MilkMan Chocolate ipb_user: 416818 ipb_by: 58666 ipb_reason: sockpuppet, evading block ipb_timestamp: 20050906061037 ipb_auto: 0 ipb_anon_only: 0 ipb_create_account: 1 ipb_expiry: infinity ipb_range_start: ipb_range_end:

ipb_enable_autoblock: 1

       ipb_deleted: 0
   ipb_block_email: 0
       ipb_by_text: MarkSweep
ipb_allow_usertalk: 1

1 row in set (0.01 sec)


Version: unspecified
Severity: normal

Details

Reference
bz34014

Related Objects

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:08 AM
bzimport set Reference to bz34014.
bzimport added a subscriber: Unknown Object (MLST).

A bit more data. It looks like this phenomenon began in 2004 and ended in 2006:

mysql> select ipb_address, ipb_timestamp from ipblocks where ipb_address like '%\_%' order by ipb_timestamp asc limit 1;
+--------------------+----------------+

ipb_addressipb_timestamp

+--------------------+----------------+

Live_and_let_Troll20041025021840

+--------------------+----------------+
1 row in set (0.02 sec)

mysql> select ipb_address, ipb_timestamp from ipblocks where ipb_address like '%\_%' order by ipb_timestamp desc limit 1;
+--------------------------------+----------------+

ipb_addressipb_timestamp

+--------------------------------+----------------+

I_learned_poker_from_wikipedia20060711184521

+--------------------------------+----------------+
1 row in set (1.96 sec)

Sounds like we just need to go through all wikis and update the underscores back to spaces

UPDATE ipblocks SET ipb_address = REPLACE(ipb_address, '_', ' ') WHERE ipb_address LIKE '%\_%';

Wonder if it's worth adding an updater action for this too, or whether it's a WMF site issue

(In reply to comment #2)

Sounds like we just need to go through all wikis and update the underscores
back to spaces

UPDATE ipblocks SET ipb_address = REPLACE(ipb_address, '_', ' ') WHERE
ipb_address LIKE '%\_%';

Wonder if it's worth adding an updater action for this too, or whether it's a
WMF site issue

Well, as I pasted above, some of these (duplicative) block entries apply to the same user. I think you'll need a maintenance script to go through, find instances where ipb_user is not equal to 0 and is non-unique, select the block with the highest ipb_timestamp, delete the older entries, and then fix ipb_address as necessary.

Change 168965 had a related patch set uploaded by Anomie:
Maintenance script to clean up mismatched user names in ipblocks

https://gerrit.wikimedia.org/r/168965

Change 168965 merged by jenkins-bot:
Maintenance script to clean up mismatched user names in ipblocks

https://gerrit.wikimedia.org/r/168965

Maintenance script is created, now we just need to run it.

Change 169231 had a related patch set uploaded by Aaron Schulz:
Maintenance script to clean up mismatched user names in ipblocks

https://gerrit.wikimedia.org/r/169231

Change 169231 merged by jenkins-bot:
Maintenance script to clean up mismatched user names in ipblocks

https://gerrit.wikimedia.org/r/169231