Page MenuHomePhabricator

write script that touches every item and thereby updates database tables
Closed, InvalidPublic

Description


Version: master
Severity: normal
Whiteboard: u=dev c=backend p=13
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=63230

Details

Reference
bz64600

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:15 AM
bzimport set Reference to bz64600.
bzimport added a subscriber: Unknown Object (MLST).

Is this about updating all tables in one run? What purpose does this have (in contrast to the recently merged rebuildItemsPerSite.php)

Should this really do null-edits (which at times can show up in the items history) or rather only run all of the secondary database updates? I guess secondary storage updates are enough, thus the bug title is slightly incorrect.

if we can do just secondary updates without the null edits, that's what we need and want, imho.

also, I would very much like the ability to specify which secondary updates to do, in cases where we don't want to do all of them.

e.g. things like hitting CirrusSearch with unnecessary updates wouldn't be nice

This is a pretty generic script, and should support a couple of modes for different use cases. I'm not saying all of these need to be implemented right now, just that we should have them in mind when designing this.

So, for a list of page IDs (optionally, page titles, possibly from a DB query), either:

  • just invalidate the parser cache by calling Title::invalidateCache()
  • re-parse (null-edit)
    • re-parse later (using RefreshLinksJob2 or similar)
  • re-apply data updates (using Content::getSecondaryDataUpdates(), using cached ParserOutput object if possible)
    • would be nice to be able to filter the updates by class name

The different modes should be implemented by separate classes that get called by the main script which iterates over ranges of target pages (strategy pattern).

On the command line, it would perhaps be nice to use verbs to describe these modes, e.g. call this as "updatePages.php invalidate", "updatePages.php parse-later", "updatePages.php update-secondary", etc.

Change 139768 had a related patch set uploaded by Legoktm:
Add simple script to "touch" pages

https://gerrit.wikimedia.org/r/139768

(In reply to Daniel Kinzler from comment #5)

  • re-parse later (using RefreshLinksJob2 or similar)

RefreshLinksJob2 has been deprecated since 1.23, and I'm not really sure what's the preferred way to do it now. I implemented the 3 other types though.

Per hooman: "We had that discussion on IRC: As null edits can cause changes in the serialization they *can* show up in the edit history, thus we don't want them (AFAIR)."

Change 139768 abandoned by Legoktm:
Add simple script to "touch" pages

Reason:
Don't have time to work on this, feel free to restore if the code is useful.

https://gerrit.wikimedia.org/r/139768

Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
Lydia_Pintscher removed a subscriber: Unknown Object (MLST).Dec 1 2014, 2:31 PM
hoo lowered the priority of this task from High to Medium.Mar 26 2015, 4:11 PM

No longer as important as it used to be.

Lydia_Pintscher claimed this task.

We decided to not work on this given that the SPARQL endpoint seems to be working well and covering most usecases.