Page MenuHomePhabricator

Fix all the Wikia stats
Closed, ResolvedPublic

Description

Author: lwchris

Description:
All Wikia stats on http://wikistats.wmflabs.org/largest_html.php failed to refresh since 2012-04-20 now. It's about time to finally fix this after 631 days.

Either use api.php?action=query&meta=siteinfo&siprop=statistics&format=xml
or Special:Statistics?action=raw

Examples:
http://lyrics.wikia.com/api.php?action=query&meta=siteinfo&siprop=statistics&format=xml
http://lyrics.wikia.com/Special:Statistics?action=raw

I think it shouldn't be very difficult to update the script.

Thanks in advance.

LWChris, Admin@LyricWiki


Version: unspecified
Severity: normal

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:38 AM
bzimport set Reference to bz59943.

The stats are not broken, they were stopped (as far as I know) because Wikia complained about the number of requests to their API. I suggest you to write community@wikia.com and make them comment somewhere (e.g. this bug) that it's ok to make requests to their API.

yea, we just stopped them. and it seemed too extreme to update them even just once in 24hrs because there are just so many and we also did not have a good way to sync the list of current existing wikis. the plan was always to maybe ask Wikia if it's possible to provide all the stats from their DB somehow. no idea if that's easy for them or hard. but it would avoid tens of thousands of requests to each single API

So, I finally found the file responsible of updates navigating the obscure repo tree: http://git.wikimedia.org/blob/operations%2Fdebs%2Fwikistats.git/HEAD/usr%2Flib%2Fwikistats%2Fupdate.php

If I understand correctly the update is run every 24h in a cron. The simplest change I can think of is:

  1. add a sleep time of 1 second between a request and the following;
  2. if a table has 1000 wikis or less (or is "mediawikis"), update them all;
  3. if a table has 1000 wikis or more, update only 1000-1500, in this way: a) start from those whose last update was earlier, b) first update up to 500 wikis with more than 100 articles, c) then update up to 1000 of the other wikis.

In this way we would update the whole Wikia table in a month (or one year once it's completely filled) but have data 10 or so days old at most for the bigger wikis. And the cron would always run in a reasonable time.

Change 108670 had a related patch set uploaded by Nemo bis:
[s23.org wikistats] Throttle updates for big farms, keep updating big wikis' stats

https://gerrit.wikimedia.org/r/108670

The patch has been merged, but I'm still seeing outdated statistics.

Yes, as noted it the commit message the patch doesn't actually enable updates. just pave the way for them. I suppose it's a hardcoded crontab, unless there's some other repo I missed.

Change 175904 had a related patch set uploaded (by Dzahn):
wikistats: add cron to enabled wikia updates

https://gerrit.wikimedia.org/r/175904

Patch-For-Review

Change 175904 merged by Dzahn:
wikistats: add cron to enable wikia updates

https://gerrit.wikimedia.org/r/175904

Yes, as noted it the commit message the patch doesn't actually enable updates. just pave the way for them. I suppose it's a hardcoded crontab, unless there's some other repo I missed.

the cron jobs are created by puppet and configured in ./modules/wikistats/manifests/updates.pp. see the change above, i merged that. check again tomorrow or so pls

hmm.. the cron job should be there but still no updates, i gotta check this

Dzahn mentioned this in Unknown Object (Diffusion Commit).Dec 10 2014, 8:37 PM
Dzahn lowered the priority of this task from Medium to Low.Jan 15 2015, 10:50 PM

actual cause: unknown table. exiting

:p

Change 185357 had a related patch set uploaded (by Dzahn):
wikistats: fix Wikia updating

https://gerrit.wikimedia.org/r/185357

Patch-For-Review

needed this fix:

https://gerrit.wikimedia.org/r/#/c/185357/3

now updates are running (in screen) ..it will take a while but it works now

Change 185357 merged by Dzahn:
wikistats: fix Wikia updating

https://gerrit.wikimedia.org/r/185357