Page MenuHomePhabricator

Special:EducationProgram times out when not cached
Closed, DeclinedPublic

Description

I just tried to invoke https://en.wikipedia.org/wiki/Special:EducationProgram - it ended up leading to a timeout error. It looks like after repeated invocation it now returns a cached result. But with just tiny bits of test data it's really not acceptable for a landing page like this to be prohibitively slow on first load.


Version: unspecified
Severity: major

Details

Reference
bz42923

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:01 AM
bzimport set Reference to bz42923.
bzimport added a subscriber: Unknown Object (MLST).

Ouch.

Based on the queries I can see on my local wiki (explain then against enwiki):

mysql:wikiadmin@db63 [enwiki]> SELECT /* ORMTable::rawSelect 192.168.0.18 */ student_id FROM ep_students WHERE student_user_id = '0' LIMIT 1\G
Empty set (0.00 sec)

mysql:wikiadmin@db63 [enwiki]> EXPLAIN SELECT /* EPRoleObject::hasCourse 192.168.0.18 */ course_id FROM ep_courses INNER JOIN ep_users_per_course ON ((upc_course_id=course_id)) WHERE upc_role = '2' AND upc_user_id = '0' LIMIT 1\G

  • 1. row ******* id: 1 select_type: SIMPLE table: ep_courses type: index

possible_keys: PRIMARY

    key: ep_course_instructor_count
key_len: 1
    ref: NULL
   rows: 1
  Extra: Using index
  • 2. row ******* id: 1 select_type: SIMPLE table: ep_users_per_course type: eq_ref

possible_keys: ep_users_per_course,ep_upc_course_id,ep_upc_role

    key: ep_users_per_course
key_len: 9
    ref: const,enwiki.ep_courses.course_id,const
   rows: 1
  Extra: Using index

2 rows in set (0.01 sec)

mysql:wikiadmin@db63 [enwiki]> EXPLAIN SELECT /* EPRoleObject::hasCourse 192.168.0.18 */ course_id FROM ep_courses INNER JOIN ep_users_per_course ON ((upc_course_id=course_id)) WHERE upc_role = '3' AND upc_user_id = '0' LIMIT 1\G

  • 1. row ******* id: 1 select_type: SIMPLE table: ep_courses type: index

possible_keys: PRIMARY

    key: ep_course_instructor_count
key_len: 1
    ref: NULL
   rows: 1
  Extra: Using index
  • 2. row ******* id: 1 select_type: SIMPLE table: ep_users_per_course type: eq_ref

possible_keys: ep_users_per_course,ep_upc_course_id,ep_upc_role

    key: ep_users_per_course
key_len: 9
    ref: const,enwiki.ep_courses.course_id,const
   rows: 1
  Extra: Using index

2 rows in set (0.00 sec)

mysql:wikiadmin@db63 [enwiki]> EXPLAIN SELECT /* ORMTable::count 192.168.0.18 */ COUNT(*) AS rowcount FROM ep_orgs LIMIT 1\G

  • 1. row ******* id: 1 select_type: SIMPLE table: ep_orgs type: index

possible_keys: NULL

    key: ep_org_active
key_len: 1
    ref: NULL
   rows: 1
  Extra: Using index

1 row in set (0.00 sec)

mysql:wikiadmin@db63 [enwiki]> EXPLAIN SELECT /* ORMTable::count 192.168.0.18 */ COUNT(*) AS rowcount FROM ep_courses LIMIT 1\G

  • 1. row ******* id: 1 select_type: SIMPLE table: ep_courses type: index

possible_keys: NULL

    key: ep_course_instructor_count
key_len: 1
    ref: NULL
   rows: 1
  Extra: Using index

1 row in set (0.00 sec)

mysql:wikiadmin@db63 [enwiki]> EXPLAIN SELECT /* ORMTable::count 192.168.0.18 */ COUNT(*) AS rowcount FROM ep_courses WHERE (course_end >= '20121210224746') AND (course_start <= '20121210224746') LIMIT 1\G

  • 1. row ******* id: 1 select_type: SIMPLE table: ep_courses type: range

possible_keys: ep_course_start,ep_course_end

    key: ep_course_start
key_len: 16
    ref: NULL
   rows: 1
  Extra: Using where

1 row in set (0.00 sec)

mysql:wikiadmin@db63 [enwiki]> EXPLAIN SELECT /* SpecialEducationProgram::getRoleCount 192.168.0.18 */ DISTINCT COUNT(upc_user_id) AS rowcount FROM ep_users_per_course WHERE upc_role = '0' LIMIT 1\G

  • 1. row ******* id: 1 select_type: SIMPLE table: ep_users_per_course type: ref

possible_keys: ep_upc_role

    key: ep_upc_role
key_len: 1
    ref: const
   rows: 1
  Extra: Using index

1 row in set (0.00 sec)

mysql:wikiadmin@db63 [enwiki]> EXPLAIN SELECT /* ORMTable::rawSelect 192.168.0.18 */ course_students FROM ep_courses WHERE (course_end >= '20121210224746') AND (course_start <= '20121210224746')\G

  • 1. row ******* id: 1 select_type: SIMPLE table: ep_courses type: range

possible_keys: ep_course_start,ep_course_end

    key: ep_course_start
key_len: 16
    ref: NULL
   rows: 1
  Extra: Using where

1 row in set (0.00 sec)

mysql:wikiadmin@db63 [enwiki]> EXPLAIN SELECT /* SpecialEducationProgram::getRoleCount 192.168.0.18 */ DISTINCT COUNT(upc_user_id) AS rowcount FROM ep_users_per_course WHERE upc_role = '1' LIMIT 1\G

  • 1. row ******* id: 1 select_type: SIMPLE table: ep_users_per_course type: ref

possible_keys: ep_upc_role

    key: ep_upc_role
key_len: 1
    ref: const
   rows: 1
  Extra: Using index

1 row in set (0.00 sec)

mysql:wikiadmin@db63 [enwiki]> EXPLAIN SELECT /* SpecialEducationProgram::getRoleCount 192.168.0.18 */ DISTINCT COUNT(upc_user_id) AS rowcount FROM ep_users_per_course WHERE upc_role = '2' LIMIT 1\G

  • 1. row ******* id: 1 select_type: SIMPLE table: ep_users_per_course type: ref

possible_keys: ep_upc_role

    key: ep_upc_role
key_len: 1
    ref: const
   rows: 2
  Extra: Using index

1 row in set (0.00 sec)

mysql:wikiadmin@db63 [enwiki]> EXPLAIN SELECT /* SpecialEducationProgram::getRoleCount 192.168.0.18 */ DISTINCT COUNT(upc_user_id) AS rowcount FROM ep_users_per_course WHERE upc_role = '3' LIMIT 1\G

  • 1. row ******* id: 1 select_type: SIMPLE table: ep_users_per_course type: ref

possible_keys: ep_upc_role

    key: ep_upc_role
key_len: 1
    ref: const
   rows: 1
  Extra: Using index

1 row in set (0.00 sec)

mysql:wikiadmin@db63 [enwiki]> EXPLAIN SELECT /* ORMTable::rawSelect 192.168.0.18 */ DISTINCT course_term FROM ep_courses\G

  • 1. row ******* id: 1 select_type: SIMPLE table: ep_courses type: index

possible_keys: NULL

    key: ep_course_term
key_len: 257
    ref: NULL
   rows: 1
  Extra: Using index

1 row in set (0.00 sec)

Oh.

That's only for the first table. enwiki has Totals per term and Gender breakdown

I've had user_touched locks, 504s..

mysql:wikiadmin@db63 [enwiki]> EXPLAIN SELECT /* ORMTable::rawSelect Reedy */ course_id,course_org_id,course_name,course_title,course_start,course_end,course_description,course_token,course_students,course_instructors,course_online_ambs,course_campus_ambs,course_field,course_level,course_term,course_lang,course_student_count,course_instructor_count,course_oa_count,course_ca_count,course_touched FROM ep_courses WHERE course_term = 'All of them'\G

  • 1. row ******* id: 1 select_type: SIMPLE table: ep_courses type: ref

possible_keys: ep_course_term

    key: ep_course_term
key_len: 257
    ref: const
   rows: 1
  Extra: Using where

1 row in set (0.01 sec)

mysql:wikiadmin@db63 [enwiki]> EXPLAIN SELECT /* ORMTable::rawSelect Reedy */ DISTINCT article_page_id FROM ep_articles WHERE article_course_id = '1'\G

  • 1. row ******* id: 1 select_type: SIMPLE table: ep_articles type: ref

possible_keys: ep_articles_course_page,ep_articles_course_id

    key: ep_articles_course_page
key_len: 4
    ref: const
   rows: 1
  Extra: Using where; Using temporary

1 row in set (0.01 sec)

^ Using temporary is a starting point

Reedy, can you pinpoint which parts of the queries are causing problems? It's not clear to me. Seems like they should run fine, unless some tables have gotten several orders of magnitude better then I'd expect. Are any of them bigger then 10k rows?

I've had user_touched locks

Is that related to this bug at all? If so, how? Sounds like a separate issue to me.

really not acceptable for a landing page like this to be prohibitively slow on

first load.

Fully agree this is not acceptable and should be fixed. However, it's not a landing page - in fact nothing links to it from the other pages. It's a tool meant for simple analytics, not student usage.

(In reply to comment #6)

Fully agree this is not acceptable and should be fixed. However, it's not a
landing page - in fact nothing links to it from the other pages. It's a tool
meant for simple analytics, not student usage.

I think Erik's objection (and mine certainly) is that one can pretty easily find this page by just snooping through Special:SpecialPages without doing anything...er...special. :-) That means that people are going to stumble into this, potentially having the unintended consequence of taking out a database slave on the cluster for a few seconds in so doing. It'll only be a matter of time before one of the more database-minded opsen/devs make the unilateral decision to disable this extension until this is fixed.

Reedy, can you reply to my questions?

I'm a bit to much in the dark on what the actual issue is to come up with a good fix.

Disabled the page pending resolution of this bug since apparently it's so serious: https://gerrit.wikimedia.org/r/#/c/43022/

Still not clear on how to proceed though :)

How can we move forward on re-implementing the funcationality of that special page?

Just having the top two tables, without the demographic table at the bottom, would be fine for now, if it's the last bit that was causing the trouble.

It could be made AJAXy.. With API a few different calls, it shouldn't time out broken into chunks.

We need to find what's slow and deal with/disable that.

Added a few more specific profiling calls to SpecialEducationProgram in https://gerrit.wikimedia.org/r/46439

It could be made AJAXy..

I discussed this with Frank many months ago and we decided this really is not high on the list priority wise :)

We need to find what's slow and deal with/disable that.

Yeah

Added a few more specific profiling calls

Thnx. Any insights yet?

Just having the top two tables

Any particular reason to get rid of the third one?

Just having the top two tables

Any particular reason to get rid of the third one?

Not really, except that the first two are data that I'd like to be able to check regularly, so if the slow part comes from the third one, I'd be quite happy to see the page available with just that data.

Unassigning myself till someone provides steps to reproduce this or some usable profiling info.

tchay wrote:

Let's table this until Ori gets back from Israel/Amsterdam. :-)

We'll reimplement whatever analytics features are most valuable at some point, but trying to reenable this special page is not on the roadmap.