database consumer could batch inserts (sometimes) [34pts]
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	• Springle
	Jul 3 2014, 3:03 AM

Description

The eventlogging database consumer generates single-row inserts. From a performance perspective (master overhead and slave replication lag) it would be better to batch inserts.

The batches need not be large or of a particular size. The approach could simply be opportunistic grouping of small numbers of rows in the same table when more than one is available.

IRC excerpt:

<ori> one thing that would have to be rethought is table creation
<ori> right now eventlogging always just tries to insert events
<ori> if the database errors out because the table doesn't exist, *then* it

issues the create table statement

<ori> this is nice because you can drop or rename a table and it just gets

recreated anew without any downtime

<ori> i don't have a ready-made model of how this would work in a world where

we do batch inserts, but my reflexive hunch is that it's not a 
show-stopping problem, and that we could work around it

<springle> not knowing the code at all, but could it be simple opportunistic

batching? when inserting a record, check if additional records for 
the same table exist, and group them

<springle> might only get a few each time, but that would still be better
<ori> yeah, totally

Version: unspecified
Severity: enhancement
Whiteboard: u=Analyst c=EventLogging p=34 s=2014-11-13

Details

Reference: bz67450

Event Timeline

• bzimport raised the priority of this task from to Lowest.Nov 22 2014, 3:29 AM

• bzimport added a project: MediaWiki-extensions-EventLogging.

• bzimport set Reference to bz67450.

• bzimport added a subscriber: Unknown Object (MLST).

• Springle created this task.Jul 3 2014, 3:03 AM

More detailed information of why is this item important when it comes to making EL data public is available in this e-mail thread:

https://lists.wikimedia.org/pipermail/analytics/2014-August/002434.html

Change 169977 had a related patch set uploaded by Nuria:
[WIP] Batching event insertion

https://gerrit.wikimedia.org/r/169977

Actual beginning of e-mail thread with pertinent conversation: https://lists.wikimedia.org/pipermail/analytics/2014-August/002429.html

Change 169977 merged by jenkins-bot:
Batch event insertion

https://gerrit.wikimedia.org/r/169977

• kevinator added a project: Analytics-Engineering.Nov 25 2014, 11:44 PM

• kevinator set Security to None.

• kevinator moved this task from Incoming to Coding and Testing on the Analytics-Engineering board.

• kevinator moved this task from Coding and Testing to Done (Shipped) on the Analytics-Engineering board.

• kevinator renamed this task from database consumer could batch inserts (sometimes) to database consumer could batch inserts (sometimes) [34pts].Nov 26 2014, 1:01 AM

• ggellerman added a project: Analytics-Sprint-2014-11-13.Nov 26 2014, 9:47 PM

• kevinator moved this task from Work in Progress to Done on the Analytics-Sprint-2014-11-13 board.Nov 26 2014, 9:48 PM

• kevinator removed a project: Analytics-Engineering.Nov 26 2014, 11:25 PM

Changes to batch events (and subsequent fixes) are deployed to vanadium

database consumer could batch inserts (sometimes) [34pts]Closed, ResolvedPublicActions

Description

Details

Event Timeline

database consumer could batch inserts (sometimes) [34pts]
Closed, ResolvedPublic
Actions