Page MenuHomePhabricator

database consumer could batch inserts (sometimes) [34pts]
Closed, ResolvedPublic

Description

The eventlogging database consumer generates single-row inserts. From a performance perspective (master overhead and slave replication lag) it would be better to batch inserts.

The batches need not be large or of a particular size. The approach could simply be opportunistic grouping of small numbers of rows in the same table when more than one is available.

IRC excerpt:

<ori> one thing that would have to be rethought is table creation
<ori> right now eventlogging always just tries to insert events
<ori> if the database errors out because the table doesn't exist, *then* it

issues the create table statement

<ori> this is nice because you can drop or rename a table and it just gets

recreated anew without any downtime

<ori> i don't have a ready-made model of how this would work in a world where

we do batch inserts, but my reflexive hunch is that it's not a 
show-stopping problem, and that we could work around it

<springle> not knowing the code at all, but could it be simple opportunistic

batching? when inserting a record, check if additional records for 
the same table exist, and group them

<springle> might only get a few each time, but that would still be better
<ori> yeah, totally


Version: unspecified
Severity: enhancement
Whiteboard: u=Analyst c=EventLogging p=34 s=2014-11-13

Details

Reference
bz67450

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 22 2014, 3:29 AM
bzimport set Reference to bz67450.
bzimport added a subscriber: Unknown Object (MLST).

More detailed information of why is this item important when it comes to making EL data public is available in this e-mail thread:

https://lists.wikimedia.org/pipermail/analytics/2014-August/002434.html

Change 169977 had a related patch set uploaded by Nuria:
[WIP] Batching event insertion

https://gerrit.wikimedia.org/r/169977

Change 169977 merged by jenkins-bot:
Batch event insertion

https://gerrit.wikimedia.org/r/169977

kevinator renamed this task from database consumer could batch inserts (sometimes) to database consumer could batch inserts (sometimes) [34pts].Nov 26 2014, 1:01 AM

Changes to batch events (and subsequent fixes) are deployed to vanadium