Page MenuHomePhabricator

Replicate the Phabricator database to labsdb
Closed, ResolvedPublic

Description

Gerrit data can be useful to write a whole new class of tools :)


Version: unspecified
Severity: enhancement

Details

Reference
bz50422

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:50 AM
bzimport added a project: Toolforge.
bzimport set Reference to bz50422.

This may not be all that relevant given the likelihood of switching to phabricator in the medium term.

In that case we can re-purpose the bug to make Phabricator's database available on labs after the switch happens.

We are also not switching Gerrit to Phabricator yet - just Bugzilla and RT.

No idea what "Phabricator" database this is about. Everything?
Plus no usecase described here at all.
Plus I don't see how to sanitize that data (legal).

I think the original use case (If I recall correctly from IRC) was to get gerrit /and bugzilla/ databases available from labs.

I don't think there's a problem in principle, though it will require a great deal of consideration. (Sanitation in particular - it's not an insurmountable problem since we do exactly that for the project databases).

I mostly intended to keep this bug open as a task tracker for the longer term.

(In reply to Andre Klapper from comment #4)

No idea what "Phabricator" database this is about. Everything?
Plus no usecase described here at all.
Plus I don't see how to sanitize that data (legal).

Yes, everything that is not-private.

Usecase is to create tools like [[mw:Gerrit/Reports]] that don't need to rely on hacks like bug 52329.

Plus I don't see how to sanitize that data (legal).

The same way we do it for MediaWiki databases?

Plus I don't see how to sanitize that data (legal).

The same way we do it for MediaWiki databases?

Feel free to provide links to how it's done for MW DBs if it can be done "the same way".

Feel free to provide links to how it's done for MW DBs if it can be done "the same way".

A couple of code links:

I don't believe there's an abundance of documentation about this, but @Springle and @yuvipanda can probably provide more background.

Things you would need:

  1. Legal OKying things
  2. List of tables that can be fully replicated, without any redaction
  3. List of tables that need to have some data redacted, and
    1. Which columns must be fully redacted
    2. Which columns must be conditionally redacted, and what those conditions are

Then we'll have to:

  1. Set up triggers to do the conditional and unconditional nulling in the sanitarium DBs
  2. Set up views to expose the data to labsdb users.

Lots of manual work + lots of DBA work :)

yuvipanda lowered the priority of this task from Low to Lowest.Jan 14 2015, 11:49 AM
coren removed coren as the assignee of this task.Mar 25 2015, 8:13 PM
coren set Security to None.
chasemp claimed this task.
chasemp subscribed.

So we have visited this in a few other tickets and the tldr is there is a huge amount of data that is sensitive in Phab for fundraising, procurement, security, etc. It's not viable to blacklist this data. The best we can do is whitelist public tasks and the associated objects and publish a structure data dump.

That is here:

http://dumps.wikimedia.org/other/misc/