Description
Details
- Reference
- bz63203
Related Objects
- Mentioned Here
- T211835: Sunset Wikimetrics
Event Timeline
bingle-admin wrote:
Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/cards/1502
Given that mediawiki utilities uses plain SQL and we are set to use alembic I do not see how this 'integration' could happen. Are we sure we want to keep this bug open?
wikimetrics is using sqlalchemy, and that's a bit of a mismatch with mediawiki utilities. I don't think that's too big of a deal, we could integrate the tools if it's a good idea. But that depends on which way wikimetrics as a product goes, and how we structure our data pipeline.
One possibility is to have wikimetrics become the ETL tool for public data. It could restructure our OLTP + recent changes + event streams into a more traditional, easy to work with, data warehouse. In that case, the logic from mediawiki-utilities would be very useful. We may wish to convert some of it to sqlalchemy, but that's a minor point.
Another possibility is to have a separate ETL process, based on an existing tool or a combination of tools. Wikimetrics would then be re-fashioned to query on top of the resulting data warehouse. In that case, mediawiki-utilities could be used to inform the ETL process but it would have a very different purpose from Wikimetrics.
I'm not opinionated on which way we go, but I think we should keep this bug open as a reminder of the great logic encapsulated in mediawiki-utilities.