Author: metatron
Description:
I want to integrate the pagecounts of Tool Labs resp. Labs into tool https://tools.wmflabs.org/wikiviewstats/ .For this, it would be necessary to have access to redacted webproxy logs, which include old web (apache) and new web (lighttpd) setups.
It would be very helpful, if these logs could be structured in the same way as the current pagecount-dumps and be released on an per hour basis.
Further suggestions:
- identifier could be toollabs resp. labs.toollabs
- querystring part of url (?xyz=..) should be removed completely
Reference:
1.) IRC Petan Jan 2, 2014
2.) WIP: Tools: Add infrastructure for AWStats
https://gerrit.wikimedia.org/r/#/c/80332/
3:) IRC scfc_de Jan 2, 2014
scfc_de: hedonil: I hope to have finished puppetizing tools-webproxy by the end of the week (the AWStats stuff is done IIRC). As -webproxy is the heart of the web access, review & deployment will then be *very* careful :-), but in general, depending on Coren's schedule, it should be deployable by between the end of next week and the end of the month.
The current pagecount-dumps are generated on an per hour basis and share the following structure:
filename eg:
pagecounts-20140101-020000.gz
1.: identifier 2.: pagetitle 3.: hits 4.: bytes
En.d perform 3 60088
En.d rainforest 3 33780
En.d servers 3 22471
En.d situation 1 107043
En.d upwards 1 32565
En.d variety 2 59495
En Allergy 3 324964
En Arthur_Rubinstein 1 0
En Article 1 0
En British_cuisine 1 191021
hierarchical structure of identifier
en - Wikipedia (en)
en.b - Wikibooks (en)
en.d - Wikdionary (en)
en.n - Wikinews (en)
etc.
Version: unspecified
Severity: normal