Page MenuHomePhabricator

Segregate and document configuration variables
Closed, DeclinedPublic

Description

One problem with using wikistats is that it has a lot of configuration variables.
A simple grep -Er "^[A-Za-z]+=" . finds:


./bash/collect_edits.sh:wikistats=/a/wikistats_git
./bash/collect_edits.sh:dumps=$wikistats/dumps
./bash/collect_edits.sh:perl=$dumps/perl
./bash/collect_edits.sh:csv=$dumps/csv
./bash/collect_edits.sh:input=/mnt/data/xmldatadumps/public/nlwikinews/20121115/nlwikinews-20121115-stub-meta-history.xml.gz
./bash/report_en.sh:wikistats=/a/wikistats_git
./bash/report_en.sh:dumps=$wikistats/dumps
./bash/report_en.sh:perl=$dumps/perl
./bash/report_en.sh:bash=$dumps/bash
./bash/report_en.sh:logs=$dumps/logs
./bash/report_en.sh:csv=$dumps/csv
./bash/report_en.sh:out=$dumps/out
./bash/progress_wikistats.sh:wikistats=/a/wikistats_git
./bash/progress_wikistats.sh:dumps=$wikistats/dumps
./bash/progress_wikistats.sh:perl=$dumps/perl
./bash/progress_wikistats.sh:out=$dumps/out
./bash/progress_wikistats.sh:dammit=/a/dammit.lt
./bash/progress_wikistats.sh:htdocs=stat1001.wikimedia.org::a/srv/stats.wikimedia.org/htdocs
./bash/zip_all.sh:wikistats=/a/wikistats_git
./bash/backup_monthly.sh:wikistats=/a/wikistats_git
./bash/backup_monthly.sh:backup=$wikistats/backup
./bash/backup_monthly.sh:dumps=$wikistats/dumps
./bash/backup_monthly.sh:csv=$dumps/csv
./bash/backup_monthly.sh:dt=$(date +[%Y-%m-%d][%H:%M])
./bash/report.sh:wikistats=/a/wikistats_git
./bash/report.sh:dumps=$wikistats/dumps
./bash/report.sh:perl=$dumps/perl
./bash/report.sh:bash=$dumps/bash
./bash/report.sh:csv=$dumps/csv
./bash/report.sh:out=$dumps/out
./bash/report.sh:htdocs=stat1001.wikimedia.org::a/srv/stats.wikimedia.org/htdocs/
./bash/report.sh:log=$dumps/logs/log_report_sh.txt
./bash/report.sh:interval=0 # only update non-English reports once per 'interval' days
./bash/report.sh:projectcode="$1"
./bash/count_commons_images_wlm.sh:wikistats=/a/wikistats_git
./bash/count_commons_images_wlm.sh:dumps=$wikistats/dumps
./bash/count_commons_images_wlm.sh:perl=$dumps/perl
./bash/count_commons_images_wlm.sh:perl=/home/ezachte/wikistats/dumps/perl # tests
./bash/count_commons_images_wlm.sh:csv=$dumps/csv
./bash/count_commons_images_wlm.sh:countrycodes=/a/wikistats_git/squids/csv/meta/CountryCodes.csv
./bash/count_editors.sh:wikistats=/a/wikistats_git
./bash/count_editors.sh:dumps=$wikistats/dumps
./bash/count_editors.sh:perl=$dumps/perl
./bash/count_editors.sh:csv=$dumps/csv
./bash/count_editors.sh:out=$dumps/out
./bash/count_editors.sh:htdocs=stat1001.wikimedia.org::a/srv/stats.wikimedia.org/htdocs/
./bash/count_editors.sh:bashpath="${PWD}"
./bash/backup_weekly.sh:wikistats=/a/wikistats_git
./bash/backup_weekly.sh:backup=$wikistats/backup
./bash/backup_weekly.sh:analytics=$wikistats/analytics
./bash/backup_weekly.sh:dammit=$wikistats/dammit.lt
./bash/backup_weekly.sh:dumps=$wikistats/dumps
./bash/backup_weekly.sh:perl=$dumps/perl
./bash/backup_weekly.sh:bash=$dumps/bash
./bash/backup_weekly.sh:csv=$dumps/csv
./bash/backup_weekly.sh:out=$dumps/out
./bash/backup_weekly.sh:projectcounts=$dammit/projectcounts
./bash/backup_weekly.sh:dt=$(date +[%Y-%m-%d][%H:%M])
./bash/report_all_editors.sh:wikistats=/a/wikistats_git
./bash/report_all_editors.sh:dumps=$wikistats/dumps
./bash/report_all_editors.sh:perl=$dumps/perl
./bash/report_all_editors.sh:csv=$dumps/csv
./bash/report_all_editors.sh:out=$dumps/out
./bash/report_all_editors.sh:htdocs=stat1001.wikimedia.org::a/srv/stats.wikimedia.org/htdocs/
./bash/zip_csv.sh:wikistats=/a/wikistats_git
./bash/zip_csv.sh:csv=$wikistats/dumps/csv
./bash/count_report_publish_wmf.sh:wikistats=/a/wikistats_git
./bash/count_report_publish_wmf.sh:dumps=$wikistats/dumps
./bash/count_report_publish_wmf.sh:perl=$dumps/perl
./bash/count_report_publish_wmf.sh:csv=$dumps/csv
./bash/count_report_publish_wmf.sh:out=$dumps/out
./bash/count_report_publish_wmf.sh:php=/a/mediawiki/core/languages
./bash/count_report_publish_wmf.sh:force=-f
./bash/count_report_publish_wmf.sh:date=today
./bash/archived_used_once_or_obsolete/regusers.sh:dumps=/mnt/data/xmldatadumps
./bash/archived_used_once_or_obsolete/titles.sh:m=wp
./bash/archived_used_once_or_obsolete/titles.sh:p=afwiki
./bash/archived_used_once_or_obsolete/titles.sh:dumps=/mnt/data/xmldatadumps
./bash/archived_used_once_or_obsolete/extract_reg_user.sh:wiki=enwiki
./bash/archived_used_once_or_obsolete/extract_reg_user.sh:date=20091103
./bash/archived_used_once_or_obsolete/extract_reg_user.sh:dumps=/mnt/data/xmldatadumps
./bash/archived_used_once_or_obsolete/publish_scripts.sh:htdocs=stat1001.wikimedia.org::a/srv/stats.wikimedia.org/htdocs/
./bash/archived_used_once_or_obsolete/publish_scripts.sh:perl=/a/wikistats/scripts/perl
./bash/archived_used_once_or_obsolete/publish.sh:now=date +%s
./bash/archived_used_once_or_obsolete/publish.sh:htdocs="stat1001.wikimedia.org::a/srv/stats.wikimedia.org/$dir/csv"
./bash/archived_used_once_or_obsolete/publish.sh:csv="/a/wikistats/csv_$1"
./bash/archived_used_once_or_obsolete/publish.sh:archive="/mnt/data/xmldatadumps/public/other/pagecounts-ez/wikistats" # odd name, temp location
./bash/archived_used_once_or_obsolete/publish.sh:publish="#publish.txt"
./bash/archived_used_once_or_obsolete/publish_regions.sh:htdocs=stat1001.wikimedia.org::a/srv/stats.wikimedia.org/htdocs/
./bash/report_one_only.sh:wikistats=/a/wikistats_git
./bash/report_one_only.sh:dumps=$wikistats/dumps
./bash/report_one_only.sh:perl=$dumps/perl
./bash/report_one_only.sh:bash=$dumps/bash
./bash/report_one_only.sh:logs=$dumsp/logs
./bash/report_one_only.sh:csv=$dumps/csv
./bash/report_one_only.sh:out=$dumps/out
./bash/report_one_only.sh:htdocs=stat1001.wikimedia.org::a/srv/stats.wikimedia.org/htdocs/
./bash/report_one_only.sh:mode=wp
./bash/report_one_only.sh:lang=en
./bash/count_prep_animations.sh:wikistats=/a/wikistats_git
./bash/count_prep_animations.sh:dumps=$wikistats/dumps
./bash/count_prep_animations.sh:perl=$dumps/perl
./bash/count_prep_animations.sh:perl=/home/ezachte/wikistats/dumps/perl # tests
./bash/count_prep_animations.sh:csv=$dumps/csv
./bash/count_prep_animations.sh:out=$wikistats/animations/growth
./bash/count_prep_animations.sh:htdocs=stat1001.wikimedia.org::a/srv/stats.wikimedia.org/htdocs/
./bash/count_report_publish_non_wp.sh:wikistats=/a/wikistats_git
./bash/count_report_publish_non_wp.sh:dumps=$wikistats/dumps
./bash/count_report_publish_non_wp.sh:bash=$dumps/bash
./bash/count_report_publish_non_wp.sh:log=$dumps/logs/log_count_report_publish_non_wp.txt
./bash/report_all.sh:wikistats=/a/wikistats_git
./bash/list_newest_dumps.sh:wikistats=/a/wikistats_git
./bash/list_newest_dumps.sh:dumps=$wikistats/dumps
./bash/list_newest_dumps.sh:perl=$dumps/perl
./bash/list_newest_dumps.sh:csv=$dumps/csv
./bash/list_newest_dumps.sh:dblists=$dumps/dblists
./bash/collect_countable_namespaces.sh:wikistats=/a/wikistats_git
./bash/collect_countable_namespaces.sh:perl=$wikistats/dumps/perl
./bash/collect_countable_namespaces.sh:perl=/home/ezachte/wikistats/dumps/perl # tests
./bash/collect_countable_namespaces.sh:csv=$wikistats/dumps/csv
./bash/collect_countable_namespaces.sh:htdocs=stat1001.wikimedia.org::a/srv/stats.wikimedia.org/htdocs/
./bash/report_regions.sh:wikistats=/a/wikistats_git
./bash/report_regions.sh:dumps=$wikistats/dumps
./bash/report_regions.sh:perl=$dumps/perl
./bash/report_regions.sh:bash=$dumps/bash
./bash/report_regions.sh:csv=$dumps/csv
./bash/report_regions.sh:out=$dumps/out
./bash/report_regions.sh:htdocs=stat1001.wikimedia.org::a/srv/stats.wikimedia.org/htdocs/
./bash/report_regions.sh:log=$dumps/logs/log_report_regions.txt
./bash/sort_dblists.sh:wikistats=/a/wikistats_git
./bash/sort_dblists.sh:dumps=$wikistats/dumps
./bash/sort_dblists.sh:perl=$dumps/perl
./bash/sort_dblists.sh:perl=/home/ezachte/wikistats/dumps/perl # tests
./bash/sort_dblists.sh:csv=$dumps/csv
./bash/sort_dblists.sh:dblists=$dumps/dblists
./bash/report_test.sh:wikistats=/a/wikistats_git
./bash/report_test.sh:dumps=$wikistats/dumps
./bash/report_test.sh:perl=$dumps/perl
./bash/report_test.sh:perl=/home/ezachte/wikistats/dumps/perl # test
./bash/report_test.sh:csv=$dumps/csv
./bash/report_test.sh:out=$dumps/out
./bash/pageviews_monthly_sp.sh:wikistats=/a/wikistats_git
./bash/pageviews_monthly_sp.sh:dumps=$wikistats/dumps
./bash/pageviews_monthly_sp.sh:perl=$dumps/perl
./bash/pageviews_monthly_sp.sh:csv=$dumps/csv
./bash/pageviews_monthly_sp.sh:out=$dumps/out
./bash/pageviews_monthly_sp.sh:report=$dumps/logs/log_pageviews_monthly_sp.txt
./bash/pageviews_monthly_sp.sh:htdocs=stat1001.wikimedia.org::a/srv/stats.wikimedia.org/htdocs/
./bash/count_report_publish_wp.sh:wikistats=/a/wikistats_git
./bash/count_report_publish_wp.sh:log=$wikistats/dumps/logs/log_count_report_publish_concise_wp.txt
./bash/report_publish_some.sh:wikistats=/a/wikistats_git
./bash/report_publish_some.sh:dumps=$wikistats/dumps
./bash/report_publish_some.sh:bash=$dumps/bash
./bash/merge_editors.sh:wikistats=/a/wikistats_git
./bash/merge_editors.sh:dumps=$wikistats/dumps
./bash/merge_editors.sh:perl=$dumps/perl
./bash/merge_editors.sh:csv=$dumps/csv
./bash/merge_editors.sh:log=$dumps/logs/log_merge_editors.txt
./bash/count.sh:project=$1
./bash/count.sh:wikistats=/a/wikistats_git
./bash/count.sh:dumps=$wikistats/dumps # folder for scripts and output
./bash/count.sh:perl=$dumps/perl
./bash/count.sh:perl=/home/ezachte/wikistats/dumps/perl # tests
./bash/count.sh:csv=$dumps/csv
./bash/count.sh:bash=$dumps/bash
./bash/count.sh:dblists=$dumps/dblists
./bash/count.sh:php=/a/mediawiki/core/languages
./bash/count.sh:trace=-r # trace resources
./bash/pageviews_monthly.sh:wikistats=/a/wikistats_git
./bash/pageviews_monthly.sh:dumps=$wikistats/dumps
./bash/pageviews_monthly.sh:perl=$dumps/perl
./bash/pageviews_monthly.sh:csv=$dumps/csv
./bash/pageviews_monthly.sh:out=$dumps/out
./bash/pageviews_monthly.sh:report=$dumps/logs/log_pageviews_monthly.txt
./bash/pageviews_monthly.sh:projectcounts=/a/dammit.lt/projectcounts
./bash/pageviews_monthly.sh:htdocs=stat1001.wikimedia.org::a/srv/stats.wikimedia.org/htdocs/
./bash/pageviews_monthly.sh:list=WhiteListWikis.csv
./bash/count_state_of_the_wiki.sh:wikistats=/a/wikistats_git
./bash/count_state_of_the_wiki.sh:dumps=$wikistats/dumps
./bash/count_state_of_the_wiki.sh:perl=$dumps/perl
./bash/count_state_of_the_wiki.sh:perl=/home/ezachte/wikistats/dumps/perl # tests
./bash/count_state_of_the_wiki.sh:csv=$dumps/csv
./bash/count_state_of_the_wiki.sh:log=$dumps/logs/count_wikis_by_size_by_growth.log
./bash/count_state_of_the_wiki.sh:htdocs=stat1001.wikimedia.org::a/srv/stats.wikimedia.org/htdocs/
./bash/publish_all.sh:wikistats=/a/wikistats_git
./bash/publish_all.sh:dumps=$wikistats/dumps
./bash/publish_all.sh:bash=$dumps/bash
./bash/publish_all.sh:bash=/home/ezachte/wikistats/dumps/bash # tests
./bash/sync_language_files.sh:wikistats=/a/wikistats_git
./bash/sync_language_files.sh:dumps=$wikistats/dumps
./bash/sync_language_files.sh:csv=$dumps/csv
./bash/tar_data_reportcard.sh:wikistats=/a/wikistats_git
./bash/tar_data_reportcard.sh:csv=$wikistats/dumps/csv
./bash/count_merge_editors.sh:wikistats=/a/wikistats_git
./bash/count_merge_editors.sh:dumps=$wikistats/dumps
./bash/count_merge_editors.sh:perl=$dumps/perl
./bash/count_merge_editors.sh:perl=/home/ezachte/wikistats/dumps/perl # tests
./bash/count_merge_editors.sh:csv=$dumps/csv
./bash/count_merge_editors.sh:log=$dumps/logs/count_merge_editors.log
./bash/zip_out.sh:wikistats=/a/wikistats_git
./bash/zip_out.sh:out=$wikistats/dumps/out
./bash/count_words.sh:x=1
./bash/count_wp_one.sh:wikistats=/a/wikistats_git
./bash/count_wp_one.sh:dumps=$wikistats/dumps
./bash/count_wp_one.sh:perl=$dumps/perl
./bash/count_wp_one.sh:perl=/home/ezachte/wikistats/dumps/perl # tests
./bash/count_wp_one.sh:csv=$dumps/csv
./bash/count_wp_one.sh:php=/a/mediawiki/core/languages
./bash/count_wp_one.sh:date=auto # 20101231 # auto
./bash/count_wp_one.sh:x=fywiki
./bash/count_wp_one.sh:project=wp
./perl/WikimediaDownload.pl:EXs=screen;EXw=EXs.width;navigator.appName!="Netscape"?
./perl/WikimediaDownload.pl:EXb=EXs.colorDepth:EXb=EXs.pixelDepth;
./perl/WikimediaDownload.pl:EXd=document;EXw?"":EXw="na";EXb?"":EXb="na";
./perl/WikimediaDownload.pl:src="http://nht-2.extreme-dm.com/n3.g?login=infodis&url=nojs&j=n&jv=n&pv=" />
./perl/WikiReportsScripts.pm:border=0 width=1 alt=''></a>
./perl/WikiReportsScripts.pm:EXs=screen;EXw=EXs.width;navigator.appName!='Netscape'?
./perl/WikiReportsScripts.pm:EXb=EXs.colorDepth:EXb=EXs.pixelDepth;
./perl/WikiReportsScripts.pm:EXd=document;
./perl/WikiReportsScriptsHtml.pm:border=0 width=1 alt=''></a>
./perl/WikiReportsScriptsHtml.pm:EXs=screen;EXw=EXs.width;navigator.appName!='Netscape'?
./perl/WikiReportsScriptsHtml.pm:EXb=EXs.colorDepth:EXb=EXs.pixelDepth;
./perl/WikiReportsScriptsHtml.pm:EXd=document;


Is it possible to concentrate all the configuration in a single file? I don't know anything about multiple-file bash/shell scripts. It would be nice to have only one file to edit.
If you tell me what's an acceptable path, I'd gladly submit patches.


Version: unspecified
Severity: normal

Details

Reference
bz62566

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:02 AM
bzimport set Reference to bz62566.
bzimport added a subscriber: Unknown Object (MLST).

Hm, also:

$ grep -r "/home/ezachte" .
./bash/report.sh:#perl=/home/ezachte/wikistats/dumps/perl # tests
./bash/count_commons_images_wlm.sh:perl=/home/ezachte/wikistats/dumps/perl # tests
./bash/count_editors.sh: perl=/home/ezachte/wikistats/dumps/perl # tests
./bash/backup_weekly.sh:cd /home/ezachte
./bash/report_all_editors.sh:# perl=/home/ezachte/wikistats/dumps/perl # tests
./bash/archived_used_once_or_obsolete/titles.sh:perl WikiStatsCollectArticleNames.pl -p $p -i $dumps/public/$p -o /home/ezachte/wikistats/titles
./bash/archived_used_once_or_obsolete/publish_all.sh:cd /home/ezachte/wikistats
./bash/count_prep_animations.sh:perl=/home/ezachte/wikistats/dumps/perl # tests
./bash/count_report_publish_non_wp.sh:# bash=/home/ezachte/wikistats/dumps/bash # tests
./bash/collect_countable_namespaces.sh:perl=/home/ezachte/wikistats/dumps/perl # tests
./bash/sort_dblists.sh:perl=/home/ezachte/wikistats/dumps/perl # tests
./bash/report_test.sh:perl=/home/ezachte/wikistats/dumps/perl # test
./bash/pageviews_monthly_sp.sh:#perl=/home/ezachte/wikistats/dumps/perl # tests
./bash/count.sh:perl=/home/ezachte/wikistats/dumps/perl # tests
./bash/pageviews_monthly.sh:# perl=/home/ezachte/wikistats/dumps/perl # tests
./bash/pageviews_monthly.sh:# projectcounts=/home/ezachte/test/projectcounts # tests
./bash/count_state_of_the_wiki.sh:perl=/home/ezachte/wikistats/dumps/perl # tests
./bash/publish_all.sh:bash=/home/ezachte/wikistats/dumps/bash # tests
./bash/count_merge_editors.sh:perl=/home/ezachte/wikistats/dumps/perl # tests
./bash/count_wp_one.sh:perl=/home/ezachte/wikistats/dumps/perl # tests
./perl/TestEzLib.pl: use lib "/home/ezachte/lib" ;
./perl/WikiCountsArguments.pm: use lib "/home/ezachte/lib" ;
./perl/WikiReportsSelectTimelines.pl: use lib "/home/ezachte/lib" ;
./perl/QD_tools/TestSortHash.pl:use lib "/home/ezachte/lib" ;
./perl/QD_tools/WikiStatsWikipediaWeekly.pl: require "/home/ezachte/wikistats/WikiReportsDate.pl" ;
./perl/QD_tools/WikiCountsRegUsers.pl: use lib "/home/ezachte/lib" ;
./perl/QD_tools/WikiCountsRegUsers.pl:if (-e "/home/ezachte/")
./perl/QD_tools/WikiCountsRegUsers.pl:if (-e "/home/ezachte/")
./perl/QD_tools/WikiStatsScanCategories.pl: { $dir = "/home/ezachte/wikistats/viewspercat" ; }
./perl/QD_tools/WikiStatsScanCategories.pl: require "/home/ezachte/wikistats/WikiReportsDate.pl" ;
./perl/QD_tools/WikiStatsPageViewsPerPagePerCategory.pl: use lib "/home/ezachte/lib" ;
./perl/QD_tools/WikiStatsPageViewsPerPagePerCategory.pl: $dir_out = "/home/ezachte/wikistats/viewspercat" ;
./perl/QD_tools/WikiExtractRegUsers.pl:if (-e "/home/ezachte/")
./perl/QD_tools/WikiExtractRegUsers.pl:#if (-e "/home/ezachte/")
./perl/WikiReportsProcessReverts.pm: use lib "/home/ezachte/lib" ;
./perl/WikiCountWords.pl: use lib "/home/ezachte/lib" ;
./perl/WikiReportsOutputTables.pm: use lib "/home/ezachte/lib" ;
./perl/EzLib.pm:use lib "/home/ezachte/lib" ;
./perl/EzLib.pm:if ($os_linux) # && (-d "/home/ezachte")) # runs on server, to be refined
./perl/EzLib.pm: $path_home = "/home/ezachte" ;
./perl/EzLib.pm: $path_pm = "/home/ezachte/lib/$file_pm" ;
./perl/WikiCountsTimeDistribution.pl: use lib "/home/ezachte/lib" ;
./perl/WikiReports.pl: use lib "/home/ezachte/lib" ;
./perl/WikiReportsSampledVisitorsLog.pl: $dir_root = "/home/ezachte" ;
./perl/WikiCountsScanNamespacesWithContent.pl: use lib "/home/ezachte/lib" ;
./perl/WikiCountsSummarizeProjectCounts.pl: use lib "/home/ezachte/lib" ;
./perl/WikiCountsRankPageHistory.pl: use lib "/home/ezachte/lib" ;
./perl/WikiCountsJobProgress.pl: use lib "/home/ezachte/lib" ;
./perl/WikiCounts.pl: use lib "/home/ezachte/lib" ;

though there is some

$ grep -r cfg_liblocation .
./squids/conf-editors/SquidReportArchiveConfig-editors.pm:#$cfg_liblocation = "/a/wikistats_git/squids/perl" ;
./squids/conf-editors/SquidReportArchiveConfig-editors.pm:$cfg_liblocation = "/home/spetrea/wikistats/wikistats/squids/perl" ;
./squids/conf-editors/SquidCountArchiveConfig-editors.pm:#$cfg_liblocation = "$squids/perl" ;
./squids/conf-editors/SquidCountArchiveConfig-editors.pm:#$cfg_liblocation = "/a/wikistats_git/squids/perl" ;
./squids/conf-editors/SquidCountArchiveConfig-editors.pm:$cfg_liblocation = "/home/spetrea/wikistats/wikistats/squids/perl" ;
./squids/testdata/regression-countries-count-arithmetic/SquidReportArchiveConfig.pm:$cfg_liblocation = "$CODE_BASE/perl" ;
./squids/testdata/regression-countries-count-arithmetic/SquidCountArchiveConfig.pm:$cfg_liblocation = "$
CODE_BASE/perl" ;
./squids/testdata/regression-mingle-356-bugzilla-46269/SquidReportArchiveConfig.pm:$cfg_liblocation = "$CODE_BASE/perl" ;
./squids/testdata/regression-mingle-356-bugzilla-46269/SquidCountArchiveConfig.pm:$cfg_liblocation = "$
CODE_BASE/perl" ;
./squids/testdata/regression-mismatch-world-north-south-unknown/SquidReportArchiveConfig.pm:$cfg_liblocation = "$CODE_BASE/perl" ;
./squids/testdata/regression-mismatch-world-north-south-unknown/SquidCountArchiveConfig.pm:$cfg_liblocation = "$
CODE_BASE/perl" ;
./squids/testdata/merge-australia-into-oceania/SquidReportArchiveConfig.pm:$cfg_liblocation = "$CODE_BASE/perl" ;
./squids/testdata/merge-australia-into-oceania/SquidCountArchiveConfig.pm:$cfg_liblocation = "$
CODE_BASE/perl" ;
./squids/testdata/regression-sample/SquidReportArchiveConfig.pm:$cfg_liblocation = "$CODE_BASE/perl" ;
./squids/testdata/regression-sample/SquidCountArchiveConfig.pm:$cfg_liblocation = "$
CODE_BASE/perl" ;
./squids/testdata/regression-test-ipv6-wrong-external-domain/SquidReportArchiveConfig.pm:$cfg_liblocation = "$CODE_BASE/perl" ;
./squids/testdata/regression-test-ipv6-wrong-external-domain/SquidCountArchiveConfig.pm:$cfg_liblocation = "$
CODE_BASE/perl" ;
./squids/testdata/regression-tablets-discrepancy_for_config_editors/SquidReportArchiveConfig.pm:$cfg_liblocation = "$CODE_BASE/perl" ;
./squids/testdata/regression-tablets-discrepancy_for_config_editors/SquidCountArchiveConfig.pm:$cfg_liblocation = "$
CODE_BASE/perl" ;
./squids/testdata/regression-totals-fixes-for-squidreportclients/SquidReportArchiveConfig.pm:$cfg_liblocation = "$CODE_BASE/perl" ;
./squids/testdata/regression-totals-fixes-for-squidreportclients/SquidCountArchiveConfig.pm:$cfg_liblocation = "$
CODE_BASE/perl" ;
./squids/conf-mobile/SquidCountArchiveConfig-mobile.pm:#$cfg_liblocation = "$squids/perl" ;
./squids/conf-mobile/SquidCountArchiveConfig-mobile.pm:#$cfg_liblocation = "/a/wikistats_git/squids/perl" ;
./squids/conf-mobile/SquidCountArchiveConfig-mobile.pm:$cfg_liblocation = "/home/spetrea/wikistats/wikistats/squids/perl" ;
./squids/conf-mobile/SquidReportArchiveConfig-mobile.pm:#$cfg_liblocation = "/a/wikistats_git/squids/perl" ;
./squids/conf-mobile/SquidReportArchiveConfig-mobile.pm:$cfg_liblocation = "/home/spetrea/wikistats/wikistats/squids/perl" ;
./squids/perl/SquidReportArchiveConfig.pm: $cfg_liblocation = "$squids/perl" ;
./squids/perl/SquidCountArchive.pl: croak "Expected \$cfg_liblocation to be defined inside config .pm file" if !defined $cfg_liblocation;
./squids/perl/SquidCountArchive.pl: unshift(@INC,$cfg_liblocation);
./squids/perl/SquidCountArchiveWriteOutput.pm:use lib $cfg_liblocation ;
./squids/perl/SquidCountryScanConfig.pm:$cfg_liblocation = "$squids/perl/" ;
./squids/perl/SquidCountArchiveConfig.pm:$cfg_liblocation = "$squids/perl" ;
./squids/perl/SquidCountryScan.pl: use lib $cfg_liblocation ;
./squids/perl/SquidReportArchive.pl: croak "Expected \$cfg_liblocation to be defined inside config .pm file" if !defined $cfg_liblocation;
./squids/perl/SquidReportArchive.pl: unshift(@INC,$cfg_liblocation);
./dumps/perl/SquidReportArchive.pl: use lib $cfg_liblocation ;

Pasting here what I'm doing for one wiki so that I don't forget. Some manual actions are also needed, but those might be an error in configuration so I'm keeping everything together.

0) Checkouts

  1. Replace the path (almost) everywhere: sed -i "s/\/a\/wikistats_git/MYESCAPEDPATH/g" dumps/bash/*sh dumps/perl/*pl dumps/perl/*pm
  2. Comment the path tests https://gerrit.wikimedia.org/r/118261 and adjust the other settings including dumps_public, php, x (I put x=translatewiki), project (I put wx); set the date variable to an 8 digit timestamp like 20140101 and add /$date to the -i argument.
  3. Create the directory $dumps_public/$project/$date and place your bz2 dump in it.
  4. Create the directories $csv, $csv/csv_$project/ , $csv/temp and set write permissions
  5. Manually create in the $csv directory a csv file (csv_mw/StatisticsContentNamespaces.csv) like http://stats.wikimedia.org/wikimedia/misc/StatisticsContentNamespaces.csv; in my case (https://translatewiki.net/w/api.php?action=query&meta=siteinfo&siprop=namespaces):

project code,language code,content namespaces
wx,translate,0|1102|1256|1214|1242|1250|1236|1202|1218|1254|1228|1240|1244|1210|1258|8|1230|1246|1212|1204|1220|1234|1216|1222|1238|1226|1208|1252|1200|1232|1206|1224

Run the script! Now it's been running for a few seconds without errors, maybe I'll get some output. :D

Got something: http://p.defau.lt/?1WfXUN56rhc3uwBiU7oXvw

Then report_one_only.sh

  1. Set mode to wx, language to whatever you used before *wiki in x (translate in my case)
  2. Create out/out_wx/EN/
  3. No config to skip views? fake them, e.g.: $ wget http://dumps.wikimedia.org/other/pagecounts-ez/wikistats/csv_wx.zip ; unzip csv_wx.zip PageViewsPerMonthAll.csv ; mv PageViewsPerMonthAll.csv csv/csv_wx/
  4. Add your wiki to the list in SetLanguageInfo and GetProjectBaseUrl in WikiReportsLiterals.pm as well as (I guess) dblists/master\ copy/special.dblist
  5. Adjust $threshold_articles and/or $threshold_edits in WhiteListLanguages of WikiReportsInput.pm or you may see in csv/csv_wx/WikiReportsLog.txt something like
  6. No dump processed:
  7. < 10 articles: translate

Result so far (report takes few minutes): http://koti.kapsi.fi/~federico/twnstats/
There are clearly some more hidden switches to enable the full reports.

So, there is something more to do. No idea how to handle the configurations like *dblist or the whole WhiteListLanguages function (split to a file used only if requested by a command line argument?); variables in the middle of functions like $threshold* also need to be listed.

Add to comment 2:

  1. For plots, set $weekly_plotdata = $true in WikiCounts.pl

Wikistats code base is notoriously difficult to maintain and full of quirks. I have never made a secret of it, that I chose adding functionality over producing neat code or restructuring. It was mostly an after hours project before I joined WMF, and at WMF a heap of new information requests got priority. Doing this with state of the art software engineering techniques could have kept a small department busy.

See also https://www.mediawiki.org/wiki/Wikistats

and for a wider perspective https://meta.wikimedia.org/wiki/Data_analysis/mining_of_Wikimedia_wikis#Wikistats_scripts

We're rather late in the game now, not sure how long Wikistats will be used, but even so I appreciate any effort to do some digging into the inner workings, if only to better understand possible discrepancies in a migration scenario, and who knows some non Wikimedia wiki may still see use for parts of it.

So I'll be happy to provide assistance in understanding the code, and getting it to work on say TranslationWiki. I won't have much time to restructure the code.

Some Q&A outside bugzilla might be expedient, happy to answer mail on ezachte.wikimedia.org or we could do one or more Skype sessions, id ezachte (pls mail me first).

I'm not blaming anyone and I'm sure you don't have time for a refactor, that's why I offer to do one myself if you tell me the general direction/method you'd be ok with. :) Thank you very much for your offer to help, I'll contact you when I get stuck and I'm ready to devote some time to coding.

I don't think we're late to the party, for instance at no point in history we have had so many dumps to analyse (thanks to WikiTeam)!

Sure, let's see how to get this ball rolling.

I suggest we focus on runtime arguments and general setup first, as you already started to do. Everything become easier if you have a working setup, as you can step into code with debugger. (I used Windows OptiPerl during development, now mostly edit code on WMF server directly).

About comment 1/2: it looks more than it is, as I split out nested paths into several statements. Having said that I certainly agree this could be coded better and more flexible, and I could use some advice here. Now I run most jobs from home folder and push to git mainly for archive purposes. If that could be done without editing bash files that would be a big win (environment variable ?). BTW somehow getting push working from /a/wikistats_git proved a bottleneck. Some authorization issues.

About code base in general. As you know WikiCounts.pl produces csv files, mostly from xml files (and some api results, php files, translatewiki output, etc). Some parts of WikiCounts are ugly or hard to comprehend even for me. Some parts are more or less self contained and maybe could be made into standalone script (to further modularisation). WikiReports.pl is probably the hardest to read (but some parts, once modularized, may find reuse even in a HADOOP environment, e.g. page view reports).

As for coding conventions: Especially in WikiReports there are lots of one letter variables and very maintenance-unfriendly function names. I tried to make names for complex functions reasonably understandable, but chose really cryptic function names for many small one line functions used for html mark-up (kind of c-style function name approach).There is only half-hearted systematic in those cryptic names, all in the name of getting code done. WikiReportsHtml.pm is by far the worst, I have to lookup names there all the time. My dilemma is giving all those mini-functions self-explanatory names would make the code using them much less readable (wood, trees). Now the html presentation details are kind of obfuscated but it doesn't distract from main presentation logic.

bingle-admin wrote:

Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/cards/1485

Erik, did the addition of -F and the refactor land the repo(s)? Are they ready to receive patches?

Nemo, I finally got all wikistats files in the repo. I think you can add patches now. Sorry for delay.

(In reply to Erik Zachte from comment #11)

Nemo, I finally got all wikistats files in the repo. I think you can add
patches now. Sorry for delay.

Wow, great, I'll try to start rebasing some of my patches next week.

Aklapper edited subscribers, added: Aklapper; removed: Tnegrin, ezachte, wikibugs-l-list.

Closing this ticket as Wikistats version 1 is dead per https://stats.wikimedia.org/Wikistats_1_announcements.htm . In case this ticket is still a valid bug report or feature request for Wikistats 2, then please reopen. Thanks a lot!