Page MenuHomePhabricator

Extended version information in user-agent
Open, MediumPublicFeature

Description

Originally from: http://sourceforge.net/p/pywikipediabot/feature-requests/330/
Reported by: valhallasw
Created on: 2013-02-04 20:52:53
Subject: Extended version information in user-agent
Original description:
See the discussion at https://www.mediawiki.org/wiki/Special:Code/pywikipedia/11027\#c33303

Implementation notes:

Hash of a file:
>>> import hashlib
>>> m = hashlib.sha1\(\)
>>> m.hexdigest\(\)
'93ae86148e74a7c3a3d63f7810b48c51889fba46'

Classes used in stack trace:

>> import inspect
>> \[\(x.\_\_module\_\_, x.\_\_name\_\_\) for x in \(s\[0\].f\_locals.get\('self', None\).\_\_class\_\_ for s in inspect.stack\(\)\)\]

Example result:
\[\('wikipedia\_family', 'Family'\), \('pdb', 'Pdb'\), \('pdb', 'Pdb'\), \('pdb', 'Pdb'\), \('pdb', 'Pdb'\), \('pdb', 'Pdb'\), \('pdb', 'Pdb'\), \('pdb', 'Pdb'\), \('pdb', 'Pdb'\), \('wikipedia\_family', 'Family'\), \('wikipedia', 'Site'\), \('wikipedia', 'Site'\), \('wikipedia', 'Site'\), \('wikipedia', 'Site'\), \('wikipedia', 'Site'\), \('wikipedia', 'Page'\), \('wikipedia', 'Page'\), \('\_\_main\_\_', 'Subject'\), \('\_\_main\_\_', 'Subject'\), \('\_\_main\_\_', 'InterwikiBot'\), \('\_\_main\_\_', 'InterwikiBot'\), \('\_\_builtin\_\_', 'NoneType'\), \('\_\_builtin\_\_', 'NoneType'\), \('\_\_builtin\_\_', 'NoneType'\), \('pdb', 'Pdb'\), \('pdb', 'Pdb'\), \('\_\_builtin\_\_', 'NoneType'\), \('\_\_builtin\_\_', 'NoneType'\), \('\_\_builtin\_\_', 'NoneType'\), \('\_\_builtin\_\_', 'NoneType'\)\]


Version: core-(2.0)
Severity: enhancement
See Also:
https://sourceforge.net/p/pywikipediabot/feature-requests/330

Details

Reference
bz55016

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:14 AM
bzimport set Reference to bz55016.
bzimport added a subscriber: Unknown Object (????).

In July there were three threads of discussion about user-agents for bots.

http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/78356

http://lists.wikimedia.org/pipermail/pywikipedia-l/2014-July/008924.html

http://lists.wikimedia.org/pipermail/pywikipedia-l/2014-July/008932.html

Following that Amir did some work to allow customisation of the user-agent, specifically adding site based information (https://gerrit.wikimedia.org/r/#/c/147381/), and I've put up a patch to allow the user agent functionality to be usable in more circumstances and be tested more easily. https://gerrit.wikimedia.org/r/#/c/152200/

As it is now a customisable string, it is possible to add email addresses, links to bot approvals, etc, etc. And lightly documented at

https://www.mediawiki.org/wiki/Manual:Pywikibot/User-agent

During the discussions I suggested something like what this bug is about: Identifying what code is running, is it the 'maintainer' version or customised by the bot operator, and putting that in the useragent.

http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/78363
http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/78413

In an IRC discussion with valhallasw, I suggested that we include the contact details of the maintainers of the running script, which can be parsed from the script docstring.

So, currently pywikibot _has_ the commit hash, sequential pywikibot revision, and

It only puts the sequential pywikibot revision into the user-agent, in the variable {version}. The sequential pywikibot revision is only a good reference point, but the running code could be different.

The commit hash is (almost) useless for ops staff, as it frequently changes.

The $Rev$ for each file that is checked in is more granular, but that doesnt help if the script file is modified or isnt checked in.

If I understand this enhancement request, it is suggesting that we get a hash of some/all of module that are used by the running script, and include that in the user-agent.

IMO, the first step is to get a hash for the script/module executed on the command line. This hash will change less frequently, and will often be common even for different branches of pywikibot. If the file is unmodified, I suggest we keep the existing user-agent value for {script}/{version}, which has a version prefix of 'g' and 's' for git or subversion. If the file is modified, I suggest we put the file hash in {version}, with a different prefix - e.g. 'm69789e1' where 'm' is for 'modified'.

Change 152200 had a related patch set uploaded by John Vandenberg:
User-agent graceful degradation

https://gerrit.wikimedia.org/r/152200

Change 152200 merged by jenkins-bot:
User-agent graceful degradation

https://gerrit.wikimedia.org/r/152200

@John Mark Vandenberg: Is the bug fixed?

Not yet. We would like to add maintainer details to the user agent, where the maintainer can be different for each script, and would be obtained via a module variable __maintainer__ or similar.

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 12:23 PM