Page MenuHomePhabricator

Port and re-package copyright*.py
Open, MediumPublicFeature

Description

copyright.py, copyright_clean.py and copyright_put.py are together a set of copyright-checking tools.

They should probably be moved to an independent package that depends on pywikibot.

Francesco, are these tools still in active use?


Version: core-(2.0)
Severity: enhancement

Details

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:14 AM
bzimport set Reference to bz64848.
bzimport added a subscriber: Unknown Object (????).

It looks like Francesco hasnt made it from Bugzilla to Phabricator.

Marc, are you aware of these scripts in the old pywikibot-compat repo.

How does it compare to your tool (http://www.uberbox.org/~marc/csb.pl ?) Is your tool able to be easily ported to work on other projects?

jayvdb raised the priority of this task from High to Needs Triage.Dec 5 2014, 3:38 AM
jayvdb set Security to None.
jayvdb removed a subscriber: Unknown Object (????).

I know it's in use on a couple other projects, and should be pretty easy to point at any language wiki that has text using the latin alphabet. (It speaks unicode, of course, but contains assumptions about how a word and statement is constructed, and its guesses at what looks like word stems is likely to be off with most scripts).

With a bit of expertise from someone who understand other languages, however, it should be adaptable without too much fuss.

I see @Earwig also has a bot which has(had?) similar functionality (and was originally based on copyright.py?).

https://github.com/earwig/earwigbot
https://github.com/earwig/earwigbot-plugins/blob/develop/tasks/afc_copyvios.py

and a python app which has similar functionality.
http://tools.wmflabs.org/copyvios

Change 194824 had a related patch set uploaded (by Prianka):
Port and re-package copyright*.py

https://gerrit.wikimedia.org/r/194824

Change 196630 had a related patch set uploaded (by Prianka):
Portind and re-package copyright*.py

https://gerrit.wikimedia.org/r/196630

I see copyright.py utilises package pywikiparser , which existed in SVN, was mentioned in https://www.mediawiki.org/wiki/Git/Conversion/pywikibot , but I cant see the git repository for it. (there are a few git clones like https://github.com/hroest/pywikipedia-git/tree/master/pywikiparser ).

Does anyone know where that svn code went to in git ? (Or has this part of copyright.py been unused since the svn2git migration?)

pywikiparser was not moved, and I think was even removed in SVN at some point. It was my attempt at a wikitext parser, but mwparserfromhell is much better.

Also, I think this should not go in the main pywikibot repository but in either its own or the 3d party scripts repository.

Thank you for merging @Xqt I think this will be useful to have back. I remember I tried to use this in the old days and it gave me errors. A good code review and update would also be good. Best regards.

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 12:23 PM