Page MenuHomePhabricator

category.py adds categories from transcluded templates
Closed, ResolvedPublic

Description

When moving category using "python category.py move -from:XX -to:YY", it does not work properly.

It adds categories from transcluded templates.

For example, [https://ko.wikipedia.org/w/index.php?title=%EC%96%91%ED%8F%89_%EC%9A%A9%EB%AC%B8%EC%82%AC_%EC%9D%80%ED%96%89%EB%82%98%EB%AC%B4&diff=prev&oldid=11655206 This edit] adds "[[분류:경기도에 관한 토막글]]"(means"Category:Gyeonngi-do stubs"). This category comes from the transcluded template "{{토막글|경기도}}".

The function "page.categories()" of pywikibot-core differs from pywikibot-compat's.

The "category.py" in pywikibot-compat works well.


Version: core-(2.0)
Severity: major

Details

Reference
bz58084

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:24 AM
bzimport set Reference to bz58084.

So I suppose it should be using textlib to extract categories from wikicode? Or there is a nicer way?

I've looked a bit into this bug today as one of my robots made such a mistake too. The person who noticed the problem (commons user Denniss) said that he reported the problem to Siebrand and SieBot was fixed, so adding him to the CC list in the hope that he might have a working solution that is not yet upstream.

The problems seems to be the page.categories function, which returns all the categories, including the ones from templates. Instead, the getCategoryLinks function should be used in page.change_category. This does not currently happen neither in compat nor core, so not sure how can compat work correctly.

I will make and test a patch later on this week if nobody has a working solution.

Any word on a fix? "category.py move" is pretty much unusable until it's fixed. (This bug munged 50 pages on my wiki.) Thanks very much.

I just uploaded a review: https://gerrit.wikimedia.org/r/#/c/123434/ Thanks for reminding me about this bug!

I +2'd the patch, can you update it and check if the is code working properly?

Patch checked in. Please reopen if the problem was not fixed.

Have to re-open this, according to [https://commons.wikimedia.org/wiki/User_talk:Bj%C3%B8rnN#BjornNbot_error] the same problem is also present in the category -add option.
Please ensure this does not happen in any function.

Change 157632 had a related patch set uploaded by Ricordisamoa:
get categories from wikitext in the AddCategory bot

https://gerrit.wikimedia.org/r/157632

Change 157632 merged by jenkins-bot:
get categories from wikitext in the AddCategory bot

https://gerrit.wikimedia.org/r/157632

Out of curiosity... does this bug have unit tests to make sure it doesn't recur?

I've encountered this bug four separate times in the past ~6 years. It seems to be a common regression.

Commons user BjørnN mentioned further issues, performing a category page move if the target cat does not exist. Is this intended behaviour?

I don't see that this could be tested. The logic is within the category script and not in the code so requries a valid page, and it does modifications to page. So it's not simply “do what the script does and then check if the categories have changed accordingly”.

The script have to be rewritten to allow “mocked sites” so it thinks it does a change.

Fabian: Are you saying that pywikipedia is not unit-tested against a baseline MediaWiki database? (I had assumed it was.)

I know it's a lot of work to set up, but IMHO database-backed unit tests are really worthwhile for program correctness and avoiding regressions. (My team uses MediaWiki's own PHP-based testing harness all the time to make sure our wiki extensions work when they modify wiki pages.)

What do you mean with tests against the MediaWiki database? As I understand it, it's only using the MediaWiki API which shouldn't involve any database usage (although I pretty new to the project).

All I'm saying is that it would be great for pywikipedia to have a test that went like this:

Input: A wiki page Foo that transludes template T, where template T contains "[[Category:A]]".

Run: "category.py move -from:A -to:B"

Expected result: Page Foo contains no category tags, and template T contains "[[Category:B]]"

Wrong result (regression): Page Foo contains "[[Category:B]]".

Doesn't matter to me whether this runs through the API or directly against a MediaWiki database. I'll bet this could be done using the MediaWiki unit test framework, having it "category.py" by shell escape and then check the contents of page Foo and template T.

(In reply to Dan Barrett from comment #11)

Out of curiosity... does this bug have unit tests to make sure it doesn't
recur?

I've encountered this bug four separate times in the past ~6 years. It seems
to be a common regression.

Most textlib functions are already covered by tests and are very unlikely to cause regressions.

(In reply to Denniss from comment #12)

Commons user BjørnN mentioned further issues, performing a category page
move if the target cat does not exist. Is this intended behaviour?

For other issues:
https://bugzilla.wikimedia.org/buglist.cgi?component=category.py&list_id=340980&product=Pywikibot&resolution=---

(In reply to Dan Barrett from comment #14)

Fabian: Are you saying that pywikipedia is not unit-tested against a
baseline MediaWiki database? (I had assumed it was.)

I know it's a lot of work to set up, but IMHO database-backed unit tests are
really worthwhile for program correctness and avoiding regressions. (My team
uses MediaWiki's own PHP-based testing harness all the time to make sure our
wiki extensions work when they modify wiki pages.)

This is definitely out of the scope of this bug and should be discussed on Pywikibot-l first.

It is not a bug. You simply forgot to use the '-inplace' command-line argument which prevents the interwiki/category links from being rearranged.

And please don't use bug reports to discuss unrelated issues.

I've created bug 70336 to discuss write tests.

(In reply to Ricordisamoa from comment #19)

It is not a bug. You simply forgot to use the '-inplace' command-line
argument which prevents the interwiki/category links from being rearranged.

And please don't use bug reports to discuss unrelated issues.

This IS a bug as they were improperly re-arranged (duplicated, placed before cats instead of below them). And don't teach me were to report issues if there's no proper place to do so without deep knownledge of which module is responsible for these issues.

(In reply to Denniss from comment #21)

This IS a bug as they were improperly re-arranged (duplicated, placed before
cats instead of below them). And don't teach me were to report issues if
there's no proper place to do so without deep knownledge of which module is
responsible for these issues.

This is not a bug in the category bot (and it would not have happened with the '-inplace' option enabled) but it should be investigated. Thanks for reporting it, but if you don't know how to create new bug reports here, the best place is the mailing list, not existing bugs.