Page MenuHomePhabricator

Bots excluded from flexible use of article title provided by lang. conv.
Closed, ResolvedPublic

Description

Author: michael.angelkovich

Description:
Range:
This problem concerns bots, and is present on SR.WP, but could be present on all wikis that use language conversion.

Time of occurrence:
By the best of my observation, it came along with a new method of login authentication.

Description:
As probably yet known, an article on SR.WP can have title either in Cyrillic or Latin script, but a simple user can also type the title in the other script to get the proper content. For example:

The article called "Вектор" could be also called as "Vektor", and the bot would be getting the contents of the article.

This is how it worked for the bots previously: they could request title in either script and get the right results. Since the things changed, the bot MUST pick the right script or it will get result as if the page didn't exist.

Can this be brought to the previous state?


Version: unspecified
Severity: trivial

Details

Reference
bz24296

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:02 PM
bzimport set Reference to bz24296.

PhiLiP.NPC wrote:

Hmm. Can you provide me some links for test?

(In reply to comment #1)

Hmm. Can you provide me some links for test?

http://sr.wikipedia.org/w/api.php?titles=Vektor|%D0%92%D0%B5%D0%BA%D1%82%D0%BE%D1%80&action=query&prop=info

show that Vektor and Вектор are treated differently from the api, even though http://sr.wikipedia.org/wiki/%D0%92%D0%B5%D0%BA%D1%82%D0%BE%D1%80 and http://sr.wikipedia.org/wiki/Vektor link to the same page

PhiLiP.NPC wrote:

I think it's because the API returns a "real" result from database. Titles are auto converted in Language Converter but still remain differences in database. Perhaps we can an extra parameter to provide a "variant-insensitive" query.

PhiLiP.NPC wrote:

typo:
Perhaps we can an => Perhaps we can add an

I'd imagine (Without being all that familiar with the api internals, so all of this is imho, take with a grain of salt, etc) that a good way to handle it would be similar to how redirects are handled with the &redirects parameter

In my own bot framework, I provide a function which wrap a title with [[ ]] and send it to action=parse to get any possible existing title in another variant. See also bug 24052.

Bryan.TongMinh wrote:

Philip, can you provide a pointer where MediaWiki normally handles this?

michael.angelkovich wrote:

Bryan, I think that it worked normally back in January 2010. What I am certain of is that it worked normally since 2008/2009, when I started Wiki botwork.

Bryan.TongMinh wrote:

Patch to ApiPageSet

I don't know why it has worked before. There appears to be no relevant code in ApiPageSet, which is where I would expect such code to reside. There are also no revisions that are related to this.

Attached patch should work, but I have not tested it since I do not have a wiki with variants set up.

attachment patch.txt ignored as obsolete

(In reply to comment #9)

Created an attachment (id=7558) [details]
Patch to ApiPageSet

If your patch changes the current behavior, please add a parameter for it and do not activate it by default. I'm always using API as a way to avoid language converter.

michael.angelkovich wrote:

I'm always using API as a way to avoid language converter.

Just curiosity, why are you avoiding it?

Bryan.TongMinh wrote:

(In reply to comment #10)

(In reply to comment #9)

Created an attachment (id=7558) [details] [details]
Patch to ApiPageSet

If your patch changes the current behavior, please add a parameter for it and
do not activate it by default. I'm always using API as a way to avoid language
converter.

The normalization performed will be in the normalization section, but this would indeed be a breaking-change, so we may want to introduce a parameter to explicitly enable conversion.

michael.angelkovich wrote:

Maybe the switch should be per Wiki. As a bot owner on SR.WP, I can't imagine a single reason why one would want to deactivate this behavior, which is why I set a question to Liangent.

Beside that, I have nothing against introducing a parameter that helps changing the default behavior.

At least one reason is that zh conversion is not as simple as sr conversion, so unexpected conversions often happen and is often reported on zh.wp.

michael.angelkovich wrote:

I think both mine and Liangent's comments suggest that this type of conversion should be settable per Wiki. Not (only) through a parameter.

(In reply to comment #15)

I think both mine and Liangent's comments suggest that this type of conversion
should be settable per Wiki. Not (only) through a parameter.

What does your "Not (only) through a parameter." mean?

I guess you misunderstood my comment.

michael.angelkovich wrote:

I understood there was need that Chinese Wiki be not affected by any changes, while I also highlighted that Serbian Wiki would require just the opposite. Therefore I have suggested that letting each Wiki decide what will be its default behavior is better solution than involving a parameter that is likely to be always used on one and never on another project.

Actually I want a parameter, so bot operators can decide whether they use it or not.

I didn't use the API's ability of resolving redirects automatically mush, so that can be the reason why I didn't say "Oh, I've been waiting for this feature for a long time!". But someone else *may* really need this feature.

michael.angelkovich wrote:

(In reply to comment #18)

Actually I want a parameter, so bot operators can decide whether they use it or
not.

As I stated above (#13), introducing a parameter to change default behavior is ok thing. But it still doesn't appear to be the best solution on the global scope while setting default behavior per Wiki does.

Bryan.TongMinh wrote:

Patch that does not change default behaviour

Attached patch will not change default behaviour but instead add a converttitles parameter. This way every API user can decide by themselves whether they need title conversion or not.

Attached:

Bryan.TongMinh wrote:

Could somebody who has title conversion enabled test this patch? I don't have it.

michael.angelkovich wrote:

This seems to be working. It just produces a warning when the argument 'converttitles' is not sent:

Warning: Invalid argument supplied for foreach() in C:\root\A\Apache2\htdocs\mw\includes\api\ApiQuery.php on line 298

My suggestion about this patch is to also make the parameter name shorter ('ct', for instance), since it is going to be called frequently.

Bryan.TongMinh wrote:

Fixed in r69237. I had to make some minor changes, so please test it again.

I did not choose to abbreviate converttitles because it is unambiguous about its meaning, while ct is not. Besides mostly bots will use the API, so you only have to type once :)

michael.angelkovich wrote:

Well, there might be something else unambiguous and shorter.

I suggested that not in order to spare my bot from longtyping but to save some traffic of WM servers. Actually that is why this thing should be settable per Wiki (i.e. every Wiki should be able to activate or deactivate it by default). Oh, well...

(In reply to comment #25)

Well, there might be something else unambiguous and shorter.

I suggested that not in order to spare my bot from longtyping but to save some
traffic of WM servers. Actually that is why this thing should be settable per
Wiki (i.e. every Wiki should be able to activate or deactivate it by default).
Oh, well...

Other stuff, such as HTTP headers, produces much more traffic.

michael.angelkovich wrote:

(In reply to comment #26)

Other stuff, such as HTTP headers, produces much more traffic.

Now, can you really shrink them? No. While there are other things you can.

But WMF doesn't shrink them. This means several extra bytes in requests are not a big problem.

michael.angelkovich wrote:

I purely disagree with that one, since every byte is worth some money which doesn't fall off from the sky.