Page MenuHomePhabricator

transwiki import of invalid titles is possible on wmf wikis
Closed, ResolvedPublic

Description

A user on de.wp has transwiki imported the page [[sv:s:t Olofsholm]], but pages with a s: at the begin are invalid on de.wp (wikisource interwiki). Over the api we have found the page id: https://de.wikipedia.org/?curid=6666199 A sysop will delete that page with the api, which allow the use of a page id instead of a title.

Please give a hint for invalid titles while importing and skip that page. The import is at the moment possible, when using a target namespace.

Thanks.


Version: 1.18.x
Severity: major
URL: https://de.wikipedia.org/w/index.php?title=Spezial:Logbuch&offset=20120101113245&type=import&user=Ticketautomat&limit=1

Details

Reference
bz33564
TitleReferenceAuthorSource BranchDest Branch
Reinstate custom rPHABd8dd5f4b71ea8af64831dde779c598554a0a50f9repos/phabricator/phabricator!37aklapperdiffRepoCreateT355644wmf/stable
Customize query in GitLab

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 12:00 AM
bzimport set Reference to bz33564.
bzimport added a subscriber: Unknown Object (MLST).

guandalug wrote:

(In reply to comment #0)

A user on de.wp has transwiki imported the page [[sv:s:t Olofsholm]], but pages
with a s: at the begin are invalid on de.wp (wikisource interwiki). Over the
api we have found the page id: https://de.wikipedia.org/?curid=6666199 A sysop
will delete that page with the api, which allow the use of a page id instead of
a title.

Please give a hint for invalid titles while importing and skip that page. The
import is at the moment possible, when using a target namespace.

Thanks.

Small correction: It's not the s: that makes trouble, it's the fact that the page title starts with a lowercase letter. This seems to be valid on sv-wikisource (and I see nothing amiss with that), but (as you know) it's not valid, or rathe disabled on wikipedia.

Now, the import has created the filename precisely as it's been told, with a lowercase 't', but every action we try on this (API aside, which can use the page ID) uses a 'normalized' title, so it becomes a 'T'. Unforunately, that page doesn't exist, so it's merely impossible to do anything with this file (again, not counting the API).

A (simple) patch commited with gerrit 4503