Page MenuHomePhabricator

Invalid Title in flickrripper
Closed, DeclinedPublic

Description

Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1466/
Reported by: betacommand
Created on: 2012-06-19 19:52:25
Subject: Invalid Title in flickrripper
Assigned to: xqt
Original description:
Betacommand multichill: I know you wrote flickrripper.py and Im trying to fix an issue with it, and thought it might be easier for you to fix
Betacommand lines 157-161 where it grabs the description and uses it for the file name
Betacommand when you start working with non-latin descriptions it doesnt handle multi-byte characters well, it ended up with a title over 320 bytes
Betacommand the max mediawiki lets you have is 255
multichill Lol
Betacommand multichill: really rather a pain
multichill So the check shoul probably encode it and than see how long it is?
Betacommand correct
multichill Or just lower the limit a bit?
Betacommand thai letters for example are 3 bytes
Betacommand notes it was discovered with flickrripper.py -autonomous -user\_id:40561337@N07 -addcategory:"Files from Abhisit Vejjajiva Flickr stream"
multichill Betacommand: Could you file a bug for this?
Betacommand multichill: you would need to cut it down to 85 to be safe


Version: unspecified
Severity: normal
See Also:
https://sourceforge.net/p/pywikipediabot/bugs/1466

Details

Reference
bz55195

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:26 AM
bzimport set Reference to bz55195.
bzimport added a subscriber: Unknown Object (????).

I guess the title is cutted by mw and not the slice operator since it works correct for unicode strings. len\(\) also gives the number of characters not the number bytes. Do we have any size\(object\) method?

an idea for getFilename \(could anybody test it whether it works\)

if not title:
\#find the max length for a mw title
maxBytes = 240 - len\(project.encode\('utf-8'\)\) \
\- len\(username.encode\('utf-8'\)\)
description = photoInfo.find\('photo'\).find\('description'\).text
if description:
descBytes = len\(description.encode\('utf-8'\)\)
if descBytes > maxBytes:
\# maybe we cut more than needed, anyway we do it
items = max\(0, len\(description\) - maxBytes + descBytes\)
description = description\[:items\]
title = cleanUpTitle\(description\)
else:
title = u''
\# Should probably have the id of the photo as last resort.

  • assigned_to: nobody --> xqt

fix committed in r10387, please check

  • summary: Invalid Title --> Invalid Title in flickrripper
  • status: pending --> pending-fixed

Issue still not fixed, actually its worse
C:\Dev\SVN\pywikipedia>flickrripper.py -autonomous -user\_id:40561337@N07 -addcat
egory:"Files from Abhisit Vejjajiva Flickr stream"
5703017392
Traceback \(most recent call last\):
File "C:\Dev\SVN\pywikipedia\flickrripper.py", line 609, in <module>
main\(\)
File "C:\Dev\SVN\pywikipedia\flickrripper.py", line 599, in main
removeCategories, autonomous\)
File "C:\Dev\SVN\pywikipedia\flickrripper.py", line 257, in processPhoto
filename = getFilename\(photoInfo\)
File "C:\Dev\SVN\pywikipedia\flickrripper.py", line 172, in getFilename
% \(title, project, username\)\).exists\(\):
File "C:\Dev\SVN\pywikipedia\wikipedia.py", line 1284, in exists
self.get\(\)
File "C:\Dev\SVN\pywikipedia\wikipedia.py", line 705, in get
expandtemplates = expandtemplates\)
File "C:\Dev\SVN\pywikipedia\wikipedia.py", line 787, in \_getEditPage
raise BadTitle\('BadTitle: %s' % self\)
pywikibot.exceptions.BadTitle: BadTitle: \[\[commons:File:&\#3609;&\#3634;&\#3618;&\#3
585;&\#3619;&\#3633;&\#3600;&\#3617;&\#3609;&\#3605;&\#3619;&\#3637; &\#3649;&\#3621;&\#363
2;&\#3588;&\#3603;&\#3632;&\#3648;&\#3604;&\#3636;&\#3609;&\#3607;&\#3634;&\#3591;&\#3629;&
\#3629;&\#3585;&\#3592;&\#3634;&\#3585;&\#3585;&\#3619;&\#3640;&\#3591;&\#3592;&\#3634;&\#35
85;&\#3634;&\#3619;&\#3660;&\#3605;&\#3634; &\#3626;&\#3634;&\#3608;&\#3634;&\#3619;&\#3603
;&\#3619;&\#3633;&\#3600;&\#3629;&\#3636;&\#3609;&\#3650;&\#3604;&\#3609;&\#3637;&\#3648;&\#
3595;&\#3637;&\#3618;&\#3585;&\#3621;&\#3633;&\#3610;&\#3618;&\#3633;&\#3591;&\#3611;&\#361
9;&\#3632;&\#3648;&\#3607;&\#3624;&\#3652;&\#3607;&\#3618; &\#3623;&\#3633;&\#3609;&\#3629;
&\#3634;&\#3607;&\#3636;&\#3605;&\#3618;&\#3660;&\#3607;&\#3637;&\#3656; 8 &\#3614;&\#3620;
&\#3625;&\#3616;&\#3634;&\#3588;&\#3617; &\#3614;.&\#3624;.2554 \(Photographer attached
to the Prime Minister of the Kingdom of Thailand \(H.E.Mr.Abhisit Vejjajiva\) , Pe
erapat Wimolrungkarat - &\#3614;&\#3637;&\#3619;&\#3614;&\#3633;&\#3602;&\#3609;&\#3660;
&\#3623;&\#3636;&\#3617;&\#3621;&\#3619;&\#3633;&\#3591;&\#3588;&\#3619;&\#3633;&\#3605;&\#
3609;&\#3660;\) @is50mm - Flickr - Abhisit Vejjajiva.jpg\]\]

  • status: pending-fixed --> open

Where are the html entities from? Are they part of the flickr page?

those are the thai parts of the page title that are being converted when the exception is being thrown

I do not see a conversion by the exception. I converted the title from html entities to unicode in my last commit

Line 787 doesnt return the title, it returns the whole page \(self\) when you print the object and not the title it gets converted there. I used a log to confirm that the title was UTF-8 before filling this bug,

Thanks for testing. The lenght calculation was wrong. I've corrected it

jayvdb set Security to None.
jayvdb edited subscribers, added: Betacommand; removed: Unknown Object (????), Xqt.

Was this appropriately fixed, in compat I guess from the dates on http://sourceforge.net/p/pywikipediabot/bugs/1466/?page=1? (and then ported to core in Oct 2013)

Xqt removed Xqt as the assignee of this task.Feb 27 2015, 12:45 PM

flickrripper is no longer actively maintained (see T223826).
If you are still using this script please reopen this task and ask to restore this script.