Page MenuHomePhabricator

Port catimages.py to core
Closed, DeclinedPublic

Description

catimages from compat should be moved to be a separate python package that depends on pywikibot core, and a whole lot of other libraries.
Checklist for dependencies of catimages.py

This table has moved to https://commons.wikimedia.org/wiki/User:AbdealiJK/file-metadata/Dependencies please update there!

Package namePYPI packageUbuntu packagePy2.6Py3.5CI
numpyyesyesyesyes
scipyyesyesyesyes
cvdeprecated and replaced by cv2
cv2yes for py2yesyesyes
pyexiv2deprecated by gexiv2. But preferably use exiftool as it's more complete
gi (new dep)yes (py2, py3)yesyesyes (gnome-continuous)
gtkOnly used to find intersection of Rectangles. Preferably remove this dep?yes (py2, py3)yesyesyes (gnome-continuous)
rsvgyes (py2, py3)yesyesyes (gnome-continuous)
cairoyes (py2, py3)yesyesyes (gnome-continuous)
magicyesyesyesyes
jsegThis is currently a zip file. Pypi pkg needed
jseg/jpeg-6bCan Pillow be used instead ? jpeg-6b seems to be a zip
_music21Do we still need the patch ? If not, use pypiyes
opencv (own)Do we still need this ? (haarcascade)
pydmtxyes, used for QR Codes. Probably use OpenCV instead ?
py_w3cyes, but use requests/bs4 instead ?
_zbarUse a new library for barcodes. Not been updated since 2010
_bobDo we still need the patch ? If not use pypiyes
xbob_flandmarkyes, But flandmark is deprecated for clandmarkno
bob.ip.flandmarkyes, A newer xbob.flandmark, But deprecated for clandmarkyes
py_flandmarkno. Incomplete python bindings are installed with clandmarkno
pywtyesyes
slicThis is currently a zip file. Use vlfeat instead ?
vlfeatyes Probably use this instead of clandmark and slic.yesyesyes
yaafelibNo pypi pkg. Use librosa as alternative maybe? - as it has pypiyes
matplotlibyesyesyesyes
pycolornameyesyesyesyes
Pillowyesyesyesyes

Binary tools:

  • exiftools
  • convert (imagemagick)
  • pdftotext
  • pdfimages
  • ffprobe

List of analysis done by catimages (Categories populated and metadata being analyzed): https://etherpad.wikimedia.org/p/Zl7V7KuK7J

Details:

Primary mentor: @DrTrigon
Co-mentor: @jayvdb
Other mentors: (optional, Phabricator username)
Skills: python and computer vision
Estimated project time for a senior contributor: 2-4 weeks
Microtasks: T76211 T128946 T67192
Conpherence: Z360, Z441
Meetings: T133762

Details

Reference
bz64838

Related Objects

StatusSubtypeAssignedTask
InvalidNone
OpenFeatureNone
DeclinedNone
ResolvedAbdealiJK
ResolvedNone
ResolvedDrTrigon
ResolvedAbdealiJK
ResolvedDrTrigon
ResolvedDrTrigon
ResolvedNone
ResolvedNone
ResolvedDrTrigon
ResolvedNone
ResolvedNone
ResolvedNone
DeclinedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedAbdealiJK
ResolvedAbdealiJK
ResolvedAbdealiJK
ResolvedDrTrigon
ResolvedAbdealiJK
ResolvedDrTrigon
DeclinedNone
ResolvedAbdealiJK
ResolvedAbdealiJK

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Given the size of this task, it could be suitable for a dedicated GSOC/Outreachy project. I expect the port of the main script will finish well within the allocated time. But with some many dependencies, there are a lot of tests to be written, and there is a lot of extra work that can be done to pypi properly package the custom dependencies, pushing modified code upstream.

Is this a potential project of Outreachy round (Dec 2015 to March 2016)?

Is this a potential project of Outreachy round (Dec 2015 to March 2016)?

I suspect so. We'd need to write it, and do a little analysis to be confident regarding what is able to be achieved by the end of the project.

This is a message posted to all tasks under "Need Discussion" at Possible-Tech-Projects. Outreachy-Round-11 is around the corner. If you want to propose this task as a featured project idea, we need a clear plan with community support, and two mentors willing to support it.

This is a message sent to all Possible-Tech-Projects. The new round of Wikimedia Individual Engagement Grants is open until 29 Sep. For the first time, technical projects are within scope, thanks to the feedback received at Wikimania 2015, before, and after (T105414). If someone is interested in obtaining funds to push this task, this might be a good way.

Hey there,

I'm interested in doing this for a GSoC project. Is there anyone who'd like to mentor this project ?

Also, could someone set microtasks here so that I could being working on them ?

I spoke to @DrTrigon via email and he mentioned that he'd be willing to help in the specific case of catimages.py as he wrote it. Although he isn;t currently active on wiki - would it be fine to add him as a Mentor ?

He also brought about the question of whether the bot is actually useful to commons. And whether it *should* be ported. Any opinions on this ?

IMPORTANT: This is a message posted to all tasks under "Need Discussion" at Possible-Tech-Projects. Wikimedia has been accepted as a mentor organization for GSoC '16. If you want to propose this task as a featured project idea, we need a clear plan with community support, and two mentors willing to support it.

@jayvdb @valhallasw , @DrTrigon , are you willing to push this project for the current round of GSoC '16/Outreachy-12 internships? If yes, please feel free to add yourself as mentors/co-mentors of the task. Note the requirements for a GSoC '16/Outreachy project:
It should not take more than 2-3 weeks for a senior developer and it should have a mentor and a co-mentor confirmed.

Abdeali? ;)

Am 2. März 2016 21:18:07 MEZ, schrieb Sumit <no-reply@phabricator.wikimedia.org>:

Sumit added a comment.

@jayvdb @valhallasw , @DrTrigon , are you willing to push this project
for the current round of GSoC '16/Outreachy-12 internships? If yes,
please feel free to add yourself as mentors/co-mentors of the task.
Note the requirements for a GSoC '16/Outreachy project:
It should not take more than 2-3 weeks for a senior developer and it
should have a mentor and a co-mentor confirmed.

TASK DETAIL

https://phabricator.wikimedia.org/T66838

EMAIL PREFERENCES

https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Sumit
Cc: Sumit, DrTrigon, AbdealiJK, StudiesWorld, Shrutika719, Avicennasis,
Aklapper, Xqt, jayvdb, valhallasw, Ricordisamoa, pywikibot-bugs-list,
droid, Jay8g

Dr. Trigon

If @DrTrigon is willing to be a mentor, then I am happy to co-mentor.

As this is project is mostly about packaging , dependencies, etc, I think T76211: pywikibot external pycolorname used by catimages.py is a good micro task. i.e. creating a proper pypi package for pycolorname or opencv. That will definitely demonstrate competency to tackle catimages.

I have edited the description of the issue based on other projects In the Possible-Tech-Projects board. I've also moved it to the "Featured for GSoC and Outreachy" column as it meets the pre-requisites and has 2 mentors.

@DrTrigon @jayvdb, thanks for the support! this is a featured project now for GSoC '16/Outreachy-12. Feel free to edit the task description accordingly if you think clarifications/refinements are needed.

Just went through the deps:

  • cv2: as I remember cv2 was customised by D.a.B only in order to run under restrictions given by the toolserver back in the past (something like libgcc version not matching or so)
  • jseg & jseg/jpeg-6b: this is code by others and we have to inform them - I don't think there's a problem since the allowed me to use the code in the past already (this might hold for other deps as well)

Re opencv & cv2, I think we should discuss that at T128946#2091079

@AbdealiJK , could you go through the list of dependencies in the task description and identify the pypi package for each (edit the task description), and highlight any which are still not in pypi. See https://github.com/wikimedia/pywikibot-compat/blob/master/externals/__init__.py for hints on where compat loaded them from.

deps: Don't forget to consider binary tools like e.g. "exiftool"... there is a bunch of these.

Thanks @DrTrigon , I had forgotten about those.

These binary tools will need to be added to .travis.yml list addons: apt: packages: (or replaced with python packages if there is a suitable equivalent).

Another aspect to consider is if any of the catimages.py logic that may be copied into MediaWiki or extensions. For example, the code which works on EXIF data, like the addition of categories 'Unidentified people' and 'Faces' and 'Groups' is based on EXIF Faces data, which is available during the upload, and the upload could suggest those categories are added. A similar problem is T130120: Suggest nearby categories based on EXIF latitude/longitude.

Another aspect to consider is if any of the catimages.py logic that may be copied into MediaWiki or extensions. For example, the code which works on EXIF data, like the addition of

Indeed! Actually I think most of the code could profit from going into an extension. E.g. since that should allow to process all the files during upload already (instead of having to download them later on) - as you mentioned. So I would say that is recommended. The reason why I never did this was my lack in PHP experience and the more proof-of-concept state of the project.

As py33 seems to be a bit problematic, since catimages has so many dependencies , please add a column for it and use yes/no or a old version number if the current version no longer supports py33.

Often old branches of large packages are very stable and still in widespread use so are effectively still supported.

If there are only a few problems, we can have some features not work on py33. If there are many core features broken, we may need to declare py33 as not supported.

Also, as the table is getting quite large, it should be migrated to mediawiki.org , probably just in your userspace now

DrTrigon updated the task description. (Show Details)
DrTrigon updated the task description. (Show Details)
DrTrigon updated the task description. (Show Details)
Multichill subscribed.

I'm going to decline this one. Not clear who wants this and nobody is working on it. Otherwise this will just end up rotting more years at the bottom of the backlog.

Of course I can be completely wrong and you can reopen this.