Page MenuHomePhabricator

Add more variables to AbuseFilter file uploading evaluation
Closed, InvalidPublic

Details

Reference
bz19565

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:40 PM
bzimport added a project: AbuseFilter.
bzimport set Reference to bz19565.
bzimport added a subscriber: Unknown Object (MLST).

(Batch change)

These are low-priority miniprojects that I can mop up at some point when I'm doing general code work as opposed to working on a specific projects.

[Batch change]

Removing Dave McCabe from CC, who I somehow managed to add to the CC list of 12 bugs assigned to me.

Some checks are possible:

  • Title, user, action and SHA1 of file

Reference:
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/AbuseFilter/AbuseFilter.hooks.php?view=markup#l280

Especially for Commons it would be desirable if we could check the *content* (wikitext) of the new file for "source= google,..." and other invalid sources to automatically tag them.

Thank you.

(In reply to comment #3)

Especially for Commons it would be desirable if we could check the *content*
(wikitext) of the new file for "source= google,..." and other invalid sources
to automatically tag them.

Thank you.

I tried to lay the foundation for this in https://gerrit.wikimedia.org/r/51326 (not yet merged) where I changed the extension to use a hook which provides more data than just the file names... but this still requires some changes to MediaWiki itself and then a follow up to the linked patch.

I'll keep you updated.

Just found that AbuseFilter already checks file uploads, but there aren't many variables available.

Currently those variables include:

Edit count of user (user_editcount)
Name of user account (user_name)
Groups (including implicit) user is in (user_groups)
Whether or not a user is editing through the mobile interface (user_mobile)
file_articleid = page id of the image page, or "0" for new uploads
file_namespace = "6"
file_text = unprefixed page name of the image page
file_prefixedtext = full page name of the image page
Action (action) = "upload"
SHA1 hash of file contents (file_sha1)
Whether or not the change was made through a tor exit node (tor_exit_node)
Unix timestamp of change (timestamp)

Missing:

upload comment
wikitext of the image page (for new uploads)

Any updates about progress on this bug Please . We prefere this getting fixed. Rgds

This report has low priority, but if you have a high interest in getting a specific problem fixed, you could provide a patch to speed up the process.
See https://www.mediawiki.org/wiki/How_to_become_a_MediaWiki_hacker for more information.

This is still blocked by changes to how MediaWiki handles uploads (and especially the page creations linked to them)... we need more context from that side (right now AbuseFilter has no way to know about the wikitext used as description).

This bug enables bad-faith editors to upload media files with an invalid {{permissionOTRS}} tag without anyone noticing. Therefore, in my opinion, this is not a low priority enhancement.

[As long as nobody works on this ticket it de-facto will be low priority, no matter what the Priority field will be set to.]

This is a great loophole for copyright abuse and uploading professional photographs to Commons without permission. Once an OTRS ticket gets "recycled", an image would be highly unlikely to get noticed as a copyright problem or that the ticket was not added by anyone with OTRS access.

If this cannot be fixed, we really should consider if an efficient bot can be designed to do the same job.

In T21565#1031373, @Fae wrote:

This is a great loophole for copyright abuse and uploading professional photographs to Commons without permission. Once an OTRS ticket gets "recycled", an image would be highly unlikely to get noticed as a copyright problem or that the ticket was not added by anyone with OTRS access.

If this cannot be fixed, we really should consider if an efficient bot can be designed to do the same job.

What exactly is your point? What variables are missing for OTRS-checking?
If AF doesn't pick up a upload it should, then it's a bug, but outside of this tasks scope.

ok @Fae, I think I understand and you mean T89252.

Yep, probably. It's not a "bad" OTRS ticket though, this is just finding a way to flag when a non-OTRS user adds what might be a perfectly good ticket. In fact sometimes we want batch uploaders to do things this way, so it's not something to stop, just something to track.

The underlying technical problem is that uploads proceed in two steps. First the file is uploaded to the server and then the text is processed and saved to the database. We have a hook into the upload process, but at that point in software the uploaded text is not yet available.

After poking around the code a bit, my instinct is that the best approach probably to add a new hook in UploadBase/performUpload prior to calling LocalFile->upload. Call it something like, UploadBeforeSaveText and send it the file info as well as the $comment, $page_text, and $user. And then move all AbuseFilter processing of uploads to that hook. The biggest potential pitfall is probably making sure the code aborts correctly and doesn't leave dangling stuff on the file system (the code is complicated enough that I'm not currently sure what cleaning might be necessary in response to a late abort).

The only other sensible alternative that I see would be to make page_text, etc. properties of UploadBase and associate it with the file much earlier in the upload process, before verifyUpload gets called. In some ways this might be preferable conceptually, as it would enforce a tighter integration of file description text and file content (and doesn't require a new hook), but doing that seems like it would require many more dependency changes since one would be modifying the parameters sent to performUpload among other things.

Does anyone else have suggestions about how to proceed here?

The underlying technical problem is that uploads proceed in two steps. First the file is uploaded to the server and then the text is processed and saved to the database. We have a hook into the upload process, but at that point in software the uploaded text is not yet available.

After poking around the code a bit, my instinct is that the best approach probably to add a new hook in UploadBase/performUpload prior to calling LocalFile->upload. Call it something like, UploadBeforeSaveText and send it the file info as well as the $comment, $page_text, and $user. And then move all AbuseFilter processing of uploads to that hook. The biggest potential pitfall is probably making sure the code aborts correctly and doesn't leave dangling stuff on the file system (the code is complicated enough that I'm not currently sure what cleaning might be necessary in response to a late abort).

The only other sensible alternative that I see would be to make page_text, etc. properties of UploadBase and associate it with the file much earlier in the upload process, before verifyUpload gets called. In some ways this might be preferable conceptually, as it would enforce a tighter integration of file description text and file content (and doesn't require a new hook), but doing that seems like it would require many more dependency changes since one would be modifying the parameters sent to performUpload among other things.

Does anyone else have suggestions about how to proceed here?

Please see the referenced tasks. See T89252 and especially T89302: Create check hook before really uploading file with infos about description page and file, which is basically what you suggest. Poke and get feedback/attention there (maybe from MediaWiki-Core-Team).

matmarex changed the task status from Open to Stalled.Apr 3 2016, 12:14 AM
matmarex subscribed.

I'm having a hard time figuring out what is being requested here. From the task title, I thought you want file metadata like width or height, but from the comments, this seems to be a duplicate of T89252.

This overlaps with either T131643 or T89252, depending what you meant, so I'm going to close this one.