Page MenuHomePhabricator

Remove "hits" from CAPTCHA dictionary
Closed, ResolvedPublic

Description

I'm forking this from bug 10408. The word "hits" tends to produce unfortunate CAPTCHAs whenever it gets appended to a plural word. While we discuss the reasonability of adding a blacklist regexp for naughty words at the original bug, in the mean time it would be reasonable to simply remove the offending word from Wikimedia's word list.


Version: unspecified
Severity: minor
URL: http://en.wikipedia.org/wiki/Image:Peashits.png

Details

Reference
bz16166

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:17 PM
bzimport set Reference to bz16166.
bzimport added a subscriber: Unknown Object (MLST).

Brion, any idea on this one?

The words 'shit' and 'shits', should be removed too. :)

Do we actually have a "dictionary", or is this farmed off to google using ConfirmEdit/Recaptcha...?

We use/used a system dictionary file I think (eg /usr/share/words or some such) to generate the captcha images via the python script included with ConfirmEdit. Not sure whether they're being actively generated now or if we're just using a large existing image pool.

(Wikimedia doesn't use Recaptcha due to its proprietary licensing.)

Yes, we use that file with a few filters (I prefer not to give too much details here, although our wordlist is public).
I don't think the files are ever regenerated.

Anyone with shell should be able to fix this and regenerate the files, giving to Tim just because he is familiar with it.

(In reply to comment #4)

(Wikimedia doesn't use Recaptcha due to its proprietary licensing.)

We do for fund-raising stuff, the reason we don't normally is because of the external reliance stuff (as of the last discussion), but we apparently pay no attention to that for the fund-raising.

We shouldn't. The developer which worked on it didn't really know how to use our FancyCaptcha at that time.

What's left to even do on this one?

(In reply to comment #8)

What's left to even do on this one?

I think resolving bug 21025 would be a better use of time than focusing energy on this bug.

Marked as depending on bug 21025, as they are closely related.

RobLa, this is one of those 10-minute bugs that lag for years because they need shell access.

Steps to take:

  • Find out where in fenari the blacklist is.
  • Commit it to ConfirmEdit (closes bug 21025)
  • Add 'hits' to it and commit.
  • Run captcha.py in a new folder, eg. /mnt/upload7/private/captcha-en (see bug 38699 or similar irc logs for bug 38391)
  • Change in CommonSettings.php: $wgCaptchaDirectory to the new folder and sync-file.
  • Close this bug

Captchas just being sent (but not answered) will still work.

Looks like /mnt/upload7/private/captcha2 has has word lists, including a bad one.

Can you commit it to the ConfirmEdit repository? :)

MW (ConfirmEdit extension) territory -> no "ops" in this case.

(In reply to comment #11 by Aaron Schulz)

Looks like /mnt/upload7/private/captcha2 has has word lists, including a bad
one.

Does this need any help from another team? I assume not.

Now that bug 21025 is resolved, this bug just needs someone in ops to verify which blacklist is being used on Wikimedia wikis.

There isn't one really. The one in my home dir on fenari should be put somewhere standard...probably in puppet to (though it would be an amusing commit to make).

I assume Wikimedia wikis are using captcha.py. The only question then is whether it's already passing the --blacklist option currently. If not, the default value should get picked up in the next code update. Maybe.

(In reply to comment #16)

I assume Wikimedia wikis are using captcha.py. The only question then is
whether it's already passing the --blacklist option currently. If not, the
default value should get picked up in the next code update. Maybe.

We only run it manually I did the last run...it should really happen periodically with proper image rotation though...I think that's a bug report.

The word "hits" tends to produce unfortunate CAPTCHAs whenever it gets appended to a plural word

Already resolved by T23025.