Page MenuHomePhabricator

Security review BounceHandler extension for deployement
Closed, ResolvedPublic

Description

Sub part of Bug: https://bugzilla.wikimedia.org/show_bug.cgi?id=69019

Quoting: https://bugzilla.wikimedia.org/show_bug.cgi?id=69019#c0

The BounceHandler extension generates a unique VERP address on every sent email, and has the bouncehandler API that can handle incoming email bounces once the bounce is HTTP POSTed to it via curl from exim. The extension was built as part of the VERP project to properly handle email bounces.

Pipe incoming bounces to the bouncehandler API:-
You need to add the following to your incoming mails to *@wikimedia.com receving PIPE transport:
command = /usr/bin/curl "action=bouncehandler" --data-urlencode "email@-" http://$IP/api.php

The genrated VERP address is of the form
$wikiId-base36( $UserID )-base36( $Timestamp )-hash( $algorithm,$key, $prefix )@$email_domain

where
$wikiId - The wiki database name, to support multiple wikis
$userID - The user_id from table 'user' - to uniquely identify a recipient.
$Timestamp - The unix timestamp.
$prefix = $wikiId. '-'. base_convert( $uid, 10, 36). '-'. base_convert( $timeNow, 10, 36);

It use the Plancake mail parser external library to extract the headers as an addition to the optional inbuilt regex functions.


Version: wmf-deployment
Severity: normal

Details

Reference
bz69099

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:33 AM
bzimport set Reference to bz69099.

We use the plancake mail parser library originally hosted at
https://packagist.org/packages/floriansemm/official-library-php-email-parser and in github - https://github.com/plancake/official-library-php-email-parser/blob/master/PlancakeEmailParser.php

We use the master branch, since it seems to be the most active one, rather than the v1.0
We have made it in a way, that our local regex functions class will be run when the plancake class is not existing.

Hi Tony, my comments about your php code bellow. Mostly should be simple adjustments. Let me know if you need clarification on anything.

For the rest of it:

  • I looked at PlancakeEmailParser.php in master of that repo, and it looks sane, but it would be best to have composer pull in a specific version, in case someone manages to get something crazy in.
  • I'm sure someone in Ops will check the exim rule that calls the api, but do make sure they're correctly escaping the emails address and validating certificates when using curl.

BounceHandler.php

  • novalidate-cert - can your example here validate the cert? Add a comment with an insecure example if you want to, but we should encourage safe defaults.

processIMAPEmails.php

  • DRY - a lot of this code looks like duplicates from other classes. You can a fake request to call the api code directly. But when you have two version of security critical code, updates tend to only make it to one. For example:
    • A row is still inserted if hash doesn't match up since empty array() returned from getUserDetails

includes/ProcessBounceWithRegex.php

  • Steal the regex from plancake to split lines - (\r?\n|\r)
    • that will also let you avoid the trim( $emailLine ) when looking for the end of the header, which seems like it could cause problems.

includes/ProcessBounceEmails.php

  • Relying on no return value from getUserDetails to prevent extra rows being inserted seems fragile (e.g., you return an empty array in processIMAPEmails)
  • Taking the Date from the email header means it's attacker controlled-- see my comments on BounceHandlerActions::handleFailingRecipient()

includes/ProcessBounceWithPlancake.php

  • DRY: put processEmail in base class
  • It would be good to gracefully handle exceptions from PlancakeEmailParser
  • getTo() returns an array, which isn't handled well in the rest of the code

includes/BounceHandlerActions.php

  • unSubscribeUser
    • Updating user table based on email is a bad idea. User the user_id which was covered by the signature. (i.e., attacker sets a legit email address, initiates an email, get's the verp address, then changes their email in MediaWiki to match a victim's, replies to the (valid) verp and victim is unconfirmed.
    • You should really use user object to handle these instead of direct DB access - The cache isn't cleared properly.
  • handleFailingRecipient
    • The current code doesn't use $bounceTimestamp or $bounceValidPeriod, but $bounceTimestamp isn't authenticated, so can't be trusted if you're planning to query the database for hits in a range.

BounceHandlerHooks.php

  • Only handle $recip[0]? That seems like it will leak a verp hash to non-recipients, so anyone on the list can unconfirm the unlucky first user in the array.

(In reply to Chris Steipp from comment #3)

Hi Tony, my comments about your php code bellow. Mostly should be simple
adjustments. Let me know if you need clarification on anything.

For the rest of it:

  • I looked at PlancakeEmailParser.php in master of that repo, and it looks

sane, but it would be best to have composer pull in a specific version, in
case someone manages to get something crazy in.

  • I'm sure someone in Ops will check the exim rule that calls the api, but

do make sure they're correctly escaping the emails address and validating
certificates when using curl.

BounceHandler.php

  • novalidate-cert - can your example here validate the cert? Add a comment

with an insecure example if you want to, but we should encourage safe
defaults.

Removed IMAP support as it is not going to be used. The feature was put in to the code before we came up with the more systematic method of using the API POST.

processIMAPEmails.php

  • DRY - a lot of this code looks like duplicates from other classes. You can

a fake request to call the api code directly. But when you have two version
of security critical code, updates tend to only make it to one. For example:

  • A row is still inserted if hash doesn't match up since empty array()

returned from getUserDetails

  • file removed **

includes/ProcessBounceWithRegex.php

  • Steal the regex from plancake to split lines - (\r?\n|\r)
    • that will also let you avoid the trim( $emailLine ) when looking for the

end of the header, which seems like it could cause problems.

done

includes/ProcessBounceEmails.php

  • Relying on no return value from getUserDetails to prevent extra rows being

inserted seems fragile (e.g., you return an empty array in processIMAPEmails)

  • Taking the Date from the email header means it's attacker controlled-- see

my comments on BounceHandlerActions::handleFailingRecipient()

includes/ProcessBounceWithPlancake.php

  • DRY: put processEmail in base class
  • It would be good to gracefully handle exceptions from PlancakeEmailParser
  • getTo() returns an array, which isn't handled well in the rest of the code

includes/BounceHandlerActions.php

  • unSubscribeUser
    • Updating user table based on email is a bad idea. User the user_id which

was covered by the signature. (i.e., attacker sets a legit email address,
initiates an email, get's the verp address, then changes their email in
MediaWiki to match a victim's, replies to the (valid) verp and victim is
unconfirmed.

  • You should really use user object to handle these instead of direct DB

access - The cache isn't cleared properly.

  • handleFailingRecipient
    • The current code doesn't use $bounceTimestamp or $bounceValidPeriod, but

$bounceTimestamp isn't authenticated, so can't be trusted if you're planning
to query the database for hits in a range.

BounceHandlerHooks.php

  • Only handle $recip[0]? That seems like it will leak a verp hash to

non-recipients, so anyone on the list can unconfirm the unlucky first user
in the array.

Change 152934 had a related patch set uploaded by 01tonythomas:
An additional base64_encode( hash_hmac( prefix, true ) ) was added

https://gerrit.wikimedia.org/r/152934

Change 153016 had a related patch set uploaded by 01tonythomas:
Fixed various issues with the BounceHandler extension

https://gerrit.wikimedia.org/r/153016

Change 152934 merged by jenkins-bot:
An additional base64_encode( hash_hmac( prefix, true ) ) was added

https://gerrit.wikimedia.org/r/152934

Change 153403 had a related patch set uploaded by 01tonythomas:
Generate VERP address for a single/array of recipient addresses

https://gerrit.wikimedia.org/r/153403

Change 153016 merged by jenkins-bot:
Fixed various issues with the BounceHandler extension

https://gerrit.wikimedia.org/r/153016

Change 153403 merged by jenkins-bot:
Generate VERP address for a single/array of recipient addresses

https://gerrit.wikimedia.org/r/153403

Almost every issue pointed out here is fixed in the above patch sets.

Change 153871 had a related patch set uploaded by 01tonythomas:
Improved the unsubscribe user method used by BounceHandler extension

https://gerrit.wikimedia.org/r/153871

Change 153871 merged by 01tonythomas:
Improved the unsubscribe user method used by BounceHandler extension

https://gerrit.wikimedia.org/r/153871

It looks like this extension will need to be deployed on loginwiki (what Tony just told me): Does that change your perspective, Chris?

Just asking because, well, loginwiki is a really really special wiki :)

(In reply to Greg Grossmeier from comment #15)

It looks like this extension will need to be deployed on loginwiki (what
Tony just told me): Does that change your perspective, Chris?

Just asking because, well, loginwiki is a really really special wiki :)

Tony, can you explain why loginwiki? That seems like the least useful wiki to install it on, and we typically only install extensions there that are necessary for it's sso functionality.

(In reply to Chris Steipp from comment #16)

Tony, can you explain why loginwiki? That seems like the least useful wiki
to install it on, and we typically only install extensions there that are
necessary for it's sso functionality.

Hey, actually we where discussing about a wiki in prod that would have maximum user accounts, and I think hoo or lego came up with the name loginwiki. The extension greps in the CentralAuth table to identify the failing user. We are comfortable with installing it anywhere in prod, which have CentralAuth access and can interact with the english-wiki ( which would create most bounces ).

If it only has to run on one wiki, then I'd be more comfortable with meta or enwiki. I don't want a larger attack surface on loginwiki than we absolutely need.

(In reply to Chris Steipp from comment #18)

If it only has to run on one wiki, then I'd be more comfortable with meta or
enwiki. I don't want a larger attack surface on loginwiki than we absolutely
need.

Talked with Hoo and Jeff on that, and they are positive to get it installed on en-wiki :) Will prepare the patch soon.