Page MenuHomePhabricator

Replace MD5 password hashing with more secure hash
Closed, ResolvedPublic

Description

Author: theevilipaddress

Description:
MediaWiki uses MD5 for encrypting user passwords, however that encryption method isn't secure any longer (see for example the lead section of http://en.wikipedia.org/wiki/MD5, including references). Though it might not be the most urgent problem so far (thus marked as minor), I believe that MediaWiki should switch someday to a more secure encryption method, at the moment SHA-2 seems to be the recommendation.


Version: 1.24rc
Severity: enhancement
URL: http://www.mail-archive.com/wikitech-l@lists.wikimedia.org/msg08830.html
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=54948

Details

Reference
bz28419

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

happy.melon.wiki wrote:

But given that the whole point is to include a physical barrier to someone being able to brute-force a stolen database on a GPU cluster or somesuch, storing a hash of the non-db key in the db turns the problem back into resting on the preimage security of whatever hash function is used on the key. Generate the (potentially many) preimages that match the hash (by brute force if necessary), and run them in parallel in a dictionary attack on the entire password set. You only need *one* user to have used a trivial password to identify which of the hash preimages is the actual secret key, thereafter it's exactly the same as currently, just with a larger salt. Having a non-db secret key only provides a fundamentally *new* layer of protection (as opposed to just an incrementally stronger implementation of the existing security) if no information at all about it is stored in the database.

This is all rather off-topic for this particular bug, anyway...

ayg wrote:

(In reply to comment #12)

But given that the whole point is to include a physical barrier to someone
being able to brute-force a stolen database on a GPU cluster or somesuch,
storing a hash of the non-db key in the db turns the problem back into resting
on the preimage security of whatever hash function is used on the key.

We aren't worried about the preimage security of the hash function. No widely-used cryptographic hash function that I know of has ever had a practical preimage attack. Even MD4, which is so badly broken that you can generate collisions *by hand*, has no preimage attack better than 2^102 or so.

What we're worried about is brute-forcing, and you can't brute-force a (for instance) 40-byte hexadecimal string. That's 2^160 possibilities, which reduces to 2^80 given birthday attacks, which is wildly impractical. Make it 60 bytes if you like, or 100.

Even if you had a preimage attack, that only gives you *some* preimage. If you choose a salt that's at least a few bytes longer than the hash, then with overwhelming probability, the preimage generated by your attack will not be the same as the actual salt. That makes it useless to an attacker.

Generate the (potentially many) preimages that match the hash (by brute force
if necessary),

You cannot generate preimages for a cryptographic hash function using brute force, unless the input is guessable. Many passwords are guessable, or at least short (which amounts to the same thing), which is why brute-forcing works at all. A long random string cannot be brute-forced, and can't even be obtained from vulnerabilities in the hash function (as long as it's a few bytes longer than the hash itself).

Put another way: if the hash is 256 bits and the salt is 320 bits, then there will be at least 2^64 possible salts on average that have a given hash. Even if you had a magic oracle that enumerated all the possible preimages of the hash of the right length, you'd never be able to try them all. For perspective, storing that many salts would take up millions of terabytes; and if you had a 10 GHz CPU that could somehow do one hash execution per clock cycle, it would take several decades to check all the hashes. If that seems a little too easy to get, make the salt 1024 bits -- it's not going to affect execution speed noticeably.

Having a non-db secret
key only provides a fundamentally *new* layer of protection (as opposed to just
an incrementally stronger implementation of the existing security) if no
information at all about it is stored in the database.

Correct. For all intents and purposes, the cryptographic hash of a long random string provides no information about the string at all. Thus it's fine to store.

(In crypto terms, the information it provides is "negligible". Almost nothing in crypto gives you better assurances than "except with negligible probability". You could successfully invert an RSA public key in constant time, too, but only with negligible probability.)

This is all rather off-topic for this particular bug, anyway...

This bug is about replacing MediaWiki's hash function, so it seems quite germane. Maybe we should make a mediawiki.org page to write up the conclusions, so people don't have to wade through the discussion to get at them.

I was experimenting a while ago with Extension:SecurePasswords, which was an attempt to improve upon MediaWiki's default password hashing scheme. Many of the measures implemented were clearly unnecessary, but it got me thinking.

First of all, I don't see how the speed of C code for a hashing algorithm v. the speed of PHP code means anything in terms of brute-forcing. Maybe I'm missing something (and if I am, please explain it). The rate of brute-forcing is only affected by the speed of the language the code is written in. Now since MediaWiki throttles password attempts, we can safely assume that any brute force attempt will only be successful if the attacker has direct database access anyway. And any attacker with any knowledge of programming languages is going to program their brute forcing application in C. So in the end the rate at which the attacker can brute force passwords has nothing to do with how fast PHP does it. It's not like the brute forcing application is racing against the MediaWiki installation to hash passwords.

Among other things, I found that a PBKDF-style iteration scheme was about six
times slower in PHP than in C, not orders of magnitude. With 32k iterations of
SHA1, PHP took around 50-100 ms, while C (using OpenSSL's SHA1 implementation)
took around 8 ms, or about 120 passwords/second. On a GPU, I got about 10,000
passwords a second. (With straight uniterated SHA1, I got 1.8 million
passwords/second on a CPU, and 220 million on a GPU. I didn't salt, but
salting would make no difference to the performance, since it only affects
initialization.)

With this in mind, I wouldn't say using PBKDF2 is that bad an idea, and to further the scheme, the current password hash (double MD5 hash) can be combined with an external salt as aforementioned before being put through PBKDF to add that extra layer of security.

(Another idea that I personally think is unnecessary but was used in Extension:SecurePasswords is a method by which each user's password hash is done using a different HMAC hash, i.e., depending on some manipulation of the user ID, one user may have Whirlpool while another might have SHA-512. To see what I mean exactly you can look at the Extension:SecurePasswords source.)

ayg wrote:

(In reply to comment #14)

First of all, I don't see how the speed of C code for a hashing algorithm v.
the speed of PHP code means anything in terms of brute-forcing. Maybe I'm
missing something (and if I am, please explain it).

Suppose PHP executed hash algorithm 1 half as fast as C, and hash algorithm 2 a tenth as fast, for a fixed number of iterations. Suppose further that we choose the number of iterations so that executing it in PHP takes 50 ms, so as not to delay login noticeably. Then with algorithm 1, a C implementation will execute twice as fast as PHP, i.e., 25 ms/hash, or 40 hashes/s. With algorithm 2, it will be 5 ms/hash, or 200 hashes/s. The fact that algorithm 2 can be executed five times as fast in C relative to PHP, for the same number of iterations, translates into an attacker getting five times the throughput when cracking. It's all because we need to choose the number of iterations based on the speed in PHP, but the attacker gets to use C.

(In reply to comment #15)

(In reply to comment #14)

First of all, I don't see how the speed of C code for a hashing algorithm v.
the speed of PHP code means anything in terms of brute-forcing. Maybe I'm
missing something (and if I am, please explain it).

Suppose PHP executed hash algorithm 1 half as fast as C, and hash algorithm 2 a
tenth as fast, for a fixed number of iterations. Suppose further that we
choose the number of iterations so that executing it in PHP takes 50 ms, so as
not to delay login noticeably. Then with algorithm 1, a C implementation will
execute twice as fast as PHP, i.e., 25 ms/hash, or 40 hashes/s. With algorithm
2, it will be 5 ms/hash, or 200 hashes/s. The fact that algorithm 2 can be
executed five times as fast in C relative to PHP, for the same number of
iterations, translates into an attacker getting five times the throughput when
cracking. It's all because we need to choose the number of iterations based on
the speed in PHP, but the attacker gets to use C.

Ah OK. That makes a lot more sense.

Well, I made a draft implementation of pbkdf2 in PHP's hash module, and compared them both on my Core 2 Duo server. I used the SHA-512 hash of "test1" and the hash of "test2" as the password and salt respectively, and I used 100,000 iterations for PBKDF2 (recommended is 1000+). And on an average of 1000 iterations, the PHP code actually was 0.0419 seconds faster than the C code, which was a 25% difference. Now I'm assuming this is just some trivial overhead that caused the PHP code to be faster, but the point is that the C code doesn't really have that much of an advantage. Most of the heavy work (the actual hashing) is being done down in C anyway.

Created attachment 9521
New password style using PBKDF2

Here's a PHP patch for using PBKDF2 in MediaWiki. On my desktop computer it seemed to work just fine with 10,000 iterations, which is 10x the minimum recommended amount.

attachment NewPasswordStyle.patch ignored as obsolete

sumanah wrote:

Thanks for the patch, Tyler! I added the keywords "patch" & "need-review" to indicate that they await review. In the future, please do add those keywords to a bug when you attach a patch, to speed up the code review. Thanks again.

Sorry about that. Will add them next time.

EN.WP.ST47 wrote:

Patch with corrected whitespace

The original patch had incorrect whitespace and therefore would only apply with patch -l, and even then obviously used four spaces instead of tabs. This patch corrects that issue. I also removed some (debugging) code that echoed the computed and stored salt and hash when the password entered was incorrect. I have also tested this patch. New accounts use the correct password style, old accounts are updated to use the new style, and both types of accounts are able to successfully log in and are shown the correct warning when entering the wrong password.

Attached:

(In reply to comment #21)

Created attachment 9561 [details]
Patch with corrected whitespace

The original patch had incorrect whitespace and therefore would only apply with
patch -l, and even then obviously used four spaces instead of tabs. This patch
corrects that issue. I also removed some (debugging) code that echoed the
computed and stored salt and hash when the password entered was incorrect. I
have also tested this patch. New accounts use the correct password style, old
accounts are updated to use the new style, and both types of accounts are able
to successfully log in and are shown the correct warning when entering the
wrong password.

Thanks. I don't submit that many patches so I always forget that MediaWiki uses tabs instead of spaces. (And the debugging code I actually removed. I guess I uploaded the wrong patch file by accident.)

Attached:

Sorry that this review is backed up in the queue. I happened to notice Django's latest release announcement[1] (they just switched to PBKDF2) and it reminded me about this issue. I think once we get our security engineer hired, we'll be able to take a closer look at this.

This is in a particularly important area for correctness, and we're pretty conservative about futzing with it. It's also a potential migration headache for existing users (including Wikimedia), so we need to be careful about what we do.

My initial feedback on the patch: it seems like going with PBKDF2 is a pretty defensible position. I think it'll still be important to respond to all of Tim's points in his original email, but I also don't think we're going to be able to wait for Tim to have the time to implement this exactly the way he envisions. Thank you Tyler for moving this forward with some code.

The one thing that I think is a very important point of his original suggestions is this "2. Upgradeable: it should be possible to compute the C-type hash from
the B-type hash, to allow upgrades without bothering users." Does this patch achieve this goal? I can't quite tell.

One big area for improvement is the function naming. Our current function naming is tragic, and this patch compounds the error in a way that is both confusing and backwards-incompatible. The current API uses "oldCrypt" for A-type crypt strings, and "crypt" for B-type crypt strings. This patch uses "reallyOldCrypt" for A-type, "oldCrypt" for B-type crypt, and "crypt" for C-type. "really"? really? :-) I won't give you too much grief, Tyler, since you seem to have just been following the convention of an unfamiliar (and, in this case, silly) codebase, but I think this is one time that deference to tradition doesn't work.

Here's my suggestion for naming functions: implement cryptTypeA(), cryptTypeB(), and cryptTypeC(). Have a deprecated "oldCrypt" call cryptTypeA(), and implement the crypt type selection logic in the new crypt().

[1] https://docs.djangoproject.com/en/dev/topics/auth/#auth-password-storage

Some very good points. I definitely agree with changing the function names. XD But as far as the upgradeable part goes, it would be a security hazard to generate new hashes based on the old hashing algorithm.

From what I can tell, there are two purposes to using PBKDF2 over our current crypt method:

  1. Use of SHA-512 over MD5, which is vastly superior.
  2. Increase in amount of work required to calculate the hash.

If we were to hash the current hash into the new hash, it would carry over any collisions from MD5 over into the final PBKDF2 hash. However, the current patch is constructed so that if a specific global variable is set to True, whenever a user with an old hash performs a successful login, it recalculates the hash and stores it.

Patch needs updates to use MWCryptRand for it's salt instead of the deprecated insecure wfGenerateToken

I agree with the notion of storing something in the db related to the secret token. We should really be able to say "Incorrect password secret found in system configuration. You may have to reset your password by email to log in to your account." instead of saying "Invalid password or username." even when the password is right.

My thought is to include say the first say 3 bytes (6 hex chars) of a sha1 of the secret key along with the other salt.
This way the secret is tied to the password it was used to hash instead of to the whole database.

That way we can make our password secret configuration variable an array. Because we have a sub-hash of the secret used we can hash our secrets and compare their sub-hashes with that one. If none match we have bad config and can't handle the password. If one matches then we know what secret to use.

This will allow for proper migration. For example if a wiki does manage to leak it's secret along with it's database after resetting the user_token column they can change their password secret configuration. Instead of replacing the key, they add a new one to the array. Doing so means that MediaWiki is capable of validating passwords using both the new and old secret. As users log in MediaWiki would automatically re-hash the passwords with the new secret. This would allow the wiki after compromise to re-secure the passwords in the database by telling all of it's users to log back in and change their passwords. Instead of making every single password in the database completely invalid and forcing users to do email resets and call in to support just to re-secure their account.

Also, I think the number of iterations used should be included in the stored string. If you upgrade your 10000 iterations to 50000 rounds all the passwords become invalid. Storing the number of iterations used means that a wiki can happily upgrade the number of iterations use without invaliding passwords and allowing them to be upgraded. Otherwise a wiki is forced to delay and potentially never upgrade to better security because doing so would break user logins. It also prevents the same issue of telling every user their password is invalid when it's not.

Also turning crypt() into oldCrypt and making crypt() ':C:' and making oldCrypt reallyOldCrypt isn't correct.

The separation between crypt() and oldCrypt() is not the addition of new password hashing schemes. It's the change from the old database scheme where we just dumped the hash into the database. To the new scheme where we include a ':X:' to denote what type of password hash it is.

Looking at the spec for PBKDF2 and some other implementations this looks a little off:

substr( base64_encode( $hash ), 0, $wgPasswordLength )

The substr is supposed to be done on the raw $hash, not on the base64 output. This could output a different hash than a standard PBKDF2 implementation would.

I started a branch for this:
https://github.com/dantman/mediawiki-core/compare/master...2012%2Fpassword-hashing

So far I've:

  • Split our crypt() and comparePassword() into a pluggable class system that can be extended and works much more sanely (you only implement the hashing function once, instead of implementing it in two spots)
  • Implemented a PBKDF2-HMAC password type implementation

Things on the todo list:

  • Implement a version of the PBKDF2-HMAC which also has a secret key in it.
  • Make the login page understand the other types of errors that the password system can output.
  • Implement automatic upgrading of passwords on login. (Also make sure it's implementation doesn't conflict with auth extensions)
  • Adding password system tests to the phpunit test framework.
  • Adding some test cases to ensure that PBKDF2-HMAC conforms to the specification.

https://github.com/dantman/mediawiki-core/compare/master...2012%2Fpassword-hashing

I don't mean to be offensive, but this is a really bad way to implement password checking. The object-based method in the linked branch blatantly abuses OOP (static functions are not meant to be used that way). Also it's unnecessarily complicated.

My recommendation would be to simply have one Password class. A hook would allow the registering of new password types. (The hook signature would be passfunc($password, $options=array()).)

This class would optionally take a hash as its constructor and parse the hash. Then the object would have a crypt() function, which would determine the hash type (or choose the preferred type if no original hash was given) and pass the password (and optionally any options if a hash was given to the constructor) to a hook function. Then there would be a compare() function that simply calls the crypt() function on a given password and compares it to the original hash.

What are you talking about?

The use of statics seams perfectly in line with the way we use them in other places of core. Just look at SpecialPage, SpecialPageFactor, Html, Linker, MWCryptRand, etc...

The password system was designed that way to simplify the implementation of common password types while remaining flexible enough for implementations that don't follow the common : pattern to be implemented.

Additionally I kept in mind the fact that we are going to want to be able to setup and teardown these classes easily so that we can add unit tests of this password system to ensure that there are never any regressions that suddenly cause all passwords to be invalid, or even worse, allow anyone to log into any account using any password.

A simple 'FOO' type that uses sha256 and a hmac can be implemented with just a little bit of code:
class Password_TypeFOO extends BasePasswordType {

protected function run( $params, $password ) {
  list( $salt ) = self::params( $params, 1 );
  return hash_hmac( 'sha256', $password, $salt );
}

protected function cryptParams() {
  $salt = MWCryptRand::generateHex( 8 );
  return array( $salt );
}

}

While at the same time one that uses crypt() which doesn't follow our patterns can also be implemented simply.
class Password_TypeCRYPT implements PasswordType {

public function getName() { return 'CRYPT'; }

public function crypt( $password ) {
  return crypt( $password );
}

public function compare( $data, $password ) {
  return Status::newGood( crypt( $password, $data ) === $data );
}

public function isPreferredFormat() { return true; }

}

What I'm saying is that the complicated system you're using is entirely useless and unnecessary. A password hashing algorithm should only need one singular function, crypt(), that takes a password and options and returns a hash.

Introducing cryptParams() and an entirely static class that serves the sole purpose of registering other classes is not only unnecessary but confusing. All you need is one password class that abstracts the functionality of hashing and comparing passwords, and then have a hook for registering password functions.

Why have an entire class with four functions (three of which being unnecessary) when you can just use one and not have a complicated system of objects and inheritance. (Also, there should not be "preferred formats" for the individual hashing algorithms. A site administrator will choose which password scheme he or she wants and that will be considered the "current format". Different people have different ideas on the security they want.)

You're using completely abstract arguments that ignore the reality of the system we're trying to implement this in.

We have no abstract concept of 'options'. All we have is a big blob of text like "250:32:d41d8cd98f00b204e9800998ecf8427e". Right now "WE" have a pattern of arbitrarily separating options using ':'. However this is in no way a standard we should require. Other implementations like crypt() use $ to separate options. And there is a possibility of our parsing things this way conflicting with the way another implementation does it and making any implementation have to use hacks to undo what we shouldn't have done in the first place.
Forcing one storage format for options and the hash gets in the way of the possibility of password implementations which use external libraries like bcrypt or scrypt.

md5 was good enough years ago. Now it's not. Best security practices change over time and add more requirements. So I am opposed to any short-sighted attempt to simplify an interface by ripping out pieces that give it enough flexibility to handle future password hashing algorithms with different requirements. We made that mistake once already, and I don't intend to have to rewrite this again because we didn't think ahead.


preferred type, common type... they're the same thing, we don't need to bike-shed over names. The preferred type is there for precisely that reason, so that the current/preferred password format can be overridden and we're not stuck with A and B hardcoded.

And yes individual schemes should be able to declare whether something is preferred or not. The point of that is so that when someone say changes the PBKDF2 settings and wants 20000 iterations we don't leave 10000 iteration passwords in the database when the wiki is configure to use 20000. We don't know what configuration variables the implementation gives the sysadmin or what format options are in, so that method is there for implementations to declare when the settings used for a password do not match what the site admin has configured into their settings.

I randomly grabbed the name preferred since it's also used in password upgrading when enabled.

By the way. I double checked. And yes, the separation between Password and PasswortType IS necessary for the implementation of password system unit tests.

There are a number of problems with that that I don't think you realize. There needs to be some way to determine what type of hash it is, and that cannot be done if the delimiter can be whatever the hell the hashing plugin wants.

Regardless, in a pluggable hashing system, it is not the hashing function's job to format the hash for storage in the database. Formatting and storage is something that should be unified and handled in a centralized manner by some sort of Password class. I cannot foresee any reason ever for MediaWiki to want to up and switch delimiters for no apparent reason.

All that is needed is a central Password class that will handle all formatting and hash parsing and then pass off actual cryptography work to the respective functions. And yes, the colon-separated format is a format WE came up with, but why in God's name would MediaWiki need to allow hashing functions to change the delimiter or storage format? There is no security advantage to using dollar signs over colons that I know of.


As far as the preferred/current type issue, I understand. I was just confused by All PHP hashing functions output a binary hash (or hex). A pluggable hashingthe wording and took it the wrong way. There does need to be a way for each hashing function to tell MediaWiki that its specific configuration options have been changed and that old hashes should be updated.

So here is what I suggest:

class Password {

public static function init();
public static function register(name, function, defaultOptions);
public static function parseHash(hash); /* Returns [[option, ...], hash]
public function __construct(hash="");
public function crypt(password);
public function compare(password);
public function current();

}

init - Does all the stuff your implementation does right now.
register - Maps the name to the function and list of default options.
parseHash - Parses the hash and returns a list of options and the hash.
__construct - Makes a new class based on a hash (or not).
crypt - Passes the plaintext to the hashing function and formats the raw hash.
compare - Passes the plaintext to crypt() and compares it to the internal hash.
current - Sees if the options in the given hash match the default options.

This system allows for a) a password hashing system that makes sense and isn't confusing and b) an easy method of updating hashes. No need for a hierarchy of classes, wach with it's own set of functions, etc.

(In reply to comment #34)

There are a number of problems with that that I don't think you realize. There
needs to be some way to determine what type of hash it is, and that cannot be
done if the delimiter can be whatever the hell the hashing plugin wants.

Yes it can.

We can have one of 2 formats:

  • A 32 char lowercase hex string; This is an old password from before we introduced types. It gets compared with oldCrypt.
  • Data starting with ':A:', etc...

Data which is not an oldCrypt string must start with ':TYPE:', a TYPE cannot contain a :. The data after the type identifier is treated as raw and passed to the password type implementation raw allowing it to handle the data in whatever format it wants.

The 'Password' class handles the :TYPE: while type implementations work with the raw data after the :TYPE:.
This provides a guarantee that:

  • New data will never match our old 32 char strings
  • There is always a type identifier for the password
  • The password type implementations may work with raw data if necessary so they can use a suitable data serialization.

Regardless, in a pluggable hashing system, it is not the hashing function's job
to format the hash for storage in the database. Formatting and storage is
something that should be unified and handled in a centralized manner by some
sort of Password class. I cannot foresee any reason ever for MediaWiki to want
to up and switch delimiters for no apparent reason.

All that is needed is a central Password class that will handle all formatting
and hash parsing and then pass off actual cryptography work to the respective
functions. And yes, the colon-separated format is a format WE came up with, but
why in God's name would MediaWiki need to allow hashing functions to change the
delimiter or storage format? There is no security advantage to using dollar
signs over colons that I know of.

There's not security advantage. There is however a fact you are ignoring.
There are already existing implementations that use dollar signs.
crypt(), bcrypt(), etc... (likely scrypt as well)
Enforcing that all data must use : separators interferes with the ability to write a password implementation that plugs a well known 3rd party password hashing library that uses a different separator into MediaWiki.

(In reply to comment #35)

(In reply to comment #34)

There are a number of problems with that that I don't think you realize. There
needs to be some way to determine what type of hash it is, and that cannot be
done if the delimiter can be whatever the hell the hashing plugin wants.

Yes it can.

We can have one of 2 formats:

  • A 32 char lowercase hex string; This is an old password from before we

introduced types. It gets compared with oldCrypt.

  • Data starting with ':A:', etc...

Data which is not an oldCrypt string must start with ':TYPE:', a TYPE cannot
contain a :. The data after the type identifier is treated as raw and passed to
the password type implementation raw allowing it to handle the data in whatever
format it wants.

FWIW, what I described here is almost precisely what we already do with ':A:', ':B:'. We don't explode everything by ':' at the type comparison, we test the string to see if it starts with ':A:' or ':B:'. We only use explode() once we get into the actual type implementation. The only difference here is we can now use type names that aren't single characters. And we give proper error conditions if the stored data does not match a valid format.

A crypt() backed password implementation may end up storing passwords in an end format that looks like:
:CRYPT:$1$nsYRDLy7$invZ4E66PRu85Uss0mmdA0
((The fact that a :CRYPT: exists at the start being transparent to the implementation))

The implementation would look something like what I posted before.


There's another factor I should mention. Password may not necessarily stay purely static.
Take a look at MWCryptRand. The class was actually originally written purely with static variables. Now you see a singleton structure with real* methods and statics that everyone uses. This was an intentional design decision TimStarling had me change the code to fit. He wants the unit test framework to be able to easily set up and tear down instances and have test subclasses that allow the unit test framework to be able to play with the internals and ensure that the class functions correctly.
It's a distinct possibility that Password may end up the same way in order to allow the unit test framework to control it.
So lumping password type and password into a single class may be short-sighted ignoring how it could possibly get in the way of later decisions on how to unit test it.

OK, now I understand the logic. In that case I'll give my support FWIW. The only thing I would recommend is cleanup the structure a little bit. There's no need for a PasswordType interface when there's a base abstract class that's already required to be inherited by the primary Password class.

Also, I would change the cryptParams() function to defaultCryptParams() and replace preferred format with current format.

(In reply to comment #37)

OK, now I understand the logic. In that case I'll give my support FWIW. The
only thing I would recommend is cleanup the structure a little bit. There's no
need for a PasswordType interface when there's a base abstract class that's
already required to be inherited by the primary Password class.

The separation between PasswordType and BasePasswordType (I think I used to call it AbstractPasswordType or CommonPasswordType, should I rename it back?) is what allows the flexible implementation of things like crypt(3).
An implementation that uses our : pattern extends BasePasswordType and lets BasePasswordType abstract away all the storage, crypt, and comparison bits. While an implementation that needs to call a 3rd party library or say crypt(3) with raw data instead implements PasswordType directly and uses the raw data to handle crypt and compare itself.
I don't believe I referenced BasePasswordType anywhere but in the extends. All instanceof checks should be against the PasswordType interface.

Though I do think I could turn the PasswordType interface into an abstract class so that isPreferredFormat can have the same require true; default as BasePasswordClass.

Also, I would change the cryptParams() function to defaultCryptParams() and
replace preferred format with current format.

Hmmm... 'default' might work. At the same time they don't feel like 'default'. It's more of a callback saying "I'm hashing a password for the first time, please create the parameters to run the algorithm with." Using 'default' makes it sound a little static when it's not, since cryptParams() returns things like completely random salts.

(In reply to comment #38)

(In reply to comment #37)

OK, now I understand the logic. In that case I'll give my support FWIW. The
only thing I would recommend is cleanup the structure a little bit. There's no
need for a PasswordType interface when there's a base abstract class that's
already required to be inherited by the primary Password class.

The separation between PasswordType and BasePasswordType (I think I used to
call it AbstractPasswordType or CommonPasswordType, should I rename it back?)
is what allows the flexible implementation of things like crypt(3).
An implementation that uses our : pattern extends BasePasswordType and lets
BasePasswordType abstract away all the storage, crypt, and comparison bits.
While an implementation that needs to call a 3rd party library or say crypt(3)
with raw data instead implements PasswordType directly and uses the raw data to
handle crypt and compare itself.
I don't believe I referenced BasePasswordType anywhere but in the extends. All
instanceof checks should be against the PasswordType interface.

Though I do think I could turn the PasswordType interface into an abstract
class so that isPreferredFormat can have the same require true; default as
BasePasswordClass.

That's my bad. Mixed up the name of the interface and abstract class and thought you were requiring password types to be children of the base class. Ignore my previous comment.

Also, I would change the cryptParams() function to defaultCryptParams() and
replace preferred format with current format.

Hmmm... 'default' might work. At the same time they don't feel like 'default'.
It's more of a callback saying "I'm hashing a password for the first time,
please create the parameters to run the algorithm with." Using 'default' makes
it sound a little static when it's not, since cryptParams() returns things like
completely random salts.

Default isn't perfect, but all I know is that it shouldn't be preferred. It just gives the wrong idea. Maybe newCryptParams()? Idk.

(In reply to comment #39)

Also, I would change the cryptParams() function to defaultCryptParams() and
replace preferred format with current format.

Hmmm... 'default' might work. At the same time they don't feel like 'default'.
It's more of a callback saying "I'm hashing a password for the first time,
please create the parameters to run the algorithm with." Using 'default' makes
it sound a little static when it's not, since cryptParams() returns things like
completely random salts.

Default isn't perfect, but all I know is that it shouldn't be preferred. It
just gives the wrong idea. Maybe newCryptParams()? Idk.

"preferred", yeah that of course doesn't fit. This is creation of parameters for a crypt() call, not comparing data against the preferential format defined in settings. Are we mixing cryptParams() and isPreferredFormat() up here?

Lol, sorry. I've been working on like three hours of sleep these past few days so forgive me if I'm jumbling things in my head. What I meant to say was that just having cryptParams() is a bit misleading because those parameters are only used for new passwords, not all the time.

Hmm... you might be a little right on that and a little wrong on that.

We have:

  • crypt() - Hashes a password
  • compare() - Compares a password against pre-hashed data

The user system was the same:

  • User::crypt() - Hash a password
  • User::comparePasswords() - Compares a password against pre-hashed data

In our system 'crypt' has only referred to that first hashing of data. Not to the comparison.

I suppose it could be mixed up when you throw in crypt(3) which acts simultaneously as both crypt() and half of compare() depending on how many arguments you give it.

But why the separation? For any password hashing algorithm, compare(hash, password) === (hash == crypt(password)), assuming crypt is passed the proper options.

Also, implementing this pluggable hashing system could make solving a number of other Special:Userlogin bugs much much easier.

The patch should be submitted to gerrit!!

This: http://www.opine.me/a-better-way-to-store-password-hashes/ sounds like a possibly good idea. Would require schema change (one or another way) though.

Damn. That's one of those ideas you can't believe you never thought of before. This is simply ingenious and I plan on taking my hand at implementing it unless somebody has an objection.

(In reply to comment #47)

Damn. That's one of those ideas you can't believe you never thought of before.
This is simply ingenious and I plan on taking my hand at implementing it unless
somebody has an objection.

Password reset may be a problem, since you have to disable the old password when resetting new. You may do it by changing salt, but the old one will remain forever, so we will probably need password change throttling or something like that.

Actually now that I've read the entire idea and thought about it I'm having second thoughts. It would be safer to just use scrypt or similar memory-intensive KDFs rather than risk the possible hash collisions that could result from this approach.

(In reply to comment #46)

This: http://www.opine.me/a-better-way-to-store-password-hashes/ sounds like
a possibly good idea. Would require schema change (one or another way) though.

That's an interesting read. But I'm not really buying it. At least not for the case of MediaWiki.

Firstly, I don't buy the whole "large pile of spaghetti data == slow password cracking".
The task of brute forcing passwords is still the same general pattern. Take user salt, generate password, validate. Where validation is a lookup in this big pile of data instead of an equality test.
I don't consider this as big of a hurdle as the author seems to think of it.

Firstly, the author mentions "search the entire Hashes table". But this is a complete characterization of the job involved here. There's no need to do anything as slow as a search of the table. This is a very simple existence check. And we've already developed numerous ways to speed up that kind of test. Anyone with enough brains to get ahold of this data is going to be smart enough to use some sort of index for this. Done in a way that optimizes it's RAM usage. Not shoved into some database that does things it does not need to. The operation isn't going to take long.

I'm also skeptical about "big RAM requirement". Last I checked for this kind of task an index of the data is actually smaller than the data itself. And it's not that hard to get 32GB of RAM into a machine. This means that the GOAL of this way of doing password storage is to make sure that your password storage alone takes up well over 32GB of disk storage. Even if your wiki doesn't even use 1GB of data storage.

The article also suggests that the need to store the entire table of hashes would somehow be a hard feat. This might be true if you assume the only way someone is going to extract the data would be with SQL injection. But that's not the vector we're focusing on. I believe we're more worried about things like database backup leaks. And other types of vulnerabilities which in the end, the attacker likely already has access to the whole data.

Though most of what I rebutted isn't even the biggest issue here:

  • Most MediaWiki installations run on VPS servers, shared hosts, etc... where they have disk capacity limits under 20GB of storage.
  • This article suggests that basic password storage should be so hard that you can't store it on a USB stick and can't fit it in ram. This practically necessitates password storage over 32GB.

In other words. This doesn't seem to work for 99% of MediaWiki installations without forcing thousands of wikis to start paying hosting costs several times what they are currently paying.

It also practically spits on the idea that small sites should keep backups of their data. If password storage bloats databases to this size I doubt as many small sites are going to be willing to keep backups.

(In reply to comment #48)

(In reply to comment #47)

Damn. That's one of those ideas you can't believe you never thought of before.
This is simply ingenious and I plan on taking my hand at implementing it unless
somebody has an objection.

Password reset may be a problem, since you have to disable the old password
when resetting new. You may do it by changing salt, but the old one will remain
forever, so we will probably need password change throttling or something like
that.

A logged in user wants to change their password. And we're going to get in their way?

Frankly this way of password storage implies the need for large data storage. So I don't believe that password change spamming is as big of an issue as the data storage itself.
Frankly if the frequency at which users change their password creates an issue, then there is something fundamentally wrong with the password storage.

He mentions in the article that the index is rather large (larger than the actual data). It's possible his data is inaccurate or unrepresentative, but considering the rather random nature of hash functions I doubt this is so. His technique will require a large usage of RAM. However, the other thing that he mentions is the ability for the attacker to steal the database in the first place. Assuming an attacker gains access to a database dump, the point he is trying to make is that it is much harder for an attacker to steal a 2 TB of data than 200 MB of data.

Regardless, that doesn't mean the technique is sound and safe to use. He makes the disastrous assumption that hash functions work just like PRNGs, but they don't. The guarantee for hash functions is that given two message x and y, P(x != y => H(x) = H(y)) is extremely close to zero. In English, the chances of two given messages having the same hash is very small, which is why it works for password authentication. The case he is making is not the comparison of two hashes, but the existence of a hash in a set. In other words, given a set of existing messages S and a new message x, what is P(x not in S => H(x) in H(S)). If hash functions were like PRNGs, then this would be a simply calculation (the answer to which is roughly |S|/sizeof(H(x), which provides reasonably low probabilities for hash functions at least 64 bits in length). But the distribution of hash functions is not exactly random and is a lot more complicated than that. I would wait for researchers to analyze this scenario and calculate that probability with accuracy before ever using such a system.

An alternative, and much simpler solution, would be to use scrypt. It's both CPU and memory intensive, so it protects against brute force attacks with the same amount of protection as the method proposed in the article.

Yeah, but from what I've seen, scrypt is far superior to PBKDF2. It's specifically crafted to use an arbitrary amount of RAM in addition to CPU time. This PHP extension implements it: https://github.com/DomBlack/php-scrypt

https://gerrit.wikimedia.org/r/16049

A pluggable password system that doesn't rely entirely on static functions. Hash algorithm sub-classes need only implement four functions: a function that identifies if a given hash is an instance of that type, a function that extracts parameters from a hash (or gives defaults), a crypt(), and a testing function that returns test vectors for PHPUnit.

The patch completely replaces User::crypt and User::comparePasswords by replacing the User::$mPassword and User::$mNewpassword properties with Password objects. All comparisons take place using member methods.

(In reply to comment #43)

But why the separation? For any password hashing algorithm, compare(hash,
password) === (hash == crypt(password)), assuming crypt is passed the proper
options.

  • A hash == crypt(password) ends up comparing parameters (unnecessarily) at a serialized text level. It's much cleaner and consistent to do compare a direct hash of the password. We want format flexibility so using compare() lets the crypt implementation extract the hash instead of the outside code making assumptions about the structure.
  • While it's true that historically compare(password, hash) = hash == crypt(password, params) there's no strict reason why that needs to be the case. It's perfectly reasonable for a password implementation to output something that won't pass strict equality but will be comparable to the internal implementation. (In fact I think I could see someone trying to do that using some form of crypto.
  • The separation of crypt() and compare() also helps keep implementation clean.
  • The crypt(password, hash) == hash pattern is also not an intuitive one. Outside code implementing the comparison has to do verbose things it should not need to do. And it's not the kind of thing a programmer expects to do. This style of handling additionally adds extra burden to the crypt() implementation itself that doesn't even need to be there.

IIRC overall it was a result of the goal to keep implementation and usage intuitive. If you take a look at the differences in code you'll notice that the actual implementation of Password_TypeA has barely any code in it at all. This keeps actual password implementations directly focused on what they actually NEED to do. Not bogged down with irrelevant trivialities that get in the way of making sure that the password implementation is correct.

Where the hell did everybody on MW learn the definition of clean and consistent? Think about what you're saying. Given two hashes that take the exact same options and exact same passwords, it's somehow "consistent" for it to spit out different hashes each time? That does not make any sense, nor can I think of any semantic reason that this would occur. A password hash is just a set of options and then a hash produced from the combination of the password, those options, and a hashing algorithm. Even the native PHP password hashing API, which is being implemented for PHP 5.5, the equality of crypt() and compare() holds true (they actually do exactly what I am proposing and just call crypt() and then compare the hashes). Yet for some reason MW needs an interface that supports not only any format hash ever created or even thought of, but it also has to support hashing algorithms that just decide to randomly change their formatting at will.

And if your goal is to keep implementation and usage intuitive, then you've done a perfect job of throwing both out the window, because the code posted above is probably the most confusing password hashing system I have ever seen. At this point I'm can't even be sure that this is serious and I'm not just being trolled out of my mind.

There should be nothing overly confusing about it (beyond perhaps some documentation tweaks, etc... that might need improvement).

There are 3 parts to the password system, only 2 of 3 should be intuitive:

  • Integration of actual password algorithms into the system should be simple. The only complexity of which should be the password algorithm itself. The details of parameter parsing etc... should not be something the implementation should have to deal with if there isn't some prior reason for this detail to be handled by them (eg: You're dealing with a library rather than a first hand implementation)
  • Usage of the password system itself should be simple too. The usage should not have any complexity beyond calling methods like Password::crypt and storing the resulting data. Comparison (or rather, validation) should also be simple to the usage. It should not require any extra assumptions in usage. Hence Password::compare should be separate from crypt and should not require usage of the system to make assumptions like serialized data outputs being the same.
  • The last part of the system is the stack itself that password implementations integrate into and usage of the system makes calls to. This part of the system should NOT be simple. This part of the system should be complex enough that it can handle all of the common scenarios and abstract them away from both the password implementation integration and the usage of the system. For every part you try to make this part of the system simple you put extra complexity and burden of security on the usage and integration code that does not belong there.

It's like an ssl/tls stack. It's easy to write a protocol and wrap it inside ssl/tls (like the integration part of the system). And it's easy to write a client and use a ssl/tls stack to handle the transport (like the usage part of the system). But the ssl/tls stack itself is insanely complex (like the internals of the password implementation).


The idea of :A: being fixed while the stuff after it is variable is also a sound implementation pattern. It's a container format for a stack, just like a tcp packet inside of an ip packet.

Given: ":B:0549900c:8490f5e1c4283c1986a8a59b287d74dc"

The bottom (outermost, lowest) level container is ":B:[...]" it consists of a : followed by data indicating the type of data contained inside the body followed by a terminating : and finally the remainder is the body used by the next level of the stack. It's like the ip packet level of the ip stack.

The next level of the container is "[...]:8490f5e1c4283c1986a8a59b287d74dc" consisting of a series of params separated by : with the final part of the ata being defined as the hash. This level of the stack understands introduces the concept of hash comparison and high level parameters for the next level of the stack. This is like the tcp level of the IP stack.

The final (top, highest) level of the container works with the high level data params=[0549900c] and the hash. This part of the stack understands that the only parameter is a salt and understands how to make a hash out of it along with a password. The stack component directly below uses that data to do the hash comparison. This level of the stack is like say the HTTP, IRC, etc... level of the stack.

So if you want to related the password stack writing a standard password implementation is like implementing HTTP, etc... on top of tcp which sits on top of ip. While implementing a crypt() handler is like implementing udp directly on top of ip instead of using tcp.


The result is simple to use when you use it much like a library or component (eg: Most of us can utilize the Parser class and can implement hooks into the parser class. But the parser itself as a component is so massive and complex few people try to touch the component itself).

The relevant code, using ':B:' as an example is:

  1. Integration; Integrating the password algorithm into the codebase.
  2. Done at the highest level

class Password_TypeB extends BasePasswordType {
protected function run( $params, $password ) {

		list( $salt ) = self::params( $params, 1 );
		return md5( $salt . '-' . md5( $password ) );

}
protected function cryptParams() {

		$salt = MWCryptRand::generateHex( 8 );
		return array( $salt );

}
}

--

This implements a level to handle the 'B' type of password hash. The code starts with a run() method to take a high level list of params and a password and convert that into a hash. The B algorithm consists of combining the salt defined as the first parameter, a '-', and a md5ed version of the password then putting that through md5. It's a simple hash based password algorithm so it uses the BasePasswordType layer to abstract away common hash comparisons and parameter parsing. The code finishes up by defining a method to create default params (a random salt) for a new password.

Likewise usage likewise is also simple.

Password::crypt( $password ) => $data Prepare a password data for storage
Password::compare( $data, $password ) => verified?
Verify a password against stored data

--


Before I finish off the last of this I should also point out something else on the topic of keeping crypt and compare separate at the highest level.

You yourself pointed out the php proposal for password handling.
Taking a look at it password_hash and password_verify are separate. In other words in order to support using this native code as a password implementation crypt and compare must be separate.


(In reply to comment #57)

Where the hell did everybody on MW learn the definition of clean and
consistent? Think about what you're saying. Given two hashes that take the
exact same options and exact same passwords, it's somehow "consistent" for it
to spit out different hashes each time? That does not make any sense, nor can I
think of any semantic reason that this would occur. A password hash is just a
set of options and then a hash produced from the combination of the password,
those options, and a hashing algorithm. Even the native PHP password hashing
API, which is being implemented for PHP 5.5, the equality of crypt() and
compare() holds true (they actually do exactly what I am proposing and just
call crypt() and then compare the hashes). Yet for some reason MW needs an
interface that supports not only any format hash ever created or even thought
of, but it also has to support hashing algorithms that just decide to randomly
change their formatting at will.

There is a very important thing to point out here.
We're NOT talking about hashes or hashing functions. We're talking about the stored format of passwords.
Preparing a password is different from just hashing. If this were just a hash we wouldn't have parameter data to mess with and there would be nothing extra to consider.

Password storage is a different category of cryptography. It has a different goal than what the cryptographic hashes category intends to provide.

Different categories of cryptography act differently. Hashes function in a way that expects the same output for every run. But this does not hold true for other areas of cryptography. Take a look at some asymmetric signing and even symmetric mac algorithms. You'll find that for some of the algorithms each time they sign a piece of text the signature comes out different. Despite that the signature can still be verified. What's important is not strict equality verification but instead cryptographic equality verification. This is consistent within cryptography, but not consistent within output (hence something you cannot do with the crypt() pattern)

Right now we use the term "Password hashes" because initially all we did with passwords was hash them, there was no concept of password hashing. And while we've expanded from that for now we're using the hash category of cryptography. So we call them password hashes.
However that doesn't mean it'll stay that way forever. It's possible that in the future someone may have a new idea on how to interfere with password cracking. And this time it may come from a different area of cryptography like signing.

Heck, I could almost see that right now. Someone deciding that instead of just making a password hash they will create a layer around that which will include an out-of-band cryptographic signature or perhaps encryption into the output instead of simply including it into some hash.

Btw, I just installed the git master version of PHP and tested the hash_pbkdf2.

The hash_pbkdf2 being included into PHP 5.5.0 properly matches ours implementation. So when php 5.5.0 is released we can conditionally include hash_pbkdf2 into our password implementation to eliminate any php overhead from PBKDF2 when the latest PHP is installed.

Except it would be preferable to use scrypt over PBKDF2, so we should look to try and find a platform-independent solution for implementing scrypt in MW.

I suggest we stop the cryptoparanoia competition here and finally get at least SHA-2 or WHIRLPOOL (since you people make such a great deal of implementing PBKDF2 so you can't get it done without tearing each other apart). We're still using MD5 right now for password storage.

I could go ahead and patch the system myself. But it looks that it has been done at least twice, can we finally get *something* into gerrit? Preferrably in the small portions. If I were you, I'd start with OOP rewrite of the current password system without any new backends, then commit a patch with PDKBF2 backend, etc.

(In reply to comment #60)

Except it would be preferable to use scrypt over PBKDF2, so we should look to
try and find a platform-independent solution for implementing scrypt in MW.

Have you tried reading scrypt?

Firstly scrypt actually uses pbkdf2 inside of it's algorithm. So even if we did try writing a scrypt implementation in php the fact that we have hash_pbkdf2 coming out in 5.5.0 is still a good thing.

However, after that I looked deeper into scrypt. One of the things scrypt uses in it's algorithm is an algorithm they call smix which uses a combination of the ROMix algorithm, BlockMix algorithm, and they use Salsa20/8 for the hash algorithm.

Now, if the person trying to implement scrypt in php at this point hasn't already burnt out from the mere thought of having to implement all these algorithms in php by themselves. I'll have to point something else out. The salsa20 hash algorithm was removed from php in 5.4.

With all that on the table I have a feeling that it is impossible to write a properly efficient platform-independent version of scrypt in php. Certainly not something we're going to write.

So while we can hope the unofficial php-scrypt module is correctly written and people can decide to install that and write a MW implementation that uses it. We cannot use scrypt as a default until someone accepts that module into php and starts shipping it.

(In reply to comment #61)

I suggest we stop the cryptoparanoia competition here and finally get at least
SHA-2 or WHIRLPOOL (since you people make such a great deal of implementing
PBKDF2 so you can't get it done without tearing each other apart). We're still
using MD5 right now for password storage.

I could go ahead and patch the system myself. But it looks that it has been
done at least twice, can we finally get *something* into gerrit? Preferrably in
the small portions. If I were you, I'd start with OOP rewrite of the current
password system without any new backends, then commit a patch with PDKBF2
backend, etc.

The OOP rewrite is the larger task. The PBKDF2 implementation usually ends up being done at some point along the implementation of the OOP backend as a test case for the new features that the OOP system requires since our old backends barely use them.

My implementation is sitting in a GitHub branch:
https://github.com/dantman/mediawiki-core/compare/master...2012/password-hashing
You can look at the branch to see the small portions that have stacked up.

It only needs password updating and proper login page errors to be complete enough for use.

(In reply to comment #61)

I suggest we stop the cryptoparanoia competition here and finally get at least
SHA-2 or WHIRLPOOL (since you people make such a great deal of implementing
PBKDF2 so you can't get it done without tearing each other apart). We're still
using MD5 right now for password storage.

I could go ahead and patch the system myself. But it looks that it has been
done at least twice, can we finally get *something* into gerrit? Preferrably in
the small portions. If I were you, I'd start with OOP rewrite of the current
password system without any new backends, then commit a patch with PDKBF2
backend, etc.

There already was something in Gerrit. I submitted a patch earlier last week that fully implemented an OOP password system. You can see the patch here: https://gerrit.wikimedia.org/r/16049

Unfortunately, I abandoned it due to the insane arguments we've been having in this thread. If you want I can adjust the patch (so that it uses hard-coded types like Daniel has proposed) and re-submit it to Gerrit.

(In reply to comment #62)

(In reply to comment #60)

Except it would be preferable to use scrypt over PBKDF2, so we should look to
try and find a platform-independent solution for implementing scrypt in MW.

Have you tried reading scrypt?

Firstly scrypt actually uses pbkdf2 inside of it's algorithm. So even if we did
try writing a scrypt implementation in php the fact that we have hash_pbkdf2
coming out in 5.5.0 is still a good thing.

However, after that I looked deeper into scrypt. One of the things scrypt uses
in it's algorithm is an algorithm they call smix which uses a combination of
the ROMix algorithm, BlockMix algorithm, and they use Salsa20/8 for the hash
algorithm.

Now, if the person trying to implement scrypt in php at this point hasn't
already burnt out from the mere thought of having to implement all these
algorithms in php by themselves. I'll have to point something else out. The
salsa20 hash algorithm was removed from php in 5.4.

With all that on the table I have a feeling that it is impossible to write a
properly efficient platform-independent version of scrypt in php. Certainly not
something we're going to write.

So while we can hope the unofficial php-scrypt module is correctly written and
people can decide to install that and write a MW implementation that uses it.
We cannot use scrypt as a default until someone accepts that module into php
and starts shipping it.

Yes I have read scrypt. And it is no small task indeed. I should probably rephrase. I just meant to say that we should try and plan for some support for scrypt, as it is preferred over PBKDF2. The unofficial PHP extension for it is actually just a wrapper for the original scrypt binary published by the author, so we can place trust in its accuracy.

Now that I think about it, maybe this is the perfect opportunity to make an extension that uses the new pluggable password system. Once the system is implemented, I'll put together an extension that adds scrypt to the password types.

(In reply to comment #62)
Now that I think about it, maybe this is the perfect opportunity to make an
extension that uses the new pluggable password system. Once the system is
implemented, I'll put together an extension that adds scrypt to the password
types.

Yup, the idea of making this a pluggable password system was so that we could easily implement support for scrypt and any other system that we can't use as a default in MediaWiki instead with the use of an extension that can be used by anyone who is willing to make sure that any server they ever use has all the necessary dependencies to make it work.

(In reply to comment #64)

(In reply to comment #61)

I suggest we stop the cryptoparanoia competition here and finally get at least
SHA-2 or WHIRLPOOL (since you people make such a great deal of implementing
PBKDF2 so you can't get it done without tearing each other apart). We're still
using MD5 right now for password storage.

I could go ahead and patch the system myself. But it looks that it has been
done at least twice, can we finally get *something* into gerrit? Preferrably in
the small portions. If I were you, I'd start with OOP rewrite of the current
password system without any new backends, then commit a patch with PDKBF2
backend, etc.

There already was something in Gerrit. I submitted a patch earlier last week
that fully implemented an OOP password system. You can see the patch here:
https://gerrit.wikimedia.org/r/16049

Unfortunately, I abandoned it due to the insane arguments we've been having in
this thread. If you want I can adjust the patch (so that it uses hard-coded
types like Daniel has proposed) and re-submit it to Gerrit.

Both our implementations are still incomplete.
Namely neither of us have implemented handling for displaying issues on the login form. We can't go outputting fatal errors and "Incorrect password entered." when we have an issue with the data stored in the database.

I've cleaned up my implementation a bit since the last time. The Status trick is gone, I've added some documentation, and split up and renamed some of the code.
I plan to push the password update code once I've been able to test it (Platonides broke maintenance/dev/ and I had to fix it).
I'm also considering renaming the isPreferred[...], and switching the interface to an abstract class like you had.
I considered the idea of tests in implementation code like you have. It's a nice idea but I'm still trying to think of how to do that while still making things like rfc6070 testing possible without introducing the possibility of serialization mistakes letting bugs in the implementation slip by. Maybe I'll split it out into a plain pbkdf2 implementation and pbkdf2 based password storage implementation so I can separate the tests and make the code easier to read and add hash_pbkdf2 to.

I was hoping to collaborate on getting this branch of code I started months ago finished and ready for actual use in core.

  • I've cleaned the code up some more and added some more documentation.
  • Implemented password upgrading.
  • Switched to the test-cases-in-implementation style.
  • Separated the PBKDF2-HMAC implementation into it's own method and prepared the system for hash_pbkdf2 to be used when released.
  • Implemented login form and api handling for the new password system.

A Gerrit branch is being created so the code will end up in the repo soon.

The code is pretty much complete at this point. However I'm still mulling over how to handle recursive layer password types like a PEPPER type that adds a fixed salt not stored in the database but does it without mandating what password type you use.

(In reply to comment #69)

  • I've cleaned the code up some more and added some more documentation.
  • Implemented password upgrading.
  • Switched to the test-cases-in-implementation style.
  • Separated the PBKDF2-HMAC implementation into it's own method and prepared

the system for hash_pbkdf2 to be used when released.

  • Implemented login form and api handling for the new password system.

A Gerrit branch is being created so the code will end up in the repo soon.

The code is pretty much complete at this point. However I'm still mulling over
how to handle recursive layer password types like a PEPPER type that adds a
fixed salt not stored in the database but does it without mandating what
password type you use.

Awesome! Yeah I saw the Gerrit branch yesterday.

And I was thinking about that as well (the recursive layers idea). Unfortunately, to create a proper implementation of such a feature, a given layer would have to be compatible with any other type of hash. That means that each layer would either a) have to use every possible option (impossible) or b) only use things that are common to all hash types.

(In reply to comment #70)

(In reply to comment #69)

  • I've cleaned the code up some more and added some more documentation.
  • Implemented password upgrading.
  • Switched to the test-cases-in-implementation style.
  • Separated the PBKDF2-HMAC implementation into it's own method and prepared

the system for hash_pbkdf2 to be used when released.

  • Implemented login form and api handling for the new password system.

A Gerrit branch is being created so the code will end up in the repo soon.

The code is pretty much complete at this point. However I'm still mulling over
how to handle recursive layer password types like a PEPPER type that adds a
fixed salt not stored in the database but does it without mandating what
password type you use.

Awesome! Yeah I saw the Gerrit branch yesterday.

I asked for the branch but the ability to push was never setup. Then I was told that Gerrit has private sandboxes and I could have actually pushed the code directly there and had them create a branch based off of it. So now I'm waiting for someone to either setup push or drop the branch and have me push to my private sandbox.

And I was thinking about that as well (the recursive layers idea).
Unfortunately, to create a proper implementation of such a feature, a given
layer would have to be compatible with any other type of hash. That means that
each layer would either a) have to use every possible option (impossible) or b)
only use things that are common to all hash types.

Yeah layering is really tricky. I can't think of a way to implement actual pepper (ie: a static config based salt) without involving an extra complex api, restrictions that get in the way of flexibility, restrictions that make pepper only work on specific implementations, or forcing pepper to be re-implemented for every type.

The only alternative I could think of was to make an encryption layer rather than a pepper layer.
Instead of peppering the salt the layer's output would contain a key to indicate which shared secret is used and an IV. The next big blob of raw data would be decrypted with the shared secret and then would be passed through Password::verify (ie: The blob of data would be encrypted password data output from Password::crypt()).
That would let us create a layer that serves the same purpose of making password data useless without the config file even though we're using encryption instead of pepper.
This might actually also have the advantage of working in password types where it's impossible to control the salt and data format. eg: A crypt() like system where you can't specify the salt would still work.
;) It's a fairly worthless fact, but this would also mean our worthless saltless :A: md5 hash would actually be compatible with this tactic.
((I was actually going to try writing it, until I found that mcrypt is not installed on my laptop and it's hell to try to install it in OSX))

Oh btw, do you have a better name than BasePasswordType that does not involve the word "Hash"?
OnewayPasswordType KeyCompareHashType, ... bleh

I sort of want a better name than BasePasswordType but I'm really trying to avoid perpetuating the lie of calling these a hash because we're not storing hashes. These cease to be cryptographic hash algorithms the moment we start introducing things like salts. What we're doing is practically key derivation rather than hashing. But trying to jump into something like DKeyPasswordType will just be confusing.

I'm afraid your imagination is as good as mine. BasePasswordType isn't too bad. :P

A recent take on password weakness: http://arstechnica.com/security/2012/08/passwords-under-assault/

This doesn't help editors who use the same password on the wiki as elsewhere, but it is a good start.

Reassigning to Daniel since he's working on it (and Tim isn't)

(In reply to comment #73)

Branch is in git now:
https://gerrit.wikimedia.org/r/gitweb?p=mediawiki%2Fcore.git;a=shortlog;h=refs%2Fheads%2Fpassword-hashing

Daniel: Did this get merged at some points? / Are there plans to merge this? What work is left to merge it?

Daniel Friesen: Did this get merged at some points? / Are there plans to merge this? What work is left to merge it?

Daniel Friesen: Did this get merged at some points? / Are there plans to merge
this? What work is left to merge it?

It hasn't been merged yet.

The basic stuff is done. But the handling for layered things (like a scheme to add a shared key from localsettings) hasn't been done. And while not critical to functionality. I think that it might require some modifications to that basic functionality to be written.

Change 77645 had a related patch set uploaded by Parent5446:
Added password hashing API

https://gerrit.wikimedia.org/r/77645

With recent events [0], I'd like to capitalize on lots of people wanting this and get it pushed through soonish.

I think we actually have 3 parts to the bug:

  • Updating MediaWiki to better handle multiple password types, and make it easier to extend by extensions and in the future. There were probably a couple of things I would change in Tyler's now-abandoned gerrit 77645, but I think it's close. Tyler, what would it take to finish that?
  • Updating the current WMF database to use a stronger format. Tim's suggestion from 2010 I think is still pretty good (although we would probably want an 8 or 9 work now). Bcrypt could also be used in a similar way (take the bcypt of the :B:-format hash). Whatever the WMF uses, we'll probably use something that we can do a one-way conversion from :B: hashes, so that we upgrade our entire database without user interactions. However, php 5.3.7 is required for a sane, native php version of bcrypt [1], so I'm actually leaning towards Tim's Whirlpool at this point.
  • Since all of the legacy password formats can be converted to :B: hashes, and if we make an upgrade from :B: to a new, strong :C: format, I think MediaWiki should remove all of the insecure types (and $wgPasswordSalt, bug 54948).

A couple times on this bug pepper was brought up, which would have saved us in this recent incident, since only database tables were leaked and not our private code repo. So I think we need that included in whatever :C: method we chose. In our case, I think adding an HMAC keyed with a secret (the "pepper") would work. Then if our secret is ever stolen (but not our password hashes), we can keep adding new pepper keys, and re-store our hashes with another HMAC applied using the new secret.

If both the pepper and the hashes are stolen, then we need (a good) way to force password changes, but I'll open a separate bug for that.

[0] - https://meta.wikimedia.org/wiki/October_2013_private_data_security_issue
[1] - http://www.php.net/security/crypt_blowfish.php

(In reply to comment #81)

With recent events [0], I'd like to capitalize on lots of people wanting this
and get it pushed through soonish.

I think we actually have 3 parts to the bug:

  • Updating MediaWiki to better handle multiple password types, and make it

easier to extend by extensions and in the future. There were probably a
couple
of things I would change in Tyler's now-abandoned Gerrit change #77645, but
I think
it's close. Tyler, what would it take to finish that?

This is done in the password branch:
https://git.wikimedia.org/log/mediawiki%2Fcore.git/refs%2Fheads%2Fpassword-hashing

  • Updating the current WMF database to use a stronger format. Tim's

suggestion
from 2010 I think is still pretty good (although we would probably want an 8
or
9 work now). Bcrypt could also be used in a similar way (take the bcypt of
the
:B:-format hash). Whatever the WMF uses, we'll probably use something that we
can do a one-way conversion from :B: hashes, so that we upgrade our entire
database without user interactions. However, php 5.3.7 is required for a
sane,
native php version of bcrypt [1], so I'm actually leaning towards Tim's
Whirlpool at this point.

The password-hashing branch uses PBKDF2-HMAC+Whirlpool. Passwords are converted on login/change.

A couple times on this bug pepper was brought up, which would have saved us
in
this recent incident, since only database tables were leaked and not our
private code repo. So I think we need that included in whatever :C: method we
chose. In our case, I think adding an HMAC keyed with a secret (the "pepper")
would work. Then if our secret is ever stolen (but not our password hashes),
we
can keep adding new pepper keys, and re-store our hashes with another HMAC
applied using the new secret.

This is actually what held up the password-hashing branch. I was trying to make it possible to implement layers in the system other than just hashing/key derivation types. I didn't try to get password-hashing into core (which otherwise is pretty much ready) as adding this layering would require some changes to the class API that would have needed extra backcompat work if I didn't make the changes before being released.

Under that incarnation of the system the "pepper" would have to be implemented through encryption rather than salting. ((Though thinking about it now, that's actually not much of a minus. Doing it that way actually means if just your pepper leaks you can change it without user interaction.))
Though if I convinced people to make a schema change expanding the user_password size I "might" have come up a way to let the system use pepper rather than encryption for some password types.

If both the pepper and the hashes are stolen, then we need (a good) way to
force password changes, but I'll open a separate bug for that.

Tell me when you have one. A system somewhere to do that sounds good. A simple flag is one possibility. Though something slightly more complex is probably desirable so that we can give proper messages like "<X> admin has flagged your account as requiring a change of password" or "Due to <information about some leak> we are requiring all users to change their password".

(In reply to comment #82)

This is done in the password branch:
https://git.wikimedia.org/log/mediawiki%2Fcore.git/refs%2Fheads%2Fpassword-
hashing

Thanks! I will take a close look at it next week.

The password-hashing branch uses PBKDF2-HMAC+Whirlpool. Passwords are
converted
on login/change.

The issue with that is a large number of the 40M users in the WMF databases will never login again, but I'd like to make sure they have a strongly hashed password so an attacker can't take over their unused account. So I'd like to find a way we can take what's in the db, and upgrade it without user interaction.

A couple times on this bug pepper was brought up, which would have saved us
in
this recent incident, since only database tables were leaked and not our
private code repo. So I think we need that included in whatever :C: method we
chose. In our case, I think adding an HMAC keyed with a secret (the "pepper")
would work. Then if our secret is ever stolen (but not our password hashes),
we
can keep adding new pepper keys, and re-store our hashes with another HMAC
applied using the new secret.

This is actually what held up the password-hashing branch. I was trying to
make
it possible to implement layers in the system other than just hashing/key
derivation types. I didn't try to get password-hashing into core (which
otherwise is pretty much ready) as adding this layering would require some
changes to the class API that would have needed extra backcompat work if I
didn't make the changes before being released.

Under that incarnation of the system the "pepper" would have to be
implemented
through encryption rather than salting. ((Though thinking about it now,
that's
actually not much of a minus. Doing it that way actually means if just your
pepper leaks you can change it without user interaction.))
Though if I convinced people to make a schema change expanding the
user_password size I "might" have come up a way to let the system use pepper
rather than encryption for some password types.

Encrypting the hash would work too. The ability to change it without user interaction I think is critical, but either way is fine.

If both the pepper and the hashes are stolen, then we need (a good) way to
force password changes, but I'll open a separate bug for that.

Tell me when you have one. A system somewhere to do that sounds good. A
simple
flag is one possibility. Though something slightly more complex is probably
desirable so that we can give proper messages like "<X> admin has flagged
your
account as requiring a change of password" or "Due to <information about some
leak> we are requiring all users to change their password".

Bug 54997 is what's I'm thinking right now (since I've worked with a bunch of enterprise software in the past that did it that way).

(In reply to comment #81)

With recent events [0], I'd like to capitalize on lots of people wanting this
and get it pushed through soonish.

I think we actually have 3 parts to the bug:

  • Updating MediaWiki to better handle multiple password types, and make it

easier to extend by extensions and in the future. There were probably a
couple
of things I would change in Tyler's now-abandoned Gerrit change #77645, but
I think
it's close. Tyler, what would it take to finish that?

It's not abandoned. You must be thinking of my previous patch. The one you linked is still active, and is just waiting code review. I just -2ed it to prevent merging.

(In reply to comment #83)

The password-hashing branch uses PBKDF2-HMAC+Whirlpool. Passwords are
converted
on login/change.

The issue with that is a large number of the 40M users in the WMF databases
will never login again, but I'd like to make sure they have a strongly hashed
password so an attacker can't take over their unused account. So I'd like to
find a way we can take what's in the db, and upgrade it without user
interaction.

Since the password system already easily handles multiple password formats I suppose we could come up with two password formats. PBKDF for new passwords and a :C: format that is based on existing :A:/:B: hashes that we can convert without user interaction.

That said. Thinking about how the idea of :C: works it might be possible to instead create a layer that could essentially do a :C: for any non-opaque password storage type using the implementation of any other non-opaque password storage type. The fundamental idea is simply using the hash of a password as the input to a new type of password storage instead of a password.

I'd need an interface for hash/dkey based password storage to expose the hash separately from the parameters. But by the layer could opaquely store the type and params used by say PBKDF, the type and params used by :B:, and the hash. With that information it would be possible to create both an instance of the B type and the PBKDF type and have the wrapper layer use the abstract methods on both classes to implement hash a password with both.

That actually has the advantage over a simple :C: type of being able to wrap any non-opaque type and in the future will still be capable of doing that for any new non-opaque password type. eg: Say in the future we discover that the PBKDF algorithm is still fine but there's a weakness in WHIRLPOOL and we need to use some brand new hash type but we can't convert from WHIRLPOOL to that hash type. The same wrapper can be used to take the existing PBKDF WHIRLPOOL and then wrap it in PBKDF with the new hash type.

Of course there is a small disadvantage. With a simple restricted :C: type; In the future if you created a :D:, :E:, and then :F: each time wrapping the old password hash you could keep relatively the same params. But for a wrapper layer each time you'd have to wrap the wrapper with the type and params for the new type:
((abstractly speaking WRAP { PBKDF:params, :B:... } becomes WRAP { NewType:params WRAP { PBKDF:params, :B:... } and continues like that))
While that's fine the downside there of course is that each time you update passwords en-masse and a user never logs in resetting it back to a normal password type the storage data grows in size. Eventually the size becomes large enough it won't fit in our user_password field and you can no longer convert a user's password to a new type without their interaction.

Bug 54997 is what's I'm thinking right now (since I've worked with a bunch of
enterprise software in the past that did it that way).

I don't like the idea of recurring "expiry" per-se but that does look fine for marking users as requiring a change of password.

Daniel, do you mind if I put a sqashed version of your branch, rebased on master, into gerrit? I'd like to have a couple of people comment on it.

(In reply to comment #86)

Daniel, do you mind if I put a sqashed version of your branch, rebased on
master, into gerrit? I'd like to have a couple of people comment on it.

Sure. Though you should probably -2 it, there's no PasswordLayerType yet.

Also a properly revived branch is going to need to update $wgRedactedFunctionArguments.

Btw, here's the encryption based pepper I was playing with to figure out what I need for a password layer type.
https://github.com/dantman/mediawiki-core/commit/d73f956076c2a169c47de497b70782991ef46503

Just so their easy to review, both Daniel and Tyler's patches are in gerrit

Daniel's - https://gerrit.wikimedia.org/r/#/c/89031
Tyler's - https://gerrit.wikimedia.org/r/#/c/77645

Tyler's seems a little closer to how I would have written it, so I'm leaning towards improving that patch a little. But I'd be happy with either.

Note, we can't use the bcrypt implementation as default from Tyler's patch, since it isn't compatible with php 5.3.2, so we would still want to define and add a :C: type for the WMF. And probably import Daniel's PBKDF for anyone who needs to be NIST compliant.

Change 110646 had a related patch set uploaded by CSteipp:
Add Whirlpool using the password api

https://gerrit.wikimedia.org/r/110646

Change 110646 abandoned by CSteipp:
Add Whirlpool using the password api

Reason:
Consensus from https://www.mail-archive.com/wikitech-l@lists.wikimedia.org/msg73695.html seems to be that several people are unsure about the safety of this, so I'm going to remove it for now.

https://gerrit.wikimedia.org/r/110646

Sorry to pollute this bug with operational details, but since gerrit 77645 looks ready to merge, we have to do a little dance to make sure we don't break things in the WMF environment-- namely CentralAuth copies the user_password field when it merges accounts, so CentralAuth needs to use the new password classes also.

I think I'll merge gerrit 77645, and then at the WMF, we'll set $wgPasswordDefault = 'B' temporarily for all wikis. We'll update CentralAuth to use the Password class, and then we'll remove the $wgPasswordDefault in our environment so users will start using pbkdf2. Once we're using pbkdf2, we'll run the maintenance script to layer all wiki's passwords with pbkdf2-legacyB.

And since CentralAuth already lets the User object do the comparison, ignore my last comment. CentralAuth uses the new hashes just fine.

(In reply to Chris Steipp from comment #92)

And since CentralAuth already lets the User object do the comparison, ignore
my last comment. CentralAuth uses the new hashes just fine.

... except when CentralAuth unmerges accounts. If a user creates a new account and CentralAuth gets the pbkdf2 hash, and later they are unmerged in CentralAuth, they won't be able to login to any wikis where the password api hasn't been deployed. So we do need the $wgPasswordDefault = 'B' for the week while code is deployed to all wikis.

(In reply to Chris Steipp from comment #93)

... except when CentralAuth unmerges accounts. If a user creates a new
account and CentralAuth gets the pbkdf2 hash, and later they are unmerged in
CentralAuth, they won't be able to login to any wikis where the password api
hasn't been deployed. So we do need the $wgPasswordDefault = 'B' for the
week while code is deployed to all wikis.

Awesome! When I saw your first comment I was afraid we'd be stuck here waiting for CentralAuth to catch up. Having the default as 'B' while changes are deployed sounds like a good route to me.

  • Bug 68389 has been marked as a duplicate of this bug. ***

scott wrote:

return md5( $salt.'-'.md5( $password ) ) == $realHash;
return self::crypt( $password, $salt ) == $hash;
return self::reallyOldCrypt( $password, $userId ) === $hash;

Can we swap out the == and === logic for one of the following:

  • Constant time hash comparison code (see hash_equals() in PHP 5.6.0 and PHP implementations, such as Taylor Hornby's PBKDF2 library)?
  • "Double HMAC" with a random nonce

i.e.
+ /**
+ * A comparison of two strings, not vulnerable to timing attacks
+ * @param string $answer the secret string that you are comparing against.
+ * @param string $test compare this string to the $answer.
+ * @return bool True if the strings are the same, false otherwise
+ */
+ static function hash_equals( $answer, $test ) {
+ if (function_exists('hash_equals')) {
+ return hash_equals($answer, $test);
+ }
+ if ( strlen( $answer ) !== strlen( $test ) ) {
+ $passwordCorrect = false;
+ } else {
+ $result = 0;
+ for ( $i = 0; $i < strlen( $answer ); $i++ ) {
+ $result |= ord( $answer[$i] ) ^ ord( $test[$i] );
+ }

+ $passwordCorrect = ( $result === 0 );
+ }
+ return $passwordCorrect;
+ }

OR

+ /**
+ * A comparison of two strings, not vulnerable to timing attacks
+ * @param string $answer the secret string that you are comparing against.
+ * @param string $test compare this string to the $answer.
+ * @return bool True if the strings are the same, false otherwise
+ */
+ static function hash_equals( $answer, $test ) {
+ if (function_exists('hash_equals')) {
+ return hash_equals($answer, $test);
+ }
+ $nonce = MWCryptRand::generate(16);
+ return hash_hmac('sha256', $test, $nonce) === hash_hmac('sha256', $answer, $nonce);
+ }

Bug 68389 is still a security bug. Does duping it to a public one mean it's actually not something sensitive?

Change 148442 had a related patch set uploaded by Parent5446:
Replaced hash_equals with a custom function

https://gerrit.wikimedia.org/r/148442

scott wrote:

(In reply to Liangent from comment #99)

Bug 68389 is still a security bug. Does duping it to a public one mean it's
actually not something sensitive?

It's not sensitive at all. It's literally a duplicate of the issue here, because I failed to locate the existing bug report and fired off a patch.

(In reply to scott from comment #102)

(In reply to Liangent from comment #99)

Bug 68389 is still a security bug. Does duping it to a public one mean it's
actually not something sensitive?

It's not sensitive at all. It's literally a duplicate of the issue here,
because I failed to locate the existing bug report and fired off a patch.

Well putting a bug in the "Security" product hides it from public view.

Change 149658 had a related patch set uploaded by Parent5446:
Changed password default to PBKDF2

https://gerrit.wikimedia.org/r/149658

Change 77645 merged by jenkins-bot:
Added password hashing API

https://gerrit.wikimedia.org/r/77645

Change 150028 had a related patch set uploaded by Parent5446:
Fixed hook documentation for removed hooks

https://gerrit.wikimedia.org/r/150028

Change 150028 merged by jenkins-bot:
Fixed hook documentation for removed hooks

https://gerrit.wikimedia.org/r/150028

Change 149658 merged by jenkins-bot:
Changed password default to PBKDF2

https://gerrit.wikimedia.org/r/149658

Change 148442 abandoned by Parent5446:
Replaced hash_equals with a custom function

https://gerrit.wikimedia.org/r/148442