Page MenuHomePhabricator

Check email addresses before storing in database for compliance with RFC 2822
Closed, DeclinedPublic

Assigned To
Authored By
Wikinaut
Nov 29 2004, 11:12 PM
Referenced Files
F1373: email-address-validation.php
Nov 21 2014, 7:03 PM
F1371: regexall.php
Nov 21 2014, 7:03 PM
F1372: regexpart.php
Nov 21 2014, 7:03 PM
F1370: checkaddress.pl
Nov 21 2014, 7:03 PM

Description

Email addresses are currently stored unchecked for mailing passwords or
EmailUser. Any *potential* security hole in php mail() or PEAR:Mail function as
called in UserMailer.php could be exploited by entering pseudo addresses which
do something unwanted.

Remark: I do not mean the email authentication as described in
http://bugzilla.wikipedia.org/show_bug.cgi?id=866 , which only authenticates
addresses for the "higher" functions such as EmailUser and Enotif. But please be
reminded, that the "I forgot my password" mails are sent to un-authenticated
addresses anyway.

Proposed is
(a) converting to all lower case
(b) a check for valid mail address strings

The Enotif and Email Authentification patches 454 and 866 are coming with
solutions to this bug, as they only allow users to store one email address which
matches the regular expression
^([a-z0-9_.-]+([a-z0-9_.-]+)*\@[a-z0-9_-]+([a-z0-9_.-]+)*([a-z.]{2,})+)$

I gave provisionally highest priority to this bug.

A fix is easy be checking the user entered (dirty=unauthenticated) email address
simply before storing in SpecialUserlogin.php against the above regular
expression. In case of not matching address, an empty (blank) string address can
be provisonally stored in table user.

I repeat: my Enotif and my Email authentication procedure will be published with
a solution to this problem.


Version: unspecified
Severity: enhancement
URL: http://www.faqs.org/rfcs/rfc2822.html

Details

Reference
bz959

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 7:03 PM
bzimport set Reference to bz959.

To fix this, I introduced a new function in User.php

/**

  • does the string match roughly an email address ?
  • @param string $addr email address
  • @static
	 */

function isValidEmailAddr ( $addr ) {

		return preg_match( '/^([a-z0-9_.-]+([a-z0-9_.-]+)*\@[a-z0-9_-]+([a-z0-9_.-]+)*([a-z.]{2,})+)$/', strtolower($addr));

}

It is currently used in

  1. SpecialUserlogin.php (when a user enters a new address into the login form),
  2. SpecialPreferences.php (when a user enters a new address into the preferences form),

"new address" is stored, if
lowercase(new address string) != lowercase(old stored address string)
&& (new address string != '')
&& IsValidEmailAddr

Remark: UserMailer.php in Enotif patch performs its own checks against a simplified RFC-822 format.
Fixed within Enotif 2.00

Can you please reference the appropriate RFC to ensure that this
validation regexp is correct and will not reject valid addresses?

I am using the regular expression as used in AutoMailto() fix of UseMod Wiki,
see http://www.usemod.com/cgi-bin/wiki.pl?WikiPatches/AutoMailto .

^([a-z0-9_.-]+([a-z0-9_.-]+)*\@[a-z0-9_-]+([a-z0-9_.-]+)*([a-z.]{2,})+)$

return "true" for a single valid email address in the formats I know. It
checking against a simplified RFC-822 format, but will need some further fine
tuning, which I cannot effort at the moment; perhaps, someone of the community
can assist here, please:

EVERYONE is invited to check this regular expression.

Its purpose within MediaWiki CVS 1.4 is:
The RE shall validate one single valid email address string (i.e. without a name
string) and return "true" in such a case.

My regular expression is used often and can be regarded as starting point for
optimisation. If anyone has doubts: please let me know and inidcated a
reference, where you base you objection upon.

zigger wrote:

A number of things seem missing straight away. Most people do not need these,
but please refer to RFC2822 section 3, starting with the definition of "address"
or "addr-spec" (?) and working backwards. Hopefully you can find some compliant
regex elsewhere.

reminder for Tom (myself): don't forget to check wfStrencode() respectively strict use of the new database-wrapper functions, when
storing e-Mail address.

reminder for Brion and other developers having code access:
current storing of email addresses in SpecialUserlogin.php and SpecialPreferences.php in MediaWiki (pre-CVS release) needs urgently
(as far as I see) escaping against potential database injections, as any user can store any string at the moment.

Version lowered to 1.3.7 to ring the bells.

Tom's claim about SQL injection is entirely unfounded, as a cursory look at the
code or simple experimentation will show.

jeluf wrote:

The regexp rejects several valid email addresses, e.g. plus signs and exclamation
marks are valid characters for the mailbox name.

Latest Punicode extensions to DNS allow the use of a limited set of diacritic
characters for the hostname.

  • introducing a new function isValidEmailAddr() in User.php, which performs a

check of the entered email address

(In reply to comment #7)

The regexp rejects several valid email addresses, e.g. plus signs and exclamation
marks are valid characters for the mailbox name.

Latest Punicode extensions to DNS allow the use of a limited set of diacritic
characters for the hostname.

I reopened the bug, because according to JeLuf some email addresses might result
in false-rejects. But the new function is a starting point.

gareth.rees wrote:

Please read RFC2822 http://www.faqs.org/rfcs/rfc2822.html The local-part of an e-mail address may contain any
ASCII character. (It might be reasonable for you to only allow the local-part to be in dot-atom format, which rules out
special characters like "@<>[]:;,\ but allows letters, numbers, and !#$%&'*+-/=?^_`{|}~. See section 3.2.4 of
RFC2822.

(In reply to comment #10)

Please read RFC2822 http://www.faqs.org/rfcs/rfc2822.html The local-part of

an e-mail address may contain any

ASCII character. (It might be reasonable for you to only allow the local-part

to be in dot-atom format, which rules out

special characters like "@<>[]:;,\ but allows letters, numbers, and

!#$%&'*+-/=?^_`{|}~. See section 3.2.4 of

RFC2822.

Coming soon. thanks for letting me know.

bugzilla_wikipedia_org.to.jamesd wrote:

The proposed check seems to reject making an email address empty. In UK and much
European law it's required that there be a way to remove personal information on
request and making admins or developers do that instead of letting the software
do it is undesirable. Email addresses are considered personal information.

Please do not convert the saved email addresses to lower case. Changing cases
introduces the possibility of bugs when extensions to the use of ASCII
characters are involved - the case conversion would need to know the local
character set encoding method used by the receiving system, information which is
not available. Note also that a local address is allowed to have many mailboxes
which differ only in case (perhaps because of character set translation), so
case conversion could result in the wrong local mailbox on the receiving system
being used. Conversion also makes life needlessly annoying for humans, who may
be using mixed case deliberately to split words.

(In reply to comment #12)

The proposed check seems to reject making an email address empty. In UK and much
European law it's required that there be a way to remove personal information on
request and making admins or developers do that instead of letting the software
do it is undesirable. Email addresses are considered personal information.

JamesDay,

perhaps you misunderstood my check routine.

The logic in SpecialPreferences and UserlOgin.php *does* allow the user to empty
his/her address: of course, it is and will be possible to blank out (to empty)
the own email address

Please do not convert the saved email addresses to lower case. Changing cases
introduces the possibility of bugs when extensions to the use of ASCII
characters are involved - the case conversion would need to know the local
character set encoding method used by the receiving system, information which is
not available. Note also that a local address is allowed to have many mailboxes
which differ only in case (perhaps because of character set translation), so
case conversion could result in the wrong local mailbox on the receiving system
being used. Conversion also makes life needlessly annoying for humans, who may
be using mixed case deliberately to split words.

I agree. I introduced and proposed the conversion to lowercase to facilitate the
"unique" (same) userid AND (same) emailaddress checking with Kate's Namechecher
tool targetting to a single-user-login.

As soon as I am allowed to commit to the CVS, I will commit such changes as
proposed by all of you to the isValidEmailAddress code (etc.).

Tom

avarab wrote:

Paul Warren made what he claims to be a fullly RFC 822 compatable PCRE that
matches valid email addresses, see:
http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

The regular expression follows (not that I haven't gotten it to work.)

(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[
\t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z

(?=[\["()<>@,;:\\".\[\]]))"(?:[^\"\r\\]\\.(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)

?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
)*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+

\Z(?=[\["()<>@,;:\\".\[\]]))"(?:[^\"\r\\]\\.(?:(?:\r\n)?[ \t]))*"(?:(?:\r

\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\

  • \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\

r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]

\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0

00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
\t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
"()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z

(?=[\["()<>@,;:\\".\[\]]))\[([^\[\]\r\\]\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(

?:\r\n)?[ \t])*))*)?;\s*)

bugzillas+padREMOVETHISdu wrote:

The following Perl code may or may not be the same as the regexp in the previous
comment and may or may not be more readable:

$newline = '(?:(?:\r\n)?[ \t])';
$newlines = "$newline+";
$opt_newlines = "$newline*";
$normal_chars = '[^()<>@,;:\\".\[\] \000-\031]+';
$dot = "\.$opt_newlines";
$backslashdot = '\\.';
$domainliteral = '([^\[\]\r\\]|'.$backslashdot.')*';
$endofstring = '\Z';
$lookahead_special_char = '(?=[\["()<>@,;:\\".\[\]])';
$stop_normal_chars = "(?:$newlines|$endofstring|$lookahead_special_char)";
$quoted_string_char = '[^\"\r\\]';
$quoted_string = '"(?:'."$quoted_string_char|$backslashdot|$newline".')*"';
$username_component =
"(?:$normal_chars$stop_normal_chars|$quoted_string$opt_newlines)";
$domain_component =
"(?:$normal_chars$stop_normal_chars|\[$domainliteral\]$opt_newlines)";
$at = "@$opt_newlines";
$at_sameline = '@';
$username = "$username_component(?:$dot$username_component)*";
$domainname = "$domain_component(?:$dot$domain_component)*";
$l_ang = "\<$opt_newlines";
$r_ang = "\>$opt_newlines";
$colon = ":$opt_newlines";
$comma = ',\s*';
$comma_no_whitespace = ',';
$route = "(?:$at_sameline$domainname(?:$comma_no_whitespace$at$domainname)*$colon)";
$opt_route = "$route?";
$plain_email_addr = "$username$at$domainname";
$route_addr = "$l_ang$opt_route$plain_email_addr$r_ang";
$email_addr_opt_with_name = "(?:$plain_email_addr|$username_component$route_addr)";
$email_addr_list =
"(?:$email_addr_opt_with_name(?:$comma$email_addr_opt_with_name)*)";
$opt_email_addr_list = "$email_addr_list?";
$semicolon = ';\s*';
$above_regexp =
"(?:$email_addr_opt_with_name|$username_component$colon$opt_email_addr_list$semicolon)";

bugzillas+padREMOVETHISdu wrote:

(In reply to comment #15)

The following Perl code may or may not be the same as the regexp in the previous
comment and may or may not be more readable:

The requirements of $at_sameline and $comma_no_whitespace and the inconsistent
usage of $opt_newlines and \s* (which allows \r without a \n or vice versa)
might be a bug in the regexp or in the RFC or in my brain or in my editor.

avarab wrote:

Lowering the severity of this, as it is a theoretical bug, there's no evidence
that this poses a risk at present time.

avarab wrote:

I was under the impression that we didn't block any email addresses currently,
but the regex we use blocks valid email addresses, upping the priority to BLOCKER.

(In reply to comment #19)

I was under the impression that we didn't block any email addresses currently,
but the regex we use blocks valid email addresses, upping the priority to BLOCKER.

Avar:
please can you supply us with more data:

(a) Which MediaWiki version do you refer to ?
(b) What user e-mail address exactly is false-rejected ?

Thanks in advance

avarab wrote:

In reply to (comment #20)

(1) HEAD (this bug is marked as 1.5-cvs so all comments should be considered to
apply to that branch)
(2) An email address with the plus sign and countless others, your regexp only
matches a very small subset of valid email addresses according to RFC 2822, it
will even reject an IPv4 address after the @.

Unless we can come up with some code that doesn't reject valid email addresses I
think we should remove this check alltogather, there's no proof whatsoever that
invalid email addresses pose any kind of threat and thiscan only serve to induce
annoyance by rejecting valid addresses.

(In reply to comment #21)

In reply to (comment #20)

(2) An email address with the plus sign and countless others, your regexp only
matches a very small subset of valid email addresses according to RFC 2822, it
will even reject an IPv4 address after the @.

Unless we can come up with some code that doesn't reject valid email addresses I
think we should remove this check alltogather, there's no proof whatsoever that
invalid email addresses pose any kind of threat and thiscan only serve to induce
annoyance by rejecting valid addresses.

In User.php, please modify the function

/**

  • does the string match roughly an email address ?
  • @param string $addr email address
  • @static
	 */

function isValidEmailAddr ( $addr ) {

		return preg_match( '/^([a-z0-9_.-]+([a-z0-9_.-]+)*\@[a-z0-9_-]+([a-z0-9_.-]+)*([a-z.]{2,})+)$/', strtolower($addr));

}

> so that it returns *true*.

Later, someone can improve the preg_match

avarab wrote:

A perl program that returns 0 if it's fed a valid address, and 1 if it's invalid

This is a perl program to test if an email address is valid, I modified it from
the Mail::RFC822::Address code, use it like:

$ perl checkaddress.pl "email.address@to.check.com" && echo valid || echo
invalid

Attached:

(In reply to comment #23)

This is a perl program to test if an email address is valid, I modified it from
the Mail::RFC822::Address code, use it like:

Good morning Avar,

can you recode this PERL to PHP please - this would be of big help (as I can
concentrate on other issues).
Mediawiki is currently coded in PHP and if you want to propose the improved
check routine, you can better post the PHP version here.

It should be compatible to and can than replace the current shorter function:

/**

  • does the string match roughly an email address ?
  • @param string $addr email address
  • @static
	 */

function isValidEmailAddr ( $addr ) {

		return preg_match(

'/^([a-z0-9_.-]+([a-z0-9_.-]+)*\@[a-z0-9_-]+([a-z0-9_.-]+)*([a-z.]{2,})+)$/',
strtolower($addr));
}

I guess, that Brion or JeLuf will commit any improved convincing version, which
helps to improve MediaWiki REL1_5.
Thanks your contribution.

Looking forward to your replacement function in PHP.

dmaj007 wrote:

/**

  • does the string match roughly an email address ?
  • @param string $addr email address
  • @static
	 */

function isValidEmailAddr ( $addr ) {
if (preg_match('/(?:(?:\\r\\n)?[ \\t])*(?:(?:(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*|(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*)*\\<(?:(?:\\r\\n)?[
\\t])*(?:@(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*(?:,@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*)*:(?:(?:\\r\\n)?[ \\t])*)?(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*\\>(?:(?:\\r\\n)?[ \\t])*)|(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*)*:(?:(?:\\r\\n)?[
\\t])*(?:(?:(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*|(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*)*\\<(?:(?:\\r\\n)?[
\\t])*(?:@(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*(?:,@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*)*:(?:(?:\\r\\n)?[ \\t])*)?(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*\\>(?:(?:\\r\\n)?[ \\t])*)(?:,\\s*(?:(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*|(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*)*\\<(?:(?:\\r\\n)?[
\\t])*(?:@(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*(?:,@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*)*:(?:(?:\\r\\n)?[ \\t])*)?(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*\\>(?:(?:\\r\\n)?[ \\t])*))*)?;\\s*)/', $text)) {
return true;
} else {
return false;
}
}
however! if ONLY AN EMAIL ADRESS IS SENT IN, YOU WANT TO APPEND A \\A RIGHT
AFTER THE '/ THAT STARTS THE REGEX

dmaj007 wrote:

Regex checks a string containing ONLY!! an email adress

This will check if the string contains only an email adress, no more, no less.

Attached:

dmaj007 wrote:

The check if there is a vaild email somewhere inside of the test string

This will return true if there is one valid email in the whole string.

Attached:

dmaj007 wrote:

These patches were commited to bugzilla because my total match rant was
incorrect. The patches, however, are defined exactly by their description. One
vaildates true if there is a single valid email in a string. The other if the
string contains exactly one email adress.

avarab wrote:

(In reply to comment #27)

Created an attachment (id=459) [edit]
The check if there is a vaild email somewhere inside of the test string

This doesn't pre-strip the email address of comments (see sub strip_comments in
attachment 453) or check if the address consists only of valid characters (see
the return line in sub valid in attachment 453)

avarab wrote:

Moving the priority down again

majewsky wrote:

Implementation of the e-mail address verification without regexps

I see that, in MW 1.9.2, the User::isValidEmailAddr function still just checks
whether the string is not empty and there is an "@". I read a bit in certain
resources like Wikipedia and the RFCs and composed the function you can find in
the fourth attachment. It makes (nearly) no use of regexps, instead it reads
and validates the string char by char. I made a speed test on my local Apache
server with PHP (measurement with microtime()). It took 7.228 seconds to
execute my function 100,000 times while the regular expression in the second
attachment needed 14.435 seconds under the same conditions.

Attached:

majewsky wrote:

Sorry, I made an error. What I expected to be the execution time of the regexp
implementation, was the execution time of both implementations. Some test runs
with corrected time measurement: (my vs. the regexp implementation)

5.758 vs. 6.592
5.763 vs. 6.604

ayg wrote:

Validating e-mail addresses is incredibly error-prone and I don't see any reason
to do it. It will probably introduce more issues than it fixes. I suppose
there's a theoretical chance that some vulnerability will be found that relies
on entering an invalid address, but is that chance really significant enough to
bother with? We should check if there's an @ somewhere to stave off people
entering totally the wrong thing by mistake, and maybe check length if there's a
length limit and check for the presence of invalid characters if there's a
simple list of valid characters, but trying (and probably failing) to implement
the entire RFC doesn't seem useful.

majewsky wrote:

Checking the lengths and the characters is what my function does. Not more and
not less.

Mediawiki functionality being tested without a solution for 5 years now after the (very) potential security thread was raised, fully agreeing with comment #33, I propose to close the bug: works for me.

happy.melon.wiki wrote:

The current code does not satisfy the original bug request, hence WONTFIX, not WORKSFORME.

I'm going to refrain from reopening this for now, but I am surprised that the 'solution' to this really complex problem that no one wants to fix because of:

a) its complexity,
b) the inability of suggested fixes to accommodate rfc 2822 (there are actually a number of other RFCs that govern the validity of email addresses as well)

is to simply look for an '@' in the email string. Particularly when an email address technically does not require an '@' in the string...

Some folks out there have spent a lot of time in working this problem over the past years. After some digging, I unearthed what actually looks like a fabulous email validator. It validates against the various RFCs out there governing email addresses AND it will provide warnings when an email address is technically considered valid but would more likely be a typo by a user.

The license is open, the source has been community developed and maintained (though principally by Dominic Sayers) and they provide a myriad of email addresses to test against the validator that have been carefully constructed using the RFCs.

Please take a look at: http://www.dominicsayers.com/isemail/

Note that this has been superseded by bug 22449: for MW 1.17/1.18 the email validation rules specified for HTML 5 client-side input validation are now being used.

This uses a subset of RFC 5322 rules for the mailbox part, and a subset of RFC 1034 for the host part.

Per comments on bug 22449, this appears to be liberal enough to not be a problem for the folks using slightly funky characters, while still being useful for catching obviously broken cases. It also means that we keep consistency between client-side validation and server-side validation, which is always a plus.

(Additional validation rules can be enforced by extensions hooking into the server-side validation function.)