rewrite quickIsNFCVerify() to use preg_match() with an offset to accommodate larger files
Open, LowestPublicFeature
Actions

Assigned To

None

Authored By

	• brion
	Apr 5 2011, 1:31 AM

Description

Broken out from T30146, which started with a narrower focus which was solved by a narrower fix.

Per notes & patches on that bug, the preg_match_all() in UtfNormal::quickIsNFCVerify uses a lot of memory for mixed ASCII/non-ASCII strings such as one finds in languages using Latin scripts with accented or other non-ASCII letters.

This results in hitting memory limits on largeish input strings, much sooner than we really ought to.

Rewriting the function so that it works through the string in chunks as it's splitting should avoid that huge memory bump, but my initial tests were too slow using preg_match and an offset, and still slowish using preg_replace_callback.

includes/normal/UtfNormalMemStress.php can be used to stress-test this.

Version: 1.18.x
Severity: enhancement

Details

Reference: bz28427

Related Objects

Mentioned Here: T30146: Memory limit hit while uploading DjVu file with embedded text

Event Timeline

• bzimport raised the priority of this task from to Lowest.Nov 21 2014, 11:33 PM

• bzimport added projects: MediaWiki-General, I18n.

• bzimport set Reference to bz28427.

• bzimport added a subscriber: Unknown Object (MLST).

• brion created this task.Apr 5 2011, 1:31 AM

I suppose that this error is related to this bug?

PHP fatal error in
/usr/local/apache/common-local/php-1.17/includes/normal/UtfNormal.php line 285:
Allowed memory size of 125829120 bytes exhausted (tried to allocate 71 bytes)

http://fr.wikisource.org/w/index.php?title=Fichier:Port_-_Dictionnaire_historique,_g%C3%A9ographique_et_biographique_du_Maine-et-Loire,_tome_1.djvu&action=purge

This is a big file: 882 pages (85,71 Mo)

That'll be another instance of bug 28146 with the djvu text extraction; merging the fix for that to 1.17 and deployment should resolve it.

Hi veteran contributors. Is this problem still valid? Is General/Unknow its best location?

Marking as Lowest, since nobody seems to be working or planning to work on this currently.

Liuxinyu970226 updated the task description. (Show Details)Apr 10 2017, 3:02 PM

Liuxinyu970226 edited subscribers, added: Aklapper; removed: • wikibugs-l-list.

Qgil unsubscribed.Apr 20 2017, 9:33 AM

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:02 AM

Krinkle added a project: Performance-Team (Radar).Jun 1 2022, 6:42 PM

Krinkle edited projects, added MediaWiki-libs-utfnormal; removed MediaWiki-General.

Reedy added a project: Performance Issue.Aug 7 2022, 1:36 PM

Krinkle moved this task from Limbo to Perf recommendation on the Performance-Team (Radar) board.Sep 26 2022, 8:09 PM

The code still exists with the same issue, but it's extremely unlikely to cause errors in production. I don't see any such errors in Logstash.

Krinkle edited projects, added Wikimedia-Performance-recommendation; removed Performance-Team (Radar).Aug 18 2023, 8:43 PM

rewrite quickIsNFCVerify() to use preg_match() with an offset to accommodate larger filesOpen, LowestPublicFeatureActions

Description

Details

Related Objects

Event Timeline

rewrite quickIsNFCVerify() to use preg_match() with an offset to accommodate larger files
Open, LowestPublicFeature
Actions