Adding a delta characters change to each revision is needed for edit analytics. This is needed for both the stub and full article dumps.
Rob suggested that using PHP's UTF-8 support (e.g. just calling mb_strlen($buffer, 'UTF-8')) to quickly dispatch of the multi-byte problem would give us a fairly accurate character count. Counting characters will allow us to compare across different languages.
If there are serious performance concerns then we can fall back to byte count.
Version: unspecified
Severity: enhancement