Page MenuHomePhabricator

Extensions:TweetANew patch to support multibyte character (e.g. CJK) and t.co changing
Closed, ResolvedPublic

Description

Author: isaka

Description:
[1]
Twitter accept a tweet in 140 characters or below.
Extension:TweetANew has a process to truncate max 140 bytes as a tweet.

But TweetANew truncate a tweet too shorten on some languages (such as Japanese or Chinese) which have multibyte characher.
Because it is using PHP string class which process string as bytes array.
I rewrite it with PHP mbstring class which process string by character.

[2]
Now Twitter wrap up all URLs in "t.co", and changed length from 20 to 22 (on https, 21 to 23) characters.
https://dev.twitter.com/docs/tco-url-wrapper
https://dev.twitter.com/blog/upcoming-tco-changes

Following patch is working in my wiki. http://kimagurenote.net/kn/TweetANew
Please test it.

  • TweetANew.body.php.head 2013-12-07 10:54:04.000000000 +0900

+++ TweetANew.body.php 2014-01-15 22:57:20.000000000 +0900
@@ -231,17 +231,19 @@

	 */

public static function makeSendTweet( $tweet_text, $finalurl ) {

		global $wgTweetANewTwitter, $wgLang;
  • # Calculate length of tweet factoring in longURL
  • if ( strlen( $finalurl ) > 20 ) {
  • $tweet_text_count = ( strlen( $finalurl ) - 20 ) + 140;

+
+ # Calculate length of tweet factoring in t.co
+ if ( stripos( $finalurl, 'https:' ) !== false ) {
+ $tweet_text_count = 140 - 23 + mb_strlen( $finalurl );
+ } elseif ( stripos( $finalurl, 'http:' ) !== false ) {
+ $tweet_text_count = 140 - 22 + mb_strlen( $finalurl );

		} else {
			$tweet_text_count = 140;
		}

		# Check if length of tweet is beyond 140 characters and shorten if necessary
  • if ( strlen( $tweet_text ) > $tweet_text_count ) {
  • $tweet_text = $wgLang->truncate( $tweet_text, $tweet_text_count );

+ if ( mb_strlen( $tweet_text ) > $tweet_text_count ) {
+ $tweet_text = mb_substr( $tweet_text, 0, $tweet_text_count - 3 ) . '...';

		}

I'm sorry about my broken English;-)


Version: unspecified
Severity: normal

Details

Reference
bz60227

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:06 AM
bzimport set Reference to bz60227.
bzimport added a subscriber: Unknown Object (MLST).

Change 108331 had a related patch set uploaded by Dereckson:
Improve UTF-8 and links support

https://gerrit.wikimedia.org/r/108331

Thank you for your patch, I've submitted it.

If you wish, you can get an access to the code review system and so be able to submit patches yourself at the following URL:

To push code to code review, the procedure is described at one of the two following URLs:

[varnent: Should this extension have its own component in Bugzilla?]

admin wrote:

(In reply to comment #3)

[varnent: Should this extension have its own component in Bugzilla?]

Sure. :)

Change 108331 abandoned by Varnent:
Improve UTF-8 and links support

Reason:
Changes made in next update

https://gerrit.wikimedia.org/r/108331

This bug has been fixed in change I0372c272d9bdada1978f9d92be763253ee93200c.

[ Bug assigned back to patch submitter. ]