Line Breaking Problem of WebKit

2008/2/24
Satoshi Nakagawa /

Summary

(U+3001) and subsequent ASCII token should be breakable into two lines.

(U+3002) has the same problem as well.

Description

With the current implementation of WebKit, and are not treated as line-breakable characters when or is followed by an ASCII token.

For example, は、ABC is not breakable into two lines although we consider it breakable into は、 and ABC.

In Japanese writing, we use these characters just like comma and period in English. We can break a line after comma or period in English. It's the same situation for and in Japanese.

IE6/7 and Firefox 2/3 implement it correctly. These browsers can break は、ABC into は、 and ABC appropriately.

The Original Cause

The original cause of this problem is the Unicode line breaking algorithm standard.

WebKit depends on the ICU's line break iterator. ICU has a very strict implementation of the Unicode line breaking algorithm.

See Problem in Unicode Line Breaking Algorithm for details.

Test Case

These are good:

レンダリングエンジン。
WebKit

インストール方法は、
http://webkit.org/building/tools.html

The following two should look like “good”:

レンダリングエンジン。WebKit

インストール方法は、http://webkit.org/building/tools.html

These are bad:

レンダリングエンジ
ン。WebKit

インストール方法
は、http://webkit.org/building/tools.html

The Fix

Index: WebCore/platform/text/CharacterNames.h
===================================================================
--- WebCore/platform/text/CharacterNames.h	(revision 30353)
+++ WebCore/platform/text/CharacterNames.h	(working copy)
@@ -39,6 +39,8 @@
     const UChar bullet = 0x2022;
     const UChar horizontalEllipsis = 0x2026;
     const UChar ideographicSpace = 0x3000;
+    const UChar ideographicComma = 0x3001;
+    const UChar ideographicFullStop = 0x3002;
     const UChar leftToRightMark = 0x200E;
     const UChar leftToRightEmbed = 0x202A;
     const UChar leftToRightOverride = 0x202D;
Index: WebCore/rendering/break_lines.cpp
===================================================================
--- WebCore/rendering/break_lines.cpp	(revision 30353)
+++ WebCore/rendering/break_lines.cpp	(working copy)
@@ -51,6 +51,8 @@
         case '-':
         case '?':
         case softHyphen:
+        case ideographicComma:
+        case ideographicFullStop:
             return true;
         default:
             return false;

Discussion

http://bugs.webkit.org/show_bug.cgi?id=17411