BiDi Mozilla

Last events
Developers
Specifications:
General BiDi
Arabic Specific
Hebrew Specific
 
Reference and Related Specification
Open Issues
Free Resources
Schedule
Back to The Mozilla Language Enabling Prj
Back to The Mozilla
Organization

Mozilla Language Enabling Feature
Arabic/Hebrew (Bi-Di) language Enabling
status update

Editor: Franck Portaneri <franck@langbox.com>

Last Update: April 18th, 2002
Original page NEW location : http://www.langbox.com/bidimozilla 

 

This status update page is updated weekly, mainly according the
news:netscape.public.mozilla.i18n discussions.
To add updates and news, you are invited to mail your input or text to me.

Last Events

Also, the Hebrew mozilla l10n team (http://www.mozilla.org.il ) has started working, and it is reporting (in Hebrew) quite a bit bugs about rtl User Interface...

Lina's comments are:

"Regarding the latest gfx code, forgot to mention one important thing.

In the meantime, BiDi gfx changes ensure correct rendering only on Win platform.
On all the other platforms, as I suppose, BiDi text would be displayed correctly in the following cases:

  • -- non-BiDi platform and visual text mode,
  • -- BiDi platform and implicit text mode.

Also, the Windows specific code should be enhanced by checking whether to reverse text on the basis of the embedding level, instead of testing for the presence of Hebrew characters as at present.


Summary of Bidi-related processing implemented in this code :

    • Parsing: consume text token taking into account BiDi category.
    • Retrieve "CSS display" part of the style sheet.
    • Content model: resolve text classification of each token, taking the CSS display property into account.
    • Frame model: sort frames according to the resolved embedding level of their content.
    • Rendering context: ensure that the text is displayed in the correct order.

 

To use this tar file, untar it in the mozilla directory (It overlays files!) Then build it (Note that Bidi is turned on in config.mak by definining IBMBIDI.)

  • Feb 10, 2000 : Matitiahu Allouche from IBM Israel posts a long document on the Bi-Di Mozilla status and his proposed design : This is a very complete and detailed document which presents:
    • Issue raised in past discussions
    • Principle for BiDi support in Mozilla
    • Proposed Design
    • Division in Tasks (Matitiahu mentions too that IBM Israel is working on Task T1, T2, T3, T4, T5, T6 and T13)

    --> Please send your remarks and comments...

  • Feb 9, 2000: Maha Abou El-Rous from IBM Egypt reports a status and gives a plan for BiDi mozilla project.
    • Status:
      • We have a preliminary design for the required changes for the User Interface, once finalized, it will be posted for your feedback/comments
      • Layout text
      • Layout table
      • Listbox, single element listbox, and scrollbar
      • Investigating CSS Bidi attributes
      • Arabic Shaping Engine

    • Plan:
      • Task Module
        Code Page Intl
        Font Support -
        UI: Preference page -
        UI: View menu - Bidi option -
        UI: Character set menu additions XPFE
        Layout Text htmlparser, layout
        Layout Table layout
        Controls: Layout RTL layout
        Controls: Bidi Multi/single line editor editor
        Search layout
        Selection layout
        Clipboard operations -
        Shaping: Character shaping gfx
        Shaping: Numeral shaping gfx
        HTML,CSS,XUL Bidi attributes htmlparser, layout

    --> Please send your remarks and comments...

  • Jan 31, 2000: Mike Kaply from IBM informed me that : "The Bidi team is doing design work and they are getting ready to post some stuff. Unfortunately, with the holidays and some other personal issues, there have been some slight delays, but we are looking to post the task list and design very soon."...

  • Dec 8, 1999: Great news : Mike Kaply from IBM announced that IBM have assembled a staff of people in Israel and Egypt (Yaacov Akiba Slama and Maha Abou El-Rous) to help with Bidi work in Mozilla. This group already worked on Bidi enablement for the Netscape 4.X products on OS/2 and Windows...
    This new team introduction reactivated the BI-DI, Arabic, Persian and Hebrew (Logical/Visual) discussions on the news:netscape.public.mozilla.i18n.

  • Jonathan Rosenne also forwarded a message from the Hebrew SIG of the Israeli Internet Association, whose first meeting will address Hebrew and the Internet, with Mozilla as the first priority. 
 
Developers
Feature Owner:
Alexander Khalil  <iskandar@ee.tamu.edu>
Franck Portaneri <franck@langbox.com>
WinFE:
Barak Ori <barak@comfy.co.il>
XFE:
Franck Portaneri <franck@langbox.com>
Mark Leisher <mleisher@crl.nmsu.edu>
MacFE:
Adil Allawi <adil@diwan.com> starts an in-house project and might show a beta at the Gitex show. He is open to a collaboration with the Mozilla team.
XP:
BI-DI : Michael Kaply <mkaply@us.ibm.com>, Yaacov Akiba Slama <slama@il.ibm.com> and Maha Abou El-Rous
Arabic : Franck Portaneri <franck@langbox.com> and Mark Leisher <mleisher@crl.nmsu.edu>
Hebrew : Dotan Dimet <dotan@usa.net>, Ariel Backenroth <arielb@rice.edu>
QA:
Alexander Khalil  <iskandar@ee.tamu.edu>
Anoosh Hosseini <anoosh@gpg.com>

JKL <jklnet@usa.net>
Doron Shikmoni  <doron@erez.cc.biu.ac.il>
Jonathan Rosenne  <rosenne@qsm.co.il>
Dov Grobgeld <dov@orbotech.co.il>
Ariel Magnum <amagnum@bigfoot.com>
Shay Elkin <antil_za@mailandnews.com>
Roozbeh Pournader <roozbeh@sina.sharif.ac.ir>  
Document:
Alexander Khalil  <iskandar@ee.tamu.edu

You want to participate :

  1. Visit on the mozilla.org site and specially http://www.mozilla.org/community.html
  2. Subscribe to the netscape.public.mozilla.i18n newsgroup ( mailto:mozilla-i18n-request@mozilla.org?subject=subscribe)
  3. Have a look on the http://www.mozilla.org/docs/refList/i18n/scripts.html and http://www.mozilla.org/docs/refList/i18n/schedule.html
  4. Download the source tree and build it on your system
  5. Contact the project owner by e-mail, cc mozilla-i18n@mozilla.org to introduce yourself.
Specifications

The main support is common for Arabic and Hebrew because of the Bi-Di (Bi-Directionality) specificity of both languages.
Of course, the charset is not the same, as well as the latest rendering process which is more complex for Arabic due to the "glyph shaping determination". So, this part of the document is split in two sections - Arabic and Hebrew :

General BiDi
IBM Code review
Jan 08, 2001
by Steve Clark <buster@netscape.com>

Last Thursday, I held a meeting a design and code review meeting regarding the Bi-Di code submission from IBM. Thanks to all those who attended and sent me feedback. Here's a summary of where I think we are today.

1. Architecture

The overall design of the new code is fine, as far as we can tell. There are plenty of things that need to get fixed, but the basic concept is perfectly acceptable. However, there were a few issues that do need to be addressed before we could include the code on the trunk.

A) platform-specific code

In general, we do not allow #ifdef PLATFORM code in XP modules. You need to factor out the platform-specific portions of your code, and isolate platform code in it's own module. Then the build system can do the right thing at build time, without polluting the XP modules with tons of #ifdef code. Along these lines...it is absolutely *not* required that you implement Bi-Di on all platforms. However, your implementation should strive to be free of platform-specific assumptions, so that others can implement it on their systems. Erik has volunteered to help validate your design against other platforms (I think he volunteered to validate Linux himself, and he "volunteered" Frank for Mac.)

B) illegal dependancies

You added a dependancy between layout and the view system that isn't legal. Kevin Mcclusky can provide the details, but basically you are making bad assumptions about frames in the view code. Kevin, please elaborate.

C) misuse of interfaces

You have added concrete functions and member variables to several interfaces. This is illegal. XPCOM interface are abstract contracts that cannot include this sort of implementation. Also, you should not have #ifdef blocks on an interface. An interface is a public contract that sometime soon (probably Mozilla 1.0), will become immutable. It cannot depend on compile-time switches. If you need optional additional functionality, it has to be on a new interface that is optionally a subclass of whatever concrete class needs to support the methods.

2. Documentation

One thing that makes reviewing a submission of this size very difficult is a lack of documentation. Some of the individual code blocks are well documented, but there is no overview to guide us. To get this code successfully integrated into the branch, we need 4 levels of documentation:

A) an overview document.

This need not be long, or formal. Just something to help us understand the philosophy behind the changes. Where are major pieces of data stored (such as knowing whether Bi-Di is enabled, or required for a particular page?) What classes do which portion of the work? What work exactly is being done (i.e., frame reordering.) I don't think the overview document needs to be complete and polished before the code can go in, but I do think something is needed before the next round of reviews.

B) interface documentation.

Though we're not always good at it, we do try hard to get all major classes and public interfaces thoroughly documented. It would be a big help if each new method had a comment block that described what the method did, its arguments, it's return value, and any possible side effects. We urge people to use a javadoc syntax, because there are tools that automatically build documentation from such comments. See nsIFrame.h for an example of a fairly-well documented interface.

C) code-level documentation.

For the most part, the submission was pretty good about including appropriate code-level comments. More is better, of course. In particular, documenting the use of member variables inside of classes is very helpful.

D) adhering to coding conventions.

Parts orf the submission were very poor at sticking to the mozilla coding conventions. This makes the code much more difficult to read. Please see http://www.mozilla.org/newlayout/doc/codingconventions.html

3. Performance

One of the biggest concerns is the impact on clients that are not interested in providing Bi-Di support. Let's break this down into several categories:

A) code size

Clearly, clients that are not interested in supporting Bi-Di should not have to pay a significant penalty for the additional code required for Bi-Di. The two ways we can think to minimize the impact are to factor as much as possible into a separate library, or to leave significant code chunks in #ifdef BIDI blocks. I'd like to urge people to think about which code could reasonably be factored into it's own library, since the support costs for #ifdef code is high.

B) memory usage

Reading the code, it doesn't look like the Bi-Di code adds any significant amount of bloat. We'll have to take measurements once it's integrated to validate, but so far, it looks good.

C) performance

Most reviewers are less concerned with the performance of the code when Bi-Di is required, than the impact of the code when Bi-Di is not needed to lay out a page. There seemed to be a few areas where Bi-Di code was being executed unnecessarily. These could probably be fixed by simply checking whether anything on the page warrented Bi-Di calculation before executing the new code.

4. Implementation problems

There are plenty of minor problems that need to get fixed. Too many to put in a newsgroup posting! But here are some general trends:

A) memory leaks

There are a few places where you leak objects because of early returns in a method. Using nsCOMPtr would prevent this.

B) null pointer checks

There are many places where pointers are used without first being checked for null. These include new allocations, method parameters, and returned out-parameters from function calls. At a minimum, assertions need to be added to validate the pointer. And unless you're guaranteed the pointer must be valid, you should put in a null pointer check and return an error if null.

C) 64-bit compatibility

Chris Waterson noticed some code that seemed to make bad assumptions about 32-bit pointers. We already have one 64-bit system, and in general we strive to avoid assumptions about the hardware. Chris, could you elaborate on the specifics here?

I'll foward individual comments separately.

Proposed plan
Dec 11, 1999
by Franck tang <ftang@netscape.com>

Frank Tang propose that the priority should be :

  • 1. Add XP bidi engine- grab from somewhere- free-bidi or the pretty-good-bidi
    Mark Leisher did an excellent comparison page here
  • 2. Look at layout code- resolve directionality and break text in different direction into different text frame.
  • 3. Add directionality attribute into text frame
  • 4. We already flow text frame depend on the DIR, so we probably don't need to change that part.
  • 5. Make sure the LTR text frame call GFX DrawString from left to right
  • 6. Fix GFX bugs.

Detail Design: Find public source code or write new code from scratch for the Bi-Di API

Three codes are free or almost open source now. They are the following in the order of their announcement:

Mark Leisher did an excellent comparison between these package results as well with IE 5.0.

However, under such systems, the GUI side (dialog boxes, text input forms...) will behave only in Latin (no dual keyboard management) - This pbm has to be fixed at the GTX level.

Here after is some details on these codes :


19-Nov-1999: Mark Leisher <mleisher@crl.nmsu.edu> announces the Version 2.3 of the UCData package, which includes the PGBA.

What is the PGBA? The PGBA is a small, simple, and fast one-pass Unicode bi-directional text reordering algorithm that works "pretty good" for most text. It provides an effective alternative to the Unicode Bidi algorithm for implicit reordering of bi-directional text. It does not currently support the explicit bi-directional codes available in Unicode. Support for logical and visual cursor motion through the reordered string is included.

Some problems with the PGBA have been fixed, speed has been improved, the code has been reduced in size and made somewhat clearer, a man page for the bidi API has been added, and the documentation has been improved a bit. The README file in the distribution details the changes. The home page will eventually have a section showing the results from the PGBA, the IBM ICU bidi implementation, and the FriBiDi implementation.

See http://crl.nmsu.edu/~mleisher/ucdata.html for documentation and download.

7-Oct-1999 : Mark Leisher <mleisher@crl.nmsu.edu> announced the availability of "Pretty Good BiDi Algorithm." Version 2.1 to its UCData freeware package. The good news is that Frank Tang  did the embedding of UCData 1.9 to the Mac, Win and Unix XPCOM in April 1999.

Mark Leisher says: << ... This release provides some bug fixes, and update for the new (apparently undocumented?) Unicode 3.0 bi-directional categories, and the addition of the "Pretty Good BiDi Algorithm." The PGBA is an elegant and simple one-pass BiDi reordering algorithm that works pretty dang good for most text. It has some deliberate, but (hopefully) minor shortcomings just so developers who use it have something to keep them occupied :-) The PGBA is in no way related to the Unicode BiDi Algorithm except by coincidence.

IMPORTANT: The PGBA is dependent on UCData because of the interpretation of certain 3.0 BiDi categories. To be explicit, the following BiDi category assumptions are made when building the character type data file:

  • "AL" is equivalent to the "R" property.
  • "BM", "NSM", "LRE", "RLE", "LRO", "RLO", "PDF" are all equivalent to the "ON" property.

If your character type package of preference has these assumptions, then using the PGBA will be no problem.

>>

Short and simple info page: http://crl.nmsu.edu/~mleisher/ucdata.html
The distribution is available in .tar.gz and .zip form from:

http://crl.nmsu.edu/~mleisher/ucdata-2.1.tar.gz
http://crl.nmsu.edu/~mleisher/ucdata21.zip
ftp://crl.nmsu.edu/CLR/multiling/unicode/ucdata.tar.gz
ftp://crl.nmsu.edu/CLR/multiling/unicode/ucdata.zip


3-Nov-1999: Markus Scherer <schererm@us.ibm.com> from IBM Cupertino mentioned that ICU have the Unicode 3.0 BiDi algorithm implemented since the end of september and since ICU 1.3. The current version is ICU 1.4.2. Mark Leisher did some testing on it. If someone tried this BiDi API, please send feedback on it.


15-Jan-1999 : Dov Grobgeld <dov@imagic.weizmann.ac.il> announces the first alpha version of FriBidi, a Free BiDi library that adhers closely to the Unicode BiDi algorithm. See http://imagic.weizmann.ac.il/~dov/freesw/FriBidi for more info.

Detail Design : Use an HTML Explicit or Implicit description of the RTL management

    This part should determine if Mozilla Arabic support expects that all the RTL/LTR management is done as :
      explicitly :
      i.e. only forced through <dir> HTML tags and directives as described in HTML 4.0 proposal.

      implicitly :
      i.e. meaning that if the charset definition is something like :
      <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-6">
      then the default direction is forced to RTL (Right justification)
      both allowed :
      with the introduction of something like :
      <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-6">  for Implicit
      <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-6-e"> for explicit
    But this point should be in accordance with the HTML 4.0 definition. Please send you feedback here, this is really an open subject that need more input and discussions...
Detail Design: Extend the Mozilla layout source code with the Bi-Di API (By Franck Portaneri) -
    The API function calls must be embedded within the Mozilla source tree to get the Bi-Di and Arabic support build-in. This is a complex part where the following issues must be taken in account:
     
    • Dissociate the "Bi-Di" and "Glyph Shaping" process (to allow both Arabic and Hebrew support)
    • Work on full paragraph context (merge all text segments of a paragraph in order to do the rendering process)
    • Embed the "Output Rendering" process on the text display level.
    • Embed the "Text Selection highlight" process on the text display level.
    • Embed the "Mouse Position handling" process at the mouse pointing level (for selection operation)
    • Manage the full RTL presentation : Right alignment, Scroll bar sliding reversed....
    • Check the Printing subsystem and contribute with the "UNIX Non-Latin1 Printing Enhancement" module owner.
    • Take care to the coexistence with an BiDi Operating system and avoid conflicts
Detail Design : GFX code extension for Bi-Di (by Frank Tang)

18-Aug-1999, Frank Tang  fixed some bugs on the MacGFX for Unicode BiDi rendering. The Screen shot results are as follow :


for Arabic