Latin Thaana Converter 2.0

Latin Thaana Converter is a small, simple software for Microsoft Windows that performs transliteration on latinized (i.e. romanized) Thaana to convert it back into the Thaana script. This is a tool I originally released in 2003 under the name "Latin Dhivehi Converter"/"Lat2Dhiv". This new release carries a new name (which I think is a more technically correct name for what it does) and sports a few aesthetic changes but is functionally almost exactly the same as the original - it is basically a recompile of my old code within the .Net framework.

Automated transliteration of Latin Thaana is not an entirely easy task. Look up table based algorithms are simple to implement but are unable to correctly handle cases of sukun, present issues with most other fili and generally have a host of other problems as well. Latin Thaana Converter utilizes a finite state machine and its transliteration mappings are based on a more extensive scheme extracted from an analysis of a body of Latin Thaana-to-Thaana sample data. It maybe worth mentioning that the analysis had revealed that upto 4 characters were being used (and needed) for some Thaana transliterations. However, it must be said that the quality of transliteration from this is limited by the accuracy and diversity of the sample data I had used and hence is by no means perfect.

Since writing this program in 2003, I have experimented with probabilistic FSMs and also put machine learning techniques to the task with better results. I plan to write more extensively on Thaana transliteration algorithms at a later time...

Usage

1. Copy-paste or type the Latin Thaana text into the "Text in Latin Thaana" box.
2. Click "Convert".
3. The converted text appears in the "Text in Thaana" box.

Download

- Latin Thaana Converter 2.0 Installer (126KB, MS Windows)
- Latin Thaana Converter 2.0 Executable only (22.8KB, MS Windows)

Hope someone finds it useful :-)

Javascript Thaana Keyboard version 4.1

I released Javascript Thaana Keyboard v4.0 only 10 days ago but I've since been made aware of a few bugs in the script that had gone unnoticed during testing back then. I decided to cut another release to fix those bugs which, although minor, could potentially be annoying to end-users. This new release also crams in a few tweaks to improve performance.

Changelog:

+ Fixed handling of Delete key and other special keys
+ Added correct handling for Thaana brackets "()"
+ Improved performance

Usage:

Usage remains same as before, so please refer to my post on the 4.0 release.

Demo:

Check out the demonstration and testing page here.

Download:

- full source version (10.6 KB)
- packed version (2.46 KB) [recommended]

Update (19-Dec-2008): This version is now superseded by release v4.2.

Javascript Thaana Keyboard version 4.0

Here is an update to my Javascript Thaana Keyboard (JTK). This 4.0 release packs in a bunch of new features, making JTK much more powerful and more flexible than any of the earlier releases.

Keyboard support:

Most notable on this new release is the introduction of support for the various different types of Thaana keyboard in use. JTK now supports the following keyboard layouts:

Phonetic (Segha version): This keyboard is perhaps the most popular Thaana keyboard layout. JTK identifies this keyboard as "phonetic".

Phonetic (Hassan Hameed version): This keyboard is similar to above but notably differs in its mapping of alifu, abafili, aabaafili, gaafu and the sukun. JTK identifies this keyboard as 'phonetic-hh'.

Typewriter: This is the standardized Thaana layout used on typewriters. JTK identifies this keyboard as 'typewriter'.

Browser support:

JTK 4.0 adds support for IE5.5, which has a very significant market share still. Hence JTK should now work perfectly well on Microsoft Internet Explorer 5.5+, Mozilla Firefox 1+, Opera 9+, Apple Safari 2+ and Google Chrome 0.1+.

Basic usage:

The basic usage allows for fast and easy integration of JTK on your Thaana web pages.

1. Link the file in the HEAD section of the page:

2. For any element accepting input (i.e. INPUTs, TEXTAREAs, content-editable DIVs), assign them the special class name "thaanaKeyboardInput". JTK will automatically handle text entry to any element with that class name. You can assign further classes to the elements without ill-effect, if needed.

3. There are two ways to set the keyboard used for an element.
defaultKeyboard method: This method allows setting a default keyboard to be used on all elements handled by JTK. To do this, add the following to the HEAD section of your web page but make sure it is added after the code inserted from step 1 above.

thaanaKeyboardState method: This method allows per element control on the type of keyboard used by an element handled by JTK. To do this, add a form element (can be a radio, checkbox, select, hidden or text field) with its name set to the text entry element id suffixed with the string "_thaanaKeyboardState". The value of these fields should specify either 'off', 'phonetic', 'phonetic-hh' or 'typewriter', indicating the status and the keyboard in use.

So, if you had a text entry field with the id "fullname" then the keyboard could be specified using a hidden field as follows:



4. Make sure that the text direction for the Thaana fields is set to "rtl". This can be easily achieved using CSS, by adding a class definition for the "thaanaKeyboardInput" class or by any other method of your choice. Adding the following to your CSS definition should suffice for most uses:
.thaanaKeyboardInput {
    font-family: faruma, 'mv iyyu nala', 'mv elaaf normal';
    direction: rtl;
}

If the above instructions are followed correctly, the JTK Thaana functionality would be in effect soon as the page has loaded!

Advanced Usage: The JTK object, methods and properties

To facilitate advanced integration functionality for developers looking to have (finer) control over its behavior, JTK now makes itself available as a public object named "thaanaKeyboard".

The following properties and methods exposed by the "thaanaKeyboard" object:

defaultKeyboard: [property] The Thaana keyboard layout type to default to when JTK enabled elements do not have a keyboard specified.
Valid values are: 'off' to keep Thaana disabled, 'phonetic' to use the standard phonetic layout, 'phonetic-hh' to use the phonetic layout by Dr. Hassan Hameed and 'typewriter' to use the typewriter layout.

setHandlerById ( id, action ): [method] Sets the state of the JTK handler for a page element.
The id argument should be a string containing the id of any content-editable element. The action argument should specify either "enable" or "disable" depending on whether input handling for Thaana should be enabled or disabled, respectively.

setHandlerByClass ( class, action ): [method] Sets the state of the JTK handler for a set of page elements.
The class argument should be a string containing the class name of any content-editable element (i.e input, textarea etc). The action argument should specify either "enable" or "disable" depending on whether input handling for Thaana should be enabled or disabled, respectively.

License:

JTK 4.0 is released under the MIT License, allowing its use in both personal and commercial applications as long as the copyright and license permission notice remains intact.

Demo:

Check out the demonstration and testing page here.

Download:

- original full source version (10.0 KB)
- packed version (2.33 KB) [recommended]

As always, drop a line here if you use it and/or have problems or suggestions. Enjoy. :-)

Update (31-Oct-2008): This version is now superseded by release v4.1.

Thaana conversions class for PHP 5 - v0.3

Here is a minor update to my previously released Thaana Conversions class for PHP. This new version adds the function convertUtf8ToEntities() which I had forgotten to include in the previous public release.

The Thaana Conversions class for PHP provides a number of useful functions for the conversion and transliteration of text between various Thaana representation formats.

Functions listing:
- convertUtf8ToUnicodeIntegers()
- convertUtf8ToAscii()
- convertUtf8ToEntities()
- convertEntitiesToUnicodeIntegers()
- convertEntitiesToUtf8
- convertEntitiesToAscii()
- convertUnicodeIntegersToUtf8()
- convertUnicodeIntegersToEntities()
- convertUnicodeIntegersToAscii()
- convertAsciiToUtf8()
- convertAsciiToUnicodeEntities()
- convertAsciiToUnicodeIntegers()


Requires:
PHP 5

License:
Open Source MIT License

Usage:
<?php
// Load the class
require 'thaana_conversions.obj.php';

// Initialize the Thaana object
$thaana = new Thaana_Conversions();

// Example: Converting Thaana expressed as HTML entities to ASCII
echo $thaana->convertEntitiesToAscii('&#1931;&#1960;&#1928;&#1964;&#1920;&#1960;');

// Example: Converting ASCII Thaana to UTF-8
echo $thaana->convertAsciiToUtf8('rWacje');
?>


Download:
- Thaana_Conversions.zip (v0.3, 4KB)

Drop me a line if you run into trouble with any of the functionality or have comments/queries. Enjoy :-)

Update (29-Jan-2009): This version is now superseded by the v0.4 release.

Radheef for Ubiquity

Earlier this week, Mozilla Labs released a very interesting (and useful) new extension for Firefox called "Ubiquity". It basically taps into the services provided by web services to integrate them better into the browsing experience. For an introduction to Ubiquity, it'd be best to read the post on the Mozilla Labs blog where the product was launched - the video featured in the post presents quite impressive practical enhancements to the browsing experience made possible by the product.

Radheef for Ubiquity
Radheef for Ubiquity adds the verb 'bas' to Ubiquity to facilitate quick and easy look up of definitions of Dhivehi words from any Thaana web page using the Radheef. To lookup a word you find on a Thaana web page, you can either select the word and bring up Ubiquity. Alternatively, you can type the word in to the Ubiquity command window. The definition results are shown immediately within the Ubiquity command bubble.

To install this, visit the Radheef for Ubiquity page after Ubiquity is installed and follow the normal verb installation steps.

Screenshots:

In use on a selected word on the Unicode-based Jazeera Daily website


In use on a selected word on the Ascii-based Haveeru Daily website. It offers helpful suggestions if no result is available for query word as-is.


Happy browsing :-)

Firefox 3 Thaana display bug: review and fixes

Maldivians who use Firefox would be aware that certain Dhivehi websites, such as Miadhu Online, no longer display the Thaana fonts correctly when they switched to the recently released version 3 of the popular browser. I would like to review the issue for the benefit of Maldivian web developers and put forward some solutions that could be used. Further, I would also like to make available a fix that ordinary web users can themselves use until website owners (or the Firefox developers) fix the issue.

Problem description

Firefox 3.x series (and the 2.x series as well, to a lesser extent) fails in correctly displaying Thaana in web pages when certain non-Unicode Thaana fonts are applied to the elements using CSS. The same pages, however, render correctly without issue with Internet Explorer, Safari and Opera.

DOCTYPE - One contributing factor seems to be the DOCTYPE of the page. My guess is that this issue may have something to do with quirksmode rendering or standards compliance. The lack of a DOCTYPE in the markup gives correct rendering of the Thaana fonts on the page. However, omission of the DOCTYPE cannot and should not be considered a solution as DOCTYPE is required for most page markup and browsers need the correct DOCTYPE specification to correctly render modern pages.

Font - Another factor seems to be the font file used. The Thaana characters fail to be rendered correctly when almost all of the commonly used Thaana fonts, such as A_Faseyha, A_Waheed and A_Randhoo, are used. However, some fonts do work without issue - A_Ilham for example.

Here are some demo pages to highlight the issues. Each of the pages has three lines of Thaana - first of which is Thaana text enclosed in a font tag specifying a (problematic) Thaana font, the second is a H3 headline which has the font family set to a (problematic) Thaana font using CSS alone, the third is again a H3 headline which has the font family is set to a (problematic) Thaana font using CSS but has the text placed inside a font tag and finally the fourth line has a H3 headline whose font family is set to a (working) Thaana font using CSS alone.
View Thaana on page with: no DOCTYPE, HTML 4.01 DOCTYPE and XHTML 1.0 DOCTYPE.

Developer's fix

There are two definite solutions that can be easily applied by web developers.

Solution 1: Add HTML Font tags around any and all text that is to be displayed in Thaana. Specify the font to be used within the "face" attribute of the Font tags as usual. The flip-side of this method is that it results in a significant increase in page size. Haveeru News seems to have addressed the problem using this method. Here's a example:
bwlimIhunc aufulunc 

should be transformed into 

bwlimIhunc aufulunc

Solution 2: Change font used in the CSS definition to "A_Ilham". It is, perhaps, not as clean and pretty as "A_Faseyha" but until there is a fix to Firefox it will have to do.

A further alternative solution would be for the site owners and developers to take this occasion to shift to Unicode Thaana. It is much more reliable and is the currently recommended method of displaying Thaana on the web. Jazeera Daily, Haama Daily and MvHeadlines, to name a few, are all using Unicode for text display and entry. You can utilize the PHP-based Thaana Conversions class I released to convert the existing non-Unicode Thaana text to Unicode - and you can do such conversion on-the-fly on page requests.

User's fix

I wrote a quick bookmarklet-based solution several weeks ago for my use after getting annoyed with having to open Internet Explorer to view pages from sites affected by this bug. This solution will, or rather should, work on any affected site and on any computer.

Simply right click on this link - Jaa's Thaana Fix - and select "Bookmark this link" from the drop-down menu. Alternatively, you can drag and drop the link onto your bookmarks toolbar. When you are on a page that is messed up by the bug, such as Miadhu Online, Vaikaradhoo Live or Kavaasaa, click the "Thaana fix" link on your Bookmarks menu or toolbar. You will need to do this for each page you view.

Happy reading :-)

Javascript Thaana Keyboard version 3.0

I released my Javascript Unicode Keyboard Handler for Thaana early this year as open-source software so that web developers producing Dhivehi websites can allow users to type Thaana straight into text entry fields without forcing them to switch keyboard using the relevant features on the user's computer operating system. The code has since made its way into many different Dhivehi websites. However, the code I released then was mostly as-is from it's original version which I had written back in 2003 which, sadly, means that its behavior could be a little bit unpredictable with certain modern browsers - especially Opera and Safari.

I've now rewritten the code with the intent of producing cleaner, easier-to-use code that works without fail on all modern browsers. This version is (more or less!) guaranteed to work, and has been tested, on Firefox 2+, Opera 9+, Internet Explorer 6+ and Safari 2+ and has also been tested on Windows, Mac and Linux operating systems.

I am a big fan of separating code from design, so in keeping with that ideal this new version uses a more modern way of assigning the Thaana keyboard functionality in favour of inline javascript event handling used by the previous version (look below for an example). Since everything needs a spunky name I've also changed the old name to the more descriptive "Javascript Thaana Keyboard", which future versions of the script will maintain.

As before, it is being released under the MIT License, which allows its use in both personal and commercial applications as long as the copyright and license permission notice remains intact - so what the guy at basfoiy.com has done is a definite no-no.

Usage:

1. Link the file in the HEAD section of the page:


2. For any text input element (i.e INPUTs or TEXTAREAs), assign them the class name "thaanaKeyboardInput". You can assign further classes to the elements without ill-effect, if needed.

3. Using CSS, set any Unicode-compatible Dhivehi font (and size) to be used for the fields. You can easily do that by adding a class definition for the "thaanaKeyboardInput" class or by any other method of your choice.

4. The Thaana functionality would be automatically applied to any elements with the required class name when the page is loaded!

Demo:

Check out the demonstration and testing page here.

Download:

- original full source version (7.34 KB)
- minified version (2.01 KB)
I recommend you use the minified version.

As always, drop a line here if you use it and/or have problems or suggestions. Enjoy. :-)

Update (20-Oct-2008): This version is now superseded by the new and improved v4.0.