Home News About WinEdt Downloads Installing Support Registration Snapshots Contact

Uniscribe Library at work

WinEdt has been a native Unicode application for many years (since version 7, to be precise). However, until WinEdt 10 it lacked the ability to properly handle complex languages that involve bidirectional text, non-spacing characters, complicated wrapping rules, etc... WinEdt 11 uses a powerful MS Uniscribe library which gives it the ability to overcome these limitations. This HTML document (written in WinEdt) illustrates the new functionality.

Pangrams that were used to test the new WinEdt scripting engine with various (complex) languages were obtained from different sources on the internet. I do not speak any such language and I cannot guarantee that they are correct (or even that they may not be found offensive by some users). If native speakers spot any mistakes or have suggestions for improvements please send me your revisions and I will make updates in the future versions.

If you open the HTML source for this page in WinEdt 11 you'll notice that it justifies paragraphs to the specified right margin (rather than leaving ragged text). You will also notice the mix of two fonts (Tahoma and Consolas) and the use of fallback fonts whenever the default font lacks glyphs for a particular script. This functionality is new in WinEdt 11.

Below is a snapshot of WinEdt working on this HTML document (note that selected text in bidirectional string is not continuous):

WinEdt in action


Bidirectional text

We start with the most challenging complex text processing examples: handling a mix of right-to-left and left-to-right strings (a much demanded functionality lacking in the previous versions of WinEdt).

Arabic

صِف خَلقَ خَودِ كَمِثلِ الشَمسِ إِذ بَزَغَت — يَحظى الضَجيعُ بِها نَجلاءَ مِعطارِ (A poem by Al Farāhīdi) هلا سكنت بذي ضغثٍ فقد زعموا — شخصت تطلب ظبياً راح مجتازا اصبر على حفظ خضر واستشر فطنا، وزج همك في بغداذ منثملا نصٌّ حكيمٌ لهُ سِرٌّ قاطِعٌ وَذُو شَأنٍ عَظيمٍ مكتوبٌ على ثوبٍ أخضرَ ومُغلفٌ بجلدٍ أزرق ... (this example illustrates proper justifying of arabic glyphs -- inserting white spaces would not cut it for this script!)

Hebrew

Urdu

ٹھنڈ میں، ایک قحط زدہ گاؤں سے گذرتے وقت ایک چڑچڑے، باأثر و فارغ شخص کو بعض جل پری نما اژدہے نظر آئے۔
ALA-LC: Ṭhanḍ meṉ, ek qaḥat̤-zadah gāʾoṉ se guẕarte waqt ek ciṛciṛe, bā-ʾas̱ar o-fārig̱ẖ s̱ẖaḵẖṣ ko baʿẓ jal-parī numā aẕẖdahe naz̤ar āʾe.
Translation: In the cold, passing through an arid village, an irritable, influential and leisurely person saw some mermaid-like pythons.

ژالہ باری میں ر‌ضائی کو غلط اوڑھے بیٹھی قرأة العین اور عظمٰی کے پاس گھر کے ذخیرے سے آناً فاناً ڈش میں ثابت جو، صراحی میں چائے اور پلیٹ میں زرده آیا۔

Uyghur

ئاۋۇ بىر جۈپ خوراز فرانسىيەنىڭ پارىژ شەھرىگە يېقىن تاغقا كۆچەلمىدى.
Uyghur Latin Script: Awu bir jüp xoraz Fransiyening Parizh shehrige yëqin taghqa köchelmidi.
Translation: Those two roosters were not able to move to the mountain near Paris in France.

زۆھرەگۈل ئابدۇۋاجىت فرانسىيەنىڭ پارىژدىكى خېلى بىشەم ئوقۇغۇچى.
Uyghur Latin Script: Zöhregül Abduwajit Fransiyening Parizhdiki xëli bishem oqughuchi.
Translation: Zöhregül Abduwajit is a quite unpleasant student in Paris, France.

Complex languages and fallback fonts

This section illustrates some other complex languages (in no particular order) using diacritic marks, non-spacing characters and fallback fonts (since most fonts lack many glyphs for exotic languages).

Thai

Korean

In current usage, Hangul has 14 simple consonant letters, 6 simple vowel letters, and 4 iotized vowel letters; there are also 5 double consonant letters, 11 consonant clusters, and 11 diphthongs, made from combinations of the simple consonants or simple vowels. Of these, the above phrase contains all the simple consonant letters, simple vowel letters, and iotized vowel letters, along with 1 double consonant letter (ㄲ “gg”), 1 consonant cluster (ㄶ “nh”), and one diphthong (ㅢ “ui”).

Chinese

There are several thousands of Chinese characters; a pangram would be impractical.

Japanese

Since there are tens of thousands of kanji characters, Japanese pangrams are ones containing all kana.

Iroha Uta

The poem Iroha uses all 47 classical kana characters exactly once, and it comes in the form of a poem. (The characters ゐ and ゑ are obsolete in modern Japanese.) Iroha is so classically entrenched that any modern construction of a Japanese pangram in classical form is called iroha-uta.

Tori Naku Uta

Ametsuchi no Uta

Taini no Uta


More Scripts...

Using fallback fonts the following examples can be handled by WinEdt reasonably well...

Malayalam

Cherokee

Hindi

Sanskrit

Tibetan

Myanmar

Javanese

This poem is used as the ordering of the Javanese script (it is a perfect pangram, which means there is only one instance of each letter).

WinEdt will not handle this script well (in combination with other fallback fonts) because WinEdt currently uses a fixed line height and the descent for this font requires a larger value or else the font is too small to be legible (or else the decent would have to be clipped)...


Problematic scripts...

Windows fonts may not contain glyphs for some scripts. This may change in the future...

Klingon

On my Windows 11 no font contains glyphs in this range (browser fails to display them as well):


Not so complex...

Below are a few more examples involving Greek and Cyrillic alphabet. Ordinary unicode functionality (without complex processing) is sufficient to handle these scripts.

Greek

Russian

Traditional telegraph test; lacks ъ and ё): The same text using quasiobsolete spelling for last word to include ъ: Each letter exactly once: Each letter exactly once: Microsoft used it in fontview.exe for Cyrillic fonts without «же»: Used in KDE: Lacks ъ and ё:

Easy breezy...

There are hundreds (if not thousands) of pangrams in many languages. However, most are based on phonetic Latin alphabet with a few language-specific diacritical marks (included in unicode) and do not require any complex text processing functionality. Pretty much all European (and many other) languages fall in this category...

Esperanto

Latin

Includes the letters k, y and z, used for words derived from Greek, but not the letters j, v or w, consonants that evolved from the vowels i and u.

Windows and Unicode

 Unicode for Windows is a project in progress...
 On Windows 7 some scripts (above) will not display due to the lack of fonts
 or glyphs. On Windows 10 and 11 they will display correctly.

 The following 20 script tags are defined on Windows 10 but currently no font
 that comes with Windows 10 has glyphs for them (this may change in the future).
 On Windows 11 Sans Serif Collection font has glyphs for some of them...

   SCRIPT="bali" // ᬅᬆᬇᬈᬉᬊᬋᬌᬍᬎᬏᬐᬑᬒᬓᬔᬕᬖᬗᬘᬙᬚᬛᬜᬝᬞᬟᬠᬡᬢᬣᬤᬥᬦᬧᬨᬩᬪᬫᬬᬭᬮᬯᬰᬱ
   SCRIPT="batk" // ᯀᯁᯂᯃᯄᯅᯆᯇᯈᯉᯊᯋᯌᯍᯎᯏᯐᯑᯒᯓᯔᯕᯖᯗᯘᯙᯚᯛᯜᯝᯞᯟᯠᯡᯢᯣᯤᯥ᯦ᯧᯨᯩᯪᯫᯬᯭᯮᯯᯰᯱ
   SCRIPT="buhd" // ᝀᝁᝂᝃᝄᝅᝆᝇᝈᝉᝊᝋᝌᝍᝎᝏᝐᝑᝒᝓ
   SCRIPT="cham" // ꨀꨁꨂꨃꨄꨅꨆꨇꨈꨉꨊꨋꨌꨍꨎꨏꨐꨑꨒꨓꨔꨕꨖꨗꨘꨙꨚꨛꨜꨝꨞꨟꨠꨡꨢꨣꨤꨥꨦꨧꨨꨩꨪꨫꨬꨭꨮꨯꨰꨱ
   SCRIPT="hano" // ᜠᜡᜢᜣᜤᜥᜦᜧᜨᜩᜪᜫᜬᜭᜮᜯᜰᜱᜲᜳ᜴᜵᜶
   SCRIPT="kali" // ꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉ꤊꤋꤌꤍꤎꤏꤐꤑꤒꤓꤔꤕꤖꤗꤘꤙꤚꤛꤜꤝꤞꤟꤠꤡꤢꤣꤤꤥꤦꤧꤨꤩꤪ꤫꤬꤭
   SCRIPT="lana" // ᨠᨡᨢᨣᨤᨥᨦᨧᨨᨩᨪᨫᨬᨭᨮᨯᨰᨱᨲᨳᨴᨵᨶᨷᨸᨹᨺᨻᨼᨽᨾᨿᩀᩁᩂᩃᩄᩅᩆᩇᩈᩉᩊᩋᩌᩍᩎᩏᩐᩑ
   SCRIPT="lepc" // ᰀᰁᰂᰃᰄᰅᰆᰇᰈᰉᰊᰋᰌᰍᰎᰏᰐᰑᰒᰓᰔᰕᰖᰗᰘᰙᰚᰛᰜᰝᰞᰟᰠᰡᰢᰣᰤᰥᰦᰧᰨᰩᰪᰫᰬᰭᰮᰯᰰᰱ
   SCRIPT="limb" // ᤀᤁᤂᤃᤄᤅᤆᤇᤈᤉᤊᤋᤌᤍᤎᤏᤐᤑᤒᤓᤔᤕᤖᤗᤘᤙᤚᤛᤜᤝᤞ
   SCRIPT="mand" // ࡀࡁࡂࡃࡄࡅࡆࡇࡈࡉࡊࡋࡌࡍࡎࡏࡐࡑࡒࡓࡔࡕࡖࡗࡘ࡙࡚࡛
   SCRIPT="mtei" // ꫠꫡꫢꫣꫤꫥꫦꫧꫨꫩꫪꫫꫬꫭꫮꫯ꫰꫱ꫲꫳꫴꫵ꫶
   SCRIPT="qavs" // ︀︁︂︃︄︅︆︇︈︉︊︋︌︍︎️
   SCRIPT="rjng" // ꤰꤱꤲꤳꤴꤵꤶꤷꤸꤹꤺꤻꤼꤽꤾꤿꥀꥁꥂꥃꥄꥅꥆꥇꥈꥉꥊꥋꥌꥍꥎꥏꥐꥑꥒ꥓
   SCRIPT="samr" // ࠀࠁࠂࠃࠄࠅࠆࠇࠈࠉࠊࠋࠌࠍࠎࠏࠐࠑࠒࠓࠔࠕࠖࠗ࠘࠙ࠚࠛࠜࠝࠞࠟࠠࠡࠢࠣࠤࠥࠦࠧࠨࠩࠪࠫࠬ࠭
   SCRIPT="saur" // ꢂꢃꢄꢅꢆꢇꢈꢉꢊꢋꢌꢍꢎꢏꢐꢑꢒꢓꢔꢕꢖꢗꢘꢙꢚꢛꢜꢝꢞꢟꢠꢡꢢꢣꢤꢥꢦꢧꢨꢩꢪꢫꢬꢭꢮꢯꢰꢱ
   SCRIPT="sund" // ᮃᮄᮅᮆᮇᮈᮉᮊᮋᮌᮍᮎᮏᮐᮑᮒᮓᮔᮕᮖᮗᮘᮙᮚᮛᮜᮝᮞᮟᮠᮡᮢᮣᮤᮥᮦᮧᮨᮩ᮪᮫ᮬᮭ
   SCRIPT="sylo" // ꠃꠄꠅ꠆ꠇꠈꠉꠊꠋꠌꠍꠎꠏꠐꠑꠒꠓꠔꠕꠖꠗꠘꠙꠚꠛꠜꠝꠞꠟꠠꠡꠢꠣꠤꠥꠦꠧ꠨꠩꠪꠫
   SCRIPT="tagb" // ᝠᝡᝢᝣᝤᝥᝦᝧᝨᝩᝪᝫᝬᝮᝯᝰᝲᝳ
   SCRIPT="tavt" // ꪀꪁꪂꪃꪄꪅꪆꪇꪈꪉꪊꪋꪌꪍꪎꪏꪐꪑꪒꪓꪔꪕꪖꪗꪘꪙꪚꪛꪜꪝꪞꪟꪠꪡꪢꪣꪤꪥꪦꪧꪨꪩꪪꪫꪬꪭꪮꪯꪰꪱ
   SCRIPT="tglg" // ᜀᜁᜂᜃᜄᜅᜆᜇᜈᜉᜊᜋᜌᜎᜏᜐᜑᜒᜓ᜔


What is a Pangram?

This section is borrowed from Wikipedia:

A pangram (Greek: παν γράμμα, pan gramma, "every letter") or holoalphabetic sentence for a given alphabet is a sentence using every letter of the alphabet at least once. Pangrams have been used to display typefaces, test equipment, and develop skills in handwriting, calligraphy, and keyboarding.

The best known English pangram is "The quick brown fox jumps over the lazy dog." It has been used since at least the late 19th century, was utilized by Western Union to test Telex / TWX data communication equipment for accuracy and reliability, and is now used by a number of computer programs (most notably the font viewer built into Microsoft Windows) to display computer fonts.

An example in another language is the German Victor jagt zwölf Boxkämpfer quer über den großen Sylter Deich, containing all letters used in German, including every umlaut (ä, ö, ü) plus the ß. It has been used since before 1800.

Short pangrams in English are more difficult to come up with and tend to use uncommon words, because the English language uses some letters (especially vowels) much more frequently than others. Longer pangrams may afford more opportunity for humor, cleverness, or thoughtfulness. A perfect pangram contains every letter of the alphabet only once and can be considered an anagram of the alphabet; it is the shortest possible pangram. An example is the phrase "Cwm fjord bank glyphs vext quiz" (cwm, a loan word from Welsh, means a steep-sided valley, particularly in Wales).