The corpus vile for ElChipo's preliminary Brillographic studies will be the CD-ROM EI article on "‘Arabiyya, Arabic Language and Literature," corresponding to I:561b-603a. (Ohmygosh! 73 pages of it print out, and needless to say the konkly hogen-mogens don't number pages.)
The official position on what you are supposed/allowed to do runs like this in the README.TXT file, X:\EI\DATA\README\README.TXT (3,790 bytes, dated 7 Dec 1999): The copy-to-clipboard function does not preserve font information. So if pasting the selection into another application, such as Microsoft Word, it will be necessary to set the fonts manually. The fonts to use are Baskerville MT for Brill 00 and 02. The recommended approach is to first select all the text and change to font 02. Some characters, such as numerals, will then not display properly and these should be changed to font 00.
ElChipo rashly infers from this language that it is not impermissible to cut text out of the aforementioned double-dutch application and paste it into something sensible. Plus maybe look at it and make a few changes on the way?
This 02 font does an admirable limbs-of-Osiris impersonation,
But that's enough pretty pictures, first one more bit of comic relief, and now let us get to work.
This basic inventory of 42 special Brillglyphs takes account only of transliterated Arabic and Persian. The immediate object is a filter program that will preserve everything on the clipboard that is either English or transliterated Arabic and discard all the rest, or perhaps convert it to ???? ???? ???? modo Redmondico.
The EJB transliteration items may be divided into three classes, as follows:
(1) 18 Brillgraph letters These digraphs will be notated inside slashes, as /sh/. In the EI, they appear with both letters underscored. The Brillgraphs required are, in Arabic alphabetical order, / th dj ch kh dh zh sh gh /. Hence the individual characters needed are D4 /c+hachek/ 31* /C+hachek/ FE /d/ F2 /D/ FF /g/ F3 /G/ 8A /h/ F4 /H/ 9A /j/ F5 /J/ A6 /k/ F6 /K/ AF /s/ F7 /S/ B2 /t/ F8 /T/ BC /z/ FB /Z/ These Brillglyphs presumably occur only in Brillgraphs, but given upper- and lower-case, each Brillgraph comes in three possible forms, as for instance / SH Sh sh /. === (2) 22 Brillblobs These are single letters with diacritics. The vowels have a macron, the consonants an infralinear dot. B6 \a\ 24* \A\ B7 \d\ 34* \D\ BF \e\ 36* \E\ C8 \h\ 5C* \H\ CA \i\ 5E* \I\ A6 \k\ 7C* \K\ D9 \o\ 88 \O\ DD \s\ 91 \S\ E3 \t\ 95 \T\ E5 \u\ 97 \U\ ED \z\ A3 \Z\ === (3) 2 Miscellaneous 22* ’alif == smooth breathing 23* &ayn == rough breathing === Valid ASCII in "Baskerville MT for Brill 02" 20 21 28 29 2C 2E 3A 3B 3F 41-5A 61-7A Everything in that list is passed through unchanged. All the codes above are mapped to Dushizat or Lunicode or some known thing. Everything else should be deleted, EXCEPT contiguous bytes in the 30-39 area, which are 98% certain to be digits and not Brillguff.
A puzzle about the digits. An adjoining parenthesis hardly means 0x3? can't be some weird Brillglyph. Still, it would probably be best to guess they are digits always, except maybe with '4', which is the upper-case D with an underdot. Also, they use the Codepage 1252 SHACHEK/shachek