ۥ-/@ -1CDP\ASNeNeeNeNeNeNeNsNXNNNN NNNN|yOyO(OOOOOOOOOOOOOO4OOOeNOO"$ElChipabic Transliteration From Scratch 25 Mar 01 / 02 Apr 01 (( Prolegomena Although I probably won't be consistent, my theory is as follows: (1) slashes frame transliteration of marks: /!l%rby+ !lfSHe/ (2) angle brackets frame letters or keyboard keys: <%> is upper-case <5>. (3) square brackets frame transcription (sc. of pronunciation): [alarabi:yatulfua:] The little bit of font-dependent transliteration here depends on the font called simply "Dushizat." )) Let us start with what is obvious. The following twenty Latin letters have pretty well predestined values, no? a b d f h i j k l m n p q r s t u w y z (/p/ gets in because when we say "Arabic" we mean "Arabic not without Persian.") That beginning gives us half of the consonants (17 of 34) and a fifth of the diacritics (3 of 14). Since those numbers may not be quite obvious, here's the target alphabet and diacritic list written down more or less the Eurolearned way: (alif) b t th p j ch kh d dh r z zh s sh gh f q [24] k g l m n h (t marbta) w y (alif maqra) (tawl) [11 + 24 = 35] a i u ("dagger alif") an in un (sukn) (shadda) [9] to which must be added the Bermuda triangle of Arabic orthography: hamza alone hamza above hamza below mdda wala [5 + 9 = 14] 49 items, and not one of them can be done without. (He announced unilaterally.) That is the repertory they actually write with. That is what needs to be transliterated. I have already prejudged one question here and done so pretty much contra mundum. I have treated hamza as a pure diacritic, as if it could appear above or below letters at all. It doesn't. If we do things the world's way, though, we need even more apparatus, because then instead of those last five items we need at least eight: hamza alone alif with hamza on top h with hamza on top ww with hamza on top alif maqra with hamza on top alif with hamza underneath alif with mdda alif with wala The world's way is right, I think, if it is a matter of preparing an Arabic font. But that is not what we are doing. The transliteration of a foreign language is not ordinarily conceived of as a set of instructions to the print shop. The "world" I just spoke of is the computer world. The learned world also wants to do things wrong, but they want to err differently. They want to write, say, wuzaru "viziers" amrun "a prince" imratun "a principality" rafun "merciful" qilun "saying" (participle) kalahum "he fed them" That plan makes their slender little Greekling smooth breathing -- so small you hardly notice it -- stand for six different things. It would be nice if Arabic were in fact written so that this sort of transliteration is accurate. In that case poor students wouldn't want to hoot when the textbook says that verbs with hamza are "basically conjugated regularly." However it is not the duty of a transliteration to impose spelling reform on its original. ("Dr. Edward ad, please call your office.") Quite apart from political correctness, there are real orthographical correctness problems with this plan. The learned will, needless to say, instantly know how the original spelling would look in all six cases, and they can teach their students the rules too. Unfortunately, however, you can't apply those rules unless you really know what word you are writing. Vowels and all. That is to say, the learned sort of bad transliteration only works if you are aways writing fully vocalized text. (I suppose this is Wicked Orientalism again, really: the natives certainly ought to write all their vowels all the time like more normal people do. But it may just be Western Academic Showing Off: "Doesn't everybody know all the vowels in all the words?") In any case, erring with the learned here totally suppresses the fact that the most common spellings of these words in fact go like in a purely mechanical rendition: wzr!c !myr !m!r+ rwwf q!el !klhm If you didn't know the words and the spelling-of-hamza rules, could you ever guess from that data and those transcriptions what the smooth breathing is supposed to signify? You'd just utterly confound alif and hamza, I think, and even then, you would be able to make no sense of /rwwf/ at all. That /rwwf/ raf is a nice shibboleth. The problem for a transliterator is to find some way of representing it that makes sense no matter how lightly or heavily pointed it is. Something like ElChipo's /rwwf / /rw'wf/ /raw'uwf/ perhaps? Where were we? Oh yes, we had solved seventeen consonants and three diacritics, or rather, we had observed that they aren't problematical in the first place. The only problem is that some people madly think that or ought to be used for alif. This is nonsense, of course, however correct paleographically. If means alif, what they are going to write fata with, I cannot imagine. Even if they were never going to write any short vowels at all, for alif would seem pretty dreadful. The Phoenician epigraphers' smooth breathing would be much more satisfactory, since at least that doesn't pretend to be a vowel. Can we use the smooth breathing to mean alif, then? I hardly think so. The same problem arises as with for alif: if we use it that way, we cannot use it for something else with stronger claims, in this case hamza. In fully vocalized text, writing a long [a:] vowel as /a'/ would be very misleading, no matter what we wrote for the sequence alif-short a-hamza. Would you believe /sa'la/ [sa:la] "he poured" as opposed to /sa'ala/ [saala] "he asked" -- no matter what mark you use instead of <> in the latter? It seems clear that <> has to be hamza just as has to be fata. And therefore alif must be something different from either. Like for instance , which looks tolerably like its original. The learned will dislike this plan, because another fact about the actual script that they want for some reason to suppress is how the long vowels are really written. This is not a doubtful matter: /a!/ and /iy/ and /uw/ when fully vocalized, and plain /!/, /y/, /w/ the other 99.44% of the time. The learned dearly want their macrnes, although using them completely misrepresents the Semitic situation. If you grant them their and and , you've allowed them to stopped transliterating and started stealthily practicing spelling reform once again. Long vowels in Arabic involve a fully credentialed extra letter thrust into the word, not a diacritic placed over something else that wasn't there in the first place. I don't myself see why /ka!tib/ /maktuwb/ /taktubiyna/ are so horrendously ugly as compared to the obvious Latinizing grace and beauty of /ktib/ /maktb/ /taktubna/, and even if I did see it, I'd probably argue that it is a very salutary ugliness, one that reminds us that 1000 Arabs in every 1,001 will think any representation of their language but the received one more or less grotesque. (Grotesque bordering on blasphemous, possibly.) Of course I've got my prejudices too. For example, /kaatib/ /maktoob/ /taktubeena/ drive me up the wall, but I think that is because these spellings, at least the last two, make sense only in English and somehow therefore have the flavor of ain't and y'all and the like. In any event, they obviously have no more to do with the standard spelling than the Faculty Club version does. So then, alif is and hamza (above a letter) is <'> and fata is . Not negotiable. The long vowels are /a! - iy - uw/. Not negotiable. That much settled, the biggest outstanding class is the velarized consonants, usually and . There is nothing wrong with writing them that way except that it requires very unusual fonts. But what other ways are available? Well, I can think of two. How about (1) D H S T Z or (2) dx hx sx tx zx, for a start? The first is ElChipo's traditional ASCII way. It obviously entails that the transliteration will not have upper-case versus lower-case any more than the standard script does. ElChipo considers that a merit, but the rest of the world seems to find caPitAl letters in the midDle of words very alarming and unsightly. Probably nobody will like the second proposal either, although if a good Muslim language like Albanian has , why shouldn't we allow ourselves ? Certainly that way the Latin-is-lovely crew can impose their first and second class distinction: Dxaraba Hxasan.un sxadiyqana! (or even /sxadiykxana!/ if you must!) !"in:a (!"In:a) sxadiyqana! dxarabahu Hxasan.un or whatever. I attest that one can get used even to things like Przybyszewski, and after all, isn't quite as far out as . It need not of course be /x/ that marks the spot under the velars of Academabic, BUT there is one major restriction on the choice of it that ought to be imposed, namely that it occur only as the second half of a digraph and have no independent meaning. This important rule brings us to the great H problem. ch dh gh kh sh th zh seem pretty inevitable, but since unfortunately plain is even more inevitable, they will undoubtedly have to go. If you think only of fully vocalized Arabic, the necessity is not altogether apparent, but if you think of reproducing exactly what the Arabs themselves write, there will be puzzles about how things divide up almost every third time a noun or verb takes a third-person suffix. The whole dash-H crew must walk the plank. Of the alphabet that we haven't used, and and and would all be bizarre as digraph-makers, and is bound to be used either for an obvious Arabic letter or an obvious Persian one. So it will have to be something extra-alphabetic. My first thought is the underscore <_>, but it is already spoken for, once we agree that handling /tatxwi___y___l/ is not a luxury. A case might be made out for c' d' g' k' s' t' z' . That's how the learned transfudge palatization in Russian, and it doesn't conflict with the hamza meaning of <'>, since that requires only the unambiguity of !' h' w' e' On the whole, I think it will do. We haven't agreed that is the /!'alif maqsxuwra+/ yet, but the choice is pretty inevitable. Writing /banae/ "he built" looks pretty good and is probably sound as historical linguistics to boot. I've already used <+> for the /ta!c marbuwtxa+/ several times, and I think it will do. It look quite like a and if it is used to write mathematics, it can have spaces on either side of it. (Items that want to be written with a are the worst problem: /ta!c t'a!c txa!c ta!c marbuwtxa+/. And beyond that, Urdu looms....) So where are we? The basic alphabet becomes ! b t t' p j k' hx c' d d' r z z' s s' sx dx tx zx (ayn) g' f q k g l m n h w y with the extensions + e !' w' e' &ayn may as well be <&>. It is far too important a sound to be represented by that negligible grave accent. That leaves with three problems about hamza before we get to the vocalization: hamza alone hamza below <"> mdda <@> Those are the traditional ElChipo values, picked mainly for mnemonic reasons. <@> looks like a big sort of ; you seen one quote mark, you seen 'em all; looks much like a hamza. might arguably do better for Persian /c'iym/. Unfortunately we cannot possibly use <'> for isolated hamza, no matter whether the apostrophe is used for the H-digraph crew or not, because cannot be permitted to mean both alif with a hamza on top and alif followed by a hamza on the line, both of which occur often enough. A case might be made for <`> (that semi-invisible grave accent) as the hamza-below marker. But see below... Oops. Before we get to vocalization, there is /s'ad:a+/, which has nothing whatsoever to do with vowels. As you can see, ElChipo writes a colon, which is used as a length marker by linguisticians. This is not 100% satisfactory, though, because colon used as punctuation invites confusion. ("Why not just double the consonant?," Dr. Pedant is likely to ask. Not a hard question. Because want to be able to write unvocalized Arabic without ambiguity, we who lack Dr. P.'s knowledge of all the vowels in all the words in /t!j !l&rws/. Case closed.) As to the vowels, ElChipo's A/I/U for nunnation will have to go, since we are to distinguish upper- and lower case; do not appeal. Trigraphs will do the trick nicely, <.an> <.in> <.un> could hardly be anything else. It probably doesn't matter too much where the alif of the accusative comes, / !'ahl.an! wasahl!.an /, as long as the three musketeers always band together (and as long as our users don't take to sticking periods into the middle of words for purposes of their own.) The "dagger alif" is traditionally <^> with ElChipo, and why change it? Doth it not direct thine eye and mind to Him Above? If it means exponentiation, put spaces on both sides of it. And now the rest is silence, or rather /sukuwn/ and /wasxla+/. El Chipo's traditional <#> has got to go. Even I admit as much. Since I've remarked how negligible that grave accent is, let us use it for this and the double quote for inferior hamza. ("Why not leave sukn out?", asks Dr. P. Not a hard question. We'll leave it out whenever the writers of what we transcribe left it out, and put it in whenever they put it in. Let Dr. P. go ask them why they spell as they do.) So finally, /wasx`la+/. It's tempting to recycle that <`> here too, since an alif with a sukuwn on it (without a hamza in between) would be a curiosity. Certainly the traditional ECdS <~> is, like <#> for /sukuwn/, far too prominent visually for what it means linguistically. (Perhaps <~> might do for the alif+wasla combination, though, and plain <@> for the alif+mdda one. Unlike hamza, these marks coccur with one proper letter only.) ===== Reconsidering a week after writing the above and looking at some whole paragraphs of the consequences, I have the following to add: (1) <&> for ayn is very problematical. Like that pestiferous <#>, it is simply too big and bold for everything else. ECdS has always allowed it to be optionally written as /o/, and that would on balance be better, I think. It has to be balanced against the reluctance of every Latin-script language I know of to use as anything a vowel. But if not /o/, then what? /c/ might do, but then what becomes of hamza sola? is also available, but recognizing the second caliph when dressed up as /Vumaru !`b`nu l`K'atx:a!bi/ would be a challenge. Whereas if he turns up as Oumaru, one could assume only a weekend in Paris.... Still, they did that sort of thing with Chinese, did they not? Adopt letter values that utterly thumb their noses at what native markers expect, I mean. (2) The trigraphs for nunnation are again too much claptrap for what they mean. Besides, those internal periods would prevent your average $1399.95 productivity suite software from counting the number of sentences correctly. An unthinkable interference with our betters. Perhaps we could import from Brazil and write digraphs /a~ i~ u~ /? I believe I may try that. (3) Of the systematic outrages, / c' d' g' s' t' z' / strike me as much less outrageous than / dx hx sx tx zx /. Possibly it is again simply a matter of how much ink is used. But my reactions may be untypical, since I am entirely used to using /x/ for and so tend to see a whole imaginary Arabic letter in such cases in a way normal people would not. (4) I have realized that admitting upper case and digraphs effectively removes half the point of the original ECdS scheme. Those marks were not only supposed to correspond to authentic Arab ones strictly, they were engraved on the keys which you hit to generate the Arab marks on the screen. (5) The real lesson of this scribble may be that what is proposed is not sensibly proposed in the first place. That is, there may be no way of representing Arabic in Latin letters (with or without strange diacritics) that will do in general. What is suitable for fully vocalized Arabic looks ill when used to write the language as the Arabs themselves write it, and any direct representation of the latter is likely to suggest the U CN LRN 2 TK SHRTHND & MK BG MNY sort of thing on the inside of matchbook covers. El Chipo de Silicio -- Normal W20 -- filename NEWXLIT.W20 -- Page PAGE1 st1@z~%'(*+-79go?B? F M N 9$F$6%:%x/~/2#2n6s6#7'799@;D;'?(?)?BCC C CCC!C"C&C'C(C)C/C1C  Z1C:C^` tvxz  %'y{ln  EG  6 X v 0 2 G I ` b  rt !!!!!!YFH 24 "OQ !!!!*!,!:!(>??@@BBBBBBBBBBBBB+C-C/C1C!!"!!!!!!1 @ A1C^HD+7AmQW1C:C"#-1C$%&|Times New Roman Symbol&Arial5Courier New GaramondDushizat New Roman Dushizat Zamzam TOLGreek%/;EJLW!C<C< C<="hT&T&VOMore Readable Elcheapabic?John H. McCloskeyJohn H. McCloskey