[html4all] pronunciation, homophones, and homographs

Sun May 25 16:09:03 PDT 2008

Hi Philip,

On May 25, 2008, at 9:58 PM, Philip TAYLOR (Ret'd) wrote:

>
>
> Robert J Burns wrote:
>> Hello 4all,
>>
>> [apologies for the long email]
>>
>> I've been doing some thinking about aural pronunciation of HTML and
>> homophones and homographs for some time now. [long snip]
>
> An interesting and provocative article, Rob : congratulations !
> I'll bounce some of your ideas off some of my non-native-speaker
> (of English) linguist friends and feed back any comments they
> may have.

Thanks for the reply. That's certainly a point-of-view I'd like to  
hear form.  suppose in many ways English has a greater need for a  
separate phonetic alphabet than other languages, but when someone  
wants to encode phonemes from languages other than their primary  
language, it makes sense to me that they would want to use graphemes  
derived from their own familiar primary language. One example I can  
think of is even the use of a Latin Letter H for an unvoiced glottal  
fricative which matches how that letter is often used in English.  
However, for someone whose primary language is Spanish a Latin letter  
J may make a better mnemonic. Likewise for speakers of Arabic, Urdu,  
Korean, Hebrew, etc. Using appropriate mnemonic graphemes for each  
language makes the use of a phonetic alphabet easier and more likely  
to be widely adopted.

Another important issue is that phonetic alphabets change over time —  
sometimes swapping one grapheme for another in the representation of a  
particular phoneme. By encoding the phonemes themselves (and not the  
graphemes representing the phonemes), the changes to a phonetic  
alphabet can be handled by updating fonts while maintaining the text  
document completely unchanged. Similarly, a user can change from one  
phonetic alphabet to another simply by changing fonts (like from the  
IPA to the Uralic Phonetic Alphabet). Also, input systems can be  
localized to the users needs so that a user may use their usual  
keyboard or a character input palette depicting the graphemes familiar  
to their primary language even while the input system is actually  
inputing phoneme characters and not grapheme characters.

Finally, I think this could lead to better international interchange  
of phoneme text. Every user can view a phoneme text document in the  
phonetic alphabet they're most familiar with. I open the document on  
my computer and due to my user defaults, the OS automatically  
associates an IPA font with the phonemes and I see IPA phonetic  
alphabetic graphemes. However, someone else opens the same document  
and they see the graphemes corresponding to the phonetic alphabet for  
their user defaults. For others, they simply hear the synthesized  
speech uttering the phonemes.

As I said before this is a departure for Unicode that I expect would  
face some resistance. Unicode has, up until now, been focussed  
exclusively on encoding graphemes as characters: they might have  
trouble even thinking about a character as a phoneme (and not a  
grapheme). However, I think this is a natural evolution for Unicode  
and since it wouldn't need to use any of the precious basic  
multilingual plane code points, it shouldn't be much of a burden to  
devote maybe 512 code points to phonemes out of the 800 thousand code  
points still available for assignment in Unicode.

Take care,
Rob