[html4all] Linguistic variants in a single document ?

Robert Burns rob at robburns.com
Tue Sep 25 16:38:47 PDT 2007


Hi Gez,

On Sep 25, 2007, at 6:11 PM, Gez Lemon wrote:

> Hi Rob,
>
> On 25/09/2007, Robert Burns <rob at robburns.com> wrote:
>> I don't think this requires any additional markup. This is something
>> that could be handled nicely by a browser without any other markup.
>> Especially at the word level it would simply involve substituting
>> British spellings for US spellings or vice versa. For some of the
>> phrases Phil cited, that might be a bit more complicated, but still
>> possible with no markup. A simple checkbox to indicate the user
>> wanted spellings in the preferred language would be the only thing
>> necessary in terms of user interaction.
>
> I can see how that might work at the word level, but the biggest issue
> with catch-all approaches is that they tend to make nonsense of
> edge-cases. For example, if a web-page was to state something simple
> like,"Americans spell colour as color", the resulting text for an
> American would be, "Americans spell color as color", and for a Brit it
> would be, "Americans spell colour as colour"; neither sentence makes
> sense.

That was certainly a use case I had not considered. However, I don't  
think its necessarily all that unusual to require new markup  
facilities. The translation/transliteration between GB and US could  
occur only on the document level. Other language declarations at the  
phrase level or the paragraph level should continue to be respected.  
Quotations even on the block level and phrase level should probably  
be respected as well. So in the use case you cite above an American  
might se the sentence as:

"Americans spell color as color"

while a British user would see:
"Americans spell colour as color"

While the former may be a little strange for an American to read,  
presumably it would be clear from the context. Also an author could  
markup the words color and colour with explicit <span xml:lang='en- 
US'>color</span> and <span xml:lang='en-GB'>colour</span> and it  
would display fine. Again it wouldn't require any additional markup  
facilities, but it might help to define something in a spec. This has  
got to be a pretty rare use case i would imagine though.

> It's difficult to determine intent without a directive to be sure. You
> could always use a directive that instructs the user agent to leave
> the phrase alone, which might help for some edge-cases, but you're
> still back to depending on markup for understanding. Ignoring markup
> completely is similar to the idea that a machine can scan an image and
> come up with appropriate alternate text in the absence of
> author-provided alternate text. A machine could probably recognise key
> features from an image, but it's unrealistic to think that a machine
> could reliably come up with something concise that makes sense of the
> image in the context the author wanted to impart; only an author can
> provide that level of information.

I agree here. Obviously the use of images is much more common than  
the use of language discussing various spellings in different regions  
of the World. I'm not sure why the calls for heuristics are ever  
related to consumer UAs and not only to authoring UAs. Once the  
heuristics are proven over a few decades in authoring UAs we might  
want to talk about moving those heuristics to consumer UAs, but we  
shouldn't be dropping things from the HTML language because we think  
those heuristic methods will materialize somehow.

Take care,
Rob




More information about the List_HTML4all.org mailing list