[html4all] Content-type negotiation as an alternative to ALT, LONGDESC and fallback
Leif Halvard Silli
lhs at malform.no
Mon Aug 27 17:51:42 PDT 2007
Hi Rob, [a long letter again]
2007-08-27 01:38:15 +0200 Robert Burns:
>> Could you expand on what you mean by «within HTML»? And do you see this
>> as a method for bridging the online/offline gap for authors?
> [...] but the file format itself should be treated as authoritative
> (this contrasts with the HTTP specification that wants to treat
> whatever the server says as authoritative).
So, if the content says "XHTML", you could not serve is as text by setting extensions to .txt?
> So in moving alternate equivalent fallback handling from within HTML markup
> (using elements and attributes) to the HTTP server through content
> negotiation is a complication (perhaps needless complexity). So my
> inclination is to try to make everything work with a single file (with no
> server overhead).
I suppose you want to link, via elements and attributes, to the alternate content. And not keep it inside the file.
> Then the server can provide header information and content
> negotiation that reduces network traffic, but it shouldn't be treated as
Above you said the file extension should not be authorative - for the server, I guess. Here you say that the header info from the server should not be authorative - by the UA then?
> 'Consider this example.
> 1) [...] the content declares its type [for the server] [...]
> 2) [...] browser [...] users environment to negotiate [...]
> Using server headers in this situation is very efficient. It allows the UA
> to query the server to get at the exact files it needs. If it could not
> perform these queries through content negotiation, the UA would need to
> discover and download one file after the other to see what files it needed
> to satisfy the user's needs. That leads to much more network traffic.
Is the only difference - in theory - from todays situation, that the server does not look at the extensions, but at the head section of the files instead?
Possible cons: How do you preserve cool URIs in this situation? URIs which are just as cool whichever version of the content the users is served?
I ask because: Since we're both know Mac OS 9, we know about free form file names there. But this freedom comes at a cost. The user must invent those extensions instead. Of course, the user/author can choose to follow those conventions anyhow, but if there is no befenefit, other than the private order for the author, then we will end up having "book.mp3" as alternetive content for "article.mov" - instead of "article.mp3" for "article.mov". E.g. when the users/readers look at the filename, they will wonder if it is really meant to be the same thing. The goes if "book.ru" is supposed to be replacement for "article.en".
> So to me the server's ability to negotiate content and provide informative
> headers is great. However, the insistence that these headers must be
> authoritative is a problem because its easy to mis- configure a server or
As a consequence of the heaeders not needing to be authorative, what would we get? The headers says this file is Russian, but your browser know better? How? When it has loaded the file?
> otherwise make mistakes in doing so. Also for an author that specifically
> wants a file treated as another type (like treating an HTML file as raw
> text), such authors may not have control over the server or even know where
> the document will be hosted. T o me this suggests it is better — in
> handling the separation of concerns properly — to place as much as
> possible within the document itself. The XML norm for an XML declaration
> that includes character encoding is one example. The existence — in
> formats such as PDF and PNG and many others — of a few bytes at the
> beginning of the binary file that show up as 'pdf' or 'png' is another
Forgetting the content negotiation issues for a while (they mostly arent so relevant when it comes to choice between encodings): So the server picks this from the file. Good, mostly, except when the author wants to overrule this info. (For many file formats, the author has no practical way to edit this information - unless file extension menthos is available, which is probably one of the reasons extensions is usually preferred.)
> Using the file format itself to specify its own metadat is a much safer
> path. It ensures that there's no separation between what a file is and the
> metadata that describes it.
The «safer» argument is only true if the author has difficulties in affecting this info. It is not difficult providing wrong charset info in the META element. It's syntax is also difficult to remember - unlike the extensions. And why is safer an issue? Perhaps simple (to change) is more important than safe (from changing)? Editors/Authors want it to be simple to edit.
The «freedom» of iTunes is that the name is meaningless, except to the machine itself - which only needs it for identifying the item as a independent «thing».
> For example, if you use iTunes you can actually
> move all of your files from one computer to another. This can be through a
> path that has no server or filesystem metadata. You can rename the files to
> pure gibberish. However, when you drag those files on iTunes, every piece of
> metadata about the file will be added to the new iTunes library (this
> doesn't count the filename itself which is usually not meaningful and
> unintelligible; it also doesn't include the playlists and perhaps album
> art). This is how I think it should be. The file is the authoritative source
> of the files metadata. Any extraction of that metadata — including the
> setting of the filename extension to '.ac4' — should be derivative.
Well, the meta info about the charset/encoding, is derivative whether it is given here or there. The same goes for other meta info. If iTunes is a pattern, then I suppose that you have bad luck if you change the extension from .ac4 to .mp3 or to .txt - it will then be read the wrong way. At least outside iTunes.
> So just to sum up what I"m saying. 1) File formats should include the
> authoritative metadata about themselves. 2) Transport protocols play a
> valuable role in reducing the amount of network traffic needed to
> communicate the essential pieces of metadata about a file.
> So what does this say about the server / local distinction. It says to me
> that whenever we can get data about the file within the file itself and not
> rely on a separate transport protocol, that's best. Whenever HTML can
> include the mechanisms and features to author a document that includes
> various fallback for accessibility and internationalization and
> bitmapped-based media resolution and bitdepth, that's better than relying on
> a separate transport protocol (the bitmapped-based media resolution may be a
> bit of an exception since, unlike the others, this is intended mainly to
> reduce unnecessary network traffic).
I understand that for a local file, the UA could load the entire page, and just render those parts of it that fits with the UA's user profile. But if your UA should both load, from the server, a text file and a audio file etc?
Oh, btw, even today, the LINK element can probably be used for some of this:
> Designates substitute versions for the document
> in which the link occurs. When used together with the
> lang attribute, it implies a translated version of the
> document. When used together with the media attribute, it
> implies a version designed for a different medium (or media).
But LINK is more meant as «LONGDESC»: If your UA support it, you can select an alternate variant manually.
I cannot see that real content negotion has to rely on content headers from the server.
> Now that doesn't mean it all has to be handled through a single flat file
> all through markup, but that can be nice and convenient. The other
> alternative is to specify a directory format that mimics the server
> content-negotiation process. By directory format I'm thinking along the
> lines of apple's directory package where an entire directory can appear as a
> single file in the filesystem and its contents are stringently defined by
> some specification (it doesn't have to appear as a single file however;
This sounds somewhat realistic - even quite good idea. But it seems somehow the opposite of what you said above to me. The languge project files/folders (or «localistion project files/folders?) is well known to me. And the localisation files there needs, as you say, to be placed within stringently named language folders.
A «Site folder standard» could be defined. And as long as you worked with this pattern, you could author selfcontained site, that could have content negotiation probably. Sounds great :-)
> OS X presents a .framework directories as if they're folders). Also the
> WebArchive format is a flat file that can really represent a complex
> hierarchy of files. So while these particular methods are a bit beyond the
> scope of our WG (except in specifying how UAs should treat these directories
> or file formats), we could specify fairly complete methods of authoring
> these things right in the HTML markup itself (using elements and attributes
> alone). So when I say within HTML itself, I'm mainly concerned with being
> able to <em>author</em> content with elements and attributes alone.
> Obviously we would have to specify how UAs handle that authored content,
> which we would be doing to specify how they handle a particular directory
> structure with filename extensions, but my preference is to do it with
> markup alone as much as possible.
>> Even with that link to the preferences, it is inconvenient to change the
>> preferred langugae *for the entire system* just because you are gonna test
>> the preferred language functionality of your pages. I'd say that that
>> isn't a very workable solution at all.
> I somewhat understand what you're saying, but I don't think that's a huge
> problem. As long as Safari picks up on the changes immediately on a reload
> testing should be easy (I haven't tested that though). As long as the author
> switches it back before launching another application, it will not effect
> any other applications.
It is mostly a (severe) inconvenience. There is little that hinderes them from making it just as easy to switch language as e.g. UA identity.
>> In addition, the OS X language preferences system is not perfect.
>> (Speaking for my own sick mother, they have not built in support for more
>> than one Norwegian language variant - while we have two official
>> standards. Wheras Firefox let me select langauge freely - and offers many
>> more languages, I think - and also the option of setting up my very own
>> langugae code.)
> This does sound like a bigger problem. User's cannot add fine-grained
> language preferences in the system preferences. I'd like to see that fixed
> for the whole system. My sense is that Apple only listed languages that it
> tries to provide forgetting that other developers (and HTML authors , for
> example) might provide others.
I have reported a bug about it ... I should do som follow-up on it ...
And take note, Charles, if you read this. Opera supports both forms of Norwegian - on the Windows platform at least. But on the Mac, one cannot event get it to prefer Norwegian Nynorsk. Becuase of the bug mentioned above.
>> Think of how fast you can change to IE identity in Opera, for instance! It
>> should be that easy to regulate the entire "UA profile".
> So you're thinking here about a feature for authoring and testing then,
Allthough some aspects of it could be usefull for normal users as well, probably. For instance if you can change your browser to prefer audio over text over image etc, then many AT users should need that.
>>> Like you suggested Lefi, this could be in the form of listing
>>> different media types for sub-resources: primarily just ranking
>>> video, audio, and text, I think
> Of course the default would be more like: video, image, audio, text.
Probably. For sighted ;-)
>> Should that list not be taken from the Media Descriptor list , which
>> has screen, tty, tv, projection, handheld, print, braille, aural? Hm ...
>> perhaps not.
>>  <http://www.w3.org/TR/html401/types.html#h-6.13>
> My sense is that this should be more focussed on MIME types and mostly on
> the primary MIME types video, audio, text and image (which I forgot to list
> last time).
I think you are right.
>>> I think the iCab approach to solving this is interesting. Extending
>>> that to actually select the the appropriate file from among
>>> several files in the same local directory would not be out of the
>>> question. In this way, typing an URL into the browser and leaving
>>> off parts of the filename extension could still select the correct
>>> file based on user preferences (and similar for files referenced
>>> within the main document).
> It could even be extended to the OS level where clicking on a file that had
> a readily available equivalent with a different language the user preferred
> could b presented too. Or double-clicking on an image file and the user is
> notified that an equivalent aural description is available (based on
> filename extensions or other metadata).
You mean just like one get the application in the language that corresponds to the preferred languages, I suppose. Getting a choice all the time, would be inconvenient. If the right choice isn't automatically «served», then one would have to eithe reconfigure one's preferences or one could get access to another language from within the application.
>> However, [a Mac OS X Web Archive] only solves the problem of saving
>> something from online to offline. Editing is more what we are
> No, actually a Mac OS X Web Archive can be used for authoring too. It can
> be opened and saved with any WebKit based authoring tool.
OK. Did not know this. Neither Mail nor Safari are my fav tools.
>> Again, not sure how you define «within HTML markup». Do you mean simply
>> «as text within the HTML document itelsf»? Or do you mean an attribute
>> that tells which version are available?
> It's probably clearer after all of my exposition above, but I do mean
> including as much of the alternatives in a single document: whether that's a
> web archive or XML or HTML.
Yes and no. If wee really want to have it «within the document» we have to go for dataurls. And serving Zip files (which web archives often are) is a new thing for me.
> This just simplifies matters by keeping the
> tightly related content in a single file. Likewise a directory structure
> could be defined — including one based on filename extensions) — but
> that makes things more complicated.
As noted, the directory structure is what I could belive in.
> The opposite approach could make the
> single files easy for authors to work with and have the sever extract the
> necessary piece or pieces to sen over the wire to the UA.
Well, it could be an idea.
>> I understand the problem you want to solve as «how to enable content
>> negotiaton without relying on the server». Or is it more spesificly «how
>> to solve this problem just by markup»?
I now see that you offered both variants above ...
>> Also, I think of our debate in
>> Public HTML, then you mentioned charset sniffing (by the servers - and
>> ultimately also by the browser) as one road to take.
>> Something that «just works» is of of course an expert's greatest fear -
>> he isn't needed anymore. But I really wonder about how you can specify
>> alternate content within the markup itself.
> Well, I think the proposal Sander made or something like that could work
> (even for alternate languages). I think the directory approach could work
> too, but its more complicated for authors (file can get separated and
> misplaced). A Web Archive like approach could also work if HTTP servers
> could be made aware of the alternates within the same file and extract them
> for delivery to the client. In this way would would change both the local to
> be more server like and the server to be more local like.
I was not prepared for Sander's - or your - ideas. :-) (But I also had my own idea to present). Now I understand them somewhat better.
> : <http://esw.w3.org/topic/HTML/ABetterAlt>
I guess this was a leftover reference from something else you wrote ;-)
More information about the List_HTML4all.org