[html4all] Content-type negotiation as an alternative to ALT, LONGDESC and fallback
rob at robburns.com
Sun Aug 26 16:38:15 PDT 2007
[sorry for the long post; for those uninterested please just delete
(it's on the archives anyway):-)]
On Aug 26, 2007, at 4:52 PM, Leif Halvard Silli wrote:
> 2007-08-26 17:46:32 +0200 Robert Burns:
>> [Re: Content-type negotiation as an alternative to ALT, LONGDESC and
>> fallback; UA gathering of user preferences; bridging the HTTP
>> server / local
>> file system divide ]
>> I too was thinking this discussion was related to the HTTP
>> discussion going
>> on the WG list. One of the problems I see in both situations is
>> they both
>> try to rely on a particular transport protocol when I think HTML
>> should be
>> mored independent of that. I think Phil's idea is interesting as
>> measure more so than permanent solution. As I said before, I
>> would want to
>> make sure there was a way within HTML markup alone to accomplish
> If we think of the profile attribute as 'within' HTML, then what I
> said should definitively be 'within' HTML. (Unless you meant
> 'within/directly in the HTML document' - such as ALT.) Currently,
> as I understand it, PROFILE only allows ust to define «link types».
> It could perahps be expanded to also include the src URLs etc. Well
> - perhaps this has nothing to do with "link types" and another
> kind of attribute/profile is needed. However, I think it would be
> natural to think about a "local" profile and a online profile.
> Could you expand on what you mean by «within HTML»? And do you see
> this as a method for bridging the online/offline gap for authors?
Well, in relation to Phil's suggestion regarding alternate equivalent
fallback content, I think it is an ingenious idea to handle with HTTP
content negotiation. However, my inclination with this idea (just
like the discussion on public-html) is to try to avoid the extra
complexity that results from involving the server. As I've said
before, I believe the file format is the ideal place to keep
authoritative information about the files. The extraction of this
data for ready use in a file system or on a server (or for spotlight
searches) is very convenient, but the file format itself should be
treated as authoritative (this contrasts with the HTTP specification
that wants to treat whatever the server says as authoritative).
So in moving alternate equivalent fallback handling from within HTML
markup (using elements and attributes) to the HTTP server through
content negotiation is a complication (perhaps needless complexity).
So my inclination is to try to make everything work with a single
file (with no server overhead). Then the server can provide header
information and content negotiation that reduces network traffic, but
it shouldn't be treated as authoritative.
'Consider this example.
1) An author creates some content and places it on a server. All of
the content declares its type (and character encoding where relevant)
in the first few bytes of the file. These starting bytes of the is
the authoritative statement about the file's type and encoding.
2) A user enters an address into a browser. The browser uses
information about the users environment to negotiate the content with
the server. This could include the language of the document, the
types of files; the resolution and bit-depth of bitmapped media; the
user's preference for video, audio or text (where relevant).
Using server headers in this situation is very efficient. It allows
the UA to query the server to get at the exact files it needs. If it
could not perform these queries through content negotiation, the UA
would need to discover and download one file after the other to see
what files it needed to satisfy the user's needs. That leads to much
more network traffic.
So to me the server's ability to negotiate content and provide
informative headers is great. However, the insistence that these
headers must be authoritative is a problem because its easy to mis-
configure a server or otherwise make mistakes in doing so. Also for
an author that specifically wants a file treated as another type
(like treating an HTML file as raw text), such authors may not have
control over the server or even know where the document will be
hosted. T o me this suggests it is better — in handling the
separation of concerns properly — to place as much as possible
within the document itself. The XML norm for an XML declaration that
includes character encoding is one example. The existence — in
formats such as PDF and PNG and many others — of a few bytes at the
beginning of the binary file that show up as 'pdf' or 'png' is
Using the file format itself to specify its own metadat is a much
safer path. It ensures that there's no separation between what a file
is and the metadata that describes it. For example, if you use iTunes
you can actually move all of your files from one computer to another.
This can be through a path that has no server or filesystem metadata.
You can rename the files to pure gibberish. However, when you drag
those files on iTunes, every piece of metadata about the file will be
added to the new iTunes library (this doesn't count the filename
itself which is usually not meaningful and unintelligible; it also
doesn't include the playlists and perhaps album art). This is how I
think it should be. The file is the authoritative source of the files
metadata. Any extraction of that metadata — including the setting of
the filename extension to '.ac4' — should be derivative.
So just to sum up what I"m saying. 1) File formats should include the
authoritative metadata about themselves. 2) Transport protocols play
a valuable role in reducing the amount of network traffic needed to
communicate the essential pieces of metadata about a file.
So what does this say about the server / local distinction. It says
to me that whenever we can get data about the file within the file
itself and not rely on a separate transport protocol, that's best.
Whenever HTML can include the mechanisms and features to author a
document that includes various fallback for accessibility and
internationalization and bitmapped-based media resolution and
bitdepth, that's better than relying on a separate transport protocol
(the bitmapped-based media resolution may be a bit of an exception
since, unlike the others, this is intended mainly to reduce
unnecessary network traffic).
Now that doesn't mean it all has to be handled through a single flat
file all through markup, but that can be nice and convenient. The
other alternative is to specify a directory format that mimics the
server content-negotiation process. By directory format I'm thinking
along the lines of apple's directory package where an entire
directory can appear as a single file in the filesystem and its
contents are stringently defined by some specification (it doesn't
have to appear as a single file however; Mac OS X presents
a .framework directories as if they're folders). Also the WebArchive
format is a flat file that can really represent a complex hierarchy
of files. So while these particular methods are a bit beyond the
scope of our WG (except in specifying how UAs should treat these
directories or file formats), we could specify fairly complete
methods of authoring these things right in the HTML markup itself
(using elements and attributes alone). So when I say within HTML
itself, I'm mainly concerned with being able to <em>author</em>
content with elements and attributes alone. Obviously we would have
to specify how UAs handle that authored content, which we would be
doing to specify how they handle a particular directory structure
with filename extensions, but my preference is to do it with markup
alone as much as possible.
>> Having said that, I think you're right that we should include
>> more norms and
>> guidance for UAs to handle other aspects of content negotiation.
>> I feel
>> language is handled adequately in such browsers as Safari by
>> using the
>> system's language preferences. I'd be interested in hearing more
>> about the
>> problem as you see it Leif, in do that. It would be nice if
>> Safari made that
>> clear in its preferences with a button to take the user directly
>> to the
>> language preference pane.
> The problem as I see it, with the Safari/OS X solution, is related
> to authors/content producers - those that have to work with more
> than one language. (Well, the line between users and authors are
> Even with that link to the preferences, it is inconvenient to
> change the preferred langugae *for the entire system* just because
> you are gonna test the preferred language functionality of your
> pages. I'd say that that isn't a very workable solution at all.
I somewhat understand what you're saying, but I don't think that's a
huge problem. As long as Safari picks up on the changes immediately
on a reload testing should be easy (I haven't tested that though). As
long as the author switches it back before launching another
application, it will not effect any other applications.
> In addition, the OS X language preferences system is not perfect.
> (Speaking for my own sick mother, they have not built in support
> for more than one Norwegian language variant - while we have two
> official standards. Wheras Firefox let me select langauge freely -
> and offers many more languages, I think - and also the option of
> setting up my very own langugae code.)
This does sound like a bigger problem. User's cannot add fine-grained
language preferences in the system preferences. I'd like to see that
fixed for the whole system. My sense is that Apple only listed
languages that it tries to provide forgetting that other developers
(and HTML authors , for example) might provide others.
> Think of how fast you can change to IE identity in Opera, for
> instance! It should be that easy to regulate the entire "UA profile".
So you're thinking here about a feature for authoring and testing
>> Other content negotiation would be good. With resolution
>> independence on the
>> horizon, I would really like to see servers and UAs adding
>> negotiation for bitmapped based media. Even before resolution
>> comes to screen media, such negotiation would be useful for
>> providing a
>> screen display image separate from the image used to print the
>> Phil's proposal would suggest the need for some sort of
>> equivalent media
>> negotiation. Like you suggested Lefi, this could be in the form
>> of listing
>> different media types for sub-resources: primarily just ranking
>> audio, and text, I think
> We are now, of course, discussing something not only for IMG but
> over all :-)
But I was thinking that if an author ranked specifically for IMG:
audio, text, then they would get audio substituted for any IMG and if
no audio was available then they would get text. Of course the
default would be more like: video, image, audio, text.
> Should that list not be taken from the Media Descriptor list ,
> which has screen, tty, tv, projection, handheld, print, braille,
> aural? Hm ... perhaps not.
>  <http://www.w3.org/TR/html401/types.html#h-6.13>
My sense is that this should be more focussed on MIME types and
mostly on the primary MIME types video, audio, text and image (which
I forgot to list last time).
>> Once this is instituted in servers and UAs, then the problem of
>> local /
>> server separation becomes larger.
> It does not :-) Well, as one person once characterised the cell
> phone: «It is just another channel were you cna make yourself
> unreachable.» So this new thing would of course be yet another
> thing that perhaps would not work. But the good news - or bad - is
> that you do not need to use it. You can specify full URIs.
>> I think the iCab approach to solving this
>> is interesting. Extending that to actually select the the
>> appropriate file
>> from among several files in the same local directory would not be
>> out of the
>> question. In this way, typing an URL into the browser and leaving
>> off parts
>> of the filename extension could still select the correct file
>> based on user
>> preferences (and similar for files referenced within the main
It could even be extended to the OS level where clicking on a file
that had a readily available equivalent with a different language the
user preferred could b presented too. Or double-clicking on an image
file and the user is notified that an equivalent aural description is
available (based on filename extensions or other metadata).
>> Another kind of solution for local files that occurred to me is
>> Apple's web
>> archive format. I don't think it currently handles content
>> negotiation, but
>> it is an ingeniously file format that includes any number of web
>> pages and
>> their sub-resources in the same document. It also includes the
>> HTTP headers,
>> so adding content negotiation information would also be possible.
>> In other
>> words in creating a web archive, the UA would alternately
>> negotiate with the
>> server for different content and add each response from the
>> server to the
> I did not know anything of this. I use the iCab web archive,
> mostly. It only saves one page at the time, if I am not mistaken.
> Allthough in the beta, there is an option where you can save a
> whole session. But I assume that sessions archives is what you are
> talking about?
I'm more speaking from a developer point of view that the Mac OS X
Web Archive could be made to work this way. It's not currently the
way Safari uses it.
> However, this only solves the problem of saving something from
> online to offline. Editing is more what we are discussing.
No, actually a Mac OS X Web Archive can be used for authoring too.
It can be opened and saved with any WebKit based authoring tool.
>> Despite the potentially fruitful avenues for investigation, I
>> still want to
>> include solutions as much as possible within HTML markup itself.
>> Even with
>> the discussions on the HTML WG list regarding HTTP, I'm concerned
>> that many
>> of those problems rely to much on the server and the HTTP protocol.
> Again, not sure how you define «within HTML markup». Do you mean
> simply «as text within the HTML document itelsf»? Or do you mean an
> attribute that tells which version are available?
It's probably clearer after all of my exposition above, but I do mean
including as much of the alternatives in a single document: whether
that's a web archive or XML or HTML. This just simplifies matters by
keeping the tightly related content in a single file. Likewise a
directory structure could be defined — including one based on
filename extensions) — but that makes things more complicated. The
opposite approach could make the single files easy for authors to
work with and have the sever extract the necessary piece or pieces to
sen over the wire to the UA.
> Also, a server can do more than a file system can. Well, it is of
> course also the other way around. But HTTP is better at serving :-)
> Our debate here deals with making the local situation more capable
> - so one can mimic the online servers. (I say this because you - in
> public HTML - said that the situation is potentially another now,
> with more advanced computers etc.)
I'm not clear what you're referring to here. As I said the file
system could be made to intelligently select files for a user based
on filename extensions, other filesystem attributes or some other way
of defining the relations between the file. This is a capability that
HTTP has that local files do not have. However, many of the other
HTTP capabilities are particularly designed for client-server
communication and therefore do not help the client side. Similar
functions are served by spotlight on Mac OS X on the client-side. For
example an application can use a spotlight query just like it would
use an HTTP header request to discover an then select the exact file
to load according to the users needs.
>> Requiring that the server be the authoritative source for content
>> type is
>> the problem. Certainly the server should be configured to gather and
>> transmit the correct information about content type, but making it
>> authoritative is the problem. Here too, I worry that by leaning
>> too much on
>> the server for a solution to this problem, needlessly complicates
>> situation. Now to create accessible documents, an entirely new
>> set of
>> protocols and RFCs are needed. So while I think it's worth
>> pursuing (and the
>> scope of the HTML WG certainly includes guidance to UAs like this
>> we should still try to make sure the HTML markup can meet the
>> needs itself.
> Do you have any ideas of how?
> I guess that I am impatient person. I fear that we shall focus
> (too) much on some dreamy thing. We haven't even used what we have
> to a full extent.
> I understand the problem you want to solve as «how to enable
> content negotiaton without relying on the server». Or is it more
> spesificly «how to solve this problem just by markup»? Also, I
> think of our debate in Public HTML, then you mentioned charset
> sniffing (by the servers - and ultimately also by the browser) as
> one road to take.
> Something that «just works» is of of course an expert's greatest
> fear - he isn't needed anymore. But I really wonder about how you
> can specify alternate content within the markup itself.
Well, I think the proposal Sander made or something like that could
work (even for alternate languages). I think the directory approach
could work too, but its more complicated for authors (file can get
separated and misplaced). A Web Archive like approach could also work
if HTTP servers could be made aware of the alternates within the same
file and extract them for delivery to the client. In this way would
would change both the local to be more server like and the server to
be more local like.
More information about the List_HTML4all.org