[html4all] Content-type negotiation as an alternative to ALT, LONGDESC and fallback

Sun Aug 26 16:38:15 PDT 2007

Hi Leif,

[sorry for the long post; for those uninterested please just delete  
(it's on the archives anyway):-)]

On Aug 26, 2007, at 4:52 PM, Leif Halvard Silli wrote:

> 2007-08-26 17:46:32 +0200 Robert Burns:
>
>> [Re: Content-type negotiation as an alternative to ALT, LONGDESC and
>> fallback; UA gathering of user preferences; bridging the HTTP   
>> server / local
>> file system divide ]
>>
>> I too was thinking this discussion was related to the HTTP  
>> discussion  going
>> on the WG list. One of the problems I see in both situations is   
>> they both
>> try to rely on a particular transport protocol when I think  HTML  
>> should be
>> mored independent of that. I think Phil's idea is  interesting as  
>> stopgap
>> measure more so than permanent solution. As I  said before, I  
>> would want to
>> make sure there was a way within HTML  markup alone to accomplish  
>> these
>> things.
>
> If we think of the profile attribute as 'within' HTML, then what I  
> said should definitively be 'within' HTML. (Unless you meant  
> 'within/directly in the HTML document' - such as ALT.) Currently,  
> as I understand it, PROFILE only allows ust to define «link types».  
> It could perahps be expanded to also include the src URLs etc. Well  
> - perhaps this has nothing to do with "link types" and another   
> kind of attribute/profile is needed. However, I think it would be  
> natural to think about a "local" profile and a online profile.
>
> Could you expand on what you mean by «within HTML»? And do you see  
> this as a method for bridging the online/offline gap for authors?

Well, in relation to Phil's suggestion regarding alternate equivalent  
fallback content, I think it is an ingenious idea to handle with HTTP  
content negotiation. However, my inclination with this idea (just  
like the discussion on public-html) is to try to avoid the extra  
complexity that results from involving the server. As I've said  
before, I believe the file format is the ideal place to keep  
authoritative information about the files. The extraction of this  
data for ready use in a file system or on a server (or for spotlight  
searches) is very convenient, but the file format itself should be  
treated as authoritative (this contrasts with the HTTP specification  
that wants to treat whatever the server says as authoritative).

So in moving alternate equivalent fallback handling from within HTML  
markup (using elements and attributes) to the HTTP server through  
content negotiation is a complication (perhaps needless complexity).  
So my inclination is to try to make everything work with a single  
file (with no server overhead). Then the server can provide header  
information and content negotiation that reduces network traffic, but  
it shouldn't be treated as authoritative.

'Consider this example.

1) An author creates some content and places it on a server. All of  
the content declares its type (and character encoding where relevant)  
in the first few bytes of the file. These starting bytes of the is  
the authoritative statement about the file's type and encoding.

  2) A user enters an address into a browser. The browser uses  
information about the users environment to negotiate the content with  
the server. This could include the language of the document, the  
types of files; the resolution and bit-depth of bitmapped media; the  
user's preference for video, audio or text (where relevant).

Using server headers in this situation is very efficient. It allows  
the UA to query the server to get at the exact files it needs. If it  
could not perform these queries through content negotiation, the UA  
would need to discover and download one file after the other to see  
what files it needed to satisfy the user's needs. That  leads to much  
more network traffic.

So to me the server's ability to negotiate content and provide  
informative headers is great. However, the insistence that these  
headers must be authoritative is a problem because its easy to mis- 
configure a server or otherwise make mistakes in doing so. Also for  
an author that specifically wants a file treated as another type  
(like treating an HTML file as raw text), such authors may not have  
control over the server or even know where the document will be  
hosted. T o me this suggests it is better — in handling the  
separation of concerns  properly — to place as much as possible  
within the document itself. The XML norm for an  XML declaration that  
includes character encoding is one example. The existence — in  
formats such as PDF and PNG and many others — of a few bytes at the  
beginning of the binary file that show up as 'pdf' or 'png' is  
another example.

Using the file format itself to specify its own metadat is a much  
safer path. It ensures that there's no separation between what a file  
is and the metadata that describes it. For example, if you use iTunes  
you can actually move all of your files from one computer to another.  
This can be through a path that has no server or filesystem metadata.  
You can rename the files to pure gibberish. However, when you drag  
those files on iTunes, every piece of metadata about the file will be  
added to the new iTunes library (this doesn't count the filename  
itself which is usually not meaningful and unintelligible; it also  
doesn't include the playlists and perhaps album art). This is how I  
think it should be. The file is the authoritative source of the files  
metadata. Any extraction of that metadata — including the setting of  
the filename extension to '.ac4' — should be derivative.

So just to sum up what I"m saying. 1) File formats should include the  
authoritative metadata about themselves. 2) Transport protocols play  
a valuable role in reducing the amount of network traffic needed to  
communicate the essential pieces of metadata about a file.

So what does this say about the server / local distinction. It says  
to me that whenever we can get data about the file within the file  
itself and not rely on a separate transport protocol, that's best.  
Whenever HTML can include the mechanisms and features to author a  
document that includes various fallback for accessibility and  
internationalization and bitmapped-based media resolution and  
bitdepth, that's better than relying on a separate transport protocol  
(the bitmapped-based media resolution may be a bit of an exception  
since, unlike the others, this is intended mainly to reduce  
unnecessary network traffic).

Now that doesn't mean it all has to be handled through a single flat  
file all through markup, but that can be nice and convenient. The  
other alternative is to specify a directory format that mimics the  
server content-negotiation process. By directory format I'm thinking  
along the lines of apple's directory package where an entire  
directory can appear as a single file in the filesystem and its  
contents are stringently defined by some specification (it doesn't  
have to appear as a single file however;  Mac OS X presents  
a .framework directories as if they're folders). Also the WebArchive  
format is a flat file that can really represent a complex hierarchy  
of files. So while these particular methods are a bit beyond the  
scope of our WG (except in specifying how UAs should treat these  
directories or file formats), we could specify fairly complete  
methods of authoring these things right in the HTML markup itself  
(using elements and attributes alone). So when I say within HTML  
itself, I'm mainly concerned with being able to <em>author</em>  
content with elements and attributes alone. Obviously we would have  
to specify how UAs handle that authored content, which we would be  
doing to specify how they handle a particular directory structure  
with filename extensions, but my preference is to do it with markup  
alone as much as possible.

>> Having said that, I think you're right that we should include  
>> more  norms and
>> guidance for UAs to handle other aspects of content  negotiation.  
>> I feel
>> language is handled adequately in such browsers  as Safari by  
>> using the
>> system's language preferences. I'd be  interested in hearing more  
>> about the
>> problem as you see it Leif, in  do that. It would be nice if  
>> Safari made that
>> clear in its  preferences with a button to take the user directly  
>> to the
>> language  preference pane.
>
> The problem as I see it, with the Safari/OS X solution, is related  
> to authors/content producers - those that have to work with more  
> than one language. (Well, the line between users and authors are  
> thin.)
>
> Even with that link to the preferences, it is inconvenient to  
> change the preferred langugae *for the entire system* just because  
> you are gonna test the preferred language functionality of your  
> pages. I'd say that that isn't a very workable solution at all.

I somewhat understand what you're saying, but I don't think that's a  
huge problem. As long as Safari picks up on the changes immediately  
on a reload testing should be easy (I haven't tested that though). As  
long as the author switches it back before launching another  
application, it will not effect any other applications.

> In addition, the OS X language preferences system is not perfect.   
> (Speaking for my own sick mother, they have not built in support  
> for more than one Norwegian language variant - while we have two  
> official standards. Wheras Firefox let me select langauge freely -  
> and offers many more languages, I think - and also the option of  
> setting up my very own langugae code.)

This does sound like a bigger problem. User's cannot add fine-grained  
language preferences in the system preferences. I'd like to see that  
fixed for the whole system. My sense is that Apple only listed  
languages that it tries to provide forgetting that other developers  
(and HTML authors , for example) might provide  others.

> Think of how fast you can change to IE identity in Opera, for  
> instance! It should be that easy to regulate the entire "UA profile".

So you're thinking here about a feature for authoring and testing  
then, right?

>> Other content negotiation would be good. With resolution  
>> independence  on the
>> horizon, I would really like to see servers and UAs adding   
>> resolution
>> negotiation for bitmapped based media. Even before  resolution  
>> independence
>> comes to screen media, such negotiation would  be useful for  
>> providing a
>> screen display image separate from the  image used to print the  
>> document.
>> Phil's proposal would suggest the  need for some sort of  
>> equivalent media
>> negotiation. Like you  suggested Lefi, this could be in the form  
>> of listing
>> different media  types for sub-resources: primarily just ranking  
>> video,
>> audio, and  text, I think
>
> We are now, of course, discussing something not only for IMG but  
> over all :-)

But I was thinking that if an author ranked specifically for IMG:  
audio, text, then they would get audio substituted for any IMG and if  
no audio was available then they would get text. Of course the  
default would be more like: video, image, audio, text.

> Should that list not be taken from the Media Descriptor list [1],  
> which has screen, tty, tv, projection, handheld, print, braille,  
> aural? Hm ... perhaps not.
>
> [1] <http://www.w3.org/TR/html401/types.html#h-6.13>

My sense is that this should be more focussed on MIME types and  
mostly on the primary MIME types video, audio, text and image (which  
I forgot to list last time).

>> Once this is instituted in servers and UAs, then the problem of   
>> local /
>> server separation becomes larger.
>
> It does not :-) Well, as one person once characterised the cell  
> phone: «It is just another channel were you cna make yourself  
> unreachable.» So this new thing would of course be yet another  
> thing that perhaps would not work. But the good news - or bad - is  
> that you do not need to use it. You can specify full URIs.
>
>> I think the iCab approach  to solving this
>> is interesting. Extending that to actually select the  the  
>> appropriate file
>> from among several files in the same local  directory would not be  
>> out of the
>> question. In this way, typing an  URL into the browser and leaving  
>> off parts
>> of the filename extension  could still select the correct file  
>> based on user
>> preferences (and  similar for files referenced within the main  
>> document).
>
> Exactly.

It could even be extended to the OS level where clicking on a file  
that had a readily available equivalent with a different language the  
user preferred could b presented too. Or double-clicking on an image  
file and the user is notified that an equivalent aural description is  
available (based on filename extensions or other metadata).

>> Another kind of solution for local files that occurred to me is   
>> Apple's web
>> archive format. I don't think it currently handles  content  
>> negotiation, but
>> it is an ingeniously file format that  includes any number of web  
>> pages and
>> their sub-resources in the same  document. It also includes the  
>> HTTP headers,
>> so adding content  negotiation information would also be possible.  
>> In other
>> words in  creating a web archive, the UA would alternately  
>> negotiate with the
>>   server for different content and add each response from the  
>> server to  the
>> archive.
>
> I did not know anything of this. I use the iCab web archive,  
> mostly. It only saves one page at the time, if I am not mistaken.  
> Allthough in the beta, there is an option where you can save a  
> whole session. But I assume that sessions archives is what you are  
> talking about?

I'm more speaking from a developer point of view that the Mac OS X  
Web Archive could be made to work this way. It's not currently the  
way Safari uses it.

> However, this only solves the problem of saving something from  
> online to offline. Editing is more what we are discussing.

No, actually a Mac OS X Web Archive  can be used for authoring too.  
It can be opened and saved with any WebKit based authoring tool.

>
>> Despite the potentially fruitful avenues for investigation, I  
>> still  want to
>> include solutions as much as possible within HTML markup  itself.  
>> Even with
>> the discussions on the HTML WG list regarding HTTP,  I'm concerned  
>> that many
>> of those problems rely to much on the server  and the HTTP protocol.
>
> Again, not sure how you define «within HTML markup». Do you mean  
> simply «as text within the HTML document itelsf»? Or do you mean an  
> attribute that tells which version are available?

It's probably clearer after all of my exposition above, but I do mean  
including as much of the alternatives in a single document: whether  
that's a web archive or XML or HTML. This just simplifies matters by  
keeping the tightly related content in a single file. Likewise a  
directory structure could be defined — including one based on  
filename extensions) — but that makes things more complicated. The  
opposite approach could make the single files easy for authors to  
work with and have the sever extract the necessary piece or pieces to  
sen over the wire to the UA.

> Also, a server can do more than a file system can. Well, it is of  
> course also the other way around. But HTTP is better at serving :-)  
> Our debate here deals with making the local situation more capable  
> - so one can mimic the online servers. (I say this because you - in  
> public HTML  - said that the situation is potentially another now,  
> with more advanced computers etc.)

I'm not clear what you're referring to here. As I said the file  
system could be made to intelligently select files for a user based  
on filename extensions, other filesystem attributes or some other way  
of defining the relations between the file. This is a capability that  
HTTP has that local files do not have. However, many of the other  
HTTP capabilities are particularly designed for client-server  
communication and therefore do not help the client side. Similar  
functions are served by spotlight on Mac OS X on the client-side. For  
example an application can use a spotlight query just like it would  
use an HTTP header request to discover an then select the exact file  
to load according to the users needs.

>> Requiring that the server be the authoritative  source for content  
>> type is
>> the problem. Certainly the server should  be configured to gather and
>> transmit the correct information about  content type, but making it
>> authoritative is the problem. Here too, I  worry that by leaning  
>> too much on
>> the server for a solution to this  problem, needlessly complicates  
>> the
>> situation. Now to create  accessible documents, an entirely new  
>> set of
>> protocols and RFCs are  needed. So while I think it's worth  
>> pursuing (and the
>> scope of the  HTML WG certainly includes guidance to UAs like this  
>> already)
>> we  should still try to make sure the HTML markup can meet the  
>> needs itself.
>
> Do you have any ideas of how?
>
> I guess that I am impatient person. I fear that we shall focus  
> (too) much on some dreamy thing. We haven't even used what we have  
> to a full extent.
>
> I understand the problem you want to solve as «how to enable  
> content negotiaton without relying on the server». Or is it more  
> spesificly «how to solve this problem just by markup»? Also, I  
> think of our debate in Public HTML, then you mentioned charset  
> sniffing (by the servers - and ultimately also by the browser) as  
> one road to take.
>
> Something that «just works» is of of course an expert's greatest  
> fear - he isn't needed anymore. But I really wonder about how you  
> can specify alternate content within the markup itself.

Well, I think the proposal Sander made or something like that could  
work (even for alternate languages). I think the directory approach  
could work too, but its more complicated for authors (file can get  
separated and misplaced). A Web Archive like approach could also work  
if HTTP servers could be made aware of the alternates within the same  
file and extract them for delivery to the client. In this way would  
would change both the local to be more server like and the server to  
be more local like.

[1]: <http://esw.w3.org/topic/HTML/ABetterAlt>