[html4all] HTML tags on the wiki

Robert Burns rob at robburns.com
Mon Aug 27 10:44:35 PDT 2007


Hell 4all,

I'm getting a little closer to figuring out the wiki issues. I think  
I've uncovered how to add file uploads to the wiki. This will mean we  
can use the built-in wiki markup for adding images. Such wiki markup  
has alt= added to all images. Unfortunately it repeats the alt= on  
the img in the title attribute of the img element's parent anchor  
element. When adding the keyword 'caption' to the markup, the text is  
then repeated a third time in the images caption (along with img at alt  
and a at title).

One option is to allow all HTML wrapped in <html> tags. This is  
discouraged as a security risk when using a completely open wiki[1].  
We could easily restrict access somewhat and it would probably be  
safe to enable this feature.

the other thing I was looking for was an HTML tag whitelist. It turns  
out that the white list is written in PHP and it has to be changed  
before compiling the MediaWiki software[2]. This is something I think  
perhaps DanC should do in setting up a new wiki for W3C. However, its  
not something I feel I could accomplish, but I might look into it.  
Basically it would involve making changes to teh whit list and then  
compiling our own HTML4All WikiMedia version.

There is a bug report to MediaWiki to add some of the obvious missing  
tags (abbr, defn, q, etc).[2][3] Obviously there's no danger in  
allowing these tags and they can also improve accessibility for the  
wiki. However, the presumption seems to be that adding <img> and  
<object> would be dangerous. I'm not sure if that's the case. They  
may be, but I'd rather hear the arguments than simply assuming they  
are. Perhaps enabling <object>  without <param> would make it usable,  
but remove some of the bigger chances for a security hole. Some of  
the <object> attributes might also open some danger like: codebase,  
codetype, and clasid. Again, I'm not saying these are security holes,  
but they could be. Any security holes would have to be exploits of  
software users already had installed on their system.

Finally, I think perhaps the MediaWiki software approach to this  
might be to instead add new syntax for images to differentiate  
between alt=, title= and captions. Something like:

[Image:srcURI | caption:caption-text | title:title-text | some-alt- 
text ]

This way the wikimedia software would convert that syntax into either  
an img or object element depending on the configuration. Perhaps  
filing a new bug on that would be helpful.

I was also thinking of writing to DanC on this to alert him to the  
fact that W3C may want to compile its own version of MediaWiki to add  
support for more semantics and accessibility.

I'd like to hear from the group on what approach I should take.  
Obviously if I or anyone else can handle a recompile and installation  
of our own MediaWiki software that would be great. However, if we  
can't do that, should we require login to the site and turn on all  
HTML tags?

Take care,
Rob

[1]: Enable all HTML in <html></html> wrapper:
  <http://www.mediawiki.org/wiki/Manual:%24wgRawHtml>

[2]: MediaWiki bug report on adding some additional HTML tags:
<http://bugzilla.wikimedia.org/show_bug.cgi?id=671>

[3]: Diff of proposed HTML tag white list:
<http://bugzilla.wikimedia.org/attachment.cgi?id=3331>




More information about the List_HTML4all.org mailing list