02-14-2006
Squashing Information Bugs
There is a ton of great information online. There is lots of garbage out there as well. One of the recurrent themes of this site is the need for more meta data, generated both by users and site owners, to be used by search engines to improve result quality, and by web browser clients to increase information utility. One of the key needs for this is precisely in the area of assessing information accuracy. Site owners are in a good position to outline the limitations of their knowledge. Site users are in a good position to evaluate whether the information on a site worked for them, or whether it has become outdated. Used in combination this could be a powerful system for detailing the relevant conditions under which information may be of value to an knowledge consumer.Â
A quick illustration of why this matters. Over the last few days I have been working on installing wordpress 2.0 and drupal on my home machine. I want to do some theme and plugin development locally, and wanted to develop a deeper understanding of all the pieces involved in these open source content management systems. At home I have a mac laptop running os X 10.4, and a windows desktop windows xp pro. To get these CMS systems working you basically need Apache, PHP, Mysql, and Mod_Rewrite.Â
Installing each of these in a windows environment is fairly straightforward once you cobble together all the appropriate installers and instruction sets. However, getting to the point whether you have all that is not so simple. There are lots of great articles out there explaining how to set everything up, however every one I looked at had some significant bug in the article that made the installation process much longer than it needed to be.  In short, make sure IIS is not running when you run the apache installer, make sure you edit the right section of httpd.conf when trying to enable mod_rewrite, and understand that some drupal themes just don’t work with php 5 right now.  I could now install these systems in 10 minutes on any xp machine, but due to all the buggy info I ran accross, I spent 6 hours in total getting everything working (and mod_rewrite still isn’t working on my mac yet).Â
The context of development information is somewhat specialized, however, any info can be buggy. I’ve seen product recommendation sites incorrectly list features of products. I’ve seen news articles that wrongly describe someone’s educational background. Bad information is out there everywhere. Even when the basic information available is worthwhile, it can also be rendered less valuable by poor information design (check out Edward Tufte’s analysis of the space shuttle disaster in Visual Explanations for an example of how serious the problems this kind of ‘formatting’ problem can cause).Â
So here are my concrete suggestions. There should be a common syntax for article authors to post corrections to their pages.  From a browser point of view, when correction metadata is found, this can be highlighted in some way, especially for users that visited the article before the corrections were posted. Article owners should also be able to post limitations that govern the utility of their articles. For example, for a technical article you should indicate what kind of technical setup an article applies to. For a product review situation, you might indicate whether or not you tested the product yourself, and if so whether it was in a pre-release version or the final consumer version. This data could also be specially highlighted to help users understand whether you article will actually help them with their questions.
There should also be a common syntax for article users to point out errors and problems they encountered using the article. This metadata would have to be somehow accessible to other users, so some kind of toolbar entry / community search result situation would be best (like my yahoo coupled with the yahoo toolbar with new features). The feedback should be able to cover everything from the accuracy of the basic data being reported to problems with how the information is being displayed (Tufte style analysis).
Over the next couple of weeks I’ll be suggesting a particular xml document structure representing the kind of feedback and meta information I am interested in readers and authors sharing.Â