Squashing Information Bugs

There is a ton of great information online.  There is lots of garbage out there as well.  One of the recurrent themes of this site is the need for more meta data, generated both by users and site owners, to be used by search engines to improve result quality, and by web browser clients to increase information utility.  One of the key needs for this is precisely in the area of assessing information accuracy.  Site owners are in a good position to outline the limitations of their knowledge.  Site users are in a good position to evaluate whether the information on a site worked for them, or whether it has become outdated.  Used in combination this could be a powerful system for detailing the relevant conditions under which information may be of value to an knowledge consumer. 

A quick illustration of why this matters.  Over the last few days I have been working on installing wordpress 2.0 and drupal on my home machine.  I want to do some theme and plugin development locally, and wanted to develop a deeper understanding of all the pieces involved in these open source content management systems. At home I have a mac laptop running os X 10.4, and a windows desktop windows xp pro.  To get these CMS systems working you basically need Apache, PHP, Mysql, and Mod_Rewrite. 

Installing each of these in a windows environment is fairly straightforward once you cobble together all the appropriate installers and instruction sets.  However, getting to the point whether you have all that is not so simple.  There are lots of great articles out there explaining how to set everything up, however every one I looked at had some significant bug in the article that made the installation process much longer than it needed to be.  In short, make sure IIS is not running when you run the apache installer, make sure you edit the right section of httpd.conf when trying to enable mod_rewrite, and understand that some drupal themes just don’t work with php 5 right now.  I could now install these systems in 10 minutes on any xp machine, but due to all the buggy info I ran accross, I spent 6 hours in total getting everything working (and mod_rewrite still isn’t working on my mac yet). 

The context of development information is somewhat specialized, however, any info can be buggy.  I’ve seen product recommendation sites incorrectly list features of products.  I’ve seen news articles that wrongly describe someone’s educational background.  Bad information is out there everywhere.  Even when the basic information available is worthwhile, it can also be rendered less valuable by poor information design (check out Edward Tufte’s analysis of the space shuttle disaster in Visual Explanations for an example of how serious the problems this kind of ‘formatting’ problem can cause). 

So here are my concrete suggestions.  There should be a common syntax for article authors to post corrections to their pages.  From a browser point of view, when correction metadata is found, this can be highlighted in some way,  especially for users that visited the article before the corrections were posted.  Article owners should also be able to post limitations that govern the utility of their articles.  For example, for a technical article you should indicate what kind of technical setup an article applies to.  For a product review situation, you might indicate whether or not you tested the product yourself, and if so whether it was in a pre-release version or the final consumer version.  This data could also be specially highlighted to help users understand whether you article will actually help them with their questions.

There should also be a common syntax for article users to point out errors and problems they encountered using the article.  This metadata would have to be somehow accessible to other users, so some kind of toolbar entry / community search result situation would be best (like my yahoo coupled with the yahoo toolbar with new features).  The feedback should be able to cover everything from the accuracy of the basic data being reported to problems with how the information is being displayed (Tufte style analysis).

Over the next couple of weeks I’ll be suggesting a particular xml document structure representing the kind of feedback and meta information I am interested in readers and authors sharing. 

XFN and the future of Why? linking

What does a link mean? If you are going to have link relationships govern the logic of search applications, it seems you have to have a good answer to this deceptively simple question. In many situations it seems that we takes links as a kind of vote of confidence. There are lots of reasons you might link to content though. Maybe you are holding up something you think should be celebrated (like Bluehost.com, the best web host ever in my opinion). Perhaps though you are pointing out something you think is awful. I remember all the great links Ask.com used to get from F’ed company back in 2001. Maybe you are pointing to some other site you own, trying to use your properties as a web ring to bring each other up (what everyone seems to be doing now, creating tons of fake content sites and interlinking them). I think the steady decline in the quality of the SERPs in major search engines in lots of commercial subject areas attests to the difficult they are having in figuring out what links really mean and determining their relative legitimacy.

There are current efforts though that could potentially make this question much easier to answer. XFN, the XHTML Friends Network protocol put forward by gmpg.org is a great example of how embedding meta data in links can help create a more exposed intentional web. XFN allows you in Blogrolls (or other places) to indicate what your relationship is to the authors of the other sites you are linking to. You can specify for example that you are someone’s partner or friend. XFN is very cool, but limited in scope to unearthing human interrelationships represented by links. While that makes a ton of sense in the world of blogs, it is not quite as appropriate to a corporate or organizational site. It would be great though to take this idea and extend it to an entire vocabulary of embedded meta info in links in essence having every link tell you why it exists (why? links). I should be able to say in my link tags, I made this link to try to make money though an affiliate relationship. Or, I made this link because I think people should be aware of the information this article is sharing. Or, I made this link because this photo is hideous and I can’t believe someone put it in their personal profile.

Like everything else of course this would be prone to abuse. Spammers would just use whatever the most positive meta inflection was for their purposes. However, leaving that issue aside for the moment, if we could promote this kind of linking behavior, it really could make the web much more searchable. You could use news type links to come up with sites that are relevant to info seeking queries. You could use humor type links to find relevant results for entertainment seeking queries. Quite aside from what it might do for search engines, this would also provide interesting data for potential interfaces for browsers. Let’s say I want to hide all the ads for example, or that I only want to see links that will show me the kind of informational relationships I am looking for. This would make that possible, in an opt in way for both browsers and web authors.

Of course, there’s no particular reason just to limit this kind of thing to links. Why not have a whole internal element based meta data language where I can tag a picture as an ad or a sentence as a joke or a paragraph as a recipe? Page level meta data is appropriate to situations where a page is monolithic in focus but these days many sites are far from monolithic, having many intentions on one page and many forms of information serving various purposes. Let us make our intentions clear! And then pay attention, and let’s see what happens.

How Search Literate are you?

I define Search Literacy as the ability to find what one is looking for in an efficient manner.  This breaks down into 2 major questions.  1.  Did you find what you were looking for?  (Did you end up with the right information?  Are you happy with your purchase?)  2.  Were you able to conduct the search in a time effective manner?  (Given the value of the information you were seeking, did you spend a reasonable amount of time conducting the search?).  Having worked in IT for over 7 years now, 3 of which were spent working for a search company (ask jeeves, i was a jeeviant at the height of the dot com craze), I consider myself pretty technically proficient and quite search literate.  I thought I would test myself though in my search for a Digital Voice Recorder.

(Spoilers Below:  If you want to compare your techniques to mine, don’t read any further in this article.  Take the test yourself by trying to decide what Digital Voice Recorder you would purchase.  I wanted a recorder that would download data to a pc that I could use for both documenting ideas and sampling sounds to use in my music.  I was looking to spend about $150 but would be willing to spend more for great features if there was a compelling reason to do so. What model would you pick?  How long did you spend looking?)

For an electronics decision at some point I like to visit a physical store to look at the products I’m considering.  Usually this occurs after I’ve done my online research.  In this case though, the idea to buy the recorder came at a time when I had good access to physical stores, but very poor web access.  So, I went to both Walmart and Staples and looked at what they had available before I had ever looked online for information.  This trip made me aware that for a top of the line recorder I would likely spend $160 or more.  Some recorders had absolutely no output port so you couldn’t transfer files to a pc.  Some recorders were compatible with transcription software while others weren’t.  Some played mp3 files and supported multiple formats.  None of them seemed to have removable storage or use rechargeable batteries. (I later found out that some do offer removable storage, which became a feature requirement for me). 

For a purchase decision like this I usually first of all look for an expert community and begin looking at popular products mentioned in that community (something like http://www.notebookreview.com/, a great site if you’ve never seen it).  After trying a few variations around the basic idea of ‘digital voice recorder reviews’ in google, I did find http://www.voicerecognition.com/solutions/digital_recorders/ which looked initially like a potentially useful resource.  This was an ecommerce site though, and like so many info-commerce sites (sites that are giving you ‘useful information’ in the context of trying to sell you something), it was hard to determine to what extent they could be trusted.  Many of the other sites I ran accross of the same nature were clearly biased, always telling you the most expensive item was the best, and never really giving you any negative reviews. 

If I can’t find a category specific expert community, I then go to more general expert communities like CNET or Anandtech for electronics.  Neither of these had a particular section around recorders and didn’t have specific reviews of some of the models I had registered from my search efforts so far.  

After exhausting expert communities, I then look for user opinions.  Amazon is still my first place to go for these, and then usenet groups via google’s group search (I still type in deja.com every once in a while for old time’s sake - gotta give it to google on one thing, they make good purchases). Amazon had some useful information and there were lots of newsgroup reviews that helped.   In particular, one user on Amazon helped me quite a bit by pointing out that some of the Sony recorders actually came bundled with transcription software that if purchased separately was almost as much as the recorders themselves.  This made bundled transcription software a new required feature for me.

At this point, I had narrowed my search down to two models.  I went looking for the lowest price point for both models, and then would have to decide given the price difference, whether the difference in feature set pointed towards getting the more or less expensive model.  I used a couple of different comparison shopping engines for this purpose.  I don’t go directly to any particular engine though, and usually just type the product name plus the word “prices” into Google.  After looking at several comparison engines, I located the merchant to buy from, and made my purchase.  If I was being super diligent, I would have looked for merchant reviews in google groups to see if there were any complaints as well as looking on BBBOnline to see if there were any cause for concern.  The merchant looked quite legitimate though and had a good rating in all the comparison engines, so I made the purchase without undergoing this step. 

One thing I noticed in going through the comparison engines is that they added different markups to the products depending on what their revenue model is.  Most of them are affiliates to the merchants and either they or the merchants mark up the products according to the commission rate.  So, if you have a comparison engine (dealtime, mysimon, bizrate) that you’ve used in the past, before you make a purchase, go and shop around, as they make not be showing the you bottom line rate the merchant will give.  Another way to get around this issue would be to find the lowest cost merchant in one engine, go to their site, bookmark it, clear your cookies, and return.  If they have any special channel based pricing, hopefully this would eliminate it.

At the end of this process, I felt quite good about my searching methods.  I ended up with a recorder I’m very happy with (a Sony ICD MX20VTP) and only spent a few hours researching the decision.  I would say though that my test was conducted in one of the easiest topic areas imaginable, consumer electronics.  There are lots of great review sites, people share information about the products they buy on news groups, and the product line is very well represented in all the shopping engines.  This is also a product search, which the web is very well setup to handle. Finally, this is a search for information that is available online.  There are lots of searches that are not so readily handled online.  For example, how do you find your wallet when you’ve lost it?  So, over the next few weeks I’m going to be working on a more thorough test writeup for a variety of different information scenarios. 

Check back if you feel you are up to the test.

Oh Tribalism!

More and more people are talking about how society is becoming increasingly tribal in character. Instead of broad common values and common media types uniting us, we are splintered into smaller, more specialized groups often distributed geographically and connected only by virtual web-based or incidental communities. Comic lovers and trekkies unite at conventions. Open source enthusiasts come together on slashdot. And so on and so on. Whether or not this phenomenon is really producing new social forms, or whether there is a reversion back to more historically prevalent behavior, something is definitely altering in how people see themselves as related.

There are a variety of interestingly novel tribe types. One of the strongest bonds seems to be in Consumer Communities (Consumities) - groups of people with common purchasing patterns. Did you have an Atari 2600? Did you have neon shoelaces on your roller skates in the early 80’s? Are you a boat person? These communities are inherently dynamic as anyone can buy their way in, even if you can’t establish legitimacy as an insider. In the nineteenth century Hegel talked about the importance of property to the development of identity, and in our post-industrial world his words ring quite true. There are also Communities of Interest (Thought Friends). So, rather than united by a purchase, you are united in a belief or a fascination in some subject. Christian groups. Sports opinion radio. Flame wars. Of course, these two types often overlap as interests involve purchasing decisions. Unfortunately we still have our physical tribal distinctions as well. Tall and short. Skin color. Gender. Where you live. As consumptive and intellectual communities become more accessible and distributed, I wonder whether those in more culturally isolated positions become more invested in the importance of their physical identity. There is no reason of course that this would have to be limited to a particular social position. The shifting lines of commerce and thought can be scary to anyone who wants to fit in.

Understanding tribal behavior and the potential consequences of the new forms of grouping that are increasingly common is a huge topic this site will return to on many occasions. Are you into it? If so, there is a special category link for people like you.

Clicks and Cliques

It seems to me that google is in a weird way the old high school popularity contest recreated on its head. By privileging sites with a great number of inbound links, it makes certain popular sites rise to the top of the pile. While this may produce relevance, a worthy goal, it may be doing so at the cost of privileging more conventional knowledge over more estoretic or revolutionary versions of ideas. So let’s say you have a crazy theory about the universe that noone agrees with. It may mean that noone links to you. Unless we can add something to the search interface which will allow someone to uncover fringe theories, we end up in the situation potentially hindering knowledge progress by foregrounding the popular version of events over those people trying to challenge the common understanding.

To rectify this it would be nice to have something which would be a kind of lowest sensible relevance filter. So, show me something which is related to what I am looking for through on page grammatical/semantic analysis, but from a linking standpoint is a relative orphan. It would also be cool to have meta data being collected from users about whether they agree or disagree with what they are reading, so that I can eventually read news that people agree is decent versus news they might classify as bogus (cough fox, cough), or vice versa.

Power to the People - Reader Response in Action

Though search engines are fixated on meeting their users’ needs (they should be anyway), too little control is given to the user in determining their own fate. There should be many more tools for users allowing them to help give the engines data to fine tune their algorithms. For example, it should be a one click operation for me to tell the engine that the result I got was a spam result due to the fine SEO efforts of some agency, with little or nothing to do with what I was looking for. There should be an easy way for users to enter meta data about results that please or offend them for the benefits of later users. Let users comment on results, and let the community of users moderate those comments like one of the many great forums out there (slashdot, etc.). While initially giving users more power might lead to more spamming, over the long run, it might be the most effective anti-spam filter out there.

Yahoo’s efforts in this area are interesting. I will have more comments on what Yahoo is doing in this sphere later this week.

Legibility 1: Finding a Date. Theoretical design for a nightclub.

Finding someone is a sometimes baffling endeavor. There are lots of great services now where you can spell out exactly what you want online and be matched up through various profile points to potentially compatible individuals. I found my partner before the online dating services were really developed, so I still remember when you did things like go out to dance clubs to find someone. I used to go to mixed clubs quite a bit in San Francisco, where some people were straight and others were gay. It was cool that people were mixing but there was of course tension around who was there for what.

If we take the online dating paradigm and integrated it into the nightclub we could create a completely legibile environment to simplify things. Imagine that the floor, furniture and seating were differently colored and labeled, so there was an area to hang out for singles versus people already in a relationship. You could divide it up into straight and gay. You could further subdivide to make it clear who wasn’t tolerant of monogamy or who was really, really into feet. You could divide things by income bracket and political party. Put all the people that have ever been arrested in a particular area along with all kinds of demarcation around who arrived with whom and who has been involved before in past relationships. The club could be called ID (Identifiable Differences).

Of course there would be the problem of people not telling the truth. Beyond that though, what would this legibility produce in terms of pushing or not pushing boundaries? If you’ve never been with a person of the same sex and you all of a sudden get groped in a nightclub by someone fitting that description it does make you think if your lack of experiences makes any sense. But when everything is made legible, are we constrained? Or would it instead through its clear distinctions make you keenly aware of what you were and were not, and hold you accountable for your limitations? It would be great if someone would build it so we could find out. I would be hanging out in the corner with the other magick freak smart boy loners that are really into pinball (we all fit some kind of description in the end).

The problem of legibility

The quest for relevance in search results seems largely focused around determining user intention. Are you shopping? Are you looking for pictures? Are you just wasting time? In pinpointing intention to fulfill these needs though something is potentially lost. How many great sites have you found for example from ‘irrelevant’ results? The problem of legibility is a larger social problem at a time when so many things are wrapped up in questions of identity. Who are you? What do you care about? What defines you? As someone who has always been a fan of the shadows in life and the beauty of the undefined, identity is a frightening thing. Whatever happened to Whitman’s barbaric yawp of self-contradiction? Why must everything be so clear? For the next few posts I’m going to look at legibility in a few different contexts to try to tease out some of the problems this quest entails.

What is this place?

While I am actively engaged in commercial search marketing, what interests me about search is not only how you make money from it, but more significantly what our ways of exploring tell us about the world / thoughtspace we inhabit and are constantly altering. We are as a society becoming increasingly tribal, with a rapidly diversifying and intensifying set of struggles around issues of identity, privacy, legibility, and cognitive coherence. All of this is occuring at time when communication systems are becoming so powerful and omnipresent that one cannot escape the clutter. It is no wonder then that searching has become one of the most important and valuable skills in the world.

This site is intended as a forum to share thoughts about search as a larger set of social problems and cultural manifestations. It is my hope to extend the discussion of searching and exploring beyond just the web and to examine it in the broader historical context of human tools and techniques for exploration and discovery. There will be discussions of particular SEO strategies and techniques, however that is not the sole focus of this site.

« Previous Page