<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>SearchPhilosophy.org &#187; Search Result Quality</title>
	<atom:link href="http://searchphilosophy.org/category/search-result-quality/feed/" rel="self" type="application/rss+xml" />
	<link>http://searchphilosophy.org</link>
	<description>Thoughts about Searching</description>
	<pubDate>Sun, 11 Nov 2007 03:45:48 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
	<language>en</language>
			<item>
		<title>Why doesn&#8217;t Google have a better 404 page?</title>
		<link>http://searchphilosophy.org/2006/06/13/why-doesnt-google-have-a-better-404-page/</link>
		<comments>http://searchphilosophy.org/2006/06/13/why-doesnt-google-have-a-better-404-page/#comments</comments>
		<pubDate>Tue, 13 Jun 2006 20:07:32 +0000</pubDate>
		<dc:creator>EveMedia</dc:creator>
		
		<category><![CDATA[Search Result Quality]]></category>

		<guid isPermaLink="false">http://searchphilosophy.org/2006/06/13/why-doesnt-google-have-a-better-404-page/</guid>
		<description><![CDATA[I use Google analytics all the time, and find it very useful in tracking webÂ user behavior.Â  Analytics isn&#8217;t always the easiest word to type though, and more times that I canÂ count I&#8217;ve tried to go to www.google.com/analtyics (or somesuch similar misspelling of the url).Â  This takes you to one of the lamest things about Google, [...]]]></description>
			<content:encoded><![CDATA[<p>I use Google analytics all the time, and find it very useful in tracking webÂ user behavior.Â  Analytics isn&#8217;t always the easiest word to type though, and more times that I canÂ count I&#8217;ve tried to go to <a href="http://www.google.com/analtyics">www.google.com/analtyics</a> (or somesuch similar misspelling of the url).Â  This takes you to one of the lamest things about Google, their terrible 404 page.Â </p>
<p>Now Google is supposed to help us all find things we are looking for right?Â  So why can&#8217;t they figure out a simple thing like when I mistype the url?Â  It seems like they could at least do something like take the url and cycle it through their own search to see if something comes up.Â  Interestingly if you pass in &#8220;/analtyics google&#8221; to google you do at least get what you are looking for the in the adwords results.Â  (Maybe they could use this to generate some adwords revenue from themselves).Â </p>
<p>In any case, a company with so much money andÂ the market leadership inÂ searching the internet should figure out some way to better facilitate user navigation of this type.Â  Google it seems needs to take some time to better find itself.</p>
<p>Â </p>
]]></content:encoded>
			<wfw:commentRss>http://searchphilosophy.org/2006/06/13/why-doesnt-google-have-a-better-404-page/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Idea:  Use the average time to Adwords click to track spammers</title>
		<link>http://searchphilosophy.org/2006/02/28/idea-use-the-average-time-to-adwords-click-to-track-spammers/</link>
		<comments>http://searchphilosophy.org/2006/02/28/idea-use-the-average-time-to-adwords-click-to-track-spammers/#comments</comments>
		<pubDate>Wed, 01 Mar 2006 02:58:42 +0000</pubDate>
		<dc:creator>EveMedia</dc:creator>
		
		<category><![CDATA[Search Result Quality]]></category>

		<category><![CDATA[Spamming]]></category>

		<guid isPermaLink="false">http://searchphilosophy.org/2006/02/28/idea-use-the-average-time-to-adwords-click-to-track-spammers/</guid>
		<description><![CDATA[In the Algorithm research session at Search Engine Strategies today severalÂ panelistsÂ mentioned thatÂ search engines might use the length of time one spends on aÂ result page to downgrade its relevance in certain circumstances.Â  If everyone immediately clicks back to the search engine for example quickly after visiting a page, the engines may take this as an indication [...]]]></description>
			<content:encoded><![CDATA[<p>In the Algorithm research session at Search Engine Strategies today severalÂ panelistsÂ mentioned thatÂ search engines might use the length of time one spends on aÂ result page to downgrade its relevance in certain circumstances.Â  If everyone immediately clicks back to the search engine for example quickly after visiting a page, the engines may take this as an indication of user dissatisfaction with the result.</p>
<p>In a similar vein, it seems like one could track the time between when one clicks a SERP result, and when a correspondingÂ ad isÂ clicked on theÂ destination page as a possible indicator of spam activity.Â  There are so many pages now that are nothing butÂ lifted content designed to get in the SERPsÂ to earn revenue through adwords or similar adÂ networks.Â  IfÂ veryÂ quickly after the average visitor is sent to page, another ad is clicked, this might identify these ad-centric pages and allow for them to be moderated down or out of the results as appropriate.Â </p>
<p>This of course is somewhat limited as the engine must match the ads being served to track the time difference.Â  Also, there is a conflict of interest in these situations, as the engines may themselves be benefiting from the revenue these sites produce.Â  In the long run though, getting rid of these sites will improve the user experience, and thus improve the value of the ad channelÂ to potential advertisers.Â  Â So even if it&#8217;s value is somewhat limited, it might be worth trying to put something like this in place to at least guarantee the integrity of the ad network within the serving engine itself.</p>
]]></content:encoded>
			<wfw:commentRss>http://searchphilosophy.org/2006/02/28/idea-use-the-average-time-to-adwords-click-to-track-spammers/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Subtle Misinformation</title>
		<link>http://searchphilosophy.org/2006/02/26/subtle-misinformation/</link>
		<comments>http://searchphilosophy.org/2006/02/26/subtle-misinformation/#comments</comments>
		<pubDate>Sun, 26 Feb 2006 14:23:36 +0000</pubDate>
		<dc:creator>EveMedia</dc:creator>
		
		<category><![CDATA[Legibility]]></category>

		<category><![CDATA[Search Result Quality]]></category>

		<guid isPermaLink="false">http://searchphilosophy.org/2006/02/26/subtle-misinformation/</guid>
		<description><![CDATA[There has never been an easier time to get your thoughts out there.  For like $10 a month you can have a website, spout your opinions, and share your wisdom with the world.  More and more people spend time reading things like blogs now, and these independent resources are playing a greater role in informing the culture.  At the same time, this means that there has never been an easier time to spread false information. ]]></description>
			<content:encoded><![CDATA[<p><font face="Times New Roman" size="3">There has never been an easier time to get your thoughts out there.Â  For like $10 a month you can have a website, spout your opinions, and share your wisdom with the world.Â  More and more people spend time reading things like blogs now, and these independent resources are playing a greater role in informing the culture.Â  At the same time, this means that there has never been an easier time to spread false information.Â  For example, put something totally false in your blog like mentioning the recent study linking cooking with Teflon to breast cancer.Â  Is there any such study?Â  Not that I know of, but you can say it.Â  Who really checks all the claims made in what they read?Â  It would be interesting to have instead of an SEO contest, a LIES contest (Loading Information Errors into Society) where people tried to get the most number of people to repeat their made up nonsense as truth.Â  This could be tracked by things like media mentions in mainstream journalistic broadcasts, or repetition as truth on unrelated websites.Â  The dark side of the urban legend is that it doesnâ€™t have to be about something as silly as a dog choking on a finger, it can be a false notion that spreads and gains ground about any topic, serious or otherwise.Â  So get out there and start fibbing.Â  You have the power to shape reality.</font></p>
]]></content:encoded>
			<wfw:commentRss>http://searchphilosophy.org/2006/02/26/subtle-misinformation/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Idea for system to identify the original source of content</title>
		<link>http://searchphilosophy.org/2006/02/16/idea-for-system-to-identify-the-original-source-of-content/</link>
		<comments>http://searchphilosophy.org/2006/02/16/idea-for-system-to-identify-the-original-source-of-content/#comments</comments>
		<pubDate>Thu, 16 Feb 2006 05:35:45 +0000</pubDate>
		<dc:creator>EveMedia</dc:creator>
		
		<category><![CDATA[Legibility]]></category>

		<category><![CDATA[Search Result Quality]]></category>

		<guid isPermaLink="false">http://searchphilosophy.org/?p=10</guid>
		<description><![CDATA[One of the problems with trying to fight content redundancy over multiple sites is that if youÂ don&#8217;t properly identify the original source of the content, youÂ might penalize the authorÂ by wrongly labeling them as a redunandant instance.Â This has been pointed out by various sources as a possible problem with the current Google rules around duplicate content.Â 
Here [...]]]></description>
			<content:encoded><![CDATA[<p>One of the problems with trying to fight content redundancy over multiple sites is that if youÂ don&#8217;t properly identify the original source of the content, youÂ might penalize the authorÂ by wrongly labeling them as a redunandant instance.Â This has been pointed out by various sources as a possible problem with the current Google rules around duplicate content.Â </p>
<p>Here is one idea for how this problem might be solved.Â </p>
<ol>
<li>A site owner creates an account at a search engine where they establish their identity and receive a unique site id.</li>
<li>The site owner then logs in and provides an article, which the search engine in turn gives an article id.Â  The author embeds this id in the cotent via some kind of meta tag when the article is published to the web.</li>
<li>When content is analyzed by the engine during its normal indexing process, if it sees an article id, the search engine cross-references the id with the site owner id and profile to see if it is registered to that site owner. If the article is registered, the site owner is considered the source whenever there is a question of duplicate content where some site must be hidden.</li>
<li>To deal with instances where content is fraudulently registered, you would need some kind of challenge process where someone could submit evidence that they were the rightful author. In turn, site owners found to be repeat offenders in terms of false content registrations could have their accounts revoked and could be removed from the SERPs entirely in extreme cases.</li>
</ol>
<p>There would seem to be benefits all around from this kind of system.Â  First of all, site owners who work hard to create original content would have more time to spend being creative and could spend less time on copyscape and similar services hunting down the people stealing their articles.Â  It might promote greater use of RSS, as you could build into feed systems something where it is not syndicated until each entry is tagged with a valid article id, giving possible syndicators less to worry about in terms of duplicate content penalties.Â  The engines themselves would benefit as well as they would have a good way to have content pumped into them directly without having to be as dependant on going out to find content.Â  Also, the engines would have yet another possible tool to use to round up and discipline spammers.</p>
<p>There are of course a ton of logistical issues to overcome, and figuring out what to do with all the existing content out there would be an enormous hassle, with huge potential for fraudulent claims.Â  However, ultimately we need something better than the copyright system to identify the source of content.Â Â Â </p>
<p>By the way, if someone does create such a system, I claim my right toÂ article ID #1.Â  First registered post!</p>
]]></content:encoded>
			<wfw:commentRss>http://searchphilosophy.org/2006/02/16/idea-for-system-to-identify-the-original-source-of-content/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Squashing Information Bugs</title>
		<link>http://searchphilosophy.org/2006/02/14/squashing-information-bugs/</link>
		<comments>http://searchphilosophy.org/2006/02/14/squashing-information-bugs/#comments</comments>
		<pubDate>Tue, 14 Feb 2006 11:59:10 +0000</pubDate>
		<dc:creator>EveMedia</dc:creator>
		
		<category><![CDATA[Search Result Quality]]></category>

		<guid isPermaLink="false">http://searchphilosophy.org/?p=9</guid>
		<description><![CDATA[There is aÂ ton of great information online.Â  There is lots of garbage out there as well.Â  One of the recurrent themes of this site is the need for more meta data, generated both by users and site owners, to be used by search engines to improve result quality, and by web browser clients to increase [...]]]></description>
			<content:encoded><![CDATA[<p>There is aÂ ton of great information online.Â  There is lots of garbage out there as well.Â  One of the recurrent themes of this site is the need for more meta data, generated both by users and site owners, to be used by search engines to improve result quality, and by web browser clients to increase information utility.Â  One of the key needs for this is precisely in the area of assessing information accuracy.Â  Site owners are in a good position to outline the limitations of their knowledge.Â  Site users are in a good position to evaluateÂ whether the information onÂ a site worked for them, or whether it has become outdated.Â  Used in combination this could be a powerful system for detailing the relevant conditions under which information may be ofÂ value to anÂ knowledge consumer.Â </p>
<p>A quick illustration of why this matters.Â  Over the last few days I have been working on installing wordpress 2.0 and drupal on my home machine.Â  I want to do some theme and pluginÂ development locally, and wanted to develop a deeper understanding of all the pieces involved in theseÂ open source content management systems. At home I have a mac laptopÂ running os X 10.4, and a windows desktopÂ windows xp pro.Â  To get these CMS systems working you basically need Apache, PHP, Mysql, and Mod_Rewrite.Â </p>
<p>Installing each of these in a windows environment is fairly straightforward once you cobble together all the appropriate installers and instruction sets.Â  However, getting to the point whether you have allÂ that is not so simple.Â  There are lots of great articles out there explaining how to set everything up, however every one I looked at had some significant bug in the article that made the installation processÂ much longer than it needed to be.Â Â In short, make sure IIS is not running when you run the apache installer,Â make sure you edit the right section of httpd.conf when trying to enable mod_rewrite, and understand that some drupal themes just don&#8217;t work with php 5 right now. Â I could now install these systems in 10 minutes on any xp machine, but due to all the buggy info I ran accross, I spent 6 hours in total getting everything working (and mod_rewrite still isn&#8217;t working on my mac yet).Â </p>
<p>The context of development information is somewhat specialized, however,Â any info can be buggy.Â  I&#8217;ve seen product recommendation sites incorrectly list features of products.Â  I&#8217;veÂ seen news articlesÂ that wrongly describe someone&#8217;s educational background.Â  Bad information is out there everywhere.Â  Even when the basic information available is worthwhile, it can also be rendered less valuable by poor information design (check out Edward Tufte&#8217;s analysis of the space shuttle disaster in <a href="http://www.amazon.com/exec/obidos/redirect?link_code=ur2&#038;tag=consumitycom-20&#038;camp=1789&#038;creative=9325&#038;path=http%3A%2F%2Fwww.amazon.com%2Fgp%2Fproduct%2F0961392126%2Fsr%3D8-3%2Fqid%3D1139918160%2Fref%3Dpd_bbs_3%3F%255Fencoding%3DUTF8">Visual Explanations</a><img style="margin: 0px; border: medium none" height="1" src="http://www.assoc-amazon.com/e/ir?t=consumitycom-20&#038;l=ur2&#038;o=1" width="1" border="0" /> for an example of how serious the problems this kind of &#8216;formatting&#8217; problem can cause).Â </p>
<p>So here areÂ my concrete suggestions.Â  There should be aÂ common syntax for article authors to post corrections to their pages.Â Â From a browser point ofÂ view, when correction metadata is found, this can beÂ highlighted in some way,Â  especially for users that visited the article before the corrections were posted.Â  Article owners should also be able to post limitations that govern the utility of their articles.Â  For example, for a technical article you should indicate what kind of technical setup an article applies to.Â  For a product review situation, you might indicate whether or not you tested the product yourself, and if so whether it was in a pre-release version or the final consumer version.Â  This data could also be specially highlighted to help users understand whether you article will actually help them with their questions.</p>
<p>There should also be a common syntax for article users to point out errors and problemsÂ they encountered using the article.Â  This metadata would have to be somehow accessible to other users, so some kind of toolbar entry /Â community search result situation would be best (like my yahoo coupled with the yahoo toolbar with new features).Â  The feedback should be able to cover everything from the accuracy of the basic data being reported to problems with how the information is being displayed (Tufte style analysis).</p>
<p>Over the next couple of weeks I&#8217;ll be suggesting a particular xml document structure representing the kind of feedback and meta information I am interested in readers and authors sharing.Â </p>
]]></content:encoded>
			<wfw:commentRss>http://searchphilosophy.org/2006/02/14/squashing-information-bugs/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
