One of the problems with trying to fight content redundancy over multiple sites is that if you don’t properly identify the original source of the content, you might penalize the author by wrongly labeling them as a redunandant instance. This has been pointed out by various sources as a possible problem with the current Google rules around duplicate content.Â
Here is one idea for how this problem might be solved.Â
- A site owner creates an account at a search engine where they establish their identity and receive a unique site id.
- The site owner then logs in and provides an article, which the search engine in turn gives an article id. The author embeds this id in the cotent via some kind of meta tag when the article is published to the web.
- When content is analyzed by the engine during its normal indexing process, if it sees an article id, the search engine cross-references the id with the site owner id and profile to see if it is registered to that site owner. If the article is registered, the site owner is considered the source whenever there is a question of duplicate content where some site must be hidden.
- To deal with instances where content is fraudulently registered, you would need some kind of challenge process where someone could submit evidence that they were the rightful author. In turn, site owners found to be repeat offenders in terms of false content registrations could have their accounts revoked and could be removed from the SERPs entirely in extreme cases.
There would seem to be benefits all around from this kind of system. First of all, site owners who work hard to create original content would have more time to spend being creative and could spend less time on copyscape and similar services hunting down the people stealing their articles. It might promote greater use of RSS, as you could build into feed systems something where it is not syndicated until each entry is tagged with a valid article id, giving possible syndicators less to worry about in terms of duplicate content penalties. The engines themselves would benefit as well as they would have a good way to have content pumped into them directly without having to be as dependant on going out to find content. Also, the engines would have yet another possible tool to use to round up and discipline spammers.
There are of course a ton of logistical issues to overcome, and figuring out what to do with all the existing content out there would be an enormous hassle, with huge potential for fraudulent claims. However, ultimately we need something better than the copyright system to identify the source of content.  Â
By the way, if someone does create such a system, I claim my right to article ID #1. First registered post!