Wake up Web 2.0 to the Challenge
My intention is to uncover one very large-scale implementation that incorporates the fundamentals behind what people are calling, web 2.0, and how it can be extrapolated to its full potential.
First, I want to point our attention to an interesting comment made by none other than the Google duo as their famous research paper on their PageRank and Centralized Repository.
The keywords would be "Anatomy of a Large Scale Hypertextual Search Engine"
This was written at a more academic, idealistic time, where, speculation and ideology were still delivered with the intent of forthcoming academic insight and predictive precision.
Interestingly enough, the paper closes with this thought about the limitations of Google's Centralized Indexing system, and the next frontier of search, which they identify as distributed, de-centralized search indexing.
Here is the final paragraph:
Of course a distributed systems like Gloss [Gravano 94] or Harvest will often be the most efficient and elegant technical solution for indexing, but it seems difficult to convince the world to use these systems because of the high administration costs of setting up large numbers of installations. Of course, it is quite likely that reducing the administration cost drastically is possible. If that happens, and everyone starts running a distributed indexing system, searching would certainly improve drastically.
-Sergey Brin and Lawrence "PageRank" Page
As we can see, the famous duo has given us some wisdom and parting thoughts as to the most important evolution in information retrieval.
A decentralized database is clearly a beautiful evolution.
Its already happened for blogs.
Each blog is just a mini index of posts. Totally searchable on its own.
But, blogs have adopted standard document types to facilitate RSS feeds that allow them to be aggregated, processed and indexed elsewhere to be available for other users and aggregators.
Blogs have easily demonstrated one possible way to make content sharable, and indexable and readable throughout various web services.
In fact, one of the most interesting evolutions in the social networking arena can be made possible by something as simple as allowing "Friend Feeds", or "Profile Feeds", which enable the aggregation of profiles in other places.
Some aggregators will focus on being comprehensive and to cover as many major sources as possible.
Other aggregators will offer specialized databases of specific domains of knowledge.
How easy would it be to make tools to model and process the friend-o-sphere, and the web-index-o-sphere when an open distributed decentralized database was fully available to compliant warehousers willing to maintain the integrity of the system?
I would say that this is nothing but a true revolution in media reaching it's full potential.
Can the social sites that want a piece of the myspace pie afford NOT to make use of this extensible platform?
Can search engine underdogs afford not to commoditize the web index, to function more on processing, user behavior, and more calculation, and thusly turn the world of "search", upsidedown?
The new data warehouse model is not that of the "site", it is of the "service", or feed.
Sites will continue to be locations to access and process data from web services.
For more information on our Social Networking Initiatives, please visit
http://www.connectsociety.com
And for information on our web object repository framework, please join us at
http://www.linkassociation.com
Search is going to fragment into it's disciplines: compilaton, structuring, and retrieval.
Search is going to fragment into it's disciplines: compilaton, structuring, and retrieval.
Web 1.0 - going to my library, doing a keyword search for books at my library building.
web 2.0 - going to your library, using the computer to search all libraries, or only the libraries relevant to you.
- Being able to make use of one or more supplimental classifiction systems for all or part of the library. (such as dewey decimal, library of congress, or any other well-defined topical heirarchy or ontology)
- Being able to use any retrieval technique from any library in the world. (not just the ones your library offers.)
Thats just the beginning. Now that this rich data is so available upon call, the software to know what you want will get even better.
Snippets from a paper on Distributed Indexing. (
This project describes an indexing scheme that will permit thorough yet efficient searches of millions of retrieval systems.
(In the system:)
An index broker is a database that builds indices. [...] Index brokers can index the contents of a "primary database" and, in fact, other index brokers.
The broker's "generator", the query that it registers at the site brokers of the databases that it indexes, defines an index broker.
We need more than a few aspiring research mega-labs to be crunching this data.
The world years to breathe knowledge.
Toward Fellowship and Advancement,
MR
References:
1. Anatomy of a Large-Scale Hypertextual web Search Engine
Brin, Page. 1996
2. Distributed Indexing: A Scalable Mechanism for Distributed Information Retrieval
Danzig, Ahn, Noll, Obraczka. (danzig@usc.edu)
http://www.techcrunch.com/2006/08/08/web-20-the-24-minute-documentary/trackback/
0 Comments:
Post a Comment
<< Home