Wake Up Web 2.0

Sunday, October 29, 2006

Forms of Thought

The Topology of the Forms of Thought


Topology of Thought




Every system with a top-down component of structure needs some rules by which it is governed.

The simple coherence, and integrity of the system is based on it's adoption of universal rules and structures.

Hence, the model incorporates the of logic of both descriptive and prescriptive methods.

Both inductive creation as well as deductive determination are supported.

The bridge between the form of thought, and the content of language is crossed by symbolic representation.

This is a model of symoblic representation. Or, rather, a meta-model.

This model includes all systems of deductive and inductive logic, as well as conceptual spaces, interpreted by symbolic logical systems.

The classes of compontents to the system:



1. Logic - Grammars - This is the rules governing the logical systems. It defines the process by which sentences, questions and proofs are interpreted.

Grammars represent well-formedness of conceptualization within the model. They also contain the logical process interpretations to evaluate truth statements or procedures.

The logic or grammer class has falls into the Interpretation super-class of core classes. This is because is plays a clarifying or elucidating role.

Other synonyms that relate to the logical layer are : system, network, constraints, meta-model, graph, constraints, rules, game.


As a whole class Logical Rule systems can describe collections including itself. As well, sub-classes of rule systems include Processes and Procedures, that are NOT self-referrential.

The core essence of a logic is that it is desriptive. The core property is well-formedness.

- The next two core classes fall into the type of "Identifiers", since they are involved in the process of naming, and identifying objects as they pertain to the platonic world of forms, or the material.

2. Concept

It represents some "space" projected from some model.

Concepts have internal topologies as well as external topologies and properties within their space.

Other notions of concepts are abstractions, simplifications, generalizations, . Understandings and implications are a sub-class of concepts.

Concepts are used in a bottom-up fashion. They are flexible, unrestricted, ad-hoc, and interpretive.

Their creation process is one of identification. Their nature is determined by their properties.

Concepts refer and may contain objects.
Concept creation is a process, and processes are concepts.
Symbols represent a bridge among concepts, to map and convey higher-order concepts.
Symbols as a technique prepresent the concept of conceptualization or abstraction.
Symbols as a useage however represent the sub-class concept of a grammar.

The concept is interpreted using a Model, a concept can or can not be a Model (Grammar). Some concepts are grammars.

Concepts have properties. Properties are founded on concepts.

The core process of a concept is formulation. Creation. Thought.

The core property is Intangibility. Abstraction. Idea.


3. Object

An Object has the core properties is their name or designation.

Objects serve the process of Identification, Encapsulation, Forming boundaries, Defining, Discriminating, Differentiating.

Objects are Concepts. Objects can be colletions, or atomic. Collections are objects.

A concept is one kind of object.

An object relies on a concept for it's meaning, but also refers to a specific name or designation to connect that meaning to a subspace of a conceptual space.

The object is a label on the map of meaning.

Words are objects. Objects use words as components to identify their scope.

Node, Edge.



In our model, a concept is an object, and an object is a concept.




4. Process - Procedures, that are NOT self-referrential. Role as Verb. Sub-class of Logic.

Procedures are applications of rules.

Proof is a Process. (The proof symbol is a separate object)

Establishment of truth or a result. Returning an answer.

Answer.

Processes enable consistency, compliance,.

Evaluation. Applicaitons.

Processes fall into the super-class of "Interprative Classes".

This is because their purpose is clarification, or elucidation.

Procedures are determinative.

They represent an applications of rules, to symbols that may or may not comply with those rules.

Processes take an abstract rule-bases system, and apply it to a symbol, which is presumably mapped to concepts or objects.

Process is the Action that enables language.

Process Verb. IN grammer a process applies to a subject. The subject of a process of typically a symbol, or space.

Depending on if it is an logical, analytic, algebraic, or graphical process.

In design, Process belongs to the "Structural" super-class.

In usage, Process belongs to the "Interpretation". super-class.








5. Properties. Are Attribues.

COmponents.

Defining Characteristics.

Data Members

Members.

Fields.


Properties can be "natural kinds", or composite properties aggregating concepts.

Every other class has properties.

THe concept of a property is that of characteristics.

The usage of a property is that of identification, or differentiation. Properties can be used to unify objects and concepts, or to divide them.

Properties are objects. (everything is an object)

Properties fall into the interpretation, or clarifying aspect.

Property names are symbols that are used to evaluate the scope of conceptual spaces.


6. Collection.

Resource. Data. Store. Set.

Simple ontology. List + document types.

Document Types are collections of properties.

Experienced through a frame of reference.

May be a conceptual collection which represents a notion.

Structure of Resources. Map of Resources.

Interpretations and Intuitions and Understanding are resources.

Collections are structural.

Collections are objects, contain objects, contain properties, may represent properties.



7. Symbol

Information. Small self-contained units of data. Composed of Nothing. Empty. Irreducable representations.

Representation, Data, Number, Abstraction.

Signal. Sign. Message. Word. Letter.

Names are symbols that designate objects.

Visualization.

Properties of symbols.

Symbols are objects, and make use of the concept of abstract representation.

Symbols have no components. Properties are interpreted, not intrinsic.




8. Relationships consist of a special kind of set or collection.

They consist of objects as well as a property.

THey connect collections to one another and properties.

Colletions (of objects, concepts, sybmols, properties, ...) can be related to one another, or other collections heterogeneous.

An object can be related to an object through a property.

And an object can be related to a property. (through another property)





They are used to signify a property of a collection.

They are enabled by the process of identification.


Connetions. Links. vertex.

Set of pairs with values.

Functions are a Surjective relation.

Relationships are used for "structural intent" or "structural use"

And are comprosed and enabled through "structural process".



9. Use. Meaning, Purpose, Concext.

This refers to how an object from a class is used.

Use is functionally a representation for Meaning.

Behaviour type.

Part of Speech represents a use of a collection of symbols as it relates to the process of relating other symbols.

ACtion. Use is a type of process.

Purpose. Meaning. Concext. Semantic Context.

Essence.




This concluded the core classes that have well-defined properties.

Another core class is that of the user of the language.

10 . The User

The reader, listener, or audience.

Agent. Client. Speaker. Input. Frame. Boundary. Terminus. External Node. Uncertainty. Externality. Limitation. Environment.

Frame. Perspective. Dimension. Vector.







Other supplimentay kinds of class types.


Operatior - Conjuction.

And, or, if , if then , then if, if and only if. else. until.


Perepositions, Postpositions. Related to relationships between objects related to spacial and orientation metrics.

(applied to spacial dimensions.)


Punctuation - Document Type. Model of Well-formedness.




Word - Symbol
Pronoun - Pointer (special kind of symbol that points to a location in space.)
Article (a, the) - identifier that contains the property of context, scope, semantic meaning.



Adjective is a property.

Definition - Core property, Axiom.

Meaning, part of speech, use - Intent, Use

Relation - Set Function, Value

Sentence - Collection Notion thought Resource message


Participant - User



Adverbs - Modifier of process, or meaning, Descriptor of meaning.

that modify verbs - sub process, Meta-process- ex. Process Differently - or Process Descriptively.

that modify adjectives - sub property. meta-property. Ex. Identify Meta-describtively

that modify adverbs - Purposefully Alternatively Descriptively Process


One can see that the adverb plays a special role, and needs further investigation.



Tense

Voice

Aspect

Implicature

Friday, October 20, 2006

Google Needs a Classification System - such as Dewey Decimal

 
Google Needs a Dewey Decimal System
 
 
Google's algorithm is already very useful for what it does - Delivers 
likely possible solutions to what might have been the intent of your 
search.
 
It will always be good at that, and continue to get better at doing 
JUST THAT.
 
But, without a dewey-decimal-like system, it falls short as an 
information retrieval ideology.
 
Google as ideology fails when it cannot realize that the world's 
library belongs to the world; not to a private advertisers. Or information 
hogs. Or media giants. Or software companies.
 
In, Of, and By the Public this Library will be.
 
 
 
Benefits of Distributed system.
 
 
In a purely technical sense, the distribted system's does not directly 
make search better, in and of itself.
 
 
It allows search to become better indirectly by making the development 
of search filters and classification systems to be developed and shared 
by everyone in a standard and low-cost environ.
 
Aggregating resources is extremely easy with this system architecture. 
This will encourage many current web publishers to adopt rigorous niche 
classification systems that fit into larger-scope Labeling systems.
 
 
 
It also lowers the cost of innovation in all aspects of the search 
process, by making crawling obsolete, and by making aggregation of 
resources a technologically trivial process (even if it is still a complex and 
multifarious process in terms of the information architecture, the 
"code work" and interfaces are all taken care of. There will be various 
competing providers of the classification schemes, and the markets will 
decide what ones thrive. The architecture allows these to be made 
available to the user.
 
In the exact same vein, it opens the market for enhanced providers of 
filters and algorithms, and various processes that sort, map, and 
predictively analyze the result sets imported from other tools.  
 
In this way, the distributed system greatly changes the economics of 
the many search engine development areas.
 
By isolating the various components of the "search engine', this new 
system will foster greater custmization and innovation - The components 
being the aggregation, the relevance processes (either symbolic, 
conceptual, authoriative, socially computed, or a blend) and the sorting and 
filtering algorithms.
 
The distributed system also changes the politics of search indexes, 
social networks, and any kind of structured resource.
 
 
The current system does not aid those that seek exhaustive, 
authoritative, definative document sets...not sets of possible matches.
 
 
 
When the search result set represents a complete, spam-free, and is 
clear of off-topic content, then it can be seen as a datum in and of 
itself.
 
It becomes an index, of a certian kind of query intent.
 
This datum then can be meaningfully combined, or summarized and 
combined with other complimentary, or supplimentary resources to allow this 
strucuted index to be used by other resources to provide greated depth 
and breadth to the search experience.
 
 
 
The other fundamental limitation of Google is it's ignorance of the 
intent of the searcher.
 
When searching for "jaguar", it gives some possible document lists.
 
But, to result of the "relevance" process, and the sorting process 
cannot be considered aything but an attempt at possible relevance. 
 
The sorting the results as a SET in this case is informationally 
valueless, because the results collect from various contexts of the term - 
the cat, the car, the sporsclub.
 
Its true that a subset of the results represent value to that searcher. 
But it is the laborious job of the searcher to define that subset for 
himself, unfortunately one document at a time. And, worst of all, even 
after the searcher has essentially sorted the search results based on 
relevancy for his intent, the information is usually uncaptured, or 
completely lost, and even the implicit indirect clues lying in 
user-behaviour that could be used to corrolate to the user's intent and satisfaction 
with the results are horded by a company that can't benefit from it's 
value, due to it's antiquated strategic central position as aggregator.
 
So, it is definately the classification that is the largest improvement 
of all.
 
The distributed platform doesn't directly offer any solution itself to 
classify all objects by mechanical processes.
 
But, it does allow systems of communication by PEOPLE, who after all 
are the final judge of the merit of a search results for a query intent.
 
The searcher is the final judge of relevancy, why not make them the 
primary judge?
 
After all, keyword density, pagerank, anchor links, are only 
CORROLATIONS with authority and usefullness, they do not represent this 
themselves.
 
And, the more the patterns are known by manipulators, the more they can 
be gamed in that fasion.
 
Google suffers an adversarial classificaiton problem, there's always a 
way in - spam, or investment in white hat SEO.
 
 
 
 
By lowering costs to build ontologies, (the dewey decimal systems of 
tomorrow), and advanced search processes and filters - distributed 
indexing has an indirect on search relevance, and cost.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Crawling the web is now a redundant process.  Just like other examples 
of peer to peer technilogy making processes obsolete such as ripping 
your own CD, when you can just download the MP3 from a p2p network. 
Storing for the first time is done. Its just re-getting, finding from one of 
many shared resources. 
 
 
Looking beyond the page - 
 
References - Google only makes it's final sorting decision based on 
popularity, not the degree to which a document references other 
authoritative sources.
 
That means a document that is highly referenced will have priority over 
those that are not.
 
All else being equal, this is a good technique for determining a 
corrolation to authority.
 
 
Some searchers may only want to search sources that themselves are well 
documented, well-cited, and employ rich sets of references within their 
document.
 
 
Having references within their document, to other authoritative 
documents is a corrolation not only with relevance, but also authority in the 
true sense of the word.
 
 
It is true that Google does give some benefit to those linking out to 
sites with Good trust rank, and that outgoing links can effect the SEO 
results, but this is more of a way to avoid the spam filter than 
something that will cause a surge in the rankings.
 
Basically, there is no way for the searcher to specify the degree of 
references in the documents they search. This is a limitation.
 
Not all queries require a document heavy in citiations, but, with spam, 
and shoddy content, made-for-AdSense pages, the option to filter 
documents without citations and refences is a necessary one for true 
knowledge retrievers.
 
 
 
 
 
Document Density (of Topics or Concepts)
 
 
Just as contemporary web "Page" based search engines use "keyword 
density" to estimate or predict the likely relevance of a resulting page, 
the new system of searching "resources" will be largely influenced by 
that resources proportion of documents pertaining to the given topic.
 
For instance, one resource may have fewer overall results, but have 
results almost completely dedicated to your intended information.
 
THis is certainly a more time-efficient resource to begin with, and 
resources of this type make better resources for beginners and experts 
(perhaps not the same ones, but denser resources will be better in both 
instances)
 
Sparse resources will many times actually give more results.
 
Anyone who's gotten over 1 million Google results for a query probably 
hadn't made time to go through them all, and probably realizes that 
this would be useless.
 
Sparse results mean that although the resource contains results, since 
it is a miniscule part of its document set, it is unlikely to have 
well-structured indexes and supplemental indexes pertaining to the topic 
and is also likely to lack useful query-intent clarification procedures 
to your specific intent.
 
 
The sparse resource may have fewer results, or may have more results.
 
THe more results usually contain garbage, spam, and "lesser sources", 
one that probably aren't worth of informational content, but are 
on-topic nonetheless.
 
Even if the sparse collection contains truly more "good resources", the 
fact that they are essentially HIDDEN in the result set with other junk 
makes it a less valueable resource.
 
The dense sources have none of these distractions.
 
The sparse sources may have more 'good resources', but they don't have 
any classification system, or sub-indicies, or query clarification 
tools to let a person search with tools that pertain to their domain.
 
In a sense, sparse sourcss are just one area of a tangled web. The 
larger tangled web isn't structured conceptually, and neither will be the 
'good resource' page results, however numerous they are.

The SCOPE of Progress

The Public Venture : How public resources will revolutionize data 
markets and information retrieval.
 
 
I show how centralized sources, obscured by secrets and 
limitations, are not the future of finding information.
 
I see technology allowing large data sets to be interoperable, to 
effectively merge, while I forsee the underlying business interests and 
markets on a course to fragmentation into various components of the 
info retrieval process. (crawling, indexing, aggregating, categorizing, sorting, filters and interfaces)
 
I give an idea of what the world might be like if social network 
profiles began to use a system of publishing now used by blogs. T
his would be a more 'open' and free form of mass publication, due 
to the non-exclusive role of large publishers to control their medium.
 
 
I shows how in his world, the popular social networking activity 
would never be able to be acquired or controlled. 
 
I no longer see the social network services delivered by the current 
giants as remaining the  "destination" for their users. They will be 
reduced to just a few of the many tools people use to reach and 
communicate with the larger outside databases - The SocialSphere.
 
This doesn't mean that these companies won't be wildly successful as media companies, it 
means that their power as exclusive social networks is inherently unstable, and they will constantly 
have to re-define themselves.
 
Myspace will always be a successful web destination, but it's focus on 
core social networking will have to get less and less, (just like 
Google's focus on search)
 
Google says it's core is 70% search, but I think most of that is 
advertising research and optimization. This clearly has it's place, but 
Google has realized that to improve their search DRASTICALLY, would be to 
admit there is a DRASTIC problem with their current technique.
 
 
Myspace will be the new MTV, with pretty faces, new studio promotions 
and large brand atmosphere.
 
The users will remain, but will begin using more powerful tools to go 
beyond only the myspace experience.
 
 
There is one area of innovation that large markets like Google and 
Myspace cannot even go near.
 
 
That is the market of becoming part of a larger world of information 
sharing.
 
 
Current, each social network is a gated community. 
 
Even if membership is open, you can only search and see information 
from ONE site, ONE database.
 
This means that if myspace doesn't get around to implement a feature, 
it won't happen.
 
Even if the feature takes less than 1 hour of computer coder time.
 
 
 
Google will lose relevance as far as aggregating documents and lose some prominence in search, but will remain highly successful at incubating other internet properties, primarily in web media, and probably not as much in web widgets.
 
The reason is that anyone can make widgets, but not everyone can make 
large media acquisitions which place themselves as or with large publsher networks and advertiser demand.
 
So, now google will remain strong from their cash position, and their 
media portfolio, as well as their technological eminance as a data 
processor.
 
Google will have an exciting future as a company, more focused on 
advertising and media in the future rather than web document search.
 
 
Web search will continue to get more corporatized, with larger and more 
numerous advertising placements. If a paid placement program was to 
ever occur for indexing as it does with Yahoo's paid inclusion, it would 
mean the end of the Google resource as we know it.
 
 
 
 
 
Evolving advertising trends suggest advertisers will begin to track the entire search experience, along with 
email, social networking and video viewing experience to optimize campaigns 
just for YOU.
 
 
Top 12 things Google will never do.
  
1. Google won't ever know what it's searching. 
No conceptual understanding.
         No categorization scheme. Google doesn’t discern between fiction, nonfiction, reference, age-appropriate material, ...
2. Google will never understand what your intent was from your keyword. It can't understand the context or purpose of a search from just symbols alone. It has no conceputal framework.

Can't refine based on conceptual query intent, not keyword. Offers no way to refine search.
 Does "baseball" mean the "game", the "league" (MLB), or the "ball"? It 
is used in all of these contexts interchangably.
3. Can't learn from users, or let users use groups to filter. The goal of 'making the world's information accessible, may be noble, but unlessy ou open the system to
LET OTHER make the world's information structured and accessable, it's going to remain a artificially "intelligent" GAME. - Feeling lucky?
4. No privacy, one company masses all data and sells or uses the data on its own.
 
5. Ownership and Control
         Censorship. 
         Advertising corrupting experience or results.
         Market position inhibits integration, and possibilities.
         Non-monopoly status of search, and publishing industries.
 
By contrast, the Benefits of an Open Platform - open data, open summaries, open relevance scores, 
sorts and filters.
          Customization, Upgrades, Flexibility
         More competitive markets
         Interoperability
         Non-monopoly status of search, and publishing industries.
         Everyone gets to use the cool toys at the Googleplex.
 
 
6. Never complete search of all libraries.
 If Google can't find it, forget about it. They won't point you in the direction of another resource that is known to have good results
for your kind of query. Google doesn't summarize other rich resources. They try to be the one resource. They can't admit their limitations.

Even Yahoo would direct you to another engine if your query gave no results.

You need to admit when another resource has better results. Google is too centered around 'documents' and not enough on resources as a whole.
 
7. Always prone to spam, manipulation through data, manipulation with 
money, non-first-rate resources by design.
 There is bad spam, and then there is manipualation of results that are somehow approved, like buying web sites , buying links, doing PR stunts.

Does this lead to better results? Or just more "competitive" ones?
 
8. No authority, or authenticity, or certainty. Everything is just a 
guesswork, probability, mass voting, mob rule, educated guesses. They 
give you up to millions of results, because, you AREN'T feeling lucky.
 I can't even trust which mob I want to rule my results? I'm stuck with the largest mob.
 
9. Advertising Banditry. Publishers don't know how much they make. 
Click fraud rampant. CPC model ensures advertisers pay the most. But, 
publisher sees little of this.
Adsense inefficiencies.
 
10. Made only to search unstructured data; Can't understand structured 
knowledge.
 Google's intent, and very assumption for it to work, is that it knows NOTHING about what it is searching in AND what it is searching for.

Google was made to sort the structureless web. It's great at that, but the structureless web sucks.
 
 
11. Only designed to crawl the outer web. The inner web is uncrawled.
 
 Other applications of distributed computing: 
Anti-manipulation, Pro-accountability in areas of

Accounting
Elections
Trade

 
Google is a tool, not an answer. It's initial results, in aggregate, 
are valueless without human sifting afterwards.
 
 
The environment that created Google was one of total lack of 
cohesiveness to the web.



Google Needs a Dewey Decimal System
 
Google's algorithm is already very useful for what it does - Delivers 
likely possible solutions to what might have been the intent of your 
search.
 
It will always be good at that, and continue to get better at doing 
JUST THAT.
 
But, without a dewey-decimal-like system, it falls short as an 
information retrieval ideology.
 
Google as ideology fails when it cannot realize that the world's 
library belongs to the world; not to private advertisers or a small band of dorks. Or information 
hogs. Or media giants. Or software companies.
 
In, Of, and By the Public this Library will be.
 
 
 
Benefits of Distributed system.
 
 
In a purely technical sense, the distribted system does not directly 
make search better, in and of itself.
It allows search to become better indirectly by making the development 
of search filters and classification systems to be developed and shared 
by everyone in a standard and low-cost environ.
 
Aggregating resources is extremely easy with this system architecture. 
This will encourage many current web publishers to adopt rigorous niche 
classification systems that fit into larger-scope Labeling systems.
    
It also lowers the cost of innovation in all aspects of the search 
process, by making crawling obsolete, and by making aggregation of 
resources a technologically trivial process (even if it is still a complex and 
multifarious process in terms of the taxonomy,  the 
"code work" and interfaces are all taken care of. There will be various 
competing providers of the classification schemes, and the markets will 
decide what ones thrive. The architecture allows these to be made 
available to the user.
 
In the exact same vein, it opens the market for enhanced providers of 
filters and algorithms, and various processes that sort, map, and 
predictively analyze the result sets imported from other tools.  
 
In this way, the distributed system greatly changes the economics of 
the many search engine development areas.
 
By isolating the various components of the "search engine', this new 
system will foster greater customization and innovation - The components 
being the aggregation, the relevance processes (either symbolic, 
conceptual, authoriative, socially computed, or a blend) and the sorting and 
filtering algorithms.
 
The distributed system also changes the politics of search indexes, 
social networks, and any kind of structured resource.
 
The current system does not aid those that seek exhaustive, 
authoritative, definative document sets...not sets of possible matches.
    
When the search result set represents a complete, spam-free, and is 
clear of off-topic content, then it can be seen as a datum in and of 
itself.
 
It becomes an index, of a certian kind of query intent.
 
This datum then can be meaningfully combined, or summarized and 
combined with other complimentary, or supplimentary resources to allow this 
strucuted index to be used by other resources to provide greated depth 
and breadth to the search experience.

As long as there is still noise and potentially bad results, Google's results don't represent knowledge.

The user has to turn that information into knowledge by going through and evaluating the results.



 
 
The other fundamental limitation of Google is it's ignorance of the 
intent of the searcher.
 
When searching for "jaguar", it gives some possible document lists.
 
But, to result of the "relevance" process, and the sorting process 
cannot be considered aything but an attempt at possible relevance. 
 
The sorting the results as a SET in this case is informationally 
valueless, because the results collect from various contexts of the term - 
the cat, the car, the sportsclub.
 
Its true that a subset of the results represent value to that searcher. 
But it is the laborious job of the searcher to define that subset for 
himself, unfortunately one document at a time. And, worst of all, even 
after the searcher has essentially sorted the search results based on 
relevancy for his intent, the information is usually uncaptured, or 
completely lost, and even the implicit indirect clues lying in 
user-behaviour that could be used to corrolate to the user's intent and satisfaction 
with the results are horded by a company that can't benefit from it's 
value, due to it's antiquated strategic central position as aggregator.
 
So, it is definately the classification that is the largest improvement 
of all.
 
The distributed platform doesn't directly offer any solution itself to 
classify all objects by mechanical processes.
 
But, it does allow systems of communication by PEOPLE, who after all 
are the final judge of the merit of a search results for a query intent.
 
The searcher is the final judge of relevancy, why not make them the 
primary judge?
 
After all, keyword density, pagerank, anchor links, are only 
CORROLATIONS with authority and usefullness, they do not represent usefulness 
themselves.
 
And, the more the rigid rules are made apparent to manipulators, the more they can 
be gamed in that fashion.
 
Google suffers an adversarial classificaiton problem, there's always a 
way in - spam, or investment in white hat SEO.
 
 
 
 
By lowering costs to build ontologies, (the dewey decimal systems of 
tomorrow), and advanced search processes and filters - distributed 
indexing has an indirect and pronounced effect on search relevance and cost.
 
 
 
Crawling the web is now a redundant process.  Just like other examples 
of peer to peer technilogy making processes obsolete such as ripping 
your own CD, when you can just download the MP3 from a p2p network. 
Storing for the first time is done. Its just re-getting, finding from one of 
many shared resources. 
 
 
Looking beyond the page - 
 
References - Google only makes it's final sorting decision based on 
popularity, not the degree to which a document references other 
authoritative sources.
 
That means a document that is highly referenced will have priority over 
those that are not.
 All else being equal, this is a good technique for determining a 
corrolation to authority.
  
Some searchers may only want to search sources that themselves are well 
documented, well-cited, and employ rich sets of references within their 
document.
 
Having references within their document, to other authoritative 
documents is a corrolation not only with relevance, but also authority in the 
true sense of the word.
 
It is true that Google does give some benefit to those linking out to 
sites with Good trust rank, and that outgoing links can effect the SEO 
results, but this is more of a way to avoid the spam filter than 
something that will cause a surge in the rankings.
 
Basically, there is no way for the searcher to specify the preference for rich  
references in the documents they search. This is a limitation.
 
Not all queries require a document heavy in citiations, but, with spam, 
and shoddy content, made-for-AdSense pages, the option to filter 
documents without citations and refences is a necessary one for true 
knowledge retrievers.
 
 
 
Document Density (of Topics or Concepts)
 
 
Just as contemporary web "Page" based search engines use "keyword 
density" to estimate or predict the likely relevance of a resulting page, 
the new system of searching "resources" will be largely influenced by 
that resources proportion of documents pertaining to the given topic.
 
For instance, one resource may have fewer overall results, but have 
results almost completely dedicated to your intended information.
 
THis is certainly a more time-efficient resource to begin with, and 
resources of this type make better resources for beginners and experts 
(perhaps not the same ones, but denser resources will be better in both 
instances)
 
Sparse resources will many times actually give more results.
 
Anyone who's gotten over 1 million Google results for a query probably 
hadn't made time to go through them all, and probably realizes that 
this would be useless.
 
Sparse results mean that although the resource contains results, since 
it is a miniscule part of its document set, it is unlikely to have 
well-structured indexes and supplemental indexes pertaining to the topic 
and is also likely to lack useful query-intent clarification procedures 
to your specific intent.
 
 
The sparse resource may have fewer results, or may have more results.
 
THe more results usually contain garbage, spam, and "lesser sources", 
one that probably aren't worth of informational content, but are 
on-topic nonetheless.
 
Even if the sparse collection contains truly more "good resources", the 
fact that they are essentially HIDDEN in the result set with other junk 
makes it a less valueable resource.
 
The dense sources have none of these distractions.
 
The sparse sources may have more 'good resources', but they don't have 
any classification system, or sub-indicies, or query clarification 
tools to let a person search with tools that pertain to their domain.
 
In a sense, sparse sourcss are just one area of a tangled web. The 
larger tangled web isn't structured conceptually, and neither will be the 
'good resource' page results, however numerous they are.