[om-list] Re: Naming Systems

Wed Sep 27 01:33:43 EDT 2000

Luke Call wrote:

> One thing I'm not sure of yet is if
> their concept can handle arbitrary knowledge stored in an object model
> and is able to do all the things in our requirements list. 

First class symbolic representation systems can handle arbitrary knowledge by
their very nature. I doubt Cyc does very much in the distributed database
area, but what they do is nearly irrelevant to us unless some major OS
manufacturer decides to bundle Cyc with a popular operating system.

> or maybe we don't want to
> pre-define any taxonomy--our taxonomy is the real world.

Strictly speaking, the real world does not have any taxonomies at all. A
taxonomy is a hybrid of a model and naming convention for a model.  The 
model of course, isn't "real" either, its just a method of thinking about the
real world.

In short, the only way we can have a model without a taxonomy is to neglect to
give names to the entities within the model.  I have yet to see anyone do any
serious thinking without giving names to the entities he is thinking about, so
I don't think that is a very likely possibility.

The more serious problem is one of communication - how are you possibly going
to discuss the validity of a model with someone else if you have no names for
anything?  Or, more to the point, how are two models supposed to be merged or
analyzed automatically unless there is a rigorous naming convention for the
things they have in common?

You could make references using globally unique synthetic identifiers, but
they keep your model from being analyzed in a human readable format. Would it
be possible to write a rock solid multi-million line program using a point and
click interface?  If you wanted to debug it, even the simplest pieces would
look something like:

#include {787A1520-75B9-11CF-980D-444553540000}

{E4C18D40-1CD5-101C-B325-00AA001F3168} {EC4CF635-D196-11CE-9027-02608C4BF3B5}
({E4C18D40-1CD5-101C-B325-00AA001F3168} argc, 
 {1EFB6596-857C-11D1-B16A-00C0F0283628} ** argv)
{
 {C74190B6-8589-11D1-B16A-00C0F0283628}("Hello, World!\n");
}

Alternatively, if we had followed the advice of one early hypertext developer,
what we now know as URLs might look like huge numbers (humbers) instead:

e.g. http://837598.327056.891372598.327582937.5982375

Key systems like that, while potentially more efficient for the computer, make
life really difficult for developers, which is one of the reasons global
numeric universal naming conventions rarely get anywhere.

> I often consider that
> the D&C says the earth will be someday be like glass, a Urim and Thummim
> to its inhabitants, showing all things pertaining to a lower order of
> kingdoms etc. etc., which strikes chords in me relative to having one
> massive database with all available truth stored in it. 

I believe the statement should be taken literally - i.e. rather than storing a
model of reality in some sort of database, the Earth becomes a reality
projector of sorts, more like a telescope than a book.

> If the goal is to store all
> knowledge, then what is knowledge, at an atomic level? I currently
> believe it is an object model of the things in my mind or someone else's
> mind. 

I think it is safe to say that most informal knowledge is just a collection
of  associations of various strengths, much like a neural network.  Formal
knowledge, however, involves reasoning in some sort of symbolic system.  The
first requirement for any symbolic system is a naming system equivalent to
keep track of the difference between various symbols.  

The very first name equivalent in your brain is the collection of associations
that surround a given entity.  Instead of "Mom", it is "the entity that
matches this facial pattern".  Surely a child can perform basic reasoning
without ever learning the proper names for the entities he is reasoning
about.  But nearly as certainly, such reasoning is slow and inefficient
compared to the speed of thinking that comes when he can call up entities by
name rather than by association alone. 

However, names are far more important for communication than for internal
processing. The power of a name lies completely in the fact that there is a
generally large audience that knows both the name and what it properly
represents.  So powerful, in fact, that no one bothers to try to do formal
thinking in any other way.  It doesn't matter how smart you are, if you can't
convert what you want to say into well organized sentences composed in a
vocabulary already known by your audience, your thoughts are nearly useless.

> I don't (yet?) understand how extensive lists of logic statements
> (and surely not statistical sentence comparisons) get you there. 

First of all, an object model *is* an abbreviated list of logical statements.
The power of logical statements is that they can represent knowledge that an
object model cannot, or at least cannot without using the object model as a
grammatical system for constructing logical statements.  As an example, you
might consider how you would represent the Pythagorean theorem in an object
model.

The really big difference is that logical systems have the notion of a
variable and ordinary object models do not.  Consider what is called "first
order predicate logic" - it consists of statements with variables and
existential qualifiers.  Variables are symbols for objects of a certain
class.  Existential qualifiers are operators like "there exists", "for all",
"there exists at least n", and so forth. 

The key difference between a variable and an entity in an object model is that
there may be more than one variable of precisely the same class within a
logical statement which have no distinction except for the roles that they
play within a given statement:

e.g. For all numbers X and Y, there exists exactly one number Z such that X -
Y = Z

> To
> create an object model all you may need (at least to begin with?) are
> ways to add objects/classes/attributes/relationships etc., and
> manipulate them, in a way that remains coherent. Again, I am probably
> missing the picture as you see it, but it seems like that's all we need
> to start, then we can study the algorithms to traverse (search) it
> effectively, nicer interfaces, multiuser, bulk data import, and so on.

You are making the assumption that most knowledge worth representing can be
represented easily in the form you describe.  My point is that such a form is
much too restrictive to handle even basic logical inferences, let alone
principles of math or physics.

> For example, one of my first test uses of the system will be to input
> the personal organizational things that I now enter into a Sharp Wizard
> and various outline/text editor things I have. Then genealogical
> information and family pictures, etc. I don't want it limited,
> but very flexible and still very consistent. 

Why don't you write a prototype that does just that?  It can't be horribly
difficult and would certainly have great utility as an information organizer
even if it doesn't do obscure things like prove mathematical theorems like I
would like a system to be capable (at least in principle) of doing?

> Perhaps others will do the
> same--modeling or recording information from biology, language, or
> whatever interests them, without being constrained by a need to really
> understand any formal system or taxonomy (though there may be such,
> under the hood)

Again, the whole purpose of formal naming systems or taxonomies, is to make
communication possible.  Without a naming system, every communication is
reduced to the level of a two year old pointing at a cookie jar.

> I also am hoping that any model our software builds would be independent
> of human language. Of course it has human language terminology tied to
> it, but any object (and/or class, as we've discussed previously) could
> have any number of names, and eventually or system could recognize the
> context of
> the user and thus tie the user's current terminology set to the objects
> being viewed at any given time. 

*Any* unified model of model of knowledge defines a new synthetic language, or
"interlingua".  At least one "namespace" is unavoidable - the only question is
whether it is human readable or not.  The Cyc people have designed a symbolic
representation system that is more distinct from English than Chinese is.  The
system could care less what the names of the symbols are, they are simply
given English-like names so that the system developers and users can easily
translate from the way the system thinks to the way they think.

> Thus, there is no "namespace", just an
> extensive set of related objects, recognized primarily by their
> relationship to the user, or to some searched-for object in the user's
> "personal" namespace (or context)

I should mention that languages like C++ support an infinite hierararchy of
namespaces and we can too.

> , the actual determiner of an object's
> identity being not a name but attributes--place in time and space (and
> genealogy, for people).

Identification based on simple attributes alone causes problems like the
situation of having two people independently develop theories of advanced
economics using only words known to first graders and then attempting to have
a computer merge them together into a single unified theory without further
input.  A shared formal naming system places this task within the realm of
possibility.

> The phrase I think of often is
> "truth is things as they are, as they were, and as they are to come", so
> then our object model seems so big that is is unwieldy to try have a
> namespace to name all data types

Surely one could make up synthetic identifiers to take the place of names for
objects without formal names, e.g. {C74190B6-8589-11D1-B16A-00C0F0283628}

, except within a given context, where
> collections of objects represent "contexts", or the terminology that a
> given set of users remembers things by (which names may be used by
> another user for a dozen other things, in various other contexts). Some
> objects may not
> have a name at all, but merely a time/place and relationships to other
> objects. There are hierarchies of contexts for different languages,
> domain areas, or even for different periods of a person's life. But I
> don't envision a namespace. (I say all this so you can see where I'm
> coming from, and then how best to correct me.)

There is nothing that stops an entity from having a name in an infinite number
of namespaces.  The only necessity for rigorous representation is to have a
good way to determine which namespace should be used. Languages like C++ use
namespace qualfiers to do this, e.g. Geometry::Triangle.  Alternatively you
can abbreviate by writing within a given context, e.g:

namespace Geometry
{
  class Polygon;
  class Triangle isA Polygon;
}

Now if we had a globally distributed namespace system, everything I write by
hand would have to look something like:

namespace Mark_Butler_19701014
{
  ::Physics::Space is ::Geometry::Euclidean.
}

The leading colons ('::') are like a leading slash ('/') in Unix - it refers
to the root namespace.  In other words, what I just wrote says that Mark
Butler believes that standard physical space has the standard geometrical
property Euclidean.  I also could write:

namespace Mark_Butler_19701014
{
  entity Physical_Space;
  entity Euclidean_Geometry;
  Physical_Space isA Euclidean_Geometry.
}

The disadvantage of the latter is that if I want to merge my model with
someone else's, somebody has to manually find matches and check correspondence
for every entity we might have in common.  When a system gets to be
sufficiently complex that becomes intractable, hence the proliferation of
formal naming systems in every branch of human endeavor.

> By the way, what does Cyc mean when they say it isn't "frame-based"?
> Does that mean it uses a bunch of declarations and logic algorithms to
> traverse them, instead of tying all of them together in a unified model
> with relationships running all around?

The short answer is that "frame-based" systems correspond to what is known as
the "psychological view" of intelligent reasoning, whereas most other systems
are based on the "logical view" of intelligent reasoning.  The psychological
view, roughly speaking, is fuzzy, empirical, and associative, whereas the
logical view, roughly speaking, is definite, designed, and explicit. Cyc is an
example of the logical view, while knowledge derived from statistical sentence
analysis is more likely to be based on the psychological view.

For a much better explanation, please read the following article:

What is a Knowledge Representation? R. Davis, H. Shrobe, and P. Szolovits. AI
Magazine, 14(1):17-33,1993.

http://www.medg.lcs.mit.edu/ftp/psz/k-rep.html

-- 
Mark Butler	       ( butlerm at middle.net )
Software Engineer  
Epic Systems              
(801)-451-4583