- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Russell Standish <r.standish.domain.name.hidden>

Date: Mon, 6 Jun 2005 10:21:15 +1000

On Fri, Jun 03, 2005 at 04:22:07PM -0700, "Hal Finney" wrote:

*> Russell Standish recently mentioned his paper "Why Occam's Razor" which
*

*> can be found at http://parallel.hpc.unsw.edu.au/rks/docs/occam/ . Among
*

*> other things he aims to derive quantum mechanics from a Schmidhuber type
*

*> ensemble. I have tried to read this paper but never really understood it.
*

*> Here I will try to ask some questions, taking it slowly.
*

*>
*

*> On this page, http://parallel.hpc.unsw.edu.au/rks/docs/occam/node2.html ,
*

*> things get started. Russell describes a set of infinite bit strings he
*

*> calls "descriptions". He writes:
*

*>
*

*> "By contrast to Schmidhuber, I assume a uniform measure over these
*

*> descriptions -- no particular string is more likely than any other."
*

*>
*

*> This surprises me. I thought that Schmidhuber assumed a uniform measure
*

*> over bit strings considered as programs for his universal computer. So
*

*> what is the "contrast" to his work?
*

Nowhere in Schmidhuber (1997) does he propose a measure over the "input

programs". What he does is is justify the appearance of a universal

prior in the set of descriptions by passing the raw data through a

reference UTM. Presumably the set of descriptions is sampled uniformly

by the observer.

In Schmidhuber (2000), the set of descriptions is generated by a

machine which has resource bounds. This leads to the notion of "speed

prior" which differs from the universal prior in several important

respects. I sometimes refer to the two different ensembles as

Schmidhuber I and Schmidhuber II.

I am beginning to regret calling the all descriptions ensemble with

uniform measure a Schmidhuber ensemble. I think what I meant was that

it could be generated by a standard dovetailer algorithm, running for

2^\aleph_0 timesteps. However, as the cardinality of "my" ensemble is

actually "c" (cardinality of the real numbers), it is quite probably a

completely different beast. It is also not generated by a program,

Schmidhuber style, it simply is (in the sense of being the simplest

set - equivalent to nothing).

*>
*

*> It seems that the greater contrast is that while Schmidhuber assumed that
*

*> the bit strings would be fed into a computer that would produce outputs,
*

*> Russell is taking the bit strings directly as raw data.
*

*>
*

Quite true.

*> But I am confused about their role.
*

*>
*

*> "Since some of these descriptions describe self aware substructures..."
*

*>
*

*> Whoa! This is a big leap for me. First, I am not too happy that mere bit
*

*> strings have been elevated with the title "descriptions". A bit string on
*

*> its own doesn't seem to have the inherent meaning necessary for it to be
*

*> considered a description.
*

Many apologies for deploying terminology in a different way to you

expect. A description (in my terminology) does not necessarily have

meaning. It is simply data. This is in accord with how I use the term

in casual English usage too - a description is simply a string of

letters, and may or may not be meaningful. Meaning is attached by an observer.

Now an observer will expect to find a SAS in one of the descriptions

as a corrolory of the anthropic principle, which is explicitly stated

as one of the assumptions in this work. I make no bones about this - I

consider the anthropic principle a mystery, not self-evident like

many people. Why should an observer expect to see a token of erself

embedded in reality? That is the mystery of the AP.

*> And now we find not only that the bit string is
*

*> a description, but it is a complex enough description to describe SAS's?
*

*> How does that work?
*

*>
*

The bitstrings are infinite in length. By reading enough bits, they can

have arbitrarily complex meanings attached to them.

*> It's especially confusing to read the introductory word "since" as though
*

*> this is all quite obvious and need not be explained. To me it is very
*

*> confusing.
*

*>
*

Sorry for not going slow enough. The habits of concise expression are

hard to shake.

*> The page goes on to identify these SAS's as observers. Now they are mappings,
*

*> or equivalently Turing Machines, which map finite bit strings to integers.
*

*> These integers are the "meanings" of the bit strings.
*

Not equivalently. Not all maps can be represented by a Turing machine,

only computable ones.

*>
*

*> I believe the idea here is that the bit strings are taken as prefixes
*

*> of the "description" bit strings in the ensemble. It is as though the
*

*> "observers" are observing the "descriptions" a bit at a time, and mapping
*

*> them to a sequence of integer "meanings". Is that correct?
*

*>
*

Indeed that is one interpretation. The most important point is that

the observer map is a "prefix" map, in the sense of prefix machines of

Algorithmic Information Theory. In reading a bit string one bit at a

time, once a meaning is attached to the string, that is the meaning

for evermore - the observer cannot change er mind after reading a few

more bits.

Schmidhuber (2000) deals with machines that do "change their mind", so

perhaps there is some extension possible in this direction.

*> So here is another confusion about the role of the "description" bit
*

*> strings in the model. Are they things that "observer" TM's observe and
*

*> map to integers? Or are they places where observers live, as suggested
*

*> by the "Since" line quoted above? Or both?
*

*>
*

All that is discussed in this paper is appearances - we only try to

explain the phenomenon (things as they appear). No attempt is made to

explain the noumenon (things as they are), nor do we need to assume

that there is a noumenon. Bruno Marchal has a detailed discussion on

this in his thesis, and concludes that he "has no need for this

hypothesis" (what he calls the extravagant hypothesis).

So the former statement is true : things that "observer" TM's observe and

map to integers. It is also true that descriptions of self aware

observers will appear within the description by the Anthropic

Principle. The phenomenon of observerhood is included. However where

the observers actually live is not a meaningful question in this framework.

*> Now it gets a little more complicated: "Under the mapping O(x), some
*

*> descriptions encode for identical meanings as other descriptions, so
*

*> one should equivalence class the descriptions."
*

*>
*

*> The problem I have is, O takes only finite bit strings.
*

Only a finite number of bits are significant. They still operate on

infinite bitstrings.

*> So technically
*

*> a "description", which is an infinite bit string, does not encode a
*

*> meaning.
*

Yes.

*> What I think is meant here, though, is that two "descriptions"
*

*> (i.e. infinite bit strings) will be considered equivalent if for every
*

*> finite prefix of the strings, the O() mapping is the same.
*

Yes

*> So if we
*

*> think of O as "observing" the "description" bit strings one by one,
*

*> it will go through precisely the same sequence of integer "meanings"
*

*> in each case. Is that right?
*

*>
*

I think you're saying that O(x) has unique value for any given value

of x. In which case you're right.

*> "In particular, strings where the bits after some bit number n are
*

*> ``don't care'' bits, are in fact equivalence classes of all strings that
*

*> share the first n bits in common."
*

*>
*

Sets of strings corresponding to a given meaning are an equivalence

class of descriptions, yes.

*> I think what this considers is a special O() and a special string prefix
*

*> such that if O sees that particular n-bit prefix, all extensions of
*

*> that prefix get mapped to the same "meaning" integer. In that case the
*

*> condition described in my previous paragraph would be met, and all
*

*> strings with this n-bit prefix would be equivalent.
*

*>
*

yes. This is the condition that O(x) is a prefix map.

*> "One can see that the size of the equivalence class drops off
*

*> exponentially with the amount of information encoded by the string."
*

*>
*

*> That seems a little questionable because the size of the equivalence class
*

*> is infinite in all cases. However I think Russell means to use a uniform
*

*> measure where the collection of all strings with a particular n-bit
*

*> prefix have a measure of 1/2^n. It's not clear how well this measure
*

*> really works or whether it applies to all sets of infinite strings.
*

*>
*

Maybe I'm being sloppy. Size here means measure, not cardinality. I

can appreciate your confusion.

The uniform measure works, because the set of all descriptions is

one-to-one mappable to the real interval [0,1], except for a set of

points of measure zero (rational numbers with power of 2 denominators)

where 2 descriptions map to the same real number.

Just as the uniform measure works with the reals (and allows integral

calculus to work), the uniform measure works with descriptions. The

measure of equivalence classes O^{-1}(n) also follows in a straight

forward fashion from the uniform measure.

*> "Under O(x), the amount of information is not necessarily equal to the
*

*> length of the string, as some of the bits may be redundant."
*

*>
*

*> Now we have this new concept of "the amount of information" which has
*

*> not previously been defined. This sentence is really hard for me.
*

*> What does it mean for bits to be redundant?
*

This sentence is part of the hand-waving "furniture" - its meant to

make things easier for the reader, not harder. Most people have an

intuitive notion of what redundancy means. Please ignore it if it

makes you feel better.

Later on I do define amount of information, or complexity to use its

proper name as {\cal C}_O(x) = -\log_2 P_O(O(x)), and appeal to the

coding theorem as a way of relating it to Kolmogorov complexity K(x).

*> We just discussed strings
*

*> where all those after bit n are "don't care", but this sentence seems
*

*> to be envisioning other kinds of redundancies.
*

*>
*

Indeed.

*> "The sum P_O(s) = [sum over p such that O(p)=s of] 2^(-|p|)
*

*> where |p| means the number of bits of p consumed by O in returning s,
*

*> gives the size of the equivalence class of all descriptions having
*

*> meaning s."
*

*>
*

*> Boy, that's a tough one now. We consider all bit strings p such that O(p)
*

*> = s. Now, is this supposed to just be those cases described earlier where
*

*> the bits after |p| are "don't care" bits? Or is it all strings p such
*

*> that O(p) = s, including all extensions of a string where we "don't care"
*

*> after some point? If the latter, it seems problematical because there
*

*> would be an uncountably infinite number of such p's. Also I think the sum
*

*> would not converge in that case. So I think it is probably the former.
*

*>
*

You have indeed hoisted me here. Well done! Of course the mistake is

not really serious, and can be patched up by taking the former

meaning you suggest, or merely to note that the measure of the set

with with common prefix of length |p| is precisely 2^{-|p|}, and then

the sum is over all such subsets of descriptions.

*> But then, what if there is a string p such that O(p)=s, but if we append
*

*> anything to p, either p+0 or p+1, then O of those strings is not s?
*

*> Does that p add to the equivalence class or not?
*

*>
*

Remember O(x) is a prefix map. That cannot happen.

*> And is it the case that for all O and for every "description" bit string,
*

*> there will be some prefix beyond which all the bits are "don't care"?
*

*> Would that follow from the finiteness of O? If not, how do these
*

*> "descriptions" feed into the measure?
*

*>
*

If there are no "don't care" bits, the description contributes zero

measure. This is not a problem, so long as there is at least one class

of strings with don't care bits.

*> And I'm a little concerned about the appearance of 2^(-|p|) here without
*

*> much motivation.
*

Perhaps making the summation index p run over subsets rather than

descriptions would help here, as I men tioned above. The subsets

correpond to finite length bitstrings (the "prefix"), and the |p|

corresponds to the length of that prefix.

*>
*

*> "The quantity C_O(x) = - log2( P_O( O(x) ) )
*

*> is a measure of the information content, or complexity of a description x."
*

*>
*

*> Now I am becoming confused about whether O() is supposed to apply
*

*> to the infinite bit strings that are "descriptions" or only to their
*

*> finite prefixes. I thought it was the latter. I don't see how a TM
*

*> can come up with a meaningful value on an infinite bit string. It can
*

*> never read in the whole input if it is infinite in size.
*

*>
*

It does not need to.

*> "If only the first n bits of the string are significant, with no
*

*> redundancy, then it is easy to see C_O(x)=n."
*

*>
*

*> That part does make sense, if in fact the definition of P_O(s) was
*

*> only with reference to "description" strings which were "don't care"
*

*> after some number of bits. And I think the part about "no redundancy"
*

*> means there are no other bit string prefixes which O maps to the same
*

*> value as it maps the n-bit prefix of x.
*

*>
*

*> The page then goes on to make some comments about measure applied to
*

*> universes. Here again I am confused about how to relate it to all that
*

*> has been descibed. What are the analogs of universes, in this model?
*

*> Is it "descriptions", the infinite bit strings? From what has been
*

*> presented so far, I don't understand how to relate our experience of
*

*> reality to this model.
*

*>
*

Each description is a possible universe, composed of an infinite

amount of information. Any observer will of course only comprehend a

finite amount information, and hence be in a superposition of a subset

of universes corresponding to that finite information. Admittedly the

usage of the term "universe" is slightly strange here.

Alterantively, one could talk about "observer moments" as

corresponding to the equivalence classes of descriptions. This

interpretation would be more natural to many here on this list.

In section 3 of the paper, I now introduce a temporal dimension, with

the observer repeatedly sampling the set of all descriptions, with the

proviso that successor states can only differ by a finite number of bits.

*> That's it for the first page. Hopefully these questions will help to
*

*> show where I am getting confused.
*

*>
*

*> Hal Finney
*

Cheers

Received on Sun Jun 05 2005 - 21:42:20 PDT

Date: Mon, 6 Jun 2005 10:21:15 +1000

On Fri, Jun 03, 2005 at 04:22:07PM -0700, "Hal Finney" wrote:

Nowhere in Schmidhuber (1997) does he propose a measure over the "input

programs". What he does is is justify the appearance of a universal

prior in the set of descriptions by passing the raw data through a

reference UTM. Presumably the set of descriptions is sampled uniformly

by the observer.

In Schmidhuber (2000), the set of descriptions is generated by a

machine which has resource bounds. This leads to the notion of "speed

prior" which differs from the universal prior in several important

respects. I sometimes refer to the two different ensembles as

Schmidhuber I and Schmidhuber II.

I am beginning to regret calling the all descriptions ensemble with

uniform measure a Schmidhuber ensemble. I think what I meant was that

it could be generated by a standard dovetailer algorithm, running for

2^\aleph_0 timesteps. However, as the cardinality of "my" ensemble is

actually "c" (cardinality of the real numbers), it is quite probably a

completely different beast. It is also not generated by a program,

Schmidhuber style, it simply is (in the sense of being the simplest

set - equivalent to nothing).

Quite true.

Many apologies for deploying terminology in a different way to you

expect. A description (in my terminology) does not necessarily have

meaning. It is simply data. This is in accord with how I use the term

in casual English usage too - a description is simply a string of

letters, and may or may not be meaningful. Meaning is attached by an observer.

Now an observer will expect to find a SAS in one of the descriptions

as a corrolory of the anthropic principle, which is explicitly stated

as one of the assumptions in this work. I make no bones about this - I

consider the anthropic principle a mystery, not self-evident like

many people. Why should an observer expect to see a token of erself

embedded in reality? That is the mystery of the AP.

The bitstrings are infinite in length. By reading enough bits, they can

have arbitrarily complex meanings attached to them.

Sorry for not going slow enough. The habits of concise expression are

hard to shake.

Not equivalently. Not all maps can be represented by a Turing machine,

only computable ones.

Indeed that is one interpretation. The most important point is that

the observer map is a "prefix" map, in the sense of prefix machines of

Algorithmic Information Theory. In reading a bit string one bit at a

time, once a meaning is attached to the string, that is the meaning

for evermore - the observer cannot change er mind after reading a few

more bits.

Schmidhuber (2000) deals with machines that do "change their mind", so

perhaps there is some extension possible in this direction.

All that is discussed in this paper is appearances - we only try to

explain the phenomenon (things as they appear). No attempt is made to

explain the noumenon (things as they are), nor do we need to assume

that there is a noumenon. Bruno Marchal has a detailed discussion on

this in his thesis, and concludes that he "has no need for this

hypothesis" (what he calls the extravagant hypothesis).

So the former statement is true : things that "observer" TM's observe and

map to integers. It is also true that descriptions of self aware

observers will appear within the description by the Anthropic

Principle. The phenomenon of observerhood is included. However where

the observers actually live is not a meaningful question in this framework.

Only a finite number of bits are significant. They still operate on

infinite bitstrings.

Yes.

Yes

I think you're saying that O(x) has unique value for any given value

of x. In which case you're right.

Sets of strings corresponding to a given meaning are an equivalence

class of descriptions, yes.

yes. This is the condition that O(x) is a prefix map.

Maybe I'm being sloppy. Size here means measure, not cardinality. I

can appreciate your confusion.

The uniform measure works, because the set of all descriptions is

one-to-one mappable to the real interval [0,1], except for a set of

points of measure zero (rational numbers with power of 2 denominators)

where 2 descriptions map to the same real number.

Just as the uniform measure works with the reals (and allows integral

calculus to work), the uniform measure works with descriptions. The

measure of equivalence classes O^{-1}(n) also follows in a straight

forward fashion from the uniform measure.

This sentence is part of the hand-waving "furniture" - its meant to

make things easier for the reader, not harder. Most people have an

intuitive notion of what redundancy means. Please ignore it if it

makes you feel better.

Later on I do define amount of information, or complexity to use its

proper name as {\cal C}_O(x) = -\log_2 P_O(O(x)), and appeal to the

coding theorem as a way of relating it to Kolmogorov complexity K(x).

Indeed.

You have indeed hoisted me here. Well done! Of course the mistake is

not really serious, and can be patched up by taking the former

meaning you suggest, or merely to note that the measure of the set

with with common prefix of length |p| is precisely 2^{-|p|}, and then

the sum is over all such subsets of descriptions.

Remember O(x) is a prefix map. That cannot happen.

If there are no "don't care" bits, the description contributes zero

measure. This is not a problem, so long as there is at least one class

of strings with don't care bits.

Perhaps making the summation index p run over subsets rather than

descriptions would help here, as I men tioned above. The subsets

correpond to finite length bitstrings (the "prefix"), and the |p|

corresponds to the length of that prefix.

It does not need to.

Each description is a possible universe, composed of an infinite

amount of information. Any observer will of course only comprehend a

finite amount information, and hence be in a superposition of a subset

of universes corresponding to that finite information. Admittedly the

usage of the term "universe" is slightly strange here.

Alterantively, one could talk about "observer moments" as

corresponding to the equivalence classes of descriptions. This

interpretation would be more natural to many here on this list.

In section 3 of the paper, I now introduce a temporal dimension, with

the observer repeatedly sampling the set of all descriptions, with the

proviso that successor states can only differ by a finite number of bits.

Cheers

-- *PS: A number of people ask me about the attachment to my email, which is of type "application/pgp-signature". Don't worry, it is not a virus. It is an electronic signature, that may be used to verify this email came from me if you have PGP or GPG installed. Otherwise, you may safely ignore this attachment. ---------------------------------------------------------------------------- A/Prof Russell Standish Phone 8308 3119 (mobile) Mathematics 0425 253119 (") UNSW SYDNEY 2052 R.Standish.domain.name.hidden Australia http://parallel.hpc.unsw.edu.au/rks International prefix +612, Interstate prefix 02 ----------------------------------------------------------------------------

- application/pgp-signature attachment: stored

*
This archive was generated by hypermail 2.3.0
: Fri Feb 16 2018 - 13:20:10 PST
*