- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Hal Finney <hal.domain.name.hidden>

Date: Fri, 3 Jun 2005 16:22:07 -0700 (PDT)

Russell Standish recently mentioned his paper "Why Occam's Razor" which

can be found at http://parallel.hpc.unsw.edu.au/rks/docs/occam/ . Among

other things he aims to derive quantum mechanics from a Schmidhuber type

ensemble. I have tried to read this paper but never really understood it.

Here I will try to ask some questions, taking it slowly.

On this page, http://parallel.hpc.unsw.edu.au/rks/docs/occam/node2.html ,

things get started. Russell describes a set of infinite bit strings he

calls "descriptions". He writes:

"By contrast to Schmidhuber, I assume a uniform measure over these

descriptions -- no particular string is more likely than any other."

This surprises me. I thought that Schmidhuber assumed a uniform measure

over bit strings considered as programs for his universal computer. So

what is the "contrast" to his work?

It seems that the greater contrast is that while Schmidhuber assumed that

the bit strings would be fed into a computer that would produce outputs,

Russell is taking the bit strings directly as raw data.

But I am confused about their role.

"Since some of these descriptions describe self aware substructures..."

Whoa! This is a big leap for me. First, I am not too happy that mere bit

strings have been elevated with the title "descriptions". A bit string on

its own doesn't seem to have the inherent meaning necessary for it to be

considered a description. And now we find not only that the bit string is

a description, but it is a complex enough description to describe SAS's?

How does that work?

It's especially confusing to read the introductory word "since" as though

this is all quite obvious and need not be explained. To me it is very

confusing.

The page goes on to identify these SAS's as observers. Now they are mappings,

or equivalently Turing Machines, which map finite bit strings to integers.

These integers are the "meanings" of the bit strings.

I believe the idea here is that the bit strings are taken as prefixes

of the "description" bit strings in the ensemble. It is as though the

"observers" are observing the "descriptions" a bit at a time, and mapping

them to a sequence of integer "meanings". Is that correct?

So here is another confusion about the role of the "description" bit

strings in the model. Are they things that "observer" TM's observe and

map to integers? Or are they places where observers live, as suggested

by the "Since" line quoted above? Or both?

Now it gets a little more complicated: "Under the mapping O(x), some

descriptions encode for identical meanings as other descriptions, so

one should equivalence class the descriptions."

The problem I have is, O takes only finite bit strings. So technically

a "description", which is an infinite bit string, does not encode a

meaning. What I think is meant here, though, is that two "descriptions"

(i.e. infinite bit strings) will be considered equivalent if for every

finite prefix of the strings, the O() mapping is the same. So if we

think of O as "observing" the "description" bit strings one by one,

it will go through precisely the same sequence of integer "meanings"

in each case. Is that right?

"In particular, strings where the bits after some bit number n are

``don't care'' bits, are in fact equivalence classes of all strings that

share the first n bits in common."

I think what this considers is a special O() and a special string prefix

such that if O sees that particular n-bit prefix, all extensions of

that prefix get mapped to the same "meaning" integer. In that case the

condition described in my previous paragraph would be met, and all

strings with this n-bit prefix would be equivalent.

"One can see that the size of the equivalence class drops off

exponentially with the amount of information encoded by the string."

That seems a little questionable because the size of the equivalence class

is infinite in all cases. However I think Russell means to use a uniform

measure where the collection of all strings with a particular n-bit

prefix have a measure of 1/2^n. It's not clear how well this measure

really works or whether it applies to all sets of infinite strings.

"Under O(x), the amount of information is not necessarily equal to the

length of the string, as some of the bits may be redundant."

Now we have this new concept of "the amount of information" which has

not previously been defined. This sentence is really hard for me.

What does it mean for bits to be redundant? We just discussed strings

where all those after bit n are "don't care", but this sentence seems

to be envisioning other kinds of redundancies.

"The sum P_O(s) = [sum over p such that O(p)=s of] 2^(-|p|)

where |p| means the number of bits of p consumed by O in returning s,

gives the size of the equivalence class of all descriptions having

meaning s."

Boy, that's a tough one now. We consider all bit strings p such that O(p)

= s. Now, is this supposed to just be those cases described earlier where

the bits after |p| are "don't care" bits? Or is it all strings p such

that O(p) = s, including all extensions of a string where we "don't care"

after some point? If the latter, it seems problematical because there

would be an uncountably infinite number of such p's. Also I think the sum

would not converge in that case. So I think it is probably the former.

But then, what if there is a string p such that O(p)=s, but if we append

anything to p, either p+0 or p+1, then O of those strings is not s?

Does that p add to the equivalence class or not?

And is it the case that for all O and for every "description" bit string,

there will be some prefix beyond which all the bits are "don't care"?

Would that follow from the finiteness of O? If not, how do these

"descriptions" feed into the measure?

And I'm a little concerned about the appearance of 2^(-|p|) here without

much motivation.

"The quantity C_O(x) = - log2( P_O( O(x) ) )

is a measure of the information content, or complexity of a description x."

Now I am becoming confused about whether O() is supposed to apply

to the infinite bit strings that are "descriptions" or only to their

finite prefixes. I thought it was the latter. I don't see how a TM

can come up with a meaningful value on an infinite bit string. It can

never read in the whole input if it is infinite in size.

"If only the first n bits of the string are significant, with no

redundancy, then it is easy to see C_O(x)=n."

That part does make sense, if in fact the definition of P_O(s) was

only with reference to "description" strings which were "don't care"

after some number of bits. And I think the part about "no redundancy"

means there are no other bit string prefixes which O maps to the same

value as it maps the n-bit prefix of x.

The page then goes on to make some comments about measure applied to

universes. Here again I am confused about how to relate it to all that

has been descibed. What are the analogs of universes, in this model?

Is it "descriptions", the infinite bit strings? From what has been

presented so far, I don't understand how to relate our experience of

reality to this model.

That's it for the first page. Hopefully these questions will help to

show where I am getting confused.

Hal Finney

Received on Fri Jun 03 2005 - 20:16:02 PDT

Date: Fri, 3 Jun 2005 16:22:07 -0700 (PDT)

Russell Standish recently mentioned his paper "Why Occam's Razor" which

can be found at http://parallel.hpc.unsw.edu.au/rks/docs/occam/ . Among

other things he aims to derive quantum mechanics from a Schmidhuber type

ensemble. I have tried to read this paper but never really understood it.

Here I will try to ask some questions, taking it slowly.

On this page, http://parallel.hpc.unsw.edu.au/rks/docs/occam/node2.html ,

things get started. Russell describes a set of infinite bit strings he

calls "descriptions". He writes:

"By contrast to Schmidhuber, I assume a uniform measure over these

descriptions -- no particular string is more likely than any other."

This surprises me. I thought that Schmidhuber assumed a uniform measure

over bit strings considered as programs for his universal computer. So

what is the "contrast" to his work?

It seems that the greater contrast is that while Schmidhuber assumed that

the bit strings would be fed into a computer that would produce outputs,

Russell is taking the bit strings directly as raw data.

But I am confused about their role.

"Since some of these descriptions describe self aware substructures..."

Whoa! This is a big leap for me. First, I am not too happy that mere bit

strings have been elevated with the title "descriptions". A bit string on

its own doesn't seem to have the inherent meaning necessary for it to be

considered a description. And now we find not only that the bit string is

a description, but it is a complex enough description to describe SAS's?

How does that work?

It's especially confusing to read the introductory word "since" as though

this is all quite obvious and need not be explained. To me it is very

confusing.

The page goes on to identify these SAS's as observers. Now they are mappings,

or equivalently Turing Machines, which map finite bit strings to integers.

These integers are the "meanings" of the bit strings.

I believe the idea here is that the bit strings are taken as prefixes

of the "description" bit strings in the ensemble. It is as though the

"observers" are observing the "descriptions" a bit at a time, and mapping

them to a sequence of integer "meanings". Is that correct?

So here is another confusion about the role of the "description" bit

strings in the model. Are they things that "observer" TM's observe and

map to integers? Or are they places where observers live, as suggested

by the "Since" line quoted above? Or both?

Now it gets a little more complicated: "Under the mapping O(x), some

descriptions encode for identical meanings as other descriptions, so

one should equivalence class the descriptions."

The problem I have is, O takes only finite bit strings. So technically

a "description", which is an infinite bit string, does not encode a

meaning. What I think is meant here, though, is that two "descriptions"

(i.e. infinite bit strings) will be considered equivalent if for every

finite prefix of the strings, the O() mapping is the same. So if we

think of O as "observing" the "description" bit strings one by one,

it will go through precisely the same sequence of integer "meanings"

in each case. Is that right?

"In particular, strings where the bits after some bit number n are

``don't care'' bits, are in fact equivalence classes of all strings that

share the first n bits in common."

I think what this considers is a special O() and a special string prefix

such that if O sees that particular n-bit prefix, all extensions of

that prefix get mapped to the same "meaning" integer. In that case the

condition described in my previous paragraph would be met, and all

strings with this n-bit prefix would be equivalent.

"One can see that the size of the equivalence class drops off

exponentially with the amount of information encoded by the string."

That seems a little questionable because the size of the equivalence class

is infinite in all cases. However I think Russell means to use a uniform

measure where the collection of all strings with a particular n-bit

prefix have a measure of 1/2^n. It's not clear how well this measure

really works or whether it applies to all sets of infinite strings.

"Under O(x), the amount of information is not necessarily equal to the

length of the string, as some of the bits may be redundant."

Now we have this new concept of "the amount of information" which has

not previously been defined. This sentence is really hard for me.

What does it mean for bits to be redundant? We just discussed strings

where all those after bit n are "don't care", but this sentence seems

to be envisioning other kinds of redundancies.

"The sum P_O(s) = [sum over p such that O(p)=s of] 2^(-|p|)

where |p| means the number of bits of p consumed by O in returning s,

gives the size of the equivalence class of all descriptions having

meaning s."

Boy, that's a tough one now. We consider all bit strings p such that O(p)

= s. Now, is this supposed to just be those cases described earlier where

the bits after |p| are "don't care" bits? Or is it all strings p such

that O(p) = s, including all extensions of a string where we "don't care"

after some point? If the latter, it seems problematical because there

would be an uncountably infinite number of such p's. Also I think the sum

would not converge in that case. So I think it is probably the former.

But then, what if there is a string p such that O(p)=s, but if we append

anything to p, either p+0 or p+1, then O of those strings is not s?

Does that p add to the equivalence class or not?

And is it the case that for all O and for every "description" bit string,

there will be some prefix beyond which all the bits are "don't care"?

Would that follow from the finiteness of O? If not, how do these

"descriptions" feed into the measure?

And I'm a little concerned about the appearance of 2^(-|p|) here without

much motivation.

"The quantity C_O(x) = - log2( P_O( O(x) ) )

is a measure of the information content, or complexity of a description x."

Now I am becoming confused about whether O() is supposed to apply

to the infinite bit strings that are "descriptions" or only to their

finite prefixes. I thought it was the latter. I don't see how a TM

can come up with a meaningful value on an infinite bit string. It can

never read in the whole input if it is infinite in size.

"If only the first n bits of the string are significant, with no

redundancy, then it is easy to see C_O(x)=n."

That part does make sense, if in fact the definition of P_O(s) was

only with reference to "description" strings which were "don't care"

after some number of bits. And I think the part about "no redundancy"

means there are no other bit string prefixes which O maps to the same

value as it maps the n-bit prefix of x.

The page then goes on to make some comments about measure applied to

universes. Here again I am confused about how to relate it to all that

has been descibed. What are the analogs of universes, in this model?

Is it "descriptions", the infinite bit strings? From what has been

presented so far, I don't understand how to relate our experience of

reality to this model.

That's it for the first page. Hopefully these questions will help to

show where I am getting confused.

Hal Finney

Received on Fri Jun 03 2005 - 20:16:02 PDT

*
This archive was generated by hypermail 2.3.0
: Fri Feb 16 2018 - 13:20:10 PST
*