- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: <juergen.domain.name.hidden>

Date: Fri, 12 Oct 2001 10:29:15 +0200

In reply to Russell Standish and Juho Pennanen I'd just like to

emphasize the main point, which is really trivial: by definition, a

uniform measure on the possible futures makes all future beginnings of

a given size equally likely. Then regular futures clearly are not any

more likely than the irregular ones. I have no idea what makes Russell

think this is debatable. In most possible futures your computer will

vanish within the next second. But it does not. This indicates that our

future is _not_ sampled from a uniform prior.

Some seem to think that the weak anthropic principle explains the

regularity. The argument goes like this: "Let there be a uniform measure

on all universe histories, represented as bitstrings. Now take the tiny

subset of histories in which you appear. Although the measure of this

subset is tiny, its conditional measure, given your very existence,

is not: According to the weak anthropic principle, the conditional

probability of finding yourself in a regular universe compatible with

your existence equals 1."

But it is essential to see that the weak anthropic principle does not

have any predictive power at all. It does not tell you anything about

the future. It cannot explain away futures in which you still exist

but irregular things happen. Only a nonuniform prior can explain this.

Which nonuniform prior is plausible? My favorite is the "resource-optimal"

Speed Prior. I am hoping and expecting that someone will confirm it soon

by finding a rather fast pseudorandom generator responsible for apparently

noisy events in our universe. But others may be more conservative and

go for the more dominant enumerable Solomonoff-Levin prior mu^M or maybe

even for the nonenumerable prior mu^E (which dominates mu^M while still

being computable in the limit) or maybe even for the extreme prior mu^G.

Juergen Schmidhuber

http://www.idsia.ch/~juergen/

http://www.idsia.ch/~juergen/everything/html.html

http://www.idsia.ch/~juergen/toesv2/

*> From everything-list-request.domain.name.hidden Thu Oct 11 17:36:41 2001
*

*> From: Juho Pennanen <juho.pennanen.domain.name.hidden>
*

*>
*

*>
*

*>
*

*> I tried to understand the problem that doctors Schmidhuber
*

*> and Standish are discussing by describing it in the most
*

*> concrete terms I could, below. (I admit beforehand I couldn't
*

*> follow all the details and do not know all the papers and
*

*> theorems referred to, so this could be irrelevant.)
*

*>
*

*> So, say you are going to drop a pencil from your hand and
*

*> trying to predict if it's going to fall down or up this
*

*> time. Using what I understand with comp TOE, I would take
*

*> the set of all programs that at some state implement a
*

*> certain conscious state, namely the state in which you
*

*> remember starting your experiment of dropping the pencil,
*

*> and have already recorded the end result (I abreviate this
*

*> conscious state with CS. To be exact it is a set of states,
*

*> but that shouldn't make a difference).
*

*>
*

*> The space of all programs would be the set of all programs
*

*> in some language, coded as infinite numerable sequences of
*

*> 0's and 1's. (I do not know how much the chosen language +
*

*> coding has effect on the whole thing).
*

*>
*

*> Now for your prediction you need to divide the
*

*> implementations of CS into two sets: those in which the
*

*> pencil fell down and those in which it fell up. Then you
*

*> compare the measures of those sets. (You would need to
*

*> assume that each program is run just once or something of
*

*> the sort. Some programs obviously implement CS several
*

*> times when they run. So you would maybe just include those
*

*> programs that implement CS infinitely many times, and
*

*> weight them with the density of CS occurrences during
*

*> their running.)
*

*>
*

*> One way to derive the measure you need is to assume a
*

*> measure on the set of all infinite sequences (i.e. on
*

*> all programs). For this we have the natural measure,
*

*> i.e. the product measure of the uniform measure on
*

*> the set containing 0 and 1. And as far as my intuition
*

*> goes, this measure would lead to the empirically correct
*

*> prediction on the direction of the pencil falling. And
*

*> if I understood it right, this is not too far from what
*

*> Dr. Standish was claiming? And we wouldn't need any
*

*> speed priors.
*

*>
*

*> But maybe the need of speed prior would come to play if I
*

*> thought more carefully about the detailed assumptions
*

*> involved? E.g. that each program would be run just once,
*

*> with the same speed etc? I am not sure.
*

*>
*

*> Juho
*

*>
*

*> /************************************************
*

*> Juho Pennanen
*

*> Department of Forest Ecology, P.O.Box 24
*

*> FIN-00014 University of Helsinki
*

*> tel. (09)191 58144 (+358-9-191 58144)
*

*> GSM 040 5455 845 (+358-40-5455 845)
*

*> http://www.helsinki.fi/people/juho.pennanen
*

*> *************************************************/
*

*>
*

*> Resent-Date: Thu, 11 Oct 2001 18:57:25 -0700
*

*> From: Russell Standish <R.Standish.domain.name.hidden>
*

*>
*

*> juergen.domain.name.hidden wrote:
*

*> >
*

*> >
*

*> >
*

*> > Huh? A PDF? You mean a probability density function? On a continuous set?
*

*>
*

*> Probability Distribution Function. And PDF's are defined on all
*

*> measurable sets, not just continuous ones.
*

*>
*

*> > No! I am talking about probability distributions on describable objects.
*

*> > On things you can program.
*

*>
*

*> Sure...
*

*>
*

*> >
*

*> > Anyway, you write "...observer will expect to see regularities even with
*

*> > the uniform prior" but that clearly cannot be true.
*

*> >
*

*>
*

*> This is where I beg to differ.
*

*>
*

*>
*

*> > >
*

*> > > Since you've obviously barked up the wrong tree here, it's a little
*

*> > > hard to know where to start. Once you understand that each observer
*

*> > > must equivalence an infinite number of descriptions due to the
*

*> > > boundedness of its resources, it becomes fairly obvious that the
*

*> > > smaller, simpler descriptions correspond to larger equivalence classes
*

*> > > (hence higher probability).
*

*> >
*

*> > Maybe you should write down formally what you mean? Which resource bounds?
*

*> > On which machine? What exactly do you mean by "simple"? Are you just
*

*> > referring to the traditional Solomonoff-Levin measure and the associated
*

*> > old Occam's razor theorems, or do you mean something else?
*

*>
*

*> It is formally written in "Why Occams Razor", and the extension to
*

*> non-computational observers is a fairly obvious corollory of the
*

*> contents of "On Complexity and Emergence".
*

*>
*

*> To summarise the results for this list, the basic idea is that a
*

*> observer acts as a filter, or category machine. On being fed a
*

*> description, the description places it into a category, or equivalence
*

*> class of descriptions. What we can say is that real observers will
*

*> have a finite number of such equivalence classes it can categorise
*

*> descriptions into (be they basins of attraction in a neural network,
*

*> or output states in the form of books, etc). This is important,
*

*> because if we consider the case of UTMs being the classification
*

*> device, and outputs from halting programs as being the categories,
*

*> then we end up with the S-L distribution over the set of halting
*

*> programs. Unfortunately, the set of halting programs has measure zero
*

*> within the set of all programs (descriptions), so we end up with the
*

*> White Rabbit paradox, namely that purely random descriptions have
*

*> probability one.
*

*>
*

*> One solution is to ban the "Wabbit" universes (which you do by means
*

*> of the speed prior), and recover the nicely behaved S-L measure,
*

*> which, essentially, is your solution (ie S-L convolved with the Speed
*

*> Prior).
*

*>
*

*> The other way is to realise that the only example of conscious
*

*> observer we are aware of is actually quite resilient to noise - only a
*

*> small number of bits in a random string will be considered
*

*> significant. We are really very good at detecting patterns in all
*

*> sorts of random noise. Hence every random string will be equivalenced
*

*> to some regular description. Then one ends up with an _observer
*

*> dependent_ S-L-like probability distribution over the set of
*

*> categories made by that observer. The corresponding Occam's Razor
*

*> theorem follows in the usual way.
*

*>
*

*> I hope this is clear enough for you to follow...
*

*>
*

*>
*

*> >
*

*> > You are talking falsifiability. I am talking verifiability. Sure, you
*

*> > cannot prove randomness. But that's not the point of any inductive
*

*> > science. The point is to find regularities if there are any. Occam's
*

*> > razor encourages us to search for regularity, even when we do not know
*

*> > whether there is any. Maybe some PhD student tomorrow will discover a
*

*> > simple PRG of the kind I mentioned, and get famous.
*

*> >
*

*> > It is important to see that Popper's popular and frequently cited and
*

*> > overrated concept of falsifiability does not really help much to explain
*

*> > what inductive science such as physics is all about. E.g., physicists
*

*> > accept Everett's ideas although most of his postulated parallel universes
*

*> > will remain inaccessible forever, and therefore are _not_ falsifiable.
*

*> > Clearly, what's convincing about the multiverse theory is its simplicity,
*

*> > not its falsifiability, in line with Solomonoff's theory of inductive
*

*> > inference and Occam's razor, which is not just a wishy-washy philosophical
*

*> > framework like Popper's.
*

*> >
*

*> > Similarly, today's string physicists accept theories for their simplicity,
*

*> > not their falsifiability. Just like nobody is able to test whether
*

*> > gravity is the same on Sirius, but believing it makes things simpler.
*

*> >
*

*> > Again: the essential question is: which prior is plausible? Which
*

*> > represents the correct notion of simplicity? Solomonoff's traditional
*

*> > prior, which does not care for temporal complexity at all? Even more
*

*> > general priors computable in the limit, such as those discussed in
*

*> > the algorithmic TOE paper? Or the Speed Prior, which embodies a more
*

*> > restricted concept of simplicity that differs from Kolmogorov complexity
*

*> > because it takes runtime into account, in an optimal fashion?
*

*> >
*

*> > Juergen Schmidhuber
*

*> >
*

*> > http://www.idsia.ch/~juergen/
*

*> > http://www.idsia.ch/~juergen/everything/html.html
*

*> > http://www.idsia.ch/~juergen/toesv2/
*

*> >
*

*> >
*

*>
*

*> The problem is that the uniform prior is simpler than the speed
*

*> prior. It is very much the null hypothesis, against which we should
*

*> test the speed prior hypothesis. That is why falsifiability, and more
*

*> importantly verifyability is important. It is entirely possible that
*

*> we exist within some kind a stupendous, but nevertheless resource
*

*> limited virtual reality. However, is it not too much to ask for
*

*> possible evidence for this?
*

*>
*

*> My main point is that your objection to the uniform prior (essentially
*

*> the so called "White Rabbit" problem) is a non-problem.
*

*>
*

*> Cheers
*

*>
*

*> ----------------------------------------------------------------------------
*

*> Dr. Russell Standish Director
*

*> High Performance Computing Support Unit, Phone 9385 6967, 8308 3119 (mobile)
*

*> UNSW SYDNEY 2052 Fax 9385 6965, 0425 253119 (")
*

*> Australia R.Standish.domain.name.hidden
*

*> Room 2075, Red Centre http://parallel.hpc.unsw.edu.au/rks
*

*> International prefix +612, Interstate prefix 02
*

*> ----------------------------------------------------------------------------
*

*>
*

Received on Fri Oct 12 2001 - 01:30:26 PDT

Date: Fri, 12 Oct 2001 10:29:15 +0200

In reply to Russell Standish and Juho Pennanen I'd just like to

emphasize the main point, which is really trivial: by definition, a

uniform measure on the possible futures makes all future beginnings of

a given size equally likely. Then regular futures clearly are not any

more likely than the irregular ones. I have no idea what makes Russell

think this is debatable. In most possible futures your computer will

vanish within the next second. But it does not. This indicates that our

future is _not_ sampled from a uniform prior.

Some seem to think that the weak anthropic principle explains the

regularity. The argument goes like this: "Let there be a uniform measure

on all universe histories, represented as bitstrings. Now take the tiny

subset of histories in which you appear. Although the measure of this

subset is tiny, its conditional measure, given your very existence,

is not: According to the weak anthropic principle, the conditional

probability of finding yourself in a regular universe compatible with

your existence equals 1."

But it is essential to see that the weak anthropic principle does not

have any predictive power at all. It does not tell you anything about

the future. It cannot explain away futures in which you still exist

but irregular things happen. Only a nonuniform prior can explain this.

Which nonuniform prior is plausible? My favorite is the "resource-optimal"

Speed Prior. I am hoping and expecting that someone will confirm it soon

by finding a rather fast pseudorandom generator responsible for apparently

noisy events in our universe. But others may be more conservative and

go for the more dominant enumerable Solomonoff-Levin prior mu^M or maybe

even for the nonenumerable prior mu^E (which dominates mu^M while still

being computable in the limit) or maybe even for the extreme prior mu^G.

Juergen Schmidhuber

http://www.idsia.ch/~juergen/

http://www.idsia.ch/~juergen/everything/html.html

http://www.idsia.ch/~juergen/toesv2/

Received on Fri Oct 12 2001 - 01:30:26 PDT

*
This archive was generated by hypermail 2.3.0
: Fri Feb 16 2018 - 13:20:07 PST
*