# Re: Predictions & duplications

From: Russell Standish <R.Standish.domain.name.hidden>
Date: Tue, 30 Oct 2001 10:06:08 +1100 (EST)

Juergen Schmidhuber wrote:
>
> > > > From: Russell Standish <R.Standish.domain.name.hidden>
> > > > The only reason for not accepting the simplest thing is if it can be
> > > > shown to be logically inconsistent. This far, you have shown no such
> > > > thing, but rather demonstrated an enormous confusion between measure
> > > > and probability distribution.
> > >
> > > Russell, this is really too much; please read any intro to measure
> > > theory,
> > > e.g., the one by Halmos. Measures in general are _not_ probability
> > > distributions. They are more than that; you can view them as _sets_ of
> > > probability distributions on sets that are related to each other in a
> > > specific way. Probability density functions define measures and
> > > therefore
> > > in general aren't probability distributions either. The uniform PDF on
> > > [0..1] yields the perfect trivial example; that's exactly why I used it
> > > before. In the computability context, compare LI/Vitanyi 1997, chapters
> > > 4.3 (discrete sample space with (semi)distribution m(x)) vs chapter 4.5
> > > (continuous sample space with (semi)measure M(x)).
> >
> > This is an excellent reference, thank you. Example 4.2.1 gives the
> > construction of the uniform measure over the set of strings, precisely
> > what you claim didn't exist when we started this discussion.
>
> You just want to annoy me, don't you? For the last time: There is a
> uniform *measure* on the strings, but no uniform probability
> distribution.

Who is annoying who? I never claimed a uniform probability density (I
actually looked up the difference between probability density and
probability distribution yesterday in Li and Vitanyi) - only a uniform
measure. In fact, my first comment to you was "Why do you insist on it
being a PDF?". Remember?

The simplest prior is the uniform measure on infinite strings.

>
> > > > 1) Halting theorem implies the set of halting programs is of measure
> > > > zero within the set of all programs
> > >
> > > You mean, given a uniform measure? But why should you need the halting
> > > theorem for this trivial statement? In fact, what exactly do you mean
> > > by
> > > halting theorem? (Schoenfield's book provides a good intro to
> > > mathematical
> > > logic and computability theory.)
> > >
> >
> > The halting theorem is that there is no algorithm for predicting
> > whether any program will halt. Of course, the result I was referring
> > to is that the set of compressible strings is of measure zero. The set
> > of halting programs actually has finite measure, since \Omega >
> > 0. However, there is still finite probability of an input string
> > being nonhalting, and indeed the probability seems to be greater than
> > 0.7 (ie \Omega <0.217643 - see pg 218 Li and Vitanyi), so this still
> > presents a problem for computationalism.
>
> This is mixing up everything, in particular, different measures.
> 1) Infinite strings with finite program have *nonzero* measure, given
> the *universal* measure (Omega is a number derived from the universal
> measure). 2) They have measure zero, given the *uniform* measure (Omega
> has nothing to do with it).

The uniform measure and the universal measure are related. The uniform
prior is on the set of input strings to a UTM, the universal measure
is the induced measure on the output strings.

No. 1) here is the relevant statement, not 2).

3) To see 2) we do not need the halting
> theorem - we just compare the finite programs (countably many) to all
> strings (uncountably many) and we are done. 4) The halting theorem is
> no problem whatsoever when it comes to computable universes. Why care
> whether some computable universe stops or not? If it does, it does,
> otherwise it does not. In both cases we can compute it in the limit.
>
> > > > The GP program (aka Universal Dovetailer) is not the simplest
> > > > thing. The simplest describable thing is the set of all descriptions,
> > > > that simply exist without being generated. That has precisely zero
> > > > information or complexity, whereas the UD has some complexity of the
> > > > order of a few 100 bits. The simplest prior is the uniform one, ie
> > > > there is no bias whatsoever in the selection.
> > >
> > > Exist without being generated? Precisely zero information or
> > > complexity? But with respect to what?
> >
> > With respect to the observer. With the respect to anything in
> > fact. The set of all possible descriptions has precisely zero
> > information, by definition, regardless of what observer or context you
> > employ.
>
> This is a vague informal statement, not a definition. Which are the
> possible descriptions? What's the formal difference between a
> description
> and a non-description? What is your "anything" if there is no device
> for describing it formally? If "anything" does not convey any bit of
> info then what exactly is it that conveys one bit? Or two?
>

Choose an alphabet (a finite set of symbols). The set of all
descriptions is the set of all infinite length strings constructed
from that alphabet. I would suggest the notation A*, where A is the
alphabet, but I think this usually refers to the set of finite length
strings.

Now to measure the information content of any particular string, you
need to define an interpreter (or observer) which is simply a map o(x)
from the set of all descriptions to a set of equivalence classes
(which can be identified as a subset of the whole numbers N).

If that map is partially recursive, then the interpreter can be
replaced by a Turing machine. But the map need not be partially
recursive, and I believe it is possible to do this with stochastic
maps (ie a map with an additional input connected to a random oracle).

The information of a string s is simply given by:

I(s) = -log m(o^{-1}(o(s)))/m(Omega)

where Omega is the set of all descriptions, m the uniform measure, and
o^{-1}(o(s)) is the equivalence class equivalent to s. The ratio above
exists, in appropriate limits, even if the value m(Omega) is infinite.

Nevertheless, regardless of what map is chosen, the set of all
descriptions has zero information content, as log 1 = 0.

> > Any traditional definition of
> > > information/simplicity requires the specification of a formal
> > > description
> > > method, such as a Turing machine. You don't need one? Then how do you
> > > define information or complexity? How exactly do you define "simple" ?
> >
> > Actually, all it needs is an equivalencing procedure. See my "On
> > Complexity and Emergence" paper. A Turing machine will do the job for
> > you, but so will a neural network or an "expert system" following
> > heuristic rules.
>
> So you have to write down formally this equivalencing procedure, right?
> Why should this be different in principle from writing down the formal
> rules for a Turing machine? Why is a simplicity measure that depends on
> your equivalencing procedure superior to a simplicity measure depending
> on a Turing machine? (The TM procedure includes yours - because it
> includes all procedures - but not the other way round.)
>