- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Juergen Schmidhuber <juergen.domain.name.hidden>

Date: Tue, 26 Mar 2002 14:25:21 +0100

Many on this list have discussed the anthropic principle. It

essentially says that the conditional probability of finding

yourself in a universe compatible with your existence equals 1.

But unfortunately the anthropic principle does not have any

predictive power. It does NOT predict there won't be any flying

rabbits tomorrow.

Clearly we need more than the anthropic principle to explain

the predictability of our universe. In particular, we need an

optimal way of predicting the future, given past observations.

And there is one!

Normally we do not know the true conditional probability

distribution p(next event | past). But assume we do know that

p is in some set P of distributions. Choose a fixed weight w_q

for each q in P such that the w_q add up to 1 (for simplicity,

let P be countable). Then construct the Bayesmix

M(x) = Sum_q w_q q(x), and predict using M instead of the

optimal but unknown p.

How wrong is it to do that? The recent exciting work of my postdoc

Marcus Hutter (IDSIA) provides general and sharp (!) loss bounds:

Let LM(n) and Lp(n) be the total expected losses of the M-predictor

and the p-predictor, respectively, for the first n events. Then

LM(n)-Lp(n) is at most of the order of sqrt[Lp(n)]. That is, M is

not much worse than p. And in general, no other predictor can do

better than that!

In particular, if p is deterministic, then the M-predictor soon

won't make any errors any more.

If P contains ALL recursively computable distributions, then M

becomes the celebrated enumerable universal prior. That is, after

decades of somewhat stagnating research we now have sharp loss

bounds for Solomonoff's universal (but incomputable) induction

scheme. And if we also include the distributions computable in

the limit, we get sharp loss bounds for the even more general

priors introduced in "Algorithmic Theories of Everything":

http://www.idsia.ch/~juergen/toesv2/

Similarly, if we replace M by the Speed Prior S - where S(x) is

small if x is hard to compute by any method - we obtain appropriate

loss bounds for computable S-based induction:

http://www.idsia.ch/~juergen/toesv2/node32.html

Alternatively, reduce M to what you get if you just add up

weighted estimated future finance data probabilities generated by

1000 commercial stock-market prediction software packages. If only

one of them happens to work fine (but you do not know which) you

still should get rich.

To learn more, please read

Optimality of Universal Bayesian Sequence Prediction for General Loss

and Alphabet: ftp://ftp.idsia.ch/pub/techrep/IDSIA-02-02.ps.gz

and also check out Hutter's other recent papers at ICML, ECML, NIPS,

Int. J. of Foundations of CS: www.idsia.ch/~marcus

Juergen Schmidhuber

http://www.idsia.ch/~juergen/

Received on Tue Mar 26 2002 - 05:26:51 PST

Date: Tue, 26 Mar 2002 14:25:21 +0100

Many on this list have discussed the anthropic principle. It

essentially says that the conditional probability of finding

yourself in a universe compatible with your existence equals 1.

But unfortunately the anthropic principle does not have any

predictive power. It does NOT predict there won't be any flying

rabbits tomorrow.

Clearly we need more than the anthropic principle to explain

the predictability of our universe. In particular, we need an

optimal way of predicting the future, given past observations.

And there is one!

Normally we do not know the true conditional probability

distribution p(next event | past). But assume we do know that

p is in some set P of distributions. Choose a fixed weight w_q

for each q in P such that the w_q add up to 1 (for simplicity,

let P be countable). Then construct the Bayesmix

M(x) = Sum_q w_q q(x), and predict using M instead of the

optimal but unknown p.

How wrong is it to do that? The recent exciting work of my postdoc

Marcus Hutter (IDSIA) provides general and sharp (!) loss bounds:

Let LM(n) and Lp(n) be the total expected losses of the M-predictor

and the p-predictor, respectively, for the first n events. Then

LM(n)-Lp(n) is at most of the order of sqrt[Lp(n)]. That is, M is

not much worse than p. And in general, no other predictor can do

better than that!

In particular, if p is deterministic, then the M-predictor soon

won't make any errors any more.

If P contains ALL recursively computable distributions, then M

becomes the celebrated enumerable universal prior. That is, after

decades of somewhat stagnating research we now have sharp loss

bounds for Solomonoff's universal (but incomputable) induction

scheme. And if we also include the distributions computable in

the limit, we get sharp loss bounds for the even more general

priors introduced in "Algorithmic Theories of Everything":

http://www.idsia.ch/~juergen/toesv2/

Similarly, if we replace M by the Speed Prior S - where S(x) is

small if x is hard to compute by any method - we obtain appropriate

loss bounds for computable S-based induction:

http://www.idsia.ch/~juergen/toesv2/node32.html

Alternatively, reduce M to what you get if you just add up

weighted estimated future finance data probabilities generated by

1000 commercial stock-market prediction software packages. If only

one of them happens to work fine (but you do not know which) you

still should get rich.

To learn more, please read

Optimality of Universal Bayesian Sequence Prediction for General Loss

and Alphabet: ftp://ftp.idsia.ch/pub/techrep/IDSIA-02-02.ps.gz

and also check out Hutter's other recent papers at ICML, ECML, NIPS,

Int. J. of Foundations of CS: www.idsia.ch/~marcus

Juergen Schmidhuber

http://www.idsia.ch/~juergen/

Received on Tue Mar 26 2002 - 05:26:51 PST

*
This archive was generated by hypermail 2.3.0
: Fri Feb 16 2018 - 13:20:07 PST
*