Optimal Prediction

From: Juergen Schmidhuber <juergen.domain.name.hidden>
Date: Tue, 26 Mar 2002 14:25:21 +0100

Many on this list have discussed the anthropic principle. It
essentially says that the conditional probability of finding
yourself in a universe compatible with your existence equals 1.

But unfortunately the anthropic principle does not have any
predictive power. It does NOT predict there won't be any flying
rabbits tomorrow.

Clearly we need more than the anthropic principle to explain
the predictability of our universe. In particular, we need an
optimal way of predicting the future, given past observations.

And there is one!

Normally we do not know the true conditional probability
distribution p(next event | past). But assume we do know that
p is in some set P of distributions. Choose a fixed weight w_q
for each q in P such that the w_q add up to 1 (for simplicity,
let P be countable). Then construct the Bayesmix
M(x) = Sum_q w_q q(x), and predict using M instead of the
optimal but unknown p.

How wrong is it to do that? The recent exciting work of my postdoc
Marcus Hutter (IDSIA) provides general and sharp (!) loss bounds:

Let LM(n) and Lp(n) be the total expected losses of the M-predictor
and the p-predictor, respectively, for the first n events. Then
LM(n)-Lp(n) is at most of the order of sqrt[Lp(n)]. That is, M is
not much worse than p. And in general, no other predictor can do
better than that!

In particular, if p is deterministic, then the M-predictor soon
won't make any errors any more.

If P contains ALL recursively computable distributions, then M
becomes the celebrated enumerable universal prior. That is, after
decades of somewhat stagnating research we now have sharp loss
bounds for Solomonoff's universal (but incomputable) induction
scheme. And if we also include the distributions computable in
the limit, we get sharp loss bounds for the even more general
priors introduced in "Algorithmic Theories of Everything":
http://www.idsia.ch/~juergen/toesv2/

Similarly, if we replace M by the Speed Prior S - where S(x) is
small if x is hard to compute by any method - we obtain appropriate
loss bounds for computable S-based induction:
http://www.idsia.ch/~juergen/toesv2/node32.html

Alternatively, reduce M to what you get if you just add up
weighted estimated future finance data probabilities generated by
1000 commercial stock-market prediction software packages. If only
one of them happens to work fine (but you do not know which) you
still should get rich.

To learn more, please read

Optimality of Universal Bayesian Sequence Prediction for General Loss
and Alphabet: ftp://ftp.idsia.ch/pub/techrep/IDSIA-02-02.ps.gz

and also check out Hutter's other recent papers at ICML, ECML, NIPS,
Int. J. of Foundations of CS: www.idsia.ch/~marcus



Juergen Schmidhuber
http://www.idsia.ch/~juergen/
Received on Tue Mar 26 2002 - 05:26:51 PST

This archive was generated by hypermail 2.3.0 : Fri Feb 16 2018 - 13:20:07 PST