Serious *Mistake* by Schmidthuber

From: Osher Doctorow <>
Date: Fri, 6 Sep 2002 12:20:28 -0700

From: Osher Doctorow, Fri. Sept. 6, 2002 11:45AM

I have read about half of J. Schmidthuber's *A computer scientist's view of
life, the universe, and everything,* (1997), and he has interesting ideas
and clarity of presentation, but I have to disagree with him on a number of
places where he uses conditional probability including his section
Generalization and Learning. I hasten to add that I do not view
alternative theories as *wrong* but as competing and that they should almost
all survive for competition, motivation, and also because many of them turn
out to have useful contributions long after they have been regarded as

Schmidthuber (S for short) concludes that generalization is impossible in
general by using a proof based on conditional probability, and similarly he
concludes that the learner's life in general is limited by also a
conditional probability proof. Most readers will undoubtedly stare at this
statement in bewilderment, since as far as they know nothing is wrong with
conditional probability.

They are partly correct and partly wrong. Nothing is wrong with
conditional probability, which is the main tool of the Bayesian school (or
as I abbreviate it, the BCP or Bayesian Conditional Probability-Statistics
school), for Fairly Frequent Events. For Rare Events, something very
strange happens. This was how my wife Marleen and I began our exploration
of Rare Events in 1980. Conditional probability divides two probabilities
and regards that as an indication of the probability of one event *given*
another event, where *given* is used in the sense of *freezing the other
event in place*. Some real analysis experts will argue that this is all
justified by the Radon Derivative of the Lebesgue-Radon-Nikodym theorem(s),
not quite realizing that the proof of those theorems only hold up to
equivalence classes outside sets of measure ZERO. But events of
probability zero are the Rarest Events. Moreover, division of
probabilities blows up even in small (one-sided) neighborhoods of
probability 0 since division by 0 is impossible. Thus, not only can
conditional probability not model events of probability 0, but it cannot
even model events of probability close to 0 (Rare Events).

Is there a simple solution? Yes! Product/Goguen fuzzy multivalued
logical implication x-->y is defined as y/x for x not 0. So it corresponds
to conditional probability where x and y are carefully chosen probabilities
in the probability-statistics analog. Lukaciewicz and Rational Pavelka
fuzzy multivalued logical implications (Rational Pavelka is the predicate
logic generalization of Lukaciewicz propositional logic) are x-->y = 1 + y -
x for y < = x for the non-trivial case. The latter does not involve
division by 0 and does not blow up in any (one-sided) neighborhood of zero.
Logic-Based Probability (LBP) uses precisely the same definition of 1 + y -
x in place of y/x for exactly the same probabilities x, y which BCP uses.
My wife and I introduced LBP in 1980. It may be remarked here the Godel
fuzzy multivalued logic, which we showed applies to Very Frequent (Very
Common) Events, uses x-->y = y and refers in the probability-statistics
analog to INDEPENDENT events, and since in general events are not
independent unless that can be established in special cases, LBP is the
correct result to use.

So when S claims that generalization is impossible in general and that the
learner's life is limited in general, he has to be referring to Fairly
Frequent Events, not Rare Events or even Very Frequent Events (which use the
Godel analog).

But surely that leaves much room for S to maneuver in? In a way, yes, and
in a way, no. S is very interested in the Great Programmer or even a
decreasing sequence of Great Programmers each delegating authority to the
other in different universes and so on. The Great Programmer thinks on the
level of the Universe or All Universes or the particular Universe in the
sequence. So we have to ask: which type of fuzzy multivalued logic or its
probability-statistics analog (or proximity function - geometry - topology
analog, which we developed as exact analogs of the above) most influences
the Universe(s)?

The answer turns out to be very simple, namely Lukaciewicz/Rational Pavelka
(Rare Event) or its probability-statistics analog LBP. This is because in
our universe it is generally agreed that a Rare Event called a Big Bang
occurred (I have proven that even if it did not, as in Steinhardt-Turok and
Gott-Li cyclic or backward time loop cosmological theories, LBP is the key
influence probability), and that very rare events such as inflation and the
transition from radiation-dominated to matter-dominated eras and transition
from non-accelerating to accelerating universe which fairly recently
occurred - that all of these Rare Events played critical roles in the
development of the Universe.

I should also mention that Shannon Information-Entropy and its Kolmogorov
generalizations blow up near zero because the logarithm does, and that the
only *influence* type of Shannon Information-Entropy is based on conditional
probability, which of course also blows up at zero. Rare Event
Information-Entropy does not use logarithms but (positive or negative)
exponentials, and of course does not divide probabilities so it does not
blow up at or near zero denominator.

Quantum-field-theory-oriented physicists may be slightly disturbed at this
point, since QFT totally eliminates probabilities except in the *formal*
location of Schrodinger's equation which is regarded as a *deterministic*
equation (another anomaly that I will be glad to argue about at another time
or place). Happily or unhappily, they have no choice in the matter of the
above results, since they hold across about 10 different branches of
mathematics and almost an equal number of branches of physics. Curiously
enough, Quantum Mechanics theorists manage to get probability back into the
picture, including their much-used CONDITIONAL probability, while
simultaneously disavowing the stochastic (probability) school and claiming
allegiance to the Statistics School (apparently unaware that there is no
statistics without probability) which plays an only *formal* role in
supporting the *deterministic Schrodinger and Heisenberg* equations.

Osher Doctorow
Received on Fri Sep 06 2002 - 12:33:31 PDT

This archive was generated by hypermail 2.3.0 : Fri Feb 16 2018 - 13:20:07 PST