Re: Bayesian boxes and Independence of Scales from GSLevy.domain.name.hidden on 1999-05-10 (everything)

From: <GSLevy.domain.name.hidden>
Date: Mon, 10 May 1999 14:41:41 EDT

I agree that the implicit distribution of m is essential in determining if we
should switch after making our first choice. Interestingly, if we assume a
logarithmic distribution and compute the expected value of the content of the
other box (the one not selected), we find that it is identical to the content
of the selected box:

Expected Value = (1/2) (Log(m/2) + Log(2m)) = (1/2) (Log(m) - Log2 + Log(m)
+ Log2)
= Log(m)

Which implies that the logarithmic distribution is the one assumed by common
sense: don't switch. In other words, the value that any particular number is
equal to m is equally likely, and INDEPENDENT OF SCALE. This independence of
scale has some intriguing connection with the MWI and the issue that the
probability of survival after death is extremely low but FROM THE POINT OF
VIEW of the observer it must be equal to one.

In a message dated 99-05-05 12:31:25 EDT, you write:

<<
Stefan Rueger <smr3.domain.name.hidden> wrote:

> Iain,
>
> Let box one contain m pounds and box two 2m pounds, where m is
> any real positive number. You may choose a box look at the
> amount of money inside and decide either to keep this amount or
> the other box's amount. (For this exercise you may think of
> cheques with real numbers on it and a special bank account where
> you can actually bank any real number of pounds.)
>
> In Peter's inaugural, he suggested that if you pick a box at
> random and find x pounds, you might conclude that the other box
> either contains x/2 or 2x pounds, the average of which is 1.25x.
> So you are better off (on average by 25%!) choosing the other
> box. He left it to the audience to decide whether this conclusion
> is valid or not.
>
> It was immediately clear to me - and I am sure, it is immediately
> clear to you - that the argument is flawed.

Errrm... do you mean the argument exactly as given above (with the
conclusion that choosing randomly and then always switching - which is
of course just a different style of "choosing randomly"! - is 1.25 times
better than not switching) is flawed? If so, then you and I certainly
agree it's flawed, and probably Pete agrees too! (Though he'll have to
speak for himself of course.) Or do you mean the "meta-argument" (which
correctly points out the absurdity of one random 50-50 choice generation
protocol being 1.25 times better than another, but goes on to conclude
sweepingly that *no* choice protocol, random or non-random, can be
better than any other for this problem) is flawed? If this latter, then
I agree with you that the "meta-argument" is flawed - though maybe for
different reasons than yours! - and I think Pete possibly *disagrees*
and takes the meta-argument to be just fine, sweeping conclusion and
all! (Again, of course, he'll have to speak for himself... I really
shouldn't play guessing games with other people's opinions like this. I
wasn't even at his inaugural talk after all!!)

>
> We both know that
> choosing a box at random (eg, using a fair coin) and keeping the
> contents gives you 1.5m pounds on average. The same is true by
> symmetry if you choose a box randomly and keep the contents of
> the **other** box: both methods must add up to the 3m pounds.
>

Exactly. That's the "correctly points out" clause at the beginning of
what I'm calling the meta-argument above. Fine so far. And indeed any
other *non-quantity-of-money-driven* elaboration of "50-50 random
choice" would be no better or worse too. You can think up really
glorious protocols here, like choosing a box, choosing an integer to say
how many times to "dither back and forth" (of course this integer may as
well be restricted to 0 or 1 since dithering back and forth is cyclic
mod 2), and then duly "dithering" that many times before finally
announcing the box one settles on. All these
non-quantity-of-money-driven elaborations are the same as just flipping
a coin and getting the thing over with.

>
> Peter's argument has the following flaw: The core argument is
> that reading one box's cheque (say, amount x) makes the contents
> of the other box's cheque amount a random variable with values 2x
> and x/2 and a uniform distribution (50% probability for either).
> But the contents of the other box is not random at all! It is
> either 2x or x/2, and this depends on the experimentor's mood
> when the boxes were set up.

I agree here - everything depends on the distribution the experimenter
used to generate the secret number m (the number which is used to fill
the boxes with m and 2m) in the first place. Even if the experimenter
later says something like "But I didn't use a distribution! I just shut
my eyes and thought of a number on a whim!", that statement is
scientifically meaningless - after all their "whim" has to be encoded
somehow in their brain state, and so the distribution is "whatever
probabilities brains in that whim-state have for thinking up numbers".
By hook or by crook, there's a distribution for m lurking in there!

> This is a
> classical example for
> assuming a probability distribution for something that you don't
> know. People who do this

...(including me!)...

> call themselves Bayesians.
> The problem
> with the Bayesian approach is that in some cases, when you are
> not careful, it can produce misleading results.

Ah, but all good Bayesians are careful. :-)

I think what you mean by being "not careful", in the context of this
example, is taking the distribution for the question "Is the box I've
just opened the m box or the 2m box?" to always be 50-50, *even after
you've read the quantity of money it contains*. That would indeed be not
just careless but wrong - it is, after all, 50-50 *before* you've read
the quantity of money, and *therefore*, for that very reason, is not in
general 50-50 *after* you've read the quantity of money. If you had
access to the experimenter's secret distribution for generating m in the
first place (which is not the same thing as access to the value of m
itself on this occasion, I hasten to add!), you could perform the update
in a fully proper and knowledgeable way. If you are a Bayesian, then
even in the absence of access to the experimenter's distribution for m,
you take it upon yourself to do the next best thing to this: you make
your own guess as to "what kind of distributions experimenters like when
playing games with people", you weight *these* distributions according
to a guessed set of weightings ("meta-distribution"?) of your own, you
collapse the meta-distribution of "ordinary" distributions into one
consolidated distribution by the standard
"multiply-it-all-out-and-add-it-all-up" method, and you use *that*
next-best-thing consolidated distribution (and your just-obtained
knowledge of the quantity of money in the box you opened) to update your
weightings - previously 50-50 - for that dangling question this long and
rambling paragraph started off with: "Is the box I've just opened the m
box or the 2m box?". Then, of course, you switch iff your updated
weightings favour the unopened box. Phew! :-)

>
> Thinking deeper about Peter's box problem after the inauguration,
> I thought that there might be a similar experiment that actually
> increases your expected return of 1.5m.
>
> You may find this astonishing, but I actually came up with a
> strategy that has an expected return of **more than** 1.5m -
> no matter what m is (as long as m is bigger than zero).
>
> In order to increase suspension a little and give you some time
> to think about this, I'll reveal my method to you - after you
> reply to this mail saying something to the effect of "It would be
> astonishing and completely unbelievable that such a method should
> exist".
>
> Regards!
>
> Stefan

Well, as should now be clear, I wouldn't use *those* words - and in fact
the algorithm I said a good Bayesian would follow (the business about
the consolidated distribution, in my long and rambling paragraph above)
*would*, I believe, increase the expected return above 1.5m in most
real-life cases. Even using a very crude caricature of what I called the
meta-distribution in that paragraph would probably raise the expected
return a wee bit above 1.5m. So, I'm not saying such an algorithm would
be "astonishing and completely unbelievable". Not at all! But of course
I'd still like you to tell me your method. I'm too tired to work out a
particular instantiation of my own "algorithm in words only" in my long
and rambling paragraph above! :-)

Bye for now... (and apologies to those on the Theories of Everything
mailing list for whom this might all be a bit out of context - but it is
a fairly well-known "toy probability paradox" in the literature, and
Stefan's explanation of it at the beginning is quite self-contained.)

Iain.

--
I have discovered a truly marvellous Theory of Everything. Unfortunately
this T-shirt is too small to contain it.
(Or: yeah, sure, the Theory of Everything will fit on a T-shirt. But in
what size of font?)

--------------------
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

Stefan Rueger <smr3.domain.name.hidden> wrote:
<BLOCKQUOTE TYPE=CITE>Iain,

Let box one contain m pounds and box two 2m pounds, where m is

any real positive number. You may choose a box look at the

amount of money inside and decide either to keep this amount or

the other box's amount. (For this exercise you may think of

cheques with real numbers on it and a special bank account where

you can  actually bank any real number of pounds.)

In Peter's inaugural, he suggested that if you pick a box at

random and find x pounds, you might conclude that the other box

either contains x/2 or 2x pounds, the average of which is 1.25x.

So you are better off (on average by 25%!) choosing the other

box. He left it to the audience to decide whether this conclusion

is valid or not.

It was immediately clear to me - and I am sure, it is immediately

clear to you - that the argument is flawed.</BLOCKQUOTE>
Errrm... do you mean the argument exactly as given above (with the conclusion
that choosing randomly and then always switching - which is of course just
a different style of "choosing randomly"! - is 1.25 times better than not
switching) is flawed? If so, then you and I certainly agree it's flawed,
and probably Pete agrees too! (Though he'll have to speak for himself of
course.) Or do you mean the "meta-argument" (which correctly points out
the absurdity of one random 50-50 choice generation protocol being 1.25
times better than another, but goes on to conclude sweepingly that *no*
choice protocol, random or non-random, can be better than any other for
this problem) is flawed? If this latter, then I agree with you that the
"meta-argument" is flawed - though maybe for different reasons than yours!
- and I think Pete possibly *disagrees* and takes the meta-argument
to be just fine, sweeping conclusion and all! (Again, of course, he'll
have to speak for himself... I really shouldn't play guessing games with
other people's opinions like this. I wasn't even at his inaugural talk
after all!!)
<BLOCKQUOTE
TYPE=CITE>           &n
bsp;            &n
bsp;            &n
bsp;            &n
bsp;            &n
bsp;          
We both know that

choosing a box at random (eg, using a fair coin) and keeping the

contents gives you 1.5m pounds on average. The same is true by

symmetry if you choose a box randomly and keep the contents of

the **other** box: both methods must add up to the 3m pounds.

 </BLOCKQUOTE>
Exactly. That's the "correctly points out" clause at the beginning of what
I'm calling the meta-argument above. Fine so far. And indeed any other
*non-quantity-of-money-driven* elaboration of "50-50 random choice"
would be no better or worse too. You can think up really glorious protocols
here, like choosing a box, choosing an integer to say how many times to
"dither back and forth" (of course this integer may as well be restricted
to 0 or 1 since dithering back and forth is cyclic mod 2), and then duly
"dithering" that many times before finally announcing the box one settles
on. All these non-quantity-of-money-driven elaborations are the same as
just flipping a coin and getting the thing over with.
<BLOCKQUOTE TYPE=CITE> 

Peter's argument has the following flaw: The core argument is

that reading one box's cheque (say, amount x) makes the contents

of the other box's cheque amount a random variable with values 2x

and x/2 and a uniform distribution (50% probability for either).

But the contents of the other box is not random at all! It is

either 2x or x/2, and this depends on the experimentor's mood

when the boxes were set up.</BLOCKQUOTE>
I agree here - everything depends on the distribution the experimenter
used to generate the secret number m (the number which is used to
fill the boxes with m and 2m) in the first place. Even if
the experimenter later says something like "But I didn't use a distribution!
I just shut my eyes and thought of a number on a whim!", that statement
is scientifically meaningless - after all their "whim" has to be encoded
somehow in their brain state, and so the distribution is "whatever
probabilities
brains in that whim-state have for thinking up numbers". By hook or by
crook, there's a distribution for m lurking in there!
<BLOCKQUOTE
TYPE=CITE>           &n
bsp;            &n
bsp;            &n
bsp;            &n
bsp;  
This is a classical example for

assuming a probability distribution for something that you don't

know. People who do this</BLOCKQUOTE>
...(including me!)...
<BLOCKQUOTE
TYPE=CITE>           &n
bsp;            &n
bsp;            &n
bsp;   
call themselves Bayesians. The problem

with the Bayesian  approach is that in some cases, when you are

not careful, it can produce misleading results.</BLOCKQUOTE>
Ah, but all good Bayesians are careful. :-)

I think what you mean by being "not careful", in the context of this
example, is taking the distribution for the question "Is the box I've just
opened the m box or the 2m box?" to always be 50-50, *even
after you've read the quantity of money it contains*. That would indeed
be not just careless but wrong - it is, after all, 50-50 *before*
you've read the quantity of money, and *therefore*, for that very
reason, is not in general 50-50 *after* you've read the quantity
of money. If you had access to the experimenter's secret distribution for
generating m in the first place (which is not the same thing as
access to the value of m itself on this occasion, I hasten to add!),
you could perform the update in a fully proper and knowledgeable way. If
you are a Bayesian, then even in the absence of access to the experimenter's
distribution for m, you take it upon yourself to do the next best
thing to this: you make your own guess as to "what kind of distributions
experimenters like when playing games with people", you weight *these*
distributions according to a guessed set of weightings ("meta-distribution"?)
of your own, you collapse the meta-distribution of "ordinary" distributions
into one consolidated distribution by the standard
"multiply-it-all-out-and-add-it-all-up"
method, and you use *that* next-best-thing consolidated distribution
(and your just-obtained knowledge of the quantity of money in the box you
opened) to update your weightings - previously 50-50 - for that dangling
question this long and rambling paragraph started off with: "Is the box
I've just opened the m box or the 2m box?". Then, of course,
you switch iff your updated weightings favour the unopened box. Phew! :-)
<BLOCKQUOTE TYPE=CITE> 

Thinking deeper about Peter's box problem after the inauguration,

I thought that there might be a similar experiment that actually

increases your expected return of 1.5m.

You may find this astonishing, but I actually came up with a

strategy that has an expected return of **more than** 1.5m -

no matter what m is (as long as m is bigger than zero).

In order to increase suspension a little and give you some time

to think about this, I'll reveal my method to you - after you

reply to this mail saying something to the effect of "It would be

astonishing and completely unbelievable that such a method should

exist".

Regards!

Stefan</BLOCKQUOTE>
Well, as should now be clear, I wouldn't use *those* words - and
in fact the algorithm I said a good Bayesian would follow (the business
about the consolidated distribution, in my long and rambling paragraph
above) *would*, I believe, increase the expected return above 1.5m
in most real-life cases. Even using a very crude caricature of what I called
the meta-distribution in that paragraph would probably raise the expected
return a wee bit above 1.5m. So, I'm not saying such an algorithm
would be "astonishing and completely unbelievable". Not at all! But of
course I'd still like you to tell me your method. I'm too tired to work
out a particular instantiation of my own "algorithm in words only" in my
long and rambling paragraph above! :-)

Bye for now... (and apologies to those on the Theories of Everything
mailing list for whom this might all be a bit out of context - but it is
a fairly well-known "toy probability paradox" in the literature, and Stefan's
explanation of it at the beginning is quite self-contained.)

  Iain. >>
Received on Mon May 10 1999 - 11:50:45 PDT

This archive was generated by hypermail 2.3.0 : Fri Feb 16 2018 - 13:20:06 PST