UDist and measure of observers

From: Hal Finney <hal.domain.name.hidden>
Date: Fri, 22 Jul 2005 13:35:17 -0700 (PDT)

I want to describe in more detail how I see the Universal Distribution
(UDist) applying to the measure of observers and observer moments.
I apologize in advance for the length of this message; someday I will
collect this and my others on this topic into a set of web pages.

To briefly reiterate, in this model every information pattern or object
is said to exist in a Platonic sense. These are the only things that
exist. Further, these objects are associated with a measure defined by
the Universal Distribution or UDist, which is defined with respect to a
given Universal Turing Machine (UTM). Basically the measure of an object
is the probability that a random program run through that UTM produces
that object. Another way to think of the measure is as the fraction
of all programs which produce that object. You can also imagine an
infinite array of UTMs, each one working on a distinct input program
and producing outputs, where the measure of an object is proportional
to the fraction of the outputs match that object pattern.

To apply this concept to observers, we first need to think of an observer
as an information pattern. I adopt a block universe perspective and think
of time as a dimension. Then we can see the dynamic activity that is
part of an observer's thinking as producing a pattern in space and time.

Let us consider the human brain. Our current theories are that thinking
and perception can be thought of as reflecting the activity of neural
cells. These cells fire off electrical impulses at various rates
and times, and are hooked up into an elaborate and complex network of
interconnections, whose properties change over time.

It seems reasonable that a complete record of the neural activity and
dynamics of the interconnection map over a period of time would be an
effective and accurate representation of the information associated
with the mental activity during that time period. In fact this is
probably more detailed than is needed; clearly much neural processing
is unconscious, and further, small variations in individual cell activity
will not produce perceptual changes in consciousness. But at this point
we cannot be certain about what parts could be simplified or eliminated.
No doubt with future study of brain function we will gradually gain a
greater understanding of consciousness and we could reduce the information
content needed to fully specify an observer's experiences even further.

It is an instructive exercise to try to estimate how much information
is needed to specify this representation of an observer as it exists
within some window of time. Obviously this will be very approximate.
I'll take 1E10 (10^10 or 10 billion) as the number of neurons and
1E5 (10 thousand) as the number of connections per neuron. Representing
this map takes 1E10 * 1E5 * 33 bits per connection or about 3E16 bits.
Then we need to specify the connection strength for each connection;
perhaps 20 bits is about right, which gives 1 part in a million accuracy.
This gets us up to 6E17 bits.

This is a static representation. As the brain thinks, these connection
strengths change. My understanding is that they don't change very fast.
Some estimates I have seen suggest that long term memory can only record
1 bit per second. I know that sounds amazingly slow but it reflects
how much we forget. Even if this is low by 10 orders of magnitude
it still means that the information needed to record changes in the
interconnection map is insignificant next to the information needed
to specify the map in the first place, which I estimated as 6E17 bits.
So I will not count this information contribution, not because it doesn't
happen, but because it is so small compared to the information in the
static network.

Next we need to record the actual neural firing patterns. Neurons fire
at a maximum rate of about 1000/sec, but we need to record more than
the firing rates. Relative timing of the neurons is important as well,
both because the brain seems to record information in that timing,
but also because whether a neuron fires or not depends on the relative
timing of the various impulses it receives.

Let's suppose we record the neural firing activity at the microsecond
level; not just a boolean value about whether it fired, but some
indication of the neuron's level of activation and its recovery rate.
We will use a 20 bit value to represent this, and record it every
microsecond. This takes 1E10 * 20 * 1E6 bits, times the number of
seconds, or 2E17 times the number of seconds.

The bottom line from all this estimation, which is obviously very very
rough, is that it takes (6+2s) times 10^17 bits to record the pattern
associated with s seconds of brain activity, in enough detail that we
could plausibly claim to have fully captured the essence of those seconds
of consciousness. That's 10^18 bits for 2 seconds of consciousness.

10^18 bits is about 100 million gigabytes. For comparison, typical new
computer disks today are about 100-500 gigabytes. So it is a pretty
big chunk of data.

What is the point of this calculation? Effectively, it gives us a lower
bound on the measure of a possible set of observer moments. Even if
nothing else works, we could write a program which would output this
particular pattern simply by embedding that pattern in the program itself.
We could do the UTM equivalent of "print 1001011110010..." to print out
whatever pattern we want. The size of that program will be roughly the
size of the information pattern itself, in this case roughly 10^18 bits.
And the measure of that information pattern will be 1/2 to this power,
or 1/2^(10^18). This is an astronomically low measure.

In effect, this measure estimates the probability of a given moment of
consciousness appearing purely at random, out of random noise. This
answers the question sometimes posed of whether the random vibrations
in the atoms of air or a rock crystal are conscious, since we could
select a subset of them and match them against the information pattern of
consciousness described above. The answer is that the only program which
could do so would embed the entire information pattern within it, and the
measure of such a program is the tiny figure quoted above. Assuming that
there are non-infinitesimal sources of measure for a given consciousness,
then the contribution from these other sources is undetectable.

So what other sources of measure might there be? In other words,
how could we write a much shorter program which could output this same
information pattern?

One thing we could do is to try to compress the redundancy out of the
pattern. As I have defined it, even though I tried not to be terribly
excessive in the number of bits I allocated to the various terms, there
is certainly some redundancy. For one thing, I proposed to record
each neuron's firing pattern independently, even though in principle a
neuron's firing should be able to be computed from its inputs and the
interconnection matrix, both of which are recorded. In that case we
might get by with only storing the inputs to the brain from the perceptual
nerves, and use the stored interconnection matrix to compute everything
else. This could greatly reduce the per-second information requirement.

However I don't think this will work as well as it sounds, because I
suspect that the brain is chaotic in nature and so small changes in
initial conditions will lead ultimately to totally different output.
Even though I stored the interconnections with a precision of 1 in a
million, that may not be enough to keep the reconstructed brain pattern
identical to the recorded one. Likewise, my failure to store the changes
in brain interconnections will probably not be a successful strategy if
they have to be kept accurate in order to keep the simulation in sync
with the reality. I would not be surprised if ceasing to record the
precise internal firing patterns, and trying to reconstruct them, will
require additional information to be stored which will largely eliminate
the savings. And besides, at least for relatively brief observer moments
of less than a few seconds, the interconnection network took more room
than the neural firing patterns, so you aren't going to save that much.

No, we need a much more radical strategy to shrink the program. And here
is where it is so remarkable that instead of 10^18 bits, or even perhaps
10^14 bits if we could compress the data by a factor of 1000, I estimate
that only about 10^5 bits would be enough to encode this information,
perhaps even 10^4. The contribution to measure from such a short program
is so much larger than the "brute force" 10^18 version that the latter
can be completely neglected.

So how do we record this brain pattern in so little data? The answer
is that we adopt a completely different approach. Instead of specifying
all of the information, we instead specify the natural laws and initial
conditions for a universe which is suitable for the evolution of life
and intelligence. The program runs and creates that universe, and then
outputs the brain pattern which one of the observers in that universe
is experiencing. I will show below that the information content of such
a program is plausibly of the order I claimed.

This requires an implicit assumption that the brain pattern in question
can in fact be produced by an observer who evolved naturally and is
experiencing the events in a plausible universe. In other words, this
will not work for all brain patterns that we could ever imagine. But if
our own brain patterns represent actual experiences in a real universe,
a universe which is not too implausible, then it should work for them.
So I will assume that this is the case.

The program to output the brain pattern can be conceptually divided
into two parts. One part computes the universe, and the other outputs
the observer moments in question, in the format described above, as an
interconnection matrix and record of neural firings. We can estimate
the size of the program by the sum of the estimated sizes of the two
parts.

How big a program is needed to create our universe? Nobody knows, but the
answer seems to be, not that big. One of the most striking properties
of the natural laws which have been discovered so far is that they are
mathematically simple. Explaining why this is so is considered a major
philosophical puzzle, and answering this question is one of the biggest
successes of the UDist principle.

We don't yet have a complete theory of the laws of physics, but given
what is known, quantum theory and relativity, and prospective new models
like string theory and loop quantum gravity, it certainly doesn't appear
that they will be very big. Wolfram gave an estimate that the universe
could be completely modelled using 5 lines of Mathematica code. Now,
Mathematica has an extensive mathematical library but I doubt that most
of it is used. 5 lines of code times 70 characters per line times 8 bits
per character is about 3E3 bits. Let's assume that the math functions
triple the size and we get about 1E4 bits.

We also need to specify the initial conditions, but there again the
information content seems to be low. I quoted Tegmark last week that the
information content of the big bang is "close to zero". I don't know how
to translate this into a specific number of bits but I will assume that it
is substantially less than 1E4 bits and not increase the estimate so far.

So we have it that a program of about 1E4, 10 thousand, bits should be
able to re-create our universe in all its glory, including observers
like ourselves (in fact, exactly like ourselves). That is our estimate
for the size of the first part of the program which will output the
information pattern for a given sequence of observer moments.

For the size of the second part, we are given the output of the first
part, a complete representation of the state of nature at every point and
every instant in the universe, and we need to output a nicely ordered data
structure representing an observer's mental activity as described above.

There are several problems here. One is that the universe is very big,
and potentially has many observers who have many observer-moments. We
need to select a particular starting moment of a particular observer so as
to output the pattern above. A related problem is just identifying where
the observers are in that vast expanse of space-time. Another problem
is that the "natural" way of expressing the universe's state as output
by the first program may not be very similar to what we perceive as the
universe around us. It appears, based on our understanding of physics,
that it is going to be expressed as a theory of what we understand as
the very small, at the Planck scale, 1E-35 meters and 1E-44 seconds.
It is likely, then, that the output of the universe program is going to
be expressed at that scale. From that perspective, the activity of a
neuron is both enormous and insubstantial, effectively just a very large
scale averaging of the much more dynamic activity at this, the natural
scale of physics.

This last problem, the difference in scale, can probably be dealt with
very simply. It may be enough simply to average the state over a large
region in order to get a good, macroscopic (in this context, meaning at
the scale of atoms and molecules) view of what is going on; or it may
be that subatomic particles like electrons and quarks will be clearly
represented even at the Planck scale and we can use them to identify
atoms and molecules at the larger scale.

The next problem is to locate and localize the observer within the entire
framework of spacetime. The way I envision this being done is that
the position of the observer's brain in space and time is hard-coded
into the program. The position can be expressed as a fraction of the
size of the universe, to the required resolution. I would think that
for a program which is going to analyze neural structure, nanometer
resolution is enough to localize the brain. The typical synaptic gap
between neurons is about 10-20 nm. In terms of time, since I sought
microsecond resolution for the neural state, localizing the observer
to microsecond resolution should be adequate.

The size of the universe is unknown, but let us for convenience work with
the size of the visible universe, about 3E10 (30 billion) light years.
This is 3E35 nanometers. It takes about 118 bits to represent a number
of that size, and we have 3 dimensions, so it takes about 350 bits to
fully localize a brain in space the size of the visible universe.

In terms of time, the universe is about 1E10 (10 billion) years old,
which is 3E23 microseconds. 80 bits is enough for that. Putting these
together, 350+80 or about 430 bits will give us a starting point accurate
to a microsecond and a nanometer for producing a description of a brain.

The actual analysis software should be straightforward. We need to locate
all the neurons, record their interconnection patterns, and then their
firing rates and activity levels. All of these have relatively simple
physical correlates given that you can analyze matter at an arbitrarily
fine scale. Locating the neurons can be done by tracing their outer
membranes. The interconnection patterns should be determined by the
amount of area they have in common, the number and distribution of
vessicles and receptors in the area, and basic chemistry as to whether
the connection is inhibitory or excitatory. This is a matter of simple
geometry and counting. Likewise, the activity level is a function of
the concentration of various chemicals inside vs outside the neural
membrane and can be calculated very simply.

This level of software is adequate to create the data structure defined
above for completely specifying the neural activity which corresponds
to a given set of observer moments. It amounts to simple counting,
area calculation, and averaging. My guess again is that 10^4 to 10^5
bits is fully adequate to perform these tasks. Adding the < 10^3 bits
needed to localize the observer still keeps it within this range.

Combining the software to create the universe, perhaps 10^4 bits, and the
software to output the observer description, about 10^4 to 10^5 bits,
we get the size proposed above, 10^4 to 10^5 bits for a self-contained
program which will output the observer description in question. On this
basis we can use a number like 1/2^(10^4) as an estimate for the measure
of such a set of observer moments.

Hopefully this explanation will clarify how we can apply the UDist
model to calculate measure of observer moments as well as other
information structures. It also illustrates how far we are from the
scientific knowledge necessary to come up with more precise estimates
for the information content of conscious entities.

Nevertheless, even with the crude level of knowledge available today,
we can make many powerful predictions from this kind of model. One case
described above is the paradox of whether conscious entities exist all
around us due to vibrations in air molecules, which this analysis lets us
reject in a quantitative sense. Hans Moravec in particular has argued
that such entities have a reality equal to our own, which is clearly
wrong. A similar analysis disposes of the long standing philosophical
debate over whether a clock implements every finite state machine (and
hence every conscious entity). Other puzzles, such as the impact on
measure of replays and duplicates can also be addressed and solved in
this framework. I have described other predictions and solutions in my
earlier messages on this topic.

Again, I hope that by laying out my calculations in this much detail it
will help people to see somewhat concretely how the Universal Distribution
works and how you can analyze measure using actual software engineering
concepts. It makes the UDist much more real as a useful tool for
understanding measure and making predictions.

Hal Finney
Received on Fri Jul 22 2005 - 17:29:29 PDT

This archive was generated by hypermail 2.3.0 : Fri Feb 16 2018 - 13:20:10 PST