The past few weeks, I've been working on a web page that looks at concepts
like entropy, uncertainty, and information theory. It started out as a simple
definition of entropy, but grew so
much that I split most of the material off into a separate handout on
Information Theory Models.
In the process of looking for web resources, I found a fascinating book,
Information Theory, Inference, and Learning Algorithms by David J.C.
MacKay (ISBN: 0521642981). The authors has placed the
full text
of this book in pdf format (all 640 pages) on the web. You are not
allowed to print the pdf file, but it does allow you to browse this book. It
is a rather eclectic mix of topics. Here's the table of contents:
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. v
1 Introduction to Information Theory . . . . . . . . . . . . . . . . 3
2 Probability, Entropy, and Inference . . . . . . . . . . . . . . . . . 22
3 More about Inference . . . . . . . . . . . . . . . . . . . . . . . . 48
I Data Compression . . . . . . . . . . . . . . . . . . . . . . . . 65
4 The Source Coding Theorem . . . . . . . . . . . . . . . . . . . . 67
5 Symbol Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6 Stream Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7 Codes for Integers . . . . . . . . . . . . . . . . . . . . . . . . . . 132
II Noisy-Channel Coding . . . . . . . . . . . . . . . . . . . . . .
137
8 Correlated Random Variables . . . . . . . . . . . . . . . . . . . . 138
9 Communication over a Noisy Channel . . . . . . . . . . . . . . . 146
10 The Noisy-Channel Coding Theorem . . . . . . . . . . . . . . . . 162
11 Error-Correcting Codes and Real Channels . . . . . . . . . . . . 177
III Further Topics in Information Theory . . . . . . . . . . . . .
191
12 Hash Codes: Codes for Efficient Information Retrieval . . . . . 193
13 Binary Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
14 Very Good Linear Codes Exist . . . . . . . . . . . . . . . . . . . 229
15 Further Exercises on Information Theory . . . . . . . . . . . . . 233
16 Message Passing . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
17 Communication over Constrained Noiseless Channels . . . . . . 248
18 Crosswords and Codebreaking . . . . . . . . . . . . . . . . . . . 260
19 Why have Sex? Information Acquisition and Evolution . . . . . 269
IV Probabilities and Inference . . . . . . . . . . . . . . . . . . .
281
20 An Example Inference Task: Clustering . . . . . . . . . . . . . . 284
21 Exact Inference by Complete Enumeration . . . . . . . . . . . . 293
22 Maximum Likelihood and Clustering . . . . . . . . . . . . . . . . 300
23 Useful Probability Distributions . . . . . . . . . . . . . . . . . . 311
24 Exact Marginalization . . . . . . . . . . . . . . . . . . . . . . . . 319
25 Exact Marginalization in Trellises . . . . . . . . . . . . . . . . . 324
26 Exact Marginalization in Graphs . . . . . . . . . . . . . . . . . . 334
27 Laplace's Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
28 Model Comparison and Occam's Razor . . . . . . . . . . . . . . 343
29 Monte Carlo Methods . . . . . . . . . . . . . . . . . . . . . . . . 357
30 E cient Monte Carlo Methods . . . . . . . . . . . . . . . . . . . 387
31 Ising Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
32 Exact Monte Carlo Sampling . . . . . . . . . . . . . . . . . . . . 413
33 Variational Methods . . . . . . . . . . . . . . . . . . . . . . . . . 422
34 Independent Component Analysis and Latent Variable Modelling
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
35 Random Inference Topics . . . . . . . . . . . . . . . . . . . . . . 445
36 Decision Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
37 Bayesian Inference and Sampling Theory . . . . . . . . . . . . . 457
V Neural networks . . . . . . . . . . . . . . . . . . . . . . . . .
467
38 Introduction to Neural Networks . . . . . . . . . . . . . . . . . . 468
39 The Single Neuron as a Classiffier . . . . . . . . . . . . . . . . . . 471
40 Capacity of a Single Neuron . . . . . . . . . . . . . . . . . . . . . 483
41 Learning as Inference . . . . . . . . . . . . . . . . . . . . . . . . 492
42 Hopfield Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 505
43 Boltzmann Machines . . . . . . . . . . . . . . . . . . . . . . . . . 522
44 Supervised Learning in Multilayer Networks . . . . . . . . . . . . 527
45 Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . 535
46 Deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
VI Sparse Graph Codes . . . . . . . . . . . . . . . . . . . . . . 555
47 Low-Density Parity-Check Codes . . . . . . . . . . . . . . . . . 557
48 Convolutional Codes and Turbo Codes . . . . . . . . . . . . . . . 574
49 Repeat{Accumulate Codes . . . . . . . . . . . . . . . . . . . . . 582
50 Digital Fountain Codes . . . . . . . . . . . . . . . . . . . . . . . 589
VII Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 597
A Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
598
B Some Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
C Some Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . 605
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
613
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
620
There are also some delightfully fun exercises. Here are three examples.
You have a two-pan balance; your job is to weigh out bags of our with
integer weights 1 to 40 pounds inclusive. How many weights do you need? [You
are allowed to put weights on either pan. You're only allowed to put one our
bag on the balance at a time.]
Pattern recognition by molecules. Some proteins produced in a cell have
a regulatory role. A regulatory protein controls the transcription of
specific genes in the genome. This control often involves the protein's
binding to a particular DNA sequence in the vicinity of the regulated gene.
The presence of the bound protein either promotes or inhibits transcription
of the gene. (a) Use information-theoretic arguments to obtain a lower bound
on the size of a typical protein that acts as a regulator specific to one
gene in the whole human genome. Assume that the genome is a sequence of 3 109
nucleotides drawn from a four letter alphabet A; C; G; T; a protein is a
sequence of amino acids drawn from a twenty letter alphabet. [Hint: establish
how long the recognized DNA sequence has to be in order for that sequence to
be unique to the vicinity of one gene, treating the rest of the genome as a
random sequence. Then discuss how big the protein must be to recognize a
sequence of that length uniquely.]
In a magic trick, there are three participants: the magician, an
assistant, and a volunteer. The assistant, who claims to have paranormal
abilities, is in a soundproof room. The magician gives the volunteer six
blank cards, five white and one blue. The volunteer writes a different
integer from 1 to 100 on each card, as the magician is watching. The
volunteer keeps the blue card. The magician arranges the five white cards in
some order and passes them to the assistant. The assistant then announces the
number on the blue card. How does the trick work?
Estimate in bits the total sensory experience that you have had in your
life, visual information, auditory information, etc. Estimate how much
information you have memorized. Estimate the information content of the works
of Shakespeare. Compare these with the capacity of your brain assuming you
have 1011 neurons each making 1000 synaptic connections, and that the
capacity result for one neuron (two bits per connection) applies. Is your
brain full yet?
I enjoyed browsing through the book over the past few days and I might have
to buy it, just to support someone bold enough to put the full book out on
the web for curious people like me.