A source is an ordered pair , where is a finite set, known as a source alphabet, and is a probability distribution over . Denote the probability of by or .
Theorem 1.1
is defined and continuous for all satisfying , .
For , ,
A function satisfy properties 1-3 if and only if it has the form
where and if .
Let be a probability distribution. Then is the b-ary entropy of the distribution .
If is the distribution of a source, is the entropy of the source.
Example 1.1.3
The entropy function
Definition: entropy of a random variable
Let be a random variable with range If then the entropy of is defined by
lemma 1.2.1
lemma 1.2.2
, and .
, and .
Then
Theorem 1.2.3 (the range of entropy function)
Let be a discrete random variable with range .
Theorem 1.2.4
Let be a probability distribution.
If then
Theorem 1.2.5
Equality holds if and are independent.
Theorem 1.2.9
The entropy function is a convex down function.
Definitions:
code alphabet: a finite set .
string or word: any sequence of elements of A
The set of all strings over is denoted as
-ary code: a nonempty subset of .
The radix of the code:
E.g. {0, 1} makes binary codes, {0, 1, 2} makes ternary codes.
Definition
An encoding scheme for the source is an ordered pair where is a code and is an encoding function.
Definition
The average code word length of an encoding scheme for the source where .
Definition
fixed length code / block code: All codewords in a code have the same length.
Otherwise, it is variable length code.
fixed length encoding scheme
variable length encoding scheme
Definition
A code is uniquely decipherable if whenever , are codewords in and , then and for all .
Definition
A code is instantaneous if each codeword in any string of codewords can be decoded as soon as it is received.
If a code is instantaneous, then it is also uniquely decipherable.
the converse is not true.
Definition
A code has the prefix property if no codeword is a prefix of any other codeword, that is, if whenever is a codeword, then is not a codeword for .
Theorem 2.1.1
A code is instantaneous if and only if it has the prefix property.
Theorem 2.1.2 Kraft's Theorem
Theorem 2.1.3 McMillan's Theorem
The code lengths of a uniquely decipherable r-ary code must satify Kraft's inequality.
Theorem 2.1.4
If a uniquely decipherable code exists with codeword lengths , then an instantaneous code must also exits with the same codeword lengths.
One of the optimal r-ary encoding scheme with minimum average length.
Theorem 2.3.1
is an instantaneous encoding scheme for . Then
Theorem 2.3.2 The noiseless coding theorem
For any probability distribution . We have
Definition n-th extension
Binary symmetric channel:
Binary erasure channel:
Definition
Condictional entropy of X given is defined by
Theorem 3.1.1
If X and Y are random variables, then
channel matrix
In a channel matrix,
Each row a single input. Each column is a single output.
lossless: input is completely determined by output. (one to many)
deterministic: output is completely determined by input. (many to one)
noiseless: both lossless and deterministic. (one to one)
useless: knowledge about input tells nothing about output. (many to many)
row/column symmetric: each row/column has the same group of numbers, but may in different order.
symmetric: both row and column symmetric.
Theorem
For row symmetric channel, is independent of input distribution.
Theorem
For column symmetric channel, uniform input produces uniform output.
The definition is the same for random vectors and .
Properties
Definition
The capacity of a channel is the maximum mutual information , taken over all input distributions of .
Theorem
The capacity of a symmetric channel, with inputs
capacity:
lossless:
deterministic:
noiseless:
useless:
binary symmetric channel: , p is crossover probability
A decision scheme is a partial function from the set of output strings to the set of codewords.
"partial" means the function is only defined for part of output strings.
decision error/decoding error: when is not the codeword that had been sent.
Assume the codeword is , and the output string received is .
The probability f a decision error:
definition
An idea observer is any decision scheme for which satisfys
Theorem
For any input distribution, an idea observer decision scheme minimizes the probability of a decision error among all decision schemes.
Idea observers depends on the input distributions.
For uniform input, it is a maximum likelihood decision scheme.
Theorem
An -ary -length -code .
The rate of the code is
Theorem: The Noisy Coding Theory
Consider a discrete memoryless channel with capacity . For any positive number , there exists a sequence of -ary codes, and corresponding decision schemes , with the following properties.
Theorem: Fano's inequality
For any decision scheme with , and any input distribution, if denotes the probability of a decision error, then
theorem: weak convere to the noisy coding theorem
Consider a discrete memoryless channel with capacity . Suppose that is a sequence of -codes, with corresponding decision schemes , and that the average probability of error of is . Then if , there exists a constant for which
Theorem: Strong Converse to he Noisy Coding Theorem
Consider a distrete memoryless channel, with capacity . Suppose that is a sequence of -codes, with corresponding decision schemes , and that the average probability of error of is . Then if , we must have
definition
code alpgabet:
is the set of all strings of length over
-ary block code : any nonempty subset of
codeword: each string in
an -code: length n and size M
The rate of a -ary -code is
definition
A discrete memoryless channel consists of an input alphabet , an output alphabet containing , and a set of channel probabilities, or transition probabilities, satisfying
other definitions:
binary symmetric channel, crossover probability, channel matrix
input distribution
decision scheme, decision error/decoding error, idea observer, average probability of error, maximum likelihood decision, the Noisy Coding Theorem.
minimum distance decoding: choose a codeword that is closest to the received word (maximum likelihood decoding).
A -code is a code with length , size , and minimum distance .
t-error-correcting: at most errors can be corrected with MDD
exactly t-error-correcting
Theorem
A code is exactly t-error-detecting if and only if .
A code is exactly t-error-correcting if and only if or .
if and only if is exactly -error-correcting.
definition
An -code is maximal if it is not contained in any -code.
Theorem
Denote
For the binary symmetric channel, the probablity of a coding error for a maximal -code satisfies
definition
is a word in where . is a nonnegative real number. The sphere of radius about is the set
definition
Let be a code in . The packing radius of is the largest integer for which the spheres about each codeword are disjoint.
The covering radius of is the smallest integer for which the spheres about each codeword cover .
Theorem
A code is -error-correcting if and only if the spheres about each codeword are disjoint, if and only if .
Definition
A code is perfect if .
There exists a number for which the spheres about each codeword are disjoint and cover .
Theorem: The sphere-packing condition
Let be a -code. Then is perfect if and only if is odd and
definition
A -ary -code is systematic if there are positions with the property that by restricting the codewords to these positions we get all of the possible -ary words of length .
information set, information symbols
Theorem
If is a prime number and is a positive integer, there is exactly one field of size which is denoted by or .
extend to vector space , denoted by
Linear codes
A code is a linear code if it is a subspace of .
If has dimension over and minimum distance , is an -code.
definition
The weight of a word is the number of nonzero positions in . The minimum weight of a code is the minimum weight of all nonzero codewords in .
For linear code, .
The codewords in are the linear combinations of thr rows of .
Standard form: , where is the identity matrix of size .
It is systematic on its first coordinate positions.
definition
Let be an -code. The set
Theorem
definition
parity check matrix: , is the generator matrix for the dual codes of .
if and only if .
The rows of are the coefficients of a system of linear equations(parity check equations) whose solutions are precisely the codewords in .
Theorem
Let be a -code, with parity check matrix . Then has at most linearly independent columns. (and at least linearly dependent columns)
Theorem: Gilbert-Varshamov Bound
There exists a -ary linear -code with minimum distance at least provided that
definition
Let be a -code, with parity check matrix . For any , is the syndrome of .
The quotient space of modulo is
Theorem
Let be a -code, with parity check matrix .
and in have the same syndrome if and only if they are in the same coset of the quotient space .
Theorem
Let be a -code, with parity check matrix .
The minimum distance decoding is equivalent to decoding a received word as a word where is a word of smallest weight in the coset , or a word of smallest weight with the same syndrome as .
standard array
0 | ||||
---|---|---|---|---|
The first row is the codewords in . The -th row is formed by choosing a word of smallest weight that is not yet in the array, and adding it to each word of the first row, to form the coset . is the coset leaders.
syndrome decoding: if is received, compute its syndrome, find the coset leader with the same syndrome, and decode as .
definition
A linear -code must have .
A linear -code is called a maximum distance separable (MDS) code.
Theorem
is MDS code if and only if any columns of are linearly independent.
If a code is MDS, then so is its dual code.
Theorem
is MDS if and only if any columns of are linearly independent.
MDS code is systematic on any positions.
Theorem
Let be an -code with generator matrix in standard form.
is an MDS code if and only if every square submatrix of is nonsingular.
The support of a vector is the set of all coordinate positions where is nonzero.