Thursday, February 18, 2010

Barnes Wall Lattices 2 - A detour via RM codes and Soft Decision Decoding

Well Hello Again,

Today we will again discuss the ongoing topic in a little more detail. But this time, I would like to take a detour.

I would return to the paper later. But first, we need to get somethings out of the way first. (Actually, its intended more for me to get these concerns out - oh well)

So, the paper talks about creating an algorithm that finds the lattice point closest to a given target point. To this end, the previous post took you through the algorithm ParBW which relied on an algorithm called SeqBW which in turn relied on a Soft Decision Decoding algorithm for Reed Muller Codes.

Naturally, we would need to walk through these to understand the material, but before that a number of remarks and concepts need to be understood. Probably you know them already, but as I am learning these as I read those, I will post about them anyway.

So, let me begin by asking what is Soft Decision Decoding (henceforth SDD) and what is it good for? When I first heard of SDD, I wondered what in Merlin's name is this decoding procedure as all I had seen till then was a majority vote method. (You determine whether the received bit in a codewrod was $1$ or $0$ by a bunch of parity check equations. If most of them said its $1$ (resp $0$) you accepted that as the bit's value). Now prepare yourself for a slurry of definitions. We will return to the mathematical part soon enough (depending on how you define soon).

If you already know these concepts you can directly skip to the 2nd row of stars.

******************************************************************************

But here I had a clear cut indication that there was something else brewing in the mathematical universe. All I knew till that point was what you call Hard Decision Decoding algorithm (henceforth HDD). So, what was the difference between these decoding methods? HDD, it turned out, was the method that you used in Binary Symmetric Channels, or put more simply, in those settings where you knew that the codeword that you receive was going to contain only $1$'s and $0$'s.

On the other hand, SDD is a bit more realistic. Since, the modern communication techniques involve fair amount of analog stuff as well, somewhere you convert the digital information to analog and then reconvert it at the receiver's end. This reconversion needs to be looked into carefully as we are interested in recovering from communication errors. Thus, you need to make amends to your decoding process so that it picks up from the place when you just recieve the "raw" message which has not yet been digitzed. You will now have to account for a received word which is not all $0$'s and $1$'s.

On a side note, the channels on which you receive these are called Additive White Gaussian Noise Channels (or AWGN channel). Now, as you can see, your deciding process just got more complex. But it will also be more accurate and certainly we will like to use it when it does not tax us too bad in terms of decoding time.

Turns out, with Reed Muller Codes, which have very lovely structure and description, you indeed can reap some benefits by using SDD.

In the rest of the post, I would describe soft decision decoding of Reed Muller Codes which you can find in the paper as well. For saving the typing effort, I would ask you to kindly look at page-6 of the paper. Okay, you might ask that this function, $RMDec^{\psi}$ is being called from $SeqBW$, an algorithm I have not talked about. In retrospect, after having read this paper, it seems instructive to go through this algorithm first which is what we do next (basically, the idea behing me taking this approach is to bring you knowledge of this topic to the level so that you may find the authors intuition accessible).

******************************************************************************

Hmm...So soft decoding. First, we ask where are those painful non-bit quantities. Look at the algorithms input - the vector $t$ is what includes some non-bit quantities. The vector $t$ is a $2n$-bit length vector. And it contains a $n$-bit vector $b$ interleaved with a $n$-tuple vector which has got "non-bit entries" all of which lie between $0$ and $1$.

Basically INPUT is $t_j =$ {$0,1$} X $[0,1] \forall j = [1...N (=2^n)]$.
And the desired OUTPUT is  the decoded codeword.

Now our idea is pretty simple. In case, we have a $RM(r, m)$ code with $r = 0$ we take a majority vote. In case, it is a $RM(r, m)$ code with $r = m$, we just return the vector of $b$'s. And in case its neither of this well - take a guess - we recurse.

(Why should not we...After all Reed Muller codes offer such a nice recursive construction, recursion is ony a natural candidate).

So, look again at the problem that we want to solve with using our recursive scheme. It involves calculating $RM(r-1, n-1)$ and $RM(r, n-1)$ which can help you calculate $RM(r,n)$.

[Here is why. You can treat $RM(r, n)$ as a codeword giving rise to a $r$-degree polynomial (say $f = x_1 \cdot x_2 .... \cdot x_r$. You can write $f = f_1 + f_2$ where $f_1$ is a polynomial obtained from $RM(r, n-1)$ and $f_2$ is obtained from $RM(r-1, n-1)$. Thus, you can think of $f_1$ as a polynomial necessarily including $x_1$ while $f_2$ necessarily misses it. Adding these two will have a $f_2$ zone as its free from interference from $f_1$ because half the time $f_1 = 0$ as $x_1 = 0$.]


So, our algorithm involves finding a vector $t_j^+$ and another vector $t_j^-$ each half the size of the original vector $t$. The $j^{th}$ entry in $t_j$ is decided by first splitting $t$ into two halves - $t_0$ and $t_1$. Next, it involves XORing the $j^{th}$ bits in the two halves together with the the minimum of the reliability information from the $j^{th}$ corresponding reliability entries on both sides.

You compute $j^{th}$ entry of the vector $t_j^-$ essentially using the same idea but different equations.

The idea behind these calculations, as revealed in the reference 10 cited in the paper, is that that the components of the received word are assumed to be statistically independent from each other. The received word consists of the codeword and an error. For a memoryless channel the errors in different components of the received word are statistically independent from each other. Different components of one codeword, however, are in general not independent from each other. Therefore, this assumption is only an approximation.

Why this approximation is good enough and why not try some other whacky thing for calculating $t_j^+$, as of now, beats me.

For instance, why do we not set $t_j^+ \leftarrow b_j^0 \oplus b_{j+1 \bmod N/2}, min(\rho_j^0,\rho_j^1)$.

Am working on it. Will write it up as soon as some more clarity dawns.

Till then have a good time

-Akash

Wednesday, February 3, 2010

Barnes Walls Lattice - I

Hello again Everybody,

Today we discuss this paper by Daniele Micciancio and Antonio Nicolosi.

Grab your own copy though, it will be helpful. Assuming that you have a copy in your hand now, you can follow this post. I will be discussing their (the authors) work in some relaxed chunks explaining in a little more detail that I think should be helpful.

You might see that this post is labeled post - I. There will be more follow-up posts as I understand more about the topic.

So, time to start. Here is what Barnes Wall Lattice means (copied verbatim from the paper). For any positive integer $n$, $BW^n$ is the $N = 2^n$ dimensional lattice over $G$ generated by the rows of the $n$-fold Kronecker product

$BW^n = \begin{pmatrix}1&1\\0&\phi\end{pmatrix}^{\oplus n}$

Here, the symbol $\phi$ is the Gaussian integer $i + 1$ and $\oplus$ is the symbol for Kronecker Product. Note that the paper uses a different symbol.

Note that $\phi$ is the prime of least squared norm in $G$.

You can go through the paper now over to the point where the authors note that $BW^{n+1} = {[u,\ u+\phi \cdot v]:\ u,v \in \  BW^n}$

Now you are invited to have a look at the algorithm presented in the paper for parallel Bounded Distance Decoding - that is - at the algorithm ParBW.

Next, I will prove that the algorithm works correctly. It is here that you might want to see the relaxed exposition in this post. To simplify the situation, first we will make a few simple observations following the authors.

But before that, we restate the problems that authors are going to address. Given a vector $s \in C^n$ which is within a distance of $d_min/2$ of some lattice point, find the lattice point $z$ it is closest to.

(1) Any vector $z \in BW^n$ can be written as $[z_0, z_1]$ where $z_0$ and $z_1$ belong to $BW^{n-1}$.

(2) The converse of (1) is not true. Huh...you did not really think it was that easy. In fact, if $z_0$ and $z_1$ belong to $BW^{n-1}$, it is $z_0 + \phi \cdot z_1$ that belongs to $BW^n$.

(3) The squared $l_2$-norm of $[s_0, s_1]$ i.e., $||s_0, s_1||^2 = ||s_0||^2 + ||s_1||^2$. Thats true no matter where you split the vector $s$. Further, we split $s$ so that both $s_0$ and $s_1$ (and also $z_0$ and $z_1$ above) are in $C^{n-1}$ (resp $BW^{n-1}$) if to begin with $s$ was in $C^n$ (resp $z$ was in $BW^n$).

(4) There is an automorphism, that is, a distance preserving linear transform that maps points in $BW^n$ to other points in $BW^n$. The authors say $T[z_0, z_1] \rightarrow \phi/2 \cdot [z_0 - z_1, z_0 + z_1]$.

A better way to visualize this, as Chris told me, was to see the corresponding matrix and check if it was uni-modular (determinant 1, all entries from the corresponding domain). You can see that the transformation is captured in the following matrix which acts on the column vector $[z_0, z_1]$ from the left and results in the column vector on the right side above.

\[\phi/2 \cdot \begin{pmatrix}1&-1\\1&\ 1\end{pmatrix} \] as you might check.

The determinant of this matrix is $2 \cdot \phi^2/4 = i$. Indeed, its unimodular.

Just for clarification sake, we call $\phi/2 \cdot [z_0 - z_1]$ as $z_-$ and $\phi/2 \cdot [z_0 + z_1]$ as $z_+$. Similarly we define $s_-$ and $s_+$.

(5) Now this is a pretty  important observation that says that given a transformed point $z$ under some transformations that we list below, it is possible to recover that point.

So, given the following transformation that is not distance preserving, that is the one here which maps $[z_0, z_1]$ to $[z_0, (z_0+z_1)/2]$, which can be captured by this matrix
\[P = \begin{pmatrix}1&0\\\phi/2&\phi/2\end{pmatrix} \]
you can come up with the corresponding inverse matrix, $P^{-1}$ which is really lovely.

It is precisely the matrix that does the transformation you see in line 9 of the function ParBW.

Looking ahead, a quick calculation reveals that
\[P^{-1} = \begin{pmatrix}1&0\\-1&2/\phi\end{pmatrix} \]

So, if you were to apply $P$ to some lattice point in $BW^n$ and then you follow it with $P^{-1}$, you will recover the point. Ditto holds for all the other transformations.

With these remarks in place, the algorithm is straightforward to analyze (assuming that SeqBW) works correctly, of course. We will talk about SeqBW in a later post.

Just follow the algorithm. Recurse when the authors say. In case, you are following the paper (you definetly should) notice that there is a minor slip when authors say "The algorithm recursively computes four $N/2$-dimensional vectors $z_*$ (for the symbol $ * \in {0,1,+,-}$) with the property that if $s_*$ is within squared distance $N/2$ from $BW^{n-1}$, then $z_* = \bar{z_*}$"

The statement should end in  "...squared distance $N/8$ from $BW^{n-1}$, then $z_* = \bar{z_*}$".

Anyway, now the idea is pretty clear. We calculated $s_-$ and $s_+$ as fail-safe guarantees (in a sense that will become clear soon enough).

The transormation taking $[s_0, s_1]$ to $[s_-, s_+]$ being distance preserving the shortest distance of $s$ from a lattice point is the same as the shortest distance of $[s_-, s_+]$ from the transformed lattice point (which is again a lattice point in the original lattice!).

Thus, by observation (4) above, we are cruising smoothly. One among $s_0, s_1$ and one among $s_-, s_+$ is a small vector (meaning has size smaller than $N/4$).

By applying the new inverse transforms we learned of in (5), we can recover 4 candidate close points. The closest one to $s$ among these is the answer as we assumed that $s$ was within decoding radius of some lattice point.

This, hopefully clearly covers the function ParBW. Allow me to catch up a little breath. I will blog upon the remaining portions soon.

Thanks and Have a Good Day
-Akash

Welcome to a Coding Theory Blog

Hello again everyone,

I am starting a new blog. This one will discuss the ideas in coding theory as I start working through the topic.

I am pretty new to this area but I am pretty excited about it. Many scientists have contributed to this fascinating field. Two of my favorites among them are also the co-authors of a wonderful book on recreational mathematics Winning Ways for your Mathematical Plays. They are Elwyn Berkelamp and John Horton Conway.

Also, the noted computer scientist Madhu Sudan won the prestigious Nevanlinna Prize for his work on error correcting codes.

There is one confession - the posts in this blog will be mostly without structure. An underlying theme will be hard to find. This is partly because I am hard pressed for time and partly because I am writing this blog mainly for my own selfish ends - to understand a problem I am working on under the supervision of Chris Peikert better.

I will start discussing from a bit advanced topics and if you are following these posts, probably a lot of mathematical background will be assumed (including a bit of mathematics I studied recently).

For eg it will be assumed that you are fluent in Linear Alebra and Combintorial reasonings. A sufficient deal of Mathematical Maturity will be assumed.

I still hope, if you do follow this blog, you will find the topics it touches upon a little enligtening - after all that is the target of this blog - to explain a few things I will be touching upon in a little more relaxed detail.

Thank You
Have a Good Day
-Akash