Well Hello Again,
Today we will again discuss the ongoing topic in a little more detail. But this time, I would like to take a detour.
I would return to the paper later. But first, we need to get somethings out of the way first. (Actually, its intended more for me to get these concerns out - oh well)
So, the paper talks about creating an algorithm that finds the lattice point closest to a given target point. To this end, the previous post took you through the algorithm ParBW which relied on an algorithm called SeqBW which in turn relied on a Soft Decision Decoding algorithm for Reed Muller Codes.
Naturally, we would need to walk through these to understand the material, but before that a number of remarks and concepts need to be understood. Probably you know them already, but as I am learning these as I read those, I will post about them anyway.
So, let me begin by asking what is Soft Decision Decoding (henceforth SDD) and what is it good for? When I first heard of SDD, I wondered what in Merlin's name is this decoding procedure as all I had seen till then was a majority vote method. (You determine whether the received bit in a codewrod was $1$ or $0$ by a bunch of parity check equations. If most of them said its $1$ (resp $0$) you accepted that as the bit's value). Now prepare yourself for a slurry of definitions. We will return to the mathematical part soon enough (depending on how you define soon).
If you already know these concepts you can directly skip to the 2nd row of stars.
******************************************************************************
But here I had a clear cut indication that there was something else brewing in the mathematical universe. All I knew till that point was what you call Hard Decision Decoding algorithm (henceforth HDD). So, what was the difference between these decoding methods? HDD, it turned out, was the method that you used in Binary Symmetric Channels, or put more simply, in those settings where you knew that the codeword that you receive was going to contain only $1$'s and $0$'s.
On the other hand, SDD is a bit more realistic. Since, the modern communication techniques involve fair amount of analog stuff as well, somewhere you convert the digital information to analog and then reconvert it at the receiver's end. This reconversion needs to be looked into carefully as we are interested in recovering from communication errors. Thus, you need to make amends to your decoding process so that it picks up from the place when you just recieve the "raw" message which has not yet been digitzed. You will now have to account for a received word which is not all $0$'s and $1$'s.
On a side note, the channels on which you receive these are called Additive White Gaussian Noise Channels (or AWGN channel). Now, as you can see, your deciding process just got more complex. But it will also be more accurate and certainly we will like to use it when it does not tax us too bad in terms of decoding time.
Turns out, with Reed Muller Codes, which have very lovely structure and description, you indeed can reap some benefits by using SDD.
In the rest of the post, I would describe soft decision decoding of Reed Muller Codes which you can find in the paper as well. For saving the typing effort, I would ask you to kindly look at page-6 of the paper. Okay, you might ask that this function, $RMDec^{\psi}$ is being called from $SeqBW$, an algorithm I have not talked about. In retrospect, after having read this paper, it seems instructive to go through this algorithm first which is what we do next (basically, the idea behing me taking this approach is to bring you knowledge of this topic to the level so that you may find the authors intuition accessible).
******************************************************************************
Hmm...So soft decoding. First, we ask where are those painful non-bit quantities. Look at the algorithms input - the vector $t$ is what includes some non-bit quantities. The vector $t$ is a $2n$-bit length vector. And it contains a $n$-bit vector $b$ interleaved with a $n$-tuple vector which has got "non-bit entries" all of which lie between $0$ and $1$.
Basically INPUT is $t_j =$ {$0,1$} X $[0,1] \forall j = [1...N (=2^n)]$.
And the desired OUTPUT is the decoded codeword.
Now our idea is pretty simple. In case, we have a $RM(r, m)$ code with $r = 0$ we take a majority vote. In case, it is a $RM(r, m)$ code with $r = m$, we just return the vector of $b$'s. And in case its neither of this well - take a guess - we recurse.
(Why should not we...After all Reed Muller codes offer such a nice recursive construction, recursion is ony a natural candidate).
So, look again at the problem that we want to solve with using our recursive scheme. It involves calculating $RM(r-1, n-1)$ and $RM(r, n-1)$ which can help you calculate $RM(r,n)$.
[Here is why. You can treat $RM(r, n)$ as a codeword giving rise to a $r$-degree polynomial (say $f = x_1 \cdot x_2 .... \cdot x_r$. You can write $f = f_1 + f_2$ where $f_1$ is a polynomial obtained from $RM(r, n-1)$ and $f_2$ is obtained from $RM(r-1, n-1)$. Thus, you can think of $f_1$ as a polynomial necessarily including $x_1$ while $f_2$ necessarily misses it. Adding these two will have a $f_2$ zone as its free from interference from $f_1$ because half the time $f_1 = 0$ as $x_1 = 0$.]
So, our algorithm involves finding a vector $t_j^+$ and another vector $t_j^-$ each half the size of the original vector $t$. The $j^{th}$ entry in $t_j$ is decided by first splitting $t$ into two halves - $t_0$ and $t_1$. Next, it involves XORing the $j^{th}$ bits in the two halves together with the the minimum of the reliability information from the $j^{th}$ corresponding reliability entries on both sides.
You compute $j^{th}$ entry of the vector $t_j^-$ essentially using the same idea but different equations.
The idea behind these calculations, as revealed in the reference 10 cited in the paper, is that that the components of the received word are assumed to be statistically independent from each other. The received word consists of the codeword and an error. For a memoryless channel the errors in different components of the received word are statistically independent from each other. Different components of one codeword, however, are in general not independent from each other. Therefore, this assumption is only an approximation.
Why this approximation is good enough and why not try some other whacky thing for calculating $t_j^+$, as of now, beats me.
For instance, why do we not set $t_j^+ \leftarrow b_j^0 \oplus b_{j+1 \bmod N/2}, min(\rho_j^0,\rho_j^1)$.
Am working on it. Will write it up as soon as some more clarity dawns.
Till then have a good time
-Akash
Subscribe to:
Post Comments (Atom)
Soft decision decoding is quite Hard :P
ReplyDeleteYes, I feel the same way....however, it is quite interesting
ReplyDelete