EMIS/ELibM Electronic Journals

Outdated Archival Version

These pages are not updated anymore. They reflect the state of 20 August 2005. For the current production of this journal, please refer to http://www.math.psu.edu/era/.



% **************************************************************************
% * The TeX source for AMS journal articles is the publisher's TeX code    *
% * which may contain special commands defined for the AMS production      *
% * environment.  Therefore, it may not be possible to process these files *
% * through TeX without errors.  To display a typeset version of a journal *
% * article easily, we suggest that you either view the HTML version or    *
% * retrieve the article in DVI, PostScript, or PDF format.                *
% **************************************************************************
% Author Package file for use with AMS-LaTeX 1.2
%Modified June 4, 1996
% Date: 11-JUN-1996
%   \pagebreak: 1   \newpage: 0   \displaybreak: 0
%   \eject: 0   \break: 0   \allowbreak: 0
%   \clearpage: 0   \allowdisplaybreaks: 0
%   $$: 19   {eqnarray}: 0
%   \linebreak: 0   \newline: 0
%   \mag: 0   \mbox: 0   \input: 0
%   \baselineskip: 0   \rm: 0   \bf: 10   \it: 8
%   \hsize: 1   \vsize: 1
%   \hoffset: 0   \voffset: 0
%   \parindent: 0   \parskip: 0
%   \vfil: 0   \vfill: 0   \vskip: 0
%   \smallskip: 1   \medskip: 6   \bigskip: 9
%   \sl: 12   \def: 0   \let: 0   \renewcommand: 0
%   \tolerance: 0   \pretolerance: 0
%   \font: 0   \noindent: 0
%   ASCII 13 (Control-M Carriage return): 0
%   ASCII 10 (Control-J Linefeed): 0
%   ASCII 12 (Control-L Formfeed): 0
%   ASCII 0 (Control-@): 0
% Special characters: 0
%
%
%final form 5/27/96
%\controldates{16-JUL-1996,16-JUL-1996,16-JUL-1996,23-JUL-1996}
\documentstyle[newlfont]{era-l} %{article}%,draft
%\issueinfo{2}{1}{}{1996}

%\hsize=168true mm
%\vsize=200true mm

\newtheorem{lemma}{Lemma}
\newtheorem{lemonetxt}{Lemma 1}% \rom{(Salary List code of a simplex)}}
\renewcommand{\thelemonetxt}{}
\newtheorem{lemtwotxt}{Lemma 2}% \rom{(Salary List code of aball)}}
\renewcommand{\thelemtwotxt}{}
\newtheorem{corlem}{Corollary of Lemma 3}\renewcommand{\thecorlem}

\newtheorem{lemfivetxt}{Lemma 5}% \rom{(Arithmetical)}}
\renewcommand{\thelemfivetxt}{}
\newtheorem{theoremn}{Theorem}\renewcommand{\thetheoremn}{}
\newtheorem{corofthm}{Corollary of the Theorem}\renewcommand{\thecorofthm}

\theoremstyle{remark}
\newtheorem{remark}{Remark}

\newcommand{\code}{\operatorname{code}}
\begin{document}


\title{Compression and restoration of square integrable functions}
%\centerline{\bf \huge Compression and Restoration of }
%\centerline{\bf \huge Square Integrable Functions.}
%\bigskip

\author{Rafail Krichevskii}
\address{Sobolev Mathematical Institute, Novosibirsk, 
Russia}
\email{rekri@@math.nsk.su}

\author{Vladimir Potapov}
\address{Sobolev Mathematical Institute, Novosibirsk, Russia}
\email{rekri@@math.nsk.su}

%\issueinfo{2}{1}{}{1996}
\commby{Ingrid Daubechies}
\date{October 20, 1995, and, in revised form, May 6, 1996}
%\revdate{May 6, 1996}
\subjclass{Primary 94A11; Secondary 42C10}
\keywords{Wavelets, Haar functions, compression, computational complexity}

%\centerline{ \sl R.E.Krichevskii and V.N.Potapov }
%\smallskip

%\centerline{Sobolev Mathematical Institute, Novosibirsk, Russia }


%\bigskip
\begin{abstract}
We consider classes of smooth functions on $[0,1]$
with mean square norm. We present a wavelet-based method  for obtaining
approximate pointwise reconstruction of every function with nearly
minimal cost without substantially increasing  the amount
of data stored. In more detail: each function $f$ of a class is supplied
with a binary code of minimal (up to a constant factor) length, where  the
minimal length  equals the
  $\varepsilon$-entropy
of the class, $ \varepsilon > 0$. Given
that code of $f$ we can calculate $f$,  $\varepsilon$-precisely in $L_2$,
at any specific $N, N\geq 1,$ points of $[0,1]$ consuming
$O(1+\log^*((1/\varepsilon)^{(1/\alpha)}/N))$ operations per point.
  If the quantity $N$ of points is a constant,
 then we consume  $O(\log^*1/\varepsilon)$ operations per point.
If $N$ goes up to   the $\varepsilon$-entropy, then the per-point time
consumption goes down to a constant, which is less than the corresponding
constant in the fast algorithm of Mallat [11]. Since the iterated logarithm
$\log^*$ is a very slowly increasing function, we can say that our calculation
method is nearly optimal.
\end{abstract}

\maketitle

%\bigskip

\section{Introduction}
%\medskip

Mathematical models of  signal compression have attracted
 attention for a long time.  Many of those models have the
following form. There is a set $P$ with a norm $||\quad||$.
A set $M$ is an $\varepsilon$-net for $P$, $\varepsilon>0$, if
for any $x\in P$ there is an $m\in M$ such that $||x-m||\leq \varepsilon$.
The minimal $\log$-cardinality of $\varepsilon$-nets is the metric
$\varepsilon$-entropy $H_\varepsilon(P)$ of $P$, with  $\log$ here and in
the sequel  binary. Points of $M$ are encoded  by different binary numbers.
The code of $m$, $||x-m||\leq \varepsilon$, is a compressed representation
of $x$, with  $x$  recoverable from $m$ with $\varepsilon$-precision.
One wishes to achieve maximal compression, i.e., to make the length
of the codes as close to $H_\varepsilon(P)$ as possible, keeping the cost of
the encoding $(x\rightarrow m)$ and the decoding $(m\rightarrow x)$
reasonably small.

We shall restrict our consideration to the  $L_2$ norm and the classes
$P_\alpha$ of functions with square integrable $\alpha$-th
derivatives, $\alpha >0$. $P_\alpha$ is defined as the set of all
functions $f$ with domain $[0,1]$ such that
$\int_0^1 f(t) \, dt = 0, ||f^{(\alpha)}|| \leq 1$,
$f^{(\alpha)}  $ is the $\alpha$-th derivative in the sense of
Hermann Weyl, i.e.,
%$$
\[f^{(\alpha)}(t)= \sum_{n=-\infty}^{\infty}
c_n(2\pi in)^\alpha e^{2\pi int} ,\] %$$
and $c_n$ are the Fourier coefficients
of $f$.

There are many ways of compressing a function $f\in P_\alpha$
into a bit stream with order $O(H_\varepsilon(P_\alpha))$ and
$O(\varepsilon)$ distortion in $L_2$ (examples: transform coding
in the Fourier Domain, in the Legendre Polynomial Domain,
in the Wavelet Domain, differential coding of spline coefficients, etc.).
The goal of the present paper  is to find a   method  which allows the fastest
decompression.  By decompression  we mean the $\varepsilon$-precise
calculation of $f$ at   one or several points of $[0,1]$ from its code.
The code is kept intact allowing us to calculate repeatedly.

First of all we  briefly survey methods of fast computation. We count
operations in flops, an ideal unit of adds/multiples. The fast algorithm
of Pan, Reif and Tate \cite{12} enables one to compute $\varepsilon$-precisely
the values of a $J$-degree polynomial at $N $ points spending
$O((N + \log\frac{1}{\varepsilon})(\log\log\frac{1}{\varepsilon})^2
+ J\cdot\min(\log J, \log\frac{1}{\varepsilon}))$ flops.
Their algorithm is a modification of  Rokhlin's approach \cite{13}.

To compute the values of a  wavelet polynomial (a linear
combination of wavelets of levels below a number $\log J$) at $N$ points
one uses the fast algorithm of Mallat \cite{11}.  If $N=J$, then
Mallat's algorithm  consumes $O(N)$ flops. Suppose $N < J$.
Partition $[0,1]$ into $N$ subintervals of equal length and
pick up a point in each  of them. Then  Mallat's  algorithm may
consume $O(N(\log J - \log N  ))$ flops.
The computation at a single point requires  $O(J)$ flops for
polynomials or $O(\log J)$ flops for wavelet polynomials.

Many papers are devoted to  $\varepsilon$-entropy estimations.
Such estimations produce  compression methods.

The classes $P_\alpha$ were introduced by Kolmogorov and
Tikhomirov in \cite{8}. They have bounded  $\varepsilon$-entropy as
%$$
\[C\cdot (1/\varepsilon)^{(1/\alpha)} \leq
H_\varepsilon(P_\alpha)\leq
C\cdot (1/\varepsilon)^{(1/\alpha)}. \] %$$
Here and elsewhere the letter $C$ stands  for different
positive constants, for simplicity sake. Kolmogorov and Tikhomirov
first passed from $L_2$ to $l_2$ where
$P_\alpha$ became an ellipsoid. Retaining the Fourier coefficients below
the level  $J=\lceil (\frac{1}{\varepsilon})^{1/\alpha}) \rceil$
they $\varepsilon$-approximated the ellipsoid by a finite-dimensional one,
in which  they constructed a cubic lattice. It yielded the upper
bound; the lower bound was obtained by the volume argument.

Besov spaces in  $L_p$-norm  are discussed in \cite{5}. To
$\varepsilon$-represent a function from such a space the authors
used a linear combination of $J$ wavelets, $\log J =
O(\log\frac{1}{\varepsilon})$.  We think that it yields some bounds on
$\varepsilon$-entropy which may be compared  with the bounds in \cite{14}.


The size of the paper does not allow us to refer to all the
 literature on the data compression. We mention only several papers.


Our main result is the following.
We supply every function $f\in P_\alpha$   with  a binary word of length
$O((\frac{1}{\varepsilon})^{1/\alpha})$ bits,  which is minimal
up to a constant factor.

When the values of $f$ at $N$ points are needed, we can
 compute them with running time  $O(1+\log^*((1/\varepsilon)^{(1/\alpha)}/N))$
flops per
point. This running  time decreases from  $O(\log^*1/\varepsilon)$
 to a constant, as $N$ increases from a constant to the 
$\varepsilon$-entropy of the class.
Computing $f$, we   employ
a black box subroutine  of a smooth enough mother scaling function of
I. Daubechies \cite{2,3}. If $\alpha\leq 1$, then we do not need such a subroutine
and do not use multiplications. Thus, the computation  of $f$
 by  our method is  faster than
such a computation by means of trigonometric or even wavelet decompositions.
Since the degrees of $\varepsilon$-approximating polynomials are
$O((\frac{1}{\varepsilon})^{1/\alpha})$ for  some  $f\in P_\alpha$,
which is very high,  fast computation methods of
polynomials do not give good results here.

We are going to develop a short and quickly computable representation
of any function $f\in P_\alpha$, $\alpha > 0$. Namely, with 
$\varepsilon$-precision,  
$f$ is a sum of $\log^*1/\varepsilon$ addends. Each addend belongs to
a ball  lying in a finite-dimensional linear span of scaling functions. 
The total
$\varepsilon$-entropy of the balls equals the $\varepsilon$-entropy of
the class. To obtain such a representation,
we start with a wavelet
expansion of $f$. The projection of $P_\alpha$ on the linear span of
wavelets of one and the same level  is called a layer.
$O(\log \frac {1}{\varepsilon})$ layers are required to 
$\varepsilon$-approximate
$P_\alpha$; consequently, it takes $ O(\log \frac {1}{\varepsilon})$
flops to $\varepsilon$-compute $f$ at a single point.
It is less than $O(\frac {1}{\varepsilon})$    flops consumed by a Fourier
expansion, but we want to compute still faster.  The merging of all
layers in a single meta-layer gives the  ultimate speed of $C$ flops
per point. We pay for such a speed by the increase of the amount of data
stored: the entropy of the meta-layer
becomes $\log\frac{1}{\varepsilon}$-times bigger
than $H_\varepsilon(P_\alpha)$. To keep the total entropy within
$O(H_\varepsilon(P_\alpha))$, we merge (fine to coarse) first two
layers into a meta-layer,  then four, and so on in nearly explosive way
$C^{C^{...}}$, $C>1$; see  Lemma 5.

To show that such merging gives the required entropy and speed we
 estimate in Section 4 the radius of a ball containing a layer.
We find in Section 5 how that radius  responds to merging. We build up
in Section 2 an algorithm of fast coding and decoding of balls
within given precision (Salary List  algorithm).

A function belonging to a meta-layer is a linear combination of
scaling functions of the same level. On the other hand, it belongs to a ball.
  The Salary List  algorithm gives
a short code to such a combination allowing us to find the
coefficients quickly.
We use  finitely many coefficients of every level to compute $f$ at  any single
point $x\in [0,1] $. It yields the Theorem (Section 5). Combining the Theorem
with the fast method of Mallat allows us to compute $f$ at any $N$ points.

We believe that the results may be generalized to other norms and classes.


%\bigskip
\section {Salary List: fast enumeration of integer-valued vectors in
simplexes and balls}

A $p$-dimensional simplex with side $N$ is a set of vectors with nonnegative
coordinates whose sum does not exceed $N$. The coordinates may be thought of
as  salaries of employees of a firm,  $N$ being the total salary. Enumerating
each integer-valued vector by the concatenation of $\lceil N+1 \rceil$-digit numbers representing
the coordinates makes the length of the code equal to
$p\cdot \lceil N+1\rceil$, which is not optimal, since the volume of the simplex
is $N^p/p!$. The length of the code is lowerbounded by $p\cdot \log N/p$. We
present a fast algorithm attaining that bound.


%{\sl Lemma 1} (Salary List Code of Simplex). 
\begin{lemonetxt}[Salary List code of a simplex] Every integer-valued vector
of a $p$-\linebreak dimensional simplex with side $N$ 
can be given a binary word of length \linebreak
$O(p\cdot(1+\log N/p))$. Each coordinate of a vector can be found via its
word  in \linebreak $O(\log N\cdot \log\log p+\log p)$  operations over bits.
\end{lemonetxt}
%{\it Proof:} 
\begin{pf}
We define first a preliminary code of a simplex. The dimension
is supposed to equal $2^r , r\geq 0$; the sum of the coordinates equals $N$;
$l(x)$ is the length of the binary notation of a nonnegative integer $x$.
Obviously,
\begin {equation}
    l(x)\leq \log (x+1)+1.
\end{equation}
We encode an integer-valued vector $x=(x_1,\dots,x_p)$  by an $r$-level labelled
tree. The label of its root (first level) is the $l(N)$-digit number
$N_0=x_1+\cdots+x_{p/2}$. The left son is labelled by the $l(N_0)$-digit number
$N_{00}=x_1+\cdots+x_{p/4}$, and the right one by the $l(N_1)$-digit number
$N_{10}=x_{p/2 +1}+\cdots+x_{3p/4}$ . Proceeding in this 
way we build a labelled
tree. The total length of labels on its $i$-th level, by (1) and the
arithmetic mean--geometric mean inequality, does not exceed
%$$
\[2^i+ 2^{i-1}\cdot \log (1+ N/2^{i-1}).\] %$$
Summing it over $1\leq i \leq r$ and using the formulas for the sums of
geometric progressions and their derivatives, we upperbound the total length
of all labels as $O(p\cdot \log \frac {4(p+2N)}{p}).$
A vector $x$ can be restored through its tree. The tree itself can be
restored through the concatenation (without commas) of all its labels.
That concatenation is the preliminary code.

We subdivide the coordinates into $p/\log p$  equal-dimensional subvectors.
The conclusive Salary List code consists, first, of the number
$x_1+\cdots+x_p ,$ second, the concatenation of the sums of the coordinates of
those subvectors, and third, the preliminary codes of the subvectors. Such a
subdividing allows us to accelerate the decoding by slightly increasing
the length of the code.
\renewcommand{\qed}{}\end{pf}
%{\sl Lemma 2} (Salary List Code of Ball). 
\begin{lemtwotxt}[Salary List code of a ball] A $p$-dimensional ball of radius
$R$ can be $\varepsilon$-precisely encoded by binary words of length
$O(p\cdot(1+\log R/\varepsilon)).$ The decoding time per coordinate is
$O(\log Rp/\varepsilon\cdot\log\log p)$  operations over bits.
\end{lemtwotxt}
%{\it Proof:} 
\begin{pf}
Develop in the ball a cubic $\varepsilon /\sqrt p$-lattice. The
enumeration of its nodes is reduced to the enumeration of integer-valued
vectors of a simplex by the Cauchy-Schwarz inequality
%$$
\[|x_1|+\cdots+|x_p| \leq \sqrt {x_1^2+\cdots+x_p^2}\cdot \sqrt p.\] %$$
\renewcommand{\qed}{}\end{pf}
%\pagebreak
%\bigskip
\section{Wavelet representation of the classes $P_\alpha$}
%\medskip

Following \cite{2,3} we introduce some basic notions of the wavelet theory.
There is a function $\psi$ called the mother wavelet such that its dilations
and translations $\psi_{j,k}$
%$$
\[\psi_{j,k} (x) = \sqrt{2^j}\psi (2^jx - k),\qquad k,j = 0,\pm 1,\dots\ .\] %$$
constitute an orthonormal basis of $L_2(-\infty,+\infty)$.
We say that $\psi_{j,k}$   is a $j$-level wavelet function.
The linear span of $j$-level wavelet functions is denoted by
$W_j$; the projection onto $W_j$ is denoted by $Q_j$.
The projection $Q_jP_\alpha$ of the class $P_\alpha$ on $W_j$
is called the $j$-th layer of $P_\alpha$  generated by the
mother $\psi$, $j=0,\pm 1$. The layers are orthogonal to each
other.

For any mother $\psi(x)$ there is a function $\phi(x)$ called the mother
scaling function. Its dilations and translations are denoted by
$\phi_{j,k}$. We say that   $\phi_{j,k}$ is a $j$-level scaling function;
the linear span of $j$-level scaling functions is denoted by $V_j$.
The equality $V_j\oplus W_j = V_{j+1}$ holds.

Hence, we have

% {\sl Remark 1:}
\begin{remark}
The $j$-th level layer belongs to any span $V_k$, $k\geq j+1$.

I. Daubechies has shown in \cite{2,3} 
that for any $r\geq 1$ there is an $r$-times
differentiable mother scaling function $\phi(x)$. The support of $\phi$
is finite.
The corresponding mother wavelet $\psi(x)$ has also a finite support.
It is also $r$-times differentiable. Moreover, its first $r$ moments vanish.
\end{remark}
We have

% {\sl Remark 2:  }
\begin{remark}
Any point $x\in [0,1]$ belongs to supports of a finite number of wavelets and
scaling functions of each level.
For any $j=0,\pm 1,\dots$ there are at most $2^j + C$ functions
$\psi_{j,k}$ whose supports have nonempty intersections with $[0,1]$.
Hence, the dimension of the $j$-th level layer is not
greater than  $2^j + C$, $j=0,1,\dots$ .

The well-known Haar basis is a special case of wavelets.
Indicators of binary
rational  intervals are scaling functions
of the Haar  basis.
\end{remark}
\setcounter{lemma}{2}
\begin{lemma} %{\sl Lemma 3.}
For any class $P_\alpha$ of smooth functions there  is a mother wavelet $\psi$
such that the $j$-th level layer generated by $\psi$ belongs to a ball
whose radius is not greater than $C\cdot 2^{-j\alpha}$ and whose
dimension is not greater than $C + 2^j$, where $C$ depends on $\psi$ only.
For $j=0,1,2,\dots $,
%$$
\[
\sum_{k=1}^{2^j+C} c_{j,k}^2  \leq    C\cdot 2^{-2j\alpha}, \] %$$
%$$
\[ c_{j,k} = (f,\psi_{j,k}).\] %$$
\end{lemma}
%{\it Proof:}
\begin{pf}
Consider a mother wavelet $\psi (x)$ having a finite support
contained in an interval $(0,S)$ and  $n$ vanishing moments,
where $n= \lfloor \alpha \rfloor$ if $\alpha$ is not an integer, and
$n = \alpha - 1$  if $\alpha$ is an integer.

Taking the Taylor expansion for $f(\frac{y+k}{2^j})$ with
integral remainder term at the point $\frac{k}{2^j}$,
we obtain:
\begin{equation}
c_{j,k} = \frac{2^{-j/2}}{\Gamma(n)}\int_0^S \bigg(\psi(y)
\int_{k/{2^j}}^{(y+k)/{2^j}} \Big(f^{(n)}(\tau)-f^{(n)}\Big(\frac{k}
{2^j}\Big)\Big)\Big(\frac{y+k}{2^j} - \tau\Big)^{n-1} \,d\tau\bigg)\,dy.
\end{equation}


Since  $f\in P_\alpha$, we have
\begin{equation}
f^{(n)}(\tau)-f^{(n)}\Big(\frac{k}{2^j}\Big)= 
\frac{1}{\Gamma(\alpha - n)}\int_{k/{2^j}}^{\tau} 
f^{(\alpha)}(t)( \tau -t)^{\alpha-n-1} \,dt.
\end{equation}
Substitute (3) into (2), change the order of integration and
use the definition of $\Gamma$-function:
\begin{equation}
c_{j,k} = \frac{1}{\Gamma(\alpha)}\int^{(S+k)/{2^j}}_{k/{2^j}} \psi_{j,k}
\int_{k/{2^j}}^x f^{(\alpha)}(t)(x -t)^{\alpha -1} \,dt \,dx.
\end{equation}
Square (4) and apply the Cauchy-Schwarz inequality. It yields
\begin{equation}
c_{j,k}^2 = \frac{2^j (\sup|\psi|)^2}{(2\alpha+1)\Gamma^2(\alpha+1)}
\Big(\frac{S}{2^j}\Big)^{2\alpha +1}\int^{(S+k)/{2^j}}_{k/{2^j}}
(f^{(\alpha)}(t))^2 \,dt.
\end{equation}
 Since $f\in P^\alpha$, we have the statement of the lemma.
\renewcommand{\qed}{}\end{pf}
%{\sl Corollary of Lemma 3.}
\begin{corlem}
Let $\alpha > 0$, $P_\alpha$ be a class of smooth functions,
$\varepsilon > 0$, and $J= \frac{1}{\alpha}\log\frac{1}{\varepsilon} + C$.
Then there is a mother wavelet such that $P_\alpha$ is
$\varepsilon$-approximated by the direct sum of layers $Q_jP_\alpha$,
$-\infty < j \leq J$.
\end{corlem}
%\bigskip

\section{Layer merging}
%\medskip

Let $a 0$ the elements   of the
meta-layer $\ell (a,b]$ can be encoded by binary words of length
$O(2^b\log(2^{-a\alpha}/\varepsilon))$. Those elements can be
restored from their codes with $\varepsilon$-precision in
$O(\log(2^{b-a\alpha}/\varepsilon)\log b)$ operations over bits.
\end{lemma}
%{\it Proof:} 
\begin{pf}By Lemma 3, $\ell (a,b]$ belongs to a ball in $V_{b+1}$
of radius %$$
\[\sqrt{C\sum_{j=a-1}^{b} 2^{-2j\alpha}},\] %$$
 which is not greater than
$C\cdot 2^{-a\alpha}$. The dimension of the ball is not greater than
$2^{b+1} + C$ by  Remark 2. Now the lemma follows from Lemma 2.
\renewcommand{\qed}{}\end{pf}

%\bigskip
\section {Approximating $P_\alpha$ by direct sums of meta-layers }
%\medskip
\begin{lemfivetxt}[Arithmetical] %{\sl Lemma 5}(Arithmetical)
Let $\xi_0,\xi_1,\dots$  be a sequence defined by
%$$
\[\xi_0=0,\dots,\xi_{i+1}=2^{\xi_i-i+1},i=1,2,\dots\ .\] %$$
Then the sequence is monotonic, and for any $R>0$  there is a number $k$ such that $\xi_k \geq R$ and
$k \leq C\cdot \log^*R $.
\end{lemfivetxt}
\setcounter{lemma}{5}
\begin{lemma}
%{\sl Lemma 6} 
For any class $P_\alpha$  and $\varepsilon >0$  there is
a mother wavelet such that $P_\alpha$
is $\varepsilon$-approximated by the direct sum of
$k+1=O(\log^* 1/\varepsilon)$  meta-layers. The $\varepsilon$-entropy of 
the sum equals $O(H_\varepsilon(P_\alpha))$. Each function of
the sum can be $\varepsilon$-restored through its Salary List code of
length $O(H_\varepsilon(P_\alpha)) $  in $C$ flops per coordinate (in a basis of
scaling functions).
\end{lemma}
%{\it Proof:} 
\begin{pf}Let
\begin{equation}
J=1/\alpha\cdot\log 1/\varepsilon +C.
\end{equation}
Take the sequence defined by Lemma 5; let $k$ be the minimal number for which
$\xi_k\geq J+1.$ According to that lemma,
\begin{equation}
k\leq C\cdot \log^* 1/\varepsilon.
\end{equation}
Define a monotonic sequence
\begin{equation}
m_l=J-\xi_{k-l}
\end{equation}
and a sequence of meta-layers
$\ell (-\infty,m_0],\ell(m_0,m_1],\dots,\ell(m_{k-1},m_k]$. By the Corollary of Lemma 3,
$P_\alpha$ is $\varepsilon$-approximated by the direct sum of those meta-layers.
For each $f$ from the class there is a function $g$ in the sum such that
$||f-g||<\varepsilon.$  The function $g$ is the sum of its projections on the
meta-layers:
\begin{equation}
g=g_0+\cdots+g_k, \quad g_l\in \ell(m_{l-1},m_l].
\end{equation}
Let
\begin{equation}
\varepsilon_l=\frac{\varepsilon\cdot 2^{m_l}}{2^{J+2}},\qquad l=0,1,\dots\ .
\end{equation}

Each element of the direct sum is encoded by the concatenation of codes of its
projections.
We encode the $l$-th component  (9) with $\varepsilon_l$-precision.
Then the total decoding error is $\sqrt{\sum_{l=0}^k{\varepsilon_l^k}}$, which
by (8) and (10) is not greater than $\varepsilon$.

The component $g_0$  belongs to a finite-dimensional ball of radius 1, 
so the length
of its $\varepsilon_0$-code is $O(\log 1/\varepsilon)$  by Lemma 2.
 For the total length of the  code of $g$, by Lemma 4, (6), 
(10), and the monotonicity  of $m_l$, we have:
\begin{equation}
|\code \,g|=C\cdot\sum_{l=1}^k{2^{m_l}
\cdot\log \frac{2^{J(1+\alpha)+2}}{2^{m_{l-1}\cdot(1+\alpha)}} } +|\code\, g_0|
\end{equation}

Substitute (8) into (11):
\begin{equation}
|\code\, g|=C\cdot(\alpha+1)\cdot 2^J\cdot \sum_{l=1}^k{\xi_{l+1}/2^{\xi_l}}.
\end{equation}
Use in (12) the definition of the sequence $\xi_l$ :
\begin{equation}
|\code\, g|=C\cdot(\alpha+1)\cdot 2^J \sum_{l=1 }^k{2^{1-l}}.
\end{equation}

From (6) and (13) we have:
%$$
\[|\code\, g|=C\cdot H_\varepsilon(P_\alpha).\] %$$
To decode a coordinate of a projection we spend, by Lemma 4, at most
%$$
\[C\cdot(\log 2^J/\varepsilon)(\log J)=C\cdot\log 1/
\varepsilon\cdot\log\log 1/\varepsilon \] %%$$
operations over bits, which is equivalent to $C$ flops by the 
Sh\"{o}nhage-Strassen algorithm.
The code of $g$ falls into the codes of its projections. Their lengths are known.
The lemma is proved.
\renewcommand{\qed}{}\end{pf}
%{\sl Theorem.} 

\begin{theoremn}Any function $f\in P_\alpha$ can be given a binary
code of minimal, up to a constant factor, length $O((\frac{1}{\varepsilon} )^{1/\alpha})$
through which it can be restored at any given  individual point with
$\varepsilon$-precision in nearly minimal time $\log^* \frac{1}{\varepsilon}$
flops.
A precomputed table of a smooth mother scaling function is entered
$\log^* \frac{1}{\varepsilon}$ times. For $\alpha\leq 1$ only adds are
performed, and the table is not needed.
\end{theoremn}
%{\it Proof:} 
\begin{pf} By Remark 2, any point $x\in [0,1]$ belongs to a
finite number of scaling functions of each level.
Now the Theorem follows Lemma 6.
\renewcommand{\qed}{}\end{pf}

\begin{corofthm}%{\sl Corollary of Theorem}
 Any function $f\in P_\alpha$ can be given a binary
code of minimal (up to a constant factor) length
$O((\frac{1}{\varepsilon} )^{1/\alpha})$
through which it can be restored at any given  $N$  points with
$\varepsilon$-precision. The time consumption is 
$O(1+\log^*((1/\varepsilon)^{(1/\alpha)}/N))$ flops per point, which is a
constant $C_1 $ for $N=
O((1/\varepsilon)^{1/\alpha})$. The constant $C_1 $ is
less than the corresponding constant in
Mallat's algorithm.
\end{corofthm}
%{\it Proof:} 
\begin{pf}The corollary is obtained by a straightforward combination of our
Theorem with Mallat's algorithm \cite{11}. First we apply Mallat's method taking
into account expansion (9). As soon as the support of a scaling function
contains only one given point, we start applying our Theorem.
\renewcommand{\qed}{}\end{pf} %\bigskip

\section{Acknowledgement}
%\medskip

The authors are very much indebted to the anonymous referee for
the thorough reading of the manuscript and many deep remarks.

%\bigskip

%\section{References}
%\medskip

\begin{thebibliography}{20}

\bibitem[1]{1} M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies,  
\textit{Image coding using wavelet transforms},
%{\it 
IEEE Trans. Image Processing, %{\bf 
\textbf{1} (1992), 205--220.


\bibitem[2]{2} I. Daubechies, \textit{ Orthonormal bases of
compactly supported wavelets},  Comm. Pure Appl. Math.,
%{\bf 
\textbf{41}  (1988), 909--996.
 \MR{90m:42039}

\bibitem[3]{3} \bysame, \textit{Ten lectures on wavelets},
CBMS Lecture Notes no. 61, SIAM, Philadelphia, PA, 1990.
 \MR{93e:42045}

\bibitem[4]{4} R. A. DeVore, B. Jawerth,  and  B. J. Lucier,
\textit{Surface compression}, 
Computer-Aided Geometric Design, %{\bf 
\textbf{9} (1992), 219--239.
 \MR{93i:65029} 

\bibitem[5]{5} \bysame, 
\textit{Image compression through wavelet transform coding},
IEEE Trans. Info. Theory %{\bf 
\textbf{38} (1992),  719--746.
\CMP{92:11}

\bibitem[6]{6}  R. A. DeVore, B. Jawerth,  and V. Popov,
\textit{Compression
of wavelet decompositions}, Amer. J. Math. %{\bf 
\textbf{114} (1992), 737--785.
 \MR{94a:42045}

\bibitem[7]{7}  D. L. Donoho, \textit{Unconditional bases are 
optimal bases for
data compression and statistical estimation}, Applied
and Computational Harmonic Analysis %{\bf 
\textbf{1} (1993), 100--115.
 \MR{94j:94011} 


\bibitem[8]{8} A. N. Kolmogorov and V. M. Tikhomirov, 
\textit{$\varepsilon$-entropy
and $\varepsilon$-capacity of sets in function spaces},
Uspekhi Mat. Nauk  %{\bf 
\textbf{14}(1959), no. 2, 3--86. (Russian)
\MR{22:2890} 

\bibitem[9]{9} R. Krichevsky, \textit{Universal compression and retrieval},
 Kluver
Academic Publishers, 1995.
 \MR{95g:94006}

\bibitem[10]{10} B. J. Lucier, \textit{Wavelets and image compression},
Mathematical Methods in CAGD and Image Processing
(T. Lyche and L.L. Schumaker, eds.), Academic Press, Boston, MA,
1992, pp. 1--10.
\MR{93d:65022}

\bibitem[11]{11} S. Mallat, \textit{Multiresolution approximation and wavelet
 orthonormal bases of $L^2$}, Trans. Amer. Math. Soc. \textbf{315} (1989),
 69--87.
\MR{90e:42046}

\bibitem[12]{12} V. Pan, J. Reif, and S. Tate, \textit{The power of 
combining the
techniques of algebraic and numerical computing: improved approximate
multipoint polynomial evaluation and improved multipole algorithms}
(Proc. of 33 Annual Symp. on Found. of Comp. Sc.),
IEEE Comp. Soc. Press, Pittsburgh, 1992.

\bibitem[13]{13} V. Rokhlin, \textit{A fast algorithm for the discrete
Laplace transformation}, J. of Complexity \textbf{4} (1988), 12--32.
 \MR{89b:65309}

\bibitem[14]{14} H. Triebel, \textit{Approximation numbers and entropy
numbers of embeddings of fractional Besov-Sobolev spaces in Orlicz spaces},
Proc. London Math. Soc. %{\bf 
\textbf{66} (1993), 589--618.
 \MR{94g:46042} 

\end{thebibliography}

\end{document}