Adaptive polynomial filters

Mathews, V. John

Adaptive polynomial filters

Download File | | Reference URL

Update Item Information

Publication Type	journal article
School or College	College of Engineering
Department	Electrical & Computer Engineering
Creator	Mathews, V. John
Title	Adaptive polynomial filters
Date	1991
Description	While linear filter are useful in a large number of applications and relatively simple from conceptual and implementational view points. there are many practical situations that require nonlinear processing of the signals involved. This article explains adaptive nonlinear filters equipped with polynomial models of nonlinearity. The polynomial systems considered are those nonlinear systems whose output signals can be related to the input signals through a truncated Volterra series expansion, or a recursive nonlinear difference equation. The Volterra series expansion can model a large class of nonlinear systems and is attractive in filtering applications because the expansion is a linear combination of nonlinear functions of the input signal. The basic ideas behind the development of gradient and recursive least-squares adaptive Volterra filters are first discussed. followed by adaptive algorithms using system models involving recursive nonlinear difference equations. Such systems are attractive because they may be able to approximate many nonlinear systems with great parsimony in the use pf coefficients. Also discussed are current research trends and new results and problem areas associated with these nonlinear filters. A lattice structure for polynomial models is also described.
Type	Text
Publisher	Institute of Electrical and Electronics Engineers (IEEE)
Volume	8
Issue	3
First Page	10
Last Page	26
Language	eng
Bibliographic Citation	Mathews, V. J. (1991). Adaptive polynomial filters. IEEE Signal Processing Magazine, 8(3), 10-26. July.
Rights Management	© 1991 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Format Medium	application/pdf
Format Extent	1,482,249 bytes
Identifier	ir-main,15075
ARK	ark:/87278/s6pc3m3f
Setname	ir_uspace
ID	707296
OCR Text	Show Adaptive Polynomial Filters V. JOHN MATHEWS While linear filters are useful in a large number of applications and relatively simple from conceptual and implementational view points, there are many practical situations that require nonlinear processing of the signals involved. This article explains adaptive nonlinear filters equipped with polynomial models of nonlinearity. The polynomial systems considered are those nonlinear systems whose output signals can be related to the input signals through a truncated Volterra series expansion, or a recursive nonlinear difference equation. The Volterra series expansion can model a large class of nonlinear systems and is attractive in adaptive filtering applications because the expansion is a linear combination of nonlinear functions of the input signal. The basic ideas behind the development of gradient and recursive least-squares adaptive Volterra filters are first discussed, followed by adaptive algorithms using system models involving recursive nonlinear difference equations. Such systems are attractive because they may be able to approximate many nonlinear systems with great parsimony in the use of coefficients. Also discussed are current research trends and new results and problem areas associated with these nonlinear filters. A lattice structure for polynomial models is also described. Linear filters have played a very crucial role in the development of various signal processing techniques. The obvious advantage of linear filters is their inherent simplicity. Design, analysis, and implementation of such filters are relatively straightforward tasks in many applications. However, there are several situations in which the performance of linear filters is unacceptable. A simple but highly pervasive type of nonlinearity is the saturation-type nonlinearity. Trying to identify these types of systems using linear models can often give misleading results. Another situation where nonlinear models will do well when linear models will fail miserably is that of trying to relate two signals with nonoverlapping spectral components. When confronted with a nonlinear systems problem, many engineers shy away from the situation (in the words of Rugh [Ru81], "hoping that the problem will go away") mainly because the solutions are often difficult from an analytical and/or computational point of view. Moreover, the rich variety of highly developed tools available for solving linear systems engineering problems are just not there when it comes to most nonlinear systems problems. The difficulties mentioned above are much more magnified in the case of adaptive nonlinear systems. The purpose of this paper is to give the reader an introduction to adaptive non1053- 5888/91/0700-0010S1.00 © 1991 IEEE IEEE SP MAGAZINE Photo credit: FPGlinear systems. Without going into great mathematical detail, this paper will discuss two common models of nonlinearity employed in adaptive filtering applications and some adaptive filter structures that evolve from the use of these models. System analysis using nonlinear structures has several applications. High-speed communications channels often need nonlinear equalizers for acceptable performance. Although channel equalization using linear, tap delay line structures is adequate in many applications, there are several other situations when they will not work at all. For example, Lucky [Lu75] has conjectured that error probability performance of data transmission systems operating at rates better than 4800 bits/s is due almost entirely to nonlinear distortion. In telephone transmission, nonlinearities arise principally from inaccuracies in signal companding. In digital satellite links, the satellite amplifiers are usually driven to near the saturation point and they exhibit highly nonlinear characteristics. Several researchers have used Volterra series representation [Sa83a, Sa83b, Sc80, Sc81, Ru81] of nonlinear systems to implement nonlinear channel equalizers [Be83, Be85, Be87, Bi84a, Fa78]. Other applications of nonlinear models and filtering in communication problems include echo cancellation [Ag82, Ca85, Si84, Sm88, Th71], performance analysis of data transmission systems [Be76, Be79, Ki83, Ma85], adaptive noise cancellation [C08O, St85], and detection of nonlinear functions of Gaussian processes [Ke85]. Nonlinear filters are very useful in modeling biological phenomena [Hu86, K086, Ma78], myoelectric signal processing [Ja84], characterization of semiconductor devices [Ja77, Na67, Na70, Pr75, Re84], image processing [Ra87, Ts88], modeling drift oscillations in random seas [Ko83b], and several other areas. Unlike the case of linear systems which are completely characterized by the system's unit impulse response function, it is impossible to find a unified framework for describing arbitrary nonlinear systems. Consequently, the researchers working on nonlinear filters are forced to restrict themselves to certain nonlinear system models that are less general. Nonlinear filters developed using such models include order statistics filters [Bo83b, Le85, No82], homomorphic filters [Op68], morphological filters [Ma87a, Ma87b], and filters based on Volterra and other polynomial descriptions of the nonlinearities involved. Order statistics filters are attractive because of their robustness and computational simplicity. As the name suggests, they are based on the order statistics (i.e., the location of any given data sample in a rearrangement of the samples under consideration in the ascending or descending order of magnitude) of the input signal to the filter. A very widely used order statistic filter is the median filter. Such filters have good edge preserving properties and are very useful in removing additive impulse noise (in general, noise belonging to long-tailed distributions) from the input signals, and have found applications especially in image processing. Homomorphic filters are among the oldest types of nonlinear filters and have applications in image enhancement, seismic signal processing, and removal of multiplicative noise from input signals. Models of human visual systems based on homomorphic filters have been extensively used in image coding applications [St72]. Morphological filters utilize geometric features of the input signals and are employed in applications involving shape recognition, edge detection, and others. A good description of time- invariant nonlinear filters belonging to all of the above classes may be found in [Pi90a]. In this paper, we will concentrate on polynomial models of nonlinearity. Such models are more general than most of the other models that were discussed above. Two specific cases will be considered in some detail - adaptive filters employing truncated Volterra series representation of nonlinear systems and those using recursive nonlinear difference equations to relate the input and output signals of the system. Even though it is possible to treat the truncated Volterra series representation as a special case of the recursive nonlinear system representation and consider a unified framework for polynomial system representations, we will discuss the two cases separately. The Volterra system model is extremely popular in adaptive nonlinear filtering and has developed an identity of its own in the last few years. The theory of adaptive nonlinear filters employing nonlinear feedback models, on the other hand, is very much in its infancy; and while such systems are very attractive from an implementational point of view, there are several problems for which effective solutions have not yet been found. Discussing the two cases separately will enable us to treat such problems in a better manner. Adaptive order statistic filters are available [Pa88, Pi90b], but we will not discuss them here. VOLTERRA SERIES EXPANSION FOR NONLINEAR SYSTEMS Let x[n] and y[n] represent the input and output signals, respectively, of a discrete-time and causal nonlinear system. The Volterra series expansion for y(n) using x[n] is given by [Sa83a, Sa83b, Sc80, Sc81, Ru81) y[n] = ho + ^ hilmi] x(n-mil (1) mi=0 m2] x(n-mi] x[n-m2] + ... m\=0 m.2-0 +£ ^...^hp[mi, m2,...,m.p\x{n-mi]x{n-m2]... x[n-mp] mi=0 rrt2=0 nip=0 +...) In (1), hplmj, m2, ..., mp] is known as the p-th order Volterra kernel of the system. Without any loss of generality, one can assume that the Volterra kernels are symmetric, i.e., h [mj, m2, .... nip] is left unchanged for any of the possible p! permutations of the indices mj, m2, ..., m . We will not delve deeply into the questions JULY 1991 IEEE SP MAGAZINE Output amplitude A ■j-----^ Input amplitude Output amplitude A (b) Input amplitude Output amplitude A Input amplitude y[n] = A sign x[n] systems involving this type of nonlinearity. Even though clearly not applicable in all situations, Volterra system models have been successfully employed in a wide variety of applications, and such models continue to be popular with researchers in this area. Among the early works on nonlinear system analysis is a very important contribution by Wiener [Wi58]. His analysis technique involved white Gaussian input signals and used "G-functionals" to characterize nonlinear system behavior. Following his work, several researchers have employed Volterra series expansion and related representations for estimation and time-invariant nonlinear system identification [Ba63, Ba64, Bo83a, Br70, Ew80, Ey63, Fa80, Ko84a, Ko84b, La81, Th84]. Two recent books [Ru81, Sc80] describe the theory of nonlinear system representation and parameter estimation using Volterra series expansions. The review articles [Bi80, Bi84c, Sc81] also detail some of the work done in (nonadaptive) estimation of nonlinear system parameters using Volterra series representation. Since an infinite series expansion like (1) is not useful in filtering applications, one must work with truncated Volterra series expansions of the form (see Fig. 2) h, [0] x[n] Fig. 1. Almost all physical systems exhibit some type of saturation effects. The figure shows three types of saturation nonlinearities for memoryless systems. A convergent Taylor series expansion exists for all of the real axis only for the nonlinearity depicted in (b). Fortunately, the saturation effects in a wide class of memoryless physical systems can be modeled adequately using the input-output relationship as shown in (b). The limitations and advantages associated with modeling dynamic nonlinear systems using Volterra series expansions in the input signals is similar to those of Taylor series expansions for memoryless nonlinearities. of convergence and uniqueness of Volterra series expansions of nonlinear systems. The interested reader may refer to [Bo85, Br76, Le78, Sa83c]. One can think of the Volterra series expansion as a Taylor series expansion with memory. The limitations of the Volterra series expansion are similar to those of the Taylor series expansion - both expansions do not do well when there are discontinuities in the system description. As an example, consider a memoryless nonlinear system described by (see Fig. 1) (2) where sign!, denotes the signum function of (•). There is no convergent Taylor series expansion for the system in (2) about x[n] = 0, and it is straightforward to infer that no convergent Volterra series expansion exists for Fig. 2. A truncated Volterra system of order P= 2 and N - 1 = 2 delay elements. Note that this system is linear in the input signal to each coefficient. This fact highly simplifies the design problems involving Volterra series representations. On the other hand, even for moderately large values of N and P. the number of coefficients becomes very large. Consequently, the truncated Volterra series representation is most useful in applications where the values of N and P are relatively small. IEEE SP MAGAZINE JULY 1991 response signal d[n] using a second-order truncated Volterra series expansion in the input signal x[n] as Fig. 3. A block diagram of the adaptive Volterra filter. d[nl is the desired response signal and x[nl is the input to the adaptive filter. &[n] is an estimate of djnl and is computed as a truncated Volterra series expansion in x[n], The objective, as in most adaptive filtering problems, is to choose the coefficients of the adaptive filter so that an appropriate convex function of the error signal (e[nj) is minimized. The adaptation algorithm depends on the choice of the above cost Junction. Among the most commonly used algorithms are the least mean square (LMS) algorithm and its variations. Recursive least squares (RLS) algorithms for adaptive Volterra filtering are at least an order of magnitude more complex than LMS-type algorithms. &[n] = Yj [n-mi] (4) rni=0 N-1 IV-1 + 1 I fialmi. m2: n] x[n-mi] x[n-m2] mi=0 iri2=mi fi,[m,: n] and m2; nl in (4) are the adaptive filter coefficients that are iteratively updated at each time so as to minimize some convex function of the error signal defined as e[n] = d\n] - c2[n] (5) y[n] = X 4n-mil m,=0 (3) N-l IV-1 + E X telnn.rrd x[n-mi] x[n-m2] m2=0 mi=0 N-l A/-1 + ... +X---X hp [mi, m2. .... rrip] x[n-mi]... x[n-mpl mp=0 mi=0 (h0 can often be estimated outside the basic adaptive filter structure. Therefore, we will, without loss of generality, assume that h0 = 0.) Note that there are OfN13) coefficients in this polynomial expansion (i.e., the number of coefficients is proportional to Np). One big disadvantage for the Volterra system model as in (3) is that the complexity of implementing filters using this model can be very large even for moderately large values of N and P. Consequently, most of the practical applications of systems employing Volterra series expansions involve low-order models. Later on, we will consider a model that is more parsimonious in the number of coefficients. ADAPTIVE FILTERS USING TRUNCATED VOLTERRA SERIES EXPANSIONS Figure 3 shows the block diagram of an adaptive Volterra filter. For simplicity, let us consider a second order (P = 2) Volterra series expansion. The adaptive filter in this case would try to estimate the desired What makes the derivation of adaptive Volterra filters relatively straightforward is the fact that the error signal can be written as a linear combination of the input signal to each filter coefficient. (In the case of the second-order Volterra filter, the relevant signals are x[n], x[n-l], .... x[n-N+l], x2[n], x\|n]x[n-l], .... x[n]x[n-N+l], .... x2[n-N+l].) This fact also makes the theoretical performance analysis of such filters a relatively straightforward extension of the linear filtering case. The LMS adaptive filter [Ha86] updates the coefficients at each time using a steepest descent algorithm that tries to minimize e2[n] at each time. The update equations for the second order Volterra filter can be easily shown to be [C08O, Ko85] hi [mr, n+1] = hi [mi: n] \|ii de [n]_ 2 3hi [mi: n] hi [mi: n] + \|aie[n] x[n-mi] and (6) , , . , H2 9e2[n]__ h2 [mi, m2: n+1] = h2[m!, m2: n[ - - ^ n] = h2[mi, m2; n] + \|a2e[n] x[n-mi] x[n-m2] (7) where n, and \|i2 are small positive constants that control the speed of convergence and the steady- state /tracking properties of the filter. For more general cases, similar update equations can be easily derived. Several variations of the LMS algorithm are also available. Adaptive Volterra filers with time-varying convergence parameters are presented in [Si87]. The adaptation algorithm employed in these filters is a variation of the "sign algorithm" [C181, Ma87c] which is simpler to implement than the LMS algorithm. Adaptive Volterra filters based on distributed arithmetic implementation are presented in [Si86, Sm88]. A gradient adaptive quadratic filtering (only second-order coefficients are used here) algorithm employing an LU decomposition of the quadratic coefficient matrix is discussed in [L088]. This paper also discusses VLSI implementations of adaptive Volterra filters. For notational simplicity as well as ease of performance analysis, it is usual to rewrite the adaptive JULY 1991 IEEE SP MAGAZINE TABLE I THE LMS SECOND-ORDER VOLTERRA FILTER J [n] = £ Xn~k(d [fc] - HT[n] X [fc] J /c=0 (8) Coefficient. Vector filn]= ^hi[0;n],hi[l;n]....hi[JV-l;R],h2[0,0;R],h2[0,l;R], ...,h2[0,W-l;n],h2[l,l;n].....h2lJV-l,N-l;R]]T Input Vector X[r] = [x(r], x[r-1], ... , x[n-iV+l], An], x[r], x\|r-1], ••• , xfn]. x(r-N+1], ^[n-l], ... , a^r-N+I]] Initialization H[0] can be arbitrarily chosen. Algorithm e [r] = d [r] - Ht[r] X [r] H [r+1] = H [r] + \|iX [r] e [n] Note: ()T denotes matrix transpose, n is a diagonal matrix with \|ii appearing in the first N diagonal entries and u2 appearing in the rest of the diagonal entries. where H[n] and X[n] are the coefficient and input signal vectors, respectively, as defined in Table I and X (0 < < k < 1 ) is a factor that controls the memory span of the adaptive filter. The solution to this problem at each time can be easily found by differentiating J[n] with respect to H[n], setting the derivative to zero, and solving for H[n]. The optimal solution at time n is given by H[n] = C-1[n]P[n\ (9) where n C [n] = X -n~k x xTM (1Q) k=0 and filtering algorithm using vector notations. The relevant equations are shown in Table I. Note that the structure of the adaptive filter is different from that of the linear case only in the way in which the vectors are defined. It is relatively straightforward under some simplifying assumptions to show that the mean values of the coefficients converge (for stationary environments) to their optimal values if the convergence constant is chosen such that 0 < [ij, (i2 < 2//.max, where /.rnax is the maximum eigenvalue of the autocorrelation matrix of the input vector X[n]. The problem, as is for the linear case, is that the eigenvalues of the autocorrelation matrix control the speed of convergence. In general, the larger the eigenvalue spread (the ratio of the maximum and minimum eigenvalues), the slower is the convergence speed. This is particularly troublesome in the nonlinear filtering case, since the eigenvalue spreads are in general very large. Even when the input signal is white, the presence of the nonlinear entries in the input vector will cause the eigenvalue spread to be more than one. Consequently, it is important to seek alternate algorithms and structures that have convergence behaviors that are independent of or less dependent on the statistics of the input signal. One approach is to use recursive least squares (RLS) algorithms in place of the LMS adaptive filter. Another alternative is to use lattice (or other orthogonalized) structures to implement the nonlinear filters. We will very briefly discuss the ideas behind RLS adaptive Volterra filters next. The LMS adaptive filter can be considered as an approximate solution to the statistical optimization problem that tries to minimize the mean squared value of the estimation error at each time. RLS adaptive filters, on the other hand, yield the exact solution to an optimization problem formulated in a deterministic fashion. One such formulation gives rise to the exponentially weighted RLS adaptive filter and in the case of the second-order Volterra filter, such adaptive systems minimize the following cost function at each time n P[n] = JjXnkcHk]Xlk] (11) k=0 H[n] can be recursively updated by realizing that C [n] = X C [n-1] + X [n] XT[n] (12) and TABLE II THE RLS ADAPTIVE SECOND-ORDER VOLTERRA FILTER Coefficient Vector H[n]=[hi\|0;R], hi[l;R],...,hi[W-l;R],h2[0,0;R],h2[0,l;n], ...,h2[0,]V-l;R],h2[l,l;R],...,h2[iV-l,N-l;R]]T Input Vector X]r] =[x(r], x(r-1].....xtR-N+1], An], x[r], x(r-1]....... x[R], x[r-N+1], x^n-l], ... , xV-N+l]] Initialization HIO] = [0, 0......0]T C_1[01 = 5"1 I 5 = a small positive constant Algorithm ... r1 cVUXIr] k [R] =------t-=------:---------- 1 +X_1Xr[R] C_1[r-1]X[r] e [r) = d [r] - Ht[r-1] X [r] H [r] = H [r-1) + \ik [r] e [n] c_1 [r] = r^'V-u - r'fc[R] xt[r] c_1[r-i] e[n] = d [r] - Ht[r] X [n] 14 IEEE SP MAGAZINE JULY 1991 2 000 4 000 6 000 8 000 NUMBER OF ITERATIONS O -10 \ 1 1 ' 1 ' 1 1 LMS RLS . 1 , 1,1,11 4 000 6 000 NUMBER OF ITERATIONS Fig. 4. The two curves in each figure compare the speed of convergence of the RLS and LMS adaptive Volterra filters. The performance measure is defined in the text. The plots on the top compare the performance of the linear coefficients and those at the bottom compare the performance of the quadratic coefficients of the adaptive filters. The parameters \|i and X of the filters were selected such that the curves eventually meet (i.e., the steady-state performances of the two systems are similar). The superior convergence behavior of the RLS algorithm in this example is obvious. However, this improved performance comes at the cost of a substantial increase in the computational complexity. P [n] = X P [n- 1] + d [n] X [n] (13) h2[0,0],h2[0, l],h2[0,2],h2[0,3],h2[l, l],h2[l,2],h2[l,3], h2[2,2], h2[2,3], h2[3,3] = H4) [0.54, 3.72, 1.86, -0.76, -1.62, 0.76, -0.12, 1.41, -1.52, -0.13] respectively. The input signal x[n] was obtained by processing a zero-mean and Gaussian signal with a linear filter with impulse response sequence given by hn = 0.25; n = 0 J 1.0; n = 1 0.25; n= 2 0.0; otherwise (15) The input signal variance was selected so that the power of the corresponding output of the unknown Volterra system was about 1. The desired response signal d[n] was obtained by adding a zero-mean and white Gaussian sequence (that was uncorrelated with x[n)) to the output of the unknown system. The output signal to measurement noise ratio was chosen to be approximately 30 dB. The forgetting factor X was chosen to be 0.995. The step sizes \|ij and (i2 of the LMS filter were chosen so that the steady-state excess mean- squared estimation error of the LMS and RLS algorithms were about the same. Fifty different experiments were conducted using 20,000 data samples each. The data used in each of these experiments were uncorrelated with those used in the other 49 experiments. The results presented in Fig. 4 have been averaged over the 50 independent experiments. Figure 4 displays a measure of the mean-squared deviations of the adaptive filter coefficients from the coefficients of the unknown system for the first 10,000 time samples. The linear and quadratic coefficients are considered separately. The measures displayed in the figure are defined as One can simplify the computational complexity a little bit by making use of the matrix inversion lemma for inverting C[n]. This will result in the algorithm given in Table II. The derivation is similar to that for the RLS linear adaptive filter given in [Ha86, Chapter 8]. We now present the results of an experiment that compares the performance of the two algorithms. In the experiment, both LMS and RLS algorithms were used to identify an unknown, time-invariant second-order Volterra system from measurements of its input signal and a noisy version of the output. The memory span of the unknown system was four samples long (i.e., N = 4) and the coefficients were given by \|hi[0], hi[l], hi[2], hi[3]; = [-0.78, -1.48, -1.39, 0.04] and II VL[n] II =10 log and IV-1 ^7fii[i; n] - hi [i]' i=0 ' (16) I(hi[i]y t=0 II Vein] II =10 1og^-^j-rWl (17) X X(h2iijr i=0 J=i ' respectively. These results demonstrate that the RLS algorithm clearly outperforms the LMS adaptive filter in terms of speed of convergence. The experiments were repeated for white Gaussian input signal as well as white and colored signals generated from a uniformly distributed random process. The results were similar to those shown in Fig. 4. The results of the performance com- JULY1991 IEEE SP MAGAZINE x\|n\| y\|n] Fig. 5. Block diagram of a simple bilinear system This system is representative of more general nonlinear systems that are described using recursive nonlinear difference equations. The key advantage of such systems is that it is possible to represent many nonlinear systems with relatively few coefficients when compared with Volterra system representations. The obvious disadvantage of these representations is that we must continuously monitor the adaptive systems using these models for stability. Another disadvantage that is not shared by recursive linear systems is that any noise in the input signals to the adaptive filter will appear in the system model in a multiplicative fashion and this will affect the performance of the adaptive systems. parison of this example are typical of the behavior of the two systems. This statement is especially true when the signal-to-measurement noise ratio of the input signals is large. An operations count will show that the LMS algorithm has a computational complexity that is proportional to N2 (0(N2)) multiplications per time instant, whereas the complexity of the RLS algorithm is 0(N4) multiplications per time instant. The price paid for the better performance in terms of the increased computational complexity is exorbitant in many applications. Fast algorithms that simplify the computational complexity by a considerable amount can be derived by making use of the fact that most of the elements of the data vectors X[n] and X[n - 1] are the same [Ci84, Lj78]. A particularly easy-to-understand exposition of the ideas involved in the derivation of the fast algorithms for the linear filtering case is given in [A186], Such an algorithm requiring 0(N3) multiplications per time instant for second-order Volterra filtering has been developed in [Le91, Ma88]. Since this method is a more efficient realization of the algorithm in Table II, it exhibits better convergence and tracking properties than the LMS Volterra filters. It also seems to be more robust to the statistical variations of the input signals. However, note that the computational complexity is considerably more than the 0(N2) complexity of LMS adaptive filters. A computationally simpler approximate RLS adaptive solution has been developed in [Da87]. However, the approximations assume that the input signal to the adaptive filter is Gaussian, and the system performance breaks down when the input sequence belongs to nonGaussian distributions. A significant problem with the methods in [Da87, Ma88] is the very poor numerical properties exhibited by the "fast" RLS algorithms. ADAPTIVE FILTERS USING RECURSIVE NONLINEAR DIFFERENCE EQUATIONS The major problem associated with Volterra series representation of nonlinear systems is that a very large number of coefficients are required to characterize many nonlinear processes. Consequently it is important to search for alternate representations that may be more parsimonious in their use of coefficients. One such model is that in which the input-output relationship is governed by a recursive nonlinear difference equation of the type Measurement Noise x[n]- Unknown Bilinear System d'[n] -d[n] (+H- M - e[n] y\|n] = d\|n] - or din] Adaptive Bilinear System - d[n] A N-l N-l N-l A N-l <J[n] = 2 Cj[n]yln - i] + ^ ^ b;j[n]y[n - j]x[n - i] aj[n]x[n - i] r I i=0 j=l i=0 Fig. 6. The differences between the equation-error and output- error approaches of adaptive bilinear filtering is explained in the context of a system identification problem here. Equation- error algorithms use din1 and x[n] as the inputs to the adaptive system to get the system output fi[n]. Since the statistics of d[n1 are in general different from those of the "true'' output of the unknown system d 7nl, the estimates of the unknown coefficients will be biased in general. The output-error algorithms use past samples ofcl[n] to obtain &[n]. Since ci[ri) is an estimate of d fn], the statistics ofclln] will hopefully be close to those of d 'Inj (at least after adaptation has taken place) and therefore we can expect to get unbiased (or at least close to unbiased) estimates of the coefficients. The relative merits of the two approaches are briefly discussed in the text. More details can be found in [Sh89], IEEE SP MAGAZINE JULY 1991 1000 1500 NUMBER OF ITERATIONS Fig. 7. One disadvantage the equation-error adaptive nonlinear filters share with their linear counterparts is that they will produce biased estimates in the presence of measurement noise (compared to the output-error adaptive algorithms). The plots on the top correspond to the average behavior of one of the coefficients of the adaptive filter under three different noise conditions. Note that they converge to wrong values (the correct value of the coefficient of the unknown system is 1). The plots at the bottom were obtained using an output- error algorithm, and the curves do converge to the correct values in this example. However, it is possible that the error surface of this system has local minima and differently initialized systems can converge to wrong coefficient values. y[n] = ^Pi ( y[n-l], y[n-2], ;=i , x[n-JV+l] In spite of the simplicity, this is an important nonlinear model since it can be shown under relatively mild conditions that a large class of nonlinear systems including Volterra systems can be approximated with arbitrary precision using bilinear system models with finite number of coefficients (see [Br76, Mo80] and the references in these papers for details). Furthermore, most of the ideas discussed here on bilinear systems can be easily extended to the more general recursive nonlinear system models. The block diagram of a bilinear system for the case when N = 3 is shown in Fig. 5. Several properties of bilinear time series are discussed in [Su81]. A survey of the applications, control, and identification of bilinear systems can be found in [Mo80]. Another work that extensively discusses the properties of bilinear systems is [Br74]. As for the case of the linear IIR adaptive filters, there are two different approaches to solving adaptive filtering problems using recursive nonlinear system models equation-error and output-error approaches. The basic ideas behind these two approaches are depicted in Fig. 6. The interested reader may refer to the tutorial article [Sh89] by Shynk on adaptive IIR filters for more details regarding these two approaches. Equation error algorithms are straightforward to develop, and the mean-squared estimation error surface has a unique minimum. However, this minimum may not be at the correct solution to the problem if there is noise present in the desired response signal. Furthermore, there is no guarantee that the adaptive filter solutions will be stable at all times (including at convergence). The basic idea is to simply use samples of the input signal x[n] and the desired response signal d[n] to obtain the adaptive filtering estimate as y[n-JV+l], xln], x(n-l], (18) N-1 ci [n] = d[n-i\ (20) + ^ a;[n] x[n-i] i= o where Pj(. •,...,•) is an i-th order polynomial in the quantities within the parentheses. Just as linear IIR filters can represent many systems with far fewer coefficients than their FIR counterparts, system representations using recursive nonlinear difference equations can model many nonlinear systems with much more parsimony than Volterra series representations [Bi84b, Di88]. Perhaps the simplest of the nonlinear systems in this category is the bilinear system whose input-output relationship is given by where c([n], fej Jn], and a,In] are the adaptive filter coefficients at time n. The adaptive filter coefficients can be updated using a gradient algorithm or an RLS solution or some other appropriate technique. The gradient update equations (which can be derived as in (6) and (7)) for minimizing the mean-squared estimation error E((d[n] - ci[n])2l are Ciln+1] = cjn] + He d [n-i] e [n] + Hb d [n-J] x[n-i] e [n] feyln+1] + fey In] and a;[n+l] = a;[n] + [iax [n-i] e [n] (21) (22) (23) JULY 1991 IEEE SP MAGAZINE wh e r e o b ta in ed a s th e so lu tio n th a t m in im iz e s e [n] = d [n] - c£ [n] (24) is the estimation error at time n and \|ia, \|ib, and \|it are constants that control the rate at which the adaptive filter converges. Note that if d[n] contains noise, the statistics of the input signal to the adaptive filter will be biased from the statistics of the "ideal" desired response signal and this will result in biased estimates. Actually, the presence of additive measurement noise in the input signals to the adaptive filter considerably complicates the problem when compared with the linear IIR filtering problem. Because of the existence of the product terms x[n - i]d[n - j] in the computation of the output of the adaptive bilinear system (see equation (20)), there will be multiplicative noise components present at the output. This situation is quite different from that of the linear IIR filtering problem. In spite of this, the above approach is attractive because of its simplicity and is very useful in low noise environments. As explained in Fig. 6, the output-error methods feed back the output of the adaptive system to estimate the current sample of the desired response signal of the adaptive filter. Many of the gradient adaptive output- error algorithms described in [Sh89] can be extended to the nonlinear estimation problem. While active research is currently going on to understand the properties of such systems using empirical and theoretical analyses, no published results are available on these types of adaptive nonlinear filters. Perhaps the simplest among the various methods available in the literature is the suboptimal least-squares method (this method has also been referred to as the extended least-squares algorithm in [Fn87]) presented by Billings and Voon [Bi84b], Moore [Mo82] has established convergence results for this algorithm when applied to linear estimation problems and this analysis seems to carry over to the nonlinear case also. The easiest approach to explaining the suboptimal least squares algorithm may be to use vector notation. Let Hln] =^ciln], C2[n], ... , cjv-i[n], feo.iln], ... , feiv-i, /v-iM, (25) ao[n], ... , and X[n] = ^c?n-i[n-1], Sn-2[n-2].......&n-iv+i[n-iV+l], xfn] cJn_i[n-l], .... x[n-A/+l] eJ„-iv+i[n-]V+l], x(n]......x(n-lV+in J (26) denote the adaptive filter coefficient vector and input vector to the adaptive filter, respectively. Here clk [i] denotes the estimate of the desired response signal at time I made using the adaptive filter coefficients at time k. Then, the adaptive filter coefficient at time n is n j[rl] = ^Xn^d[(c]-HT[n]X[fc]j (27) k=0 Sn[n] = HT\n] X [nl (28) H[n] = C VlPIn] (29) where C[n] and P[n] are as defined in equations (10) and (11), and X[n] is as defined in equation (26). As discussed before, one can make use of the matrix inversion lemma to obtain a more computationally efficient solution to the problem. Even though the formulation and the above solution of our problem look veiy similar to the RLS Volterra filtering problem we discussed earlier, equations (28) and (29) do not represent an exact least-squares solution in the following sense. The exact least-squares minimization problem in equation (8) requires that the cost function J[n] is defined using estimation error values en Ik] = d[k] - HT[n] X [fc] (30) computed at time k using the solution H[n] to be obtained at the current time. Thus the problem is formulated as if we were finding an entirely different solution at each time (even though it is possible to update the coefficients on the basis of the previous solutions). In the problem described in equations (25)- (29), Sk l[k - 1], at2[k - 2], ..., &k_N+1[k - N + 1] that appears in the input vector X[k] are computed at times k - 1, k - 2, ...,k-N+l, respectively. Consequently, the coefficient solution at time n does depend on the previous solutions (at least implicitly), and the solution is not an exact least-squares solution. Even though clearly suboptimal in the above sense, experimental results presented in [Bi84b, Fn87] seem to indicate that this method performs very well. [Bi84b] also discusses two other somewhat more complicated algorithms for output-error adaptive filters for nonlinear systems described by recursive nonlinear difference equations. Several variations of the ideas discussed above have been presented in [Da89, Fn87, Ga89]. The advantage of output-error algorithms over equa- t ion- error algorithms is obvious - the former may be less sensitive to additive noise components present in the desired response signal than the latter. However, the error surface may have local minima (see [Sh89] for illustrations of this idea) and the adaptive filter may not converge to the global minimum, unless the system is initialized properly. Research aimed at getting a better understanding of the properties of and designing better and more efficient output-error adaptive nonlinear filters is currently going on. We will now present a simulation example The adaptive filter output at time n is given by and H[n] is estimated as 18 IEEE SP MAGAZINE JULY 1991 demonstrating some of the ideas discussed above. Once again, we will consider a problem involving identification of an unknown system. The system to be identified is bilinear and has input-output relationship given by y [n] = ay [n-l] + by [n-2] x [n-l] + cx [n-l] (31) where a = 1, b = -0.7, and c = 0.5. This system is the same as the one used in [Fn87]. The input signal to the adaptive filter x[n] was a white and zero-mean Gaussian sequence with variance 0.05. (The reason for selecting algorithm, on the other hand, converges to the correct solution for all noise levels in this experiment. While it may be possible to artificially create situations when the output-error filter will converge to some local minimum, the algorithm has exhibited very good behavior in all our experiments. However, much work needs to be done before we can claim a complete understanding of the properties of such filters. Just as adaptive HR (linear) filters have many problems that are not shared by their FIR counterparts, adaptive nonlinear systems using recursive nonlinear difference equations also have many problems that are not shared by adaptive Volterra filters. The most important among them is the fact that such algorithms should either be guaranteed to be stable all the time or they should be monitored at all times for stability, and if found to be unstable at any time, steps must be taken to modify the coefficients such that the resulting filter is stable. The problems associated with stability are much larger in the case of nonlinear systems than for linear systems. To see this, consider the bilinear system used in the experiments. Suppose for the time being that b = 0. Then, the system 800 1200 1600 2000 NUMBER OF ITERATIONS Fig. 8. The problems with the stability of recursive nonlinear systems are substantially larger than those associated with recursive linear systems. Often, the notion of stability of such systems is input signal dependent. It is possible to drive many such systems to instability by simply magnifying their input signal (This can never happen with linear systems.) One such example is illustrated in this figure. The plots correspond to the behavior of the output-error filter when the input signal is an amplified version of that used to obtain Fig. 7b. One explanation for the erratic behavior of the coefficients is that the underlying unknown system is unstable for the present input and therefore the dynamics of the output signal is extremely large. Consequently the adaptive filter has a very hard time tracking the coefficients properly. this low number for the input signal variance will become clear a little later.) The desired response signal was obtained by corrupting the output of the unknown system with additive white noise that is uncorrelated with the input signal. The adaptive filter was run using the same model as described by equation (31) (i.e., the only unknown quantities are a, b, and c). The results presented are ensemble averages over fifty independent experiments. Figures 7a and b display the average behavior of the adaptive filter coefficient corresponding to the unknown parameter 'a' in equation (31) obtained using the equation- error and output-error methods, respectively, for different measurement noise levels. Both algorithms used X = 0.995. All the coefficients were initialized to zero. Notice that the equation-error algorithm converges to the wrong solution for high noise level in accordance with our earlier discussion. The output-error y [n] = y [n-l] + 0.5 x [n-l] (32) is only marginally stable. In fact, when b is nonzero, one would expect that there would be a very large class of input signals that would make the system unstable. This statement is true in general of nonlinear feedback systems. One can almost always find bounded input signals that would drive the system to instability. (The notion of input-dependent stability may offend many purists. Even though most feedback nonlinear systems are unstable in the general sense, we can often define classes of input signals for which such systems will provide useful outputs and/or model signals and other real-world systems with good accuracy. Consequently, it is not at all unusual to talk about input-dependent stability in the context of recursive nonlinear systems.) The above problem causes great difficulty in the design and analysis of adaptive feedback nonlinear systems. In order to illustrate this problem further, consider the experimental set up described earlier, with the difference that the input signal variance is 1.0 instead of 0.05. Figure 8 displays the average behavior of the three coefficients of the adaptive filter. Note that the coefficient behavior has become very erratic and this is caused at least in part by the fact that the underlying unknown system as well as the system that the adaptive filter has identified is unstable for the given input signal. Most of the techniques that are currently available cannot adequately handle this problem without human intervention. One exception is the work by Fnaiech and Ljung [Fn87] which discusses several variations of the ideas presented in this section. In their work, they stabilize the filter by means of a time-varying Kalman filter. With the help of a theorem in [Ja70], they have argued that such a system will always result in a stable nonlinear system. They have also demonstrated the validity of the claim by means of simulation examples. However, the details are beyond the scope of this paper. JULY 1991 IEEE SP MAGAZINE x2[n] x[n]x[n-l] x[n]x[n-2] x[n]x[-3J \ Jf_ jL $ x(n-l]x[n-3] x\|n-2jx[n-31 x[n-3] x~[n-3] Fig. 9. The adaptive Volterra filtering problem can be easily translated into an adaptive, multichannel, linear filtering problem. In the example shown here with P = 2 and N = 4, one can visualize having five channels as shown. The signals are tapped from the input points as well as from the outputs of the delay elements and linearly combined to form the estimate of the desired response signal. What makes this "multichannel" problem somewhat different and perhaps a little difficult when compared with traditional multichannel adaptive filters is the fact that the number of delay elements in each channel is different from those in others. However, this problem can be overcome, and this structure is the basis for fast RLS Volterra filters and certain lattice realizations of the adaptive Volterra filter. ADAPTIVE LATTICE POLYNOMIAL FILTERS Adaptive lattice filters try to orthogonalize the input signals to the filter and then estimate the desired response signal as a linear combination of the transformed signals that are hopefully orthogonal to each other. The advantages of lattice filters in adaptive filtering applications are several. Lattice filters equipped with LMS-type adaptation algorithms tend to show faster and less input signal-dependent convergence behavior than their direct form counterparts. They also tend to have better numerical properties than direct form adaptive filters. It turns out that adaptation of the filter parameters in each lattice stage can be done independently of the rest of the stages. Also, the structure is fairly modular, and therefore adaptive lattice filters are veiy suitable for VLSI implementation. In this section, we will develop a lattice structure for a second-order truncated Volterra system. The ideas developed will be equally applicable to other types of polynomial systems also. In order to develop the lattice parameterization of Volterra filters, it is convenient to visualize the nonlinear filtering problem as a linear multichannel filtering problem. This characterization is depicted in Fig. 9 for the second-order Volterra filter. The multichannel characterization is somewhat different from traditional multichannel adaptive filtering problems in the sense that each of the different channels uses a different number of delay elements (and coefficients) when compared with the rest of the channels. To overcome this difficulty, many lattice realizations of Volterra filters [Le86] use additional coefficients and delay elements in each "channel" to make the number of coefficients the same for every "channel." (This actually corresponds to special shapes for the region of support of the Volterra kernels.) Adaptive lattice Volterra filters that are designed specifically for Gaussian input signals and work well only for such signals have been presented in [Ko83a], However, there are lattice structures (that are designed independently of the statistics of the input signals) available for truncated Volterra systems as given in equation (3). We will now discuss one such structure that is based on a multichannel lattice filter developed by Ling and Proakis [Li84] and a nonlinear lattice predictor developed by Zarzycki [Za85]. For simplicity, we will consider the case when N = 3 and P = 2. A block diagram of the nonlinear lattice predictor is shown in Fig. 10. Let us group the signals involved in the estimation at time n into three columns as shown below. x[n] x[n-l] x{n-2] An] An-1] An- 2] x[n] x[n-l] xfn-1] x[n-2] x[ri] xfn-2] (33) 1t 1t 1t Column 0 Column 1 Column 2 =x§ [n] = x? [n] = x^ [n] The basic idea employed in the derivation of the lattice Volterra filter is to obtain a Gram-Schmidt orthogonal decomposition of x§[n],xj[n], and x^tn]. (All lattice filters try to obtain Gram-Schmidt orthogonaliza- tion of appropriate input vectors.) Let b0[n], b^n], and b2[n] represent an orthogonal basis set for x§[n],xj[n], and Xgln]. Then, any linear combination of the elements of xg[n],x^[n], and xtjln] can be equivalently written as another linear combination of the elements of b0[n], bj[n], and b2[n], and vice versa. (In other words, the linear spans of the elements of both the sets of vectors are exactly the same.) What this means is that instead of estimating the desired response signal d(n) as a linear combination of the elements of x^[n],x^[n], and x§[n], we can compute the estimate as a linear combination of the elements of b0[n], b,[n], and b2[n]. Let & [n] bo[n] biln] b2[n] (34) be the best estimate so obtained, where and are appropriate coefficient vectors from which the possible time dependence has been suppressed. One of the biggest advantages of the lattice structure is that since b0[n], bj[n], and b2[n] are orthogonal to each other, the coefficient vector kf can be computed solely from the joint statistics of d[n] and bjn]. For example, the min- IEEESP MAGAZINE JULY 1991 Fig. 10. Block diagram of a lattice filter structure for Volterra systems with N = 3 and P = 2. The number of lines going into and out of a system component indicates the number of input and output signals, respectively, of that component The backward prediction error vectors bo[n],bi[n], and tain] are orthogonal to each other, and the components of these vectors span the whole space spanned by the elements o/'.X[n]=(x[n],.x2[n],x[n^l],.x2[n^l],.x[n]x(n-l],x[n-2], A2[n-2],x(n-l]x[n-2],x[n]x[R-2])T. (Note that the elements of each of these vectors are not orthogonal to each other. This can be achieved by doing a Gram- Schmidt orthogonalization of the elements of each vector.) At each stage of the lattice, the prediction error vector has one more element than the previous stage. This prediction error signal (that corresponds to estimating x(n)x(n - i) for the i-th stage) must be computed outside the basic lattice structure. The coefficients denoted using the letter g are used to compute these additional prediction-error signals. Efficient computation of the backward prediction-error vectors requires computation of the forward prediction-error vectors fo[n] Ji[n], andfi[n] also. (See text for details.) For joint process estimation for estimating a different signal d(n) using elements ofX[n]), we need only lo find the appropriate linear combination of the components of the backward prediction-error vectors. Development of gradient and least-squares adaptive algorithms based on this lattice structure is now relatively straightforward. imum mean-squared solution for (Cq is given by Jc$ = E Ibo [n] bo [n]l E d [n] bo[n] (35) and does not depend on b, [n] or b2[n]. It is well known that one way of obtaining b0[n], b,[n], and b2[n] is to define b(\|n] as the i-th order backward prediction error vector for x^[n]. b([n] would then be the estimation error when xftn] is estimated using the previous column vectors of (33). b0[n] is defined to be bo [n\ = xB [n] = x [n] x2 [n] (36) "forward prediction error" vector (to be defined shortly), /jln], and some allied quantities. To see this, note that bj[n - 1] is the error vector obtained when we estimate xfrn-1] x [n-2] x2 In-2] x [n-l] x [n-2] (37) bjtn] and b2[n] are defined to be the estimation error vector when column 1 and column 2, respectively, of (33) are estimated using elements of all the previous columns. Given b0[n] and bjln], computation of b2[n] can be done from knowledge of bx[n - 1], the first-order using the elements of the set \|x[n - 1], x2[n - 1]\|, i.e., x§[ n-1 ]. The key point is that x^[ n-1 ] is nothing but the top three elements of Xgin] (similarly, x§[n-l] appears as the top two elements of x^[n]) and b2[n] is the prediction error vector when we estimate x^In] using Xj[n] and XqIm]. In bj[n - 1], we have all the information about the top three elements of x^[n] that we can extract from the top two elements of Xj[n]. Now, the problem is to find out how much additional information is contained in x§[n] and x[n] x[n - 1], the last element of Xj[n], The "new" information is present in that part of JULY 1991 IEEE SP MAGAZINE*[n] x2 [n] x [n] x [n-1] that is "not related to" or orthogonal to x [n-1] > a2 [n-1] (38) (39) This component is precisely the estimation error obtained when the three-element vector in (38) is estimated using the two-element vector in (39). The estimation error vector that is produced in the process is nothing but the first-order forward prediction error vector. In general, the i-th order forward prediction error vector fjn] is defined as the error vector produced when the data vector x [n] x2 [n] x [n] x [n-1] x [n] x [n-2] (40) x [n] x [n-i ] J is estimated using all possible linear and quadratic terms formed using the elements of the set (x[n - 1], x[n . 2]......x[n - ill. Let jfc^ln] represent the top three elements of b2[n]. Based on the above discussion, we can express as tain] = bi[n-l] - K2 fi [n] (41) where K2 is the appropriate coefficient matrix and the possible time dependence has been suppressed. Similarly, one can show that the top three elements of ^[n] can be evaluated as (42) Ja[n] =ii[n] - bi[n-l] where the notation is similar to that used in (41). The last element of b2[n], which is the error in estimating x[n]x[n - 2] using the same five input elements in the first two columns of (33), has to be computed separately. This element can be computed by subtracting a linear combination of all the elements of b0[n] and/jln] from x[n]x[n - 2] since the components of b0[n] and /, [n] do span the same space spanned by the elements of the first two columns of (33). In general, the last element of b^n] can be obtained by subtracting from x[n]x[n - i] an appropriate linear combination of the elements of the vectors b0[n - 1], b^n - 1], ..., bj_2[n - 1], and Jj.jtn]. Similarly, the last element of jfj[n] can be obtained by subtracting an estimate of x[n]x[n - i] obtained as a linear combination of the elements of b0[n - I], bj[n - 1]......bj.^n - 1] from x[n]x[n - i]. The basic lattice predictor algorithm for a second-order Volterra system with N - 1 delays is given in Table III. Once the lattice structure has been developed, deriv ing an adaptive filter based on this structure is not very LATTICE F ILTER S T R U C T U R E Structure shown is for a second-order Volterra system with arbitrary N. Possible time dependence of the coefficients has been suppressed. Initialization /oN = boW=Kl"Ji] eo [n] = d [nl , T bo [n-1] -(k?) loin] bi[n]: iiln] I Kx [n] x [n-1] - J>[n] ( T /o[nl-(M) bo[n-l] jc [n] x [n-11 - bo[n-l]. Ci,o[n] = x [n] x [n-fl - (g£ol boln-1]; t= 2,3.......N-1 ei[n] = eo[n] - boln] lattice Sections 2 thru Nul DO FOR i= 2, 3, .... N-1 Rackward Prediction Error Update bjn] = bi-i[n-l] - ("k?'] Xi-iln] 1. ct i-2[n] - ^2u-i j /i-iln] Forward Prediction Error Update lAn] : 1 J_i[n]-(K-f) bi-i[n-l] Uti-2[n] - b(-i[n-l] Auxiliary Variable Update cj.i-.iln] = cj,i-2(n] - ^(-ijbi-i[n-ll; J=U-1, tf2.....N-1 ■ Inint Process Estimation Error Update ejn] = a-i[n] - ((cf-i^bi-iM END LOOP Final Joint Process Estimation Error e [n] = ejvtn] = ejv-iM - fk$-i'l bN-iln.] Notes: k{ and K? are (i+l)x (i+1) matrices. g{j and gij are vectors with (J+l) elements. cy[n] are scalar signals and are used to compute the last elements of /([ n] and bdn]. ____________ IEEE SP MAGAZINE JULY 1991 difficult. Gradient algorithms like those presented in [Gr77] can be easily extended to the nonlinear case. The key idea employed in LMS-type lattice filters is that the coefficients in each stage can be optimized independently of later stages. This is because of the orthogonality of the relevant signals in different stages when optimal lattice coefficients are used. For example, consider the discussion surrounding equations (34) and (35). It is apparent that each coefficient kf can be evaluated as the optimal coefficient for estimating the desired response signal d[n] as a linear combination of the elements of bjn]. Let ei [n] d [n] - X (K J=o b; [n] (43) Observe that since b0[n], ..., bM[n] are orthogonal to b [n], e^n] contains all the components of d[n] that can be estimated using bjn]. Therefore, if we try to estimate e([n] using bjn], we will get the same result that we would have obtained if we tried to estimate d[n] using bjn]. Thus adaptation of kf can be considered as a separate adaptive filtering problem where e^n] is the desired response signal, bjn] is the input to the adaptive filter, and ei+1 is the estimation error. The relevant equations for the LMS-type adaptive filter are ei+i [n] = ei[n] - ( kf [n] J b, [n] and kf [n+1] = kf [n] + \i ei+i [n] bi [n] (44) (45) derivation of these equations, which is straightforward and omitted here, will complete the development of the LMS adaptive Volterra lattice filter. One of the disadvantages of the lattice structure when compared with the direct form structure is the fact that it requires 0(N3) coefficients to completely describe a second-order Volterra system with N delays while the direct form structure needs only 0(N2) coefficients. Therefore, the computational complexity of the gradient adaptive algorithms based on the lattice structure will also be proportional to N3 operations per instant. This complexity is comparable to those of fast RLS algorithms (even though it will still be lower than most RLS adaptive Volterra filters), and consequently the computational advantage the gradient adaptive lattice Volterra filters enjoy over the RLS adaptive Volterra filters is not as significant as in the case of direct-form implementations. Least-squares adaptive lattice Volterra filters with 0(N3) computational complexity and extremely good numerical properties have recently been developed [Sy90]. However, an exposition of the ideas employed in the derivation of such algorithms is beyond the scope of this introductoiy paper. The interested reader is referred to [Sy90] for details. Algorithms for adaptive least-squares lattice bilinear filters have also been developed [Ba90a, Ba90b]. Another related work is [Pa81]. Korenberg has developed algorithms using Gram-Schmidt or- thogonalization of the input data and that can be applied to the general class of polynomial system models [Ko88]. For the Volterra and bilinear system models, the lattice filter structure discussed in this paper turns out to be quite a bit more efficient than Korenberg's approach. Different values of \|i may be used for different stages. Similar to the above development, the adaptation of the coefficients in each stage of the predictor part of the lattice structure can be done independently of the other stages. The coefficient matrix Kf for the backward prediction problem can be updated by realizing that we can view this as a separate adaptive filtering problem withjj.jM as the input signal, b^ln-l] as the desired response signal, and ~5t [n] as the error signal. The corresponding adaptation algorithm is bi [n] = b(_i [n-1] - fxf Ini') J-i[n] and K? [n+1] = Kf [n] + \i J_i[n] biT[n] (46) (47) Again, (j. can be different for each stage or even time-varying. One may also use a matrix in place of the scalar quantity. The update equations for the coefficients of the forward predictor sections and the auxiliary quantities associated with computation of the last elements of Jj[n] and bj[n] can be derived in a similar fashion. The CONCLUDING REMARKS This tutorial article presented an introduction to adaptive nonlinear filtering theory. The emphasis in the first part of the paper wap on system models using truncated Volterra series expansions and adaptive filters based on such models. A160 presented was a brief introduction to adaptive filtering using recursive nonlinear system models. This paper also described the basics of a lattice nonlinear filter structure. Obviously, there is no one general theory of nonlinear system analysis and we had to restrict ourselves to just these two nonlinear models. Consequently, adaptive nonlinear filters based on other nonlinear models were not discussed in the paper. Adaptive nonlinear filtering is an exciting and challenging area with a wide variety of applications. While quite some progress has been made in recent years, much needs to be still done. There is a fairly large amount of research activity going on in this area at present and we can expect to see a substantial number of new techniques being developed, and potential breakthroughs with great impact on practical applications occurring in the near future. JULY 1991 IEEE SP MAGAZINE This work was supported in part by NSF under Grant Nos. MIP-8708970 and MIP-8922146 and a University of Utah Faculty Fellow Award. I have learned quite a lot from my students at the University of Utah. In particular, I would like to acknowledge very useful interactions with Heung Ki Baik, Junghsi Lee, and Mushtaq Syed as well as their contributions to the development of many of the ideas discussed in the paper. Junghsi Lee provided the results of all the experiments described in the paper. A c k n o w l e d g m e n t s V. John Mathews (S '82, M '85, SM '90) was bom in Nedungadappally, Kerala, India in 1958. He received his B.E. (Hons.) degree in electronics and communication engineering from the University of Madras, India, and the M.S. and Ph.D. degrees in electrical and computer engineering from the University of Iowa, Iowa City, in 1980, 1981, and 1984, respectively. From 1980 to 1984 he held a Teaching-Research Fellowship at the University of Iowa, where he also worked as a Visiting Assistant Professor with the Department of Electrical and Computer Engineering from 1984 to 1985. He is currently an Assistant Professor with the Department of Electrical Engineering, University of Utah, Salt Lake City. His research interests include adaptive filtering, spectrum estimation, and data compression. Dr. Mathews is an associate editor of the IEEE Transactions on Signal Processing. REFERENCES [Ag82]Agazzi, O., D. G. Messerschmitt and D. A. Hodges, "Nonlinear echo cancellation of data signals, IEEE Trans. Communications, Vol. COM-30, pp. 2421-2433, November 1982. [A186]Alexander, S. T., "Fast adaptive filters: A geometrical approach," IEEE ASSP Magazine. Vol. 3. No. 4, pp. 18-28, October 1986. [Ba63]Barret, J. F., "The use of functionals in the analysis of nonlinear systemJournal of Electronics, Controls, Vol. 15, pp. 567-615, No. 6, 1963. [Ba64]Balakrishnan, A. V., "A general theory of nonlinear estimation problems in control systems. Journal of Mathematics Analysis. Applications, Vol. 8. No. 1, pp. 4-30, 1964. [Ba90a]Baik, H. K., V. J. Mathews, and R. T. Short. "Adaptive lattice bilinear filters," Proc. SPIE's 1990 Int. Symp. on Optics and Optoelectronic Applied Science and Engineering - Conference on Advanced Signal Processing Algorithms. Architec tures and Implementations, San Diego. California, July 1990. [Ba90b]Baik, H. K. and V. J. Mathews. "Adaptive lattice bilinear filters," submitted to IEEE Trans. Signal Processing, December 1990. [Be76]Benedetto, S., E. Biglieri and R. Daffara, "Performance of multilevel baseband digital systems in nonlinear environment," IEEE Trans. Communications, Vol. COM-24, pp. 11661175, 1976. [Be79]Benedetto, S., E. Biglieri and R. Daffara, "Modeling and performance evaluation of nonlinear satellite links - A Volterra series approach," IEEE Trans. Aerospace and Electronic Sys terns, Vol. AES-15, pp. 494-507, 1979. [Be83]Benedetto, S. and E. Biglieri, "Nonlinear equalization of digital satellite channels," IEEE J. Selected Areas on Communications, Vol. SAC-1, pp. 57-62, 1983. [Be85]Bellafemina, M. and S. Benedetto, "Identification and equalization of nonlinear channels for digital transmission," Proc. IEEE Int. Symp. Circuits and Systems, Kyoto, Japan, pp. 1477-1480, June 1985. [Be87]Benedetto, S., E. Biglieri, and V. Castellini, Digital Transmission Theory, Prentice Hall, Englewood Cliffs, New Jersey, 1987. [Bi80]Billings, S. A., "Identification of nonlinear systems - A survey." IEE Proceedings, Vol. 127, Pt. D, No. 6, pp. 272-285, November 1980. [Bi84a]Biglieri, E., A. Gersho, R. D. Gitlin and T. L. Lim, "Adaptive cancellation of nonlinear, intersymbol interference for voiceband data transmission," IEEE J. Selected Areas on Communications, Vol. SAC-2, pp. 765-77, 1984. [Bi84b]Billings, S. A. and W. S. F. Voon, "Least squares parameter estimation algorithms for nonlinear systems," Int. J. System Set, Vol. 15, No. 6. pp. 601-615, 1984. [Bi84c]Billings, S. A., "Identification of nonlinear systems, In Nonlinear System Design, (Ed. S. A. Billings, J. O. Gray and D. H. Owens) Peter Peregrinus, Ltd., London, UK, 1984. [Bo83a]Boyd, S. , Y. S. Tang, and L. O. Chua, "Measuring Volterra kernels," IEEE Transactions on Circuits and Systems, Vol. CAS-30, No. 8, pp. 571-577, August 1983. [Bo83b]Bovik, A. C., T. S. Huang, and D. C. Munson, "A generalization of median filtering using linear combinations ot order statistics," IEEE Trans. Acoust., Speech, Signal Proc., Vol. ASSP-31, No. 6, pp. 1342-1349, December 1983. [Bo85]Boyd, S. and L. O. Chua, "Fading memory and the problem of approximating nonlinear operators with Volterra series." IEEE Transactions on Circuits and Systems, Vol. CAS- 32, No. 11, pp. 1150-1161, November 1985. [Br70]Brillinger, D., ‘The identification of polynomial systems by means of higher order spectra," J. Sound and Vibrations, Vol. 12, pp. 301-313, 1970. [Br74]Bruni, C., G. DiPillo, andG. Koch, "Bilinear systems: An appealing class of'nearly linear systems' in theory and applications," IEEE Trans. Automatic Control, Vol. AC-19, pp. 334-348, 1974. [Br76]Brockett, R., "Volterra series and geometric control theory," Automatica, Vol. 12, pp. 167-176, 1976. [Ca85]Casar-Corredera, J. R., M. Garcia-Otero, and A. R. Figueiras- Vidal, "Data echo nonlinear cancellation," Proc. IEEE Int. Conf. Acoust., Speech, Signal Proc.. Tampa, Florida, pp. 32.4.1-4, March 1985. [Ci84]Cioffi, J. M . andT. Kailath, "Fast, recursive least-squares transversal filters for adaptive filtering," IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-32, No. 2, pp. 304-337, April 1984. [C181]Claasen, T. A. C. M. and W. F. G. Mecklenbrauker, "Comparison of the convergence of two algorithms for adaptive FIR digital filters," IEEE Transactions Acoust., Speech, Signal Proc., Vol. ASSP-29, pp. 670-678, June 1981. [Co80]Coker, M. J. and D. N. Simkins, "A nonlinear adaptive noise canceller," Proceedings of the 1980 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 470-473, 1980. [Da871Davila, C. E., A. J. Welch, and H. G. Rylander, III, "A second-order adaptive Volterra filter with rapid convergence, IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-35, No. 9, pp. 1259-1263, September 1987. IEEE SP MAGAZINE JULY 1991 [Da89]Dai, H. and N. K. Sinha, "Robust recursive least-squares method with modified weights for bilinear system identification," IEE Proceedings, Vol. 136, Part D, No. 3, pp. 122-126, May 1989. [Di88]Diaz, H. and A. A. Desrochers, "Modeling of nonlinear discrete-time systems from input-output data, Automatica, Vol. 24, No. 5, pp. 629-641, 1988. [Ew80]Ewen, E. J. and D. D. Weiner, "Identification of weakly nonlinear systems using input and output measurements, IEEE Transactions on Circuits and Systems, Vol. CAS-27, No. 12, pp. 1255-1261, December 1980. [Ey63]Eykhoff, P., "Some fundamental aspects of process- parameter estimation," IEEE Transactions on Automatic Control, Vol. AC-8, pp. 347-357, October 1963. [Fa78]Falconer, D. D., "Adaptive equalization of channel nonlinearities in QAM Data Transmission Systems," Bell Systems Tech. J., Vol. 57, No. 7, pp. 2589-2611, September 1978. [Fa80]Fakhouri, S. Y., "Identification of the Volterra kernels of nonlinear systems," IEE Proceedings, Vol. 127, Pt. D, No. 6. pp. 296-304, November 1980. [Fn87]Fnaiech, F. and L. Ljung, "Recursive identification of bilinear systems," Int J. Control, Vol. 45, No. 2, pp. 453-470, 1987. [Ga89]Gao, X. Y., W. M. Snelgrove, and D. A. Johns, "Nonlinear IIR adaptive filtering using a bilinear structure," Proc. IEEE Int. Symp. Circuits and Systems, Portland, Oregon, May 1989. [Gr77]Griffiths, L. J., "A continuously adaptive filter implemented as a lattice structure," Proc. IEEE Int Conf. Acoust. Speech, Signal Proc., Hartford, CT, pp. 683-686, May 1977. [Ha86]Haykin, S., Adaptive Filter Theory, Prentice Hall, Englewood Cliffs, New Jersey. 1986. [Hu86]Hunter, I. W. and M. J. Korenberg, "The identification of nonlinear biological systems: Wiener and Hammerstein cascade models," Biological Cybernetics, Vol. 55, pp. 135-144, 1986. [Ja70]Jazwinsky, A. H., Stochastic Processes and Filtering Theory, Academic Press, New York, 1970. [Ja77]Javed, A., B. A. Syrett, and P. A. Goud, "Intermodulation distortion analysis of reflection-type IMPATT amplifiers using Volterra series representation," IEEE Transactions on Microwave Theory and Techniques, Vol. MTT-25, No. 9, pp. 729-733, September 1977. [Ja84]Jacobsen, S. C., S. G. Meek, and R. R. Fullmer, "An adaptive myoelectric filter," 6th IEEE Conf. Eng. in Med. and Biol. Soc., 1984. [Ke85]Kenefic, R. J. and D. D. Weiner. "Application of the Volterra functional expansion in the detection of nonlinear functions of Gaussian processes," IEEE Transactions on Communications, Vol. COM-33, No. 3, pp. 276-279, March 1985. [Ki83]Kim, T. L. and J. K. Omura, "Error-rate estimates in digital communication over a nonlinear channel with memory." IEEE Transactions in Communications, Vol. COM-31, No. 3, pp. 407-412, March 1983. [Ko83a]Koh, T. and E. J. Powers, "An adaptive nonlinear digital filter with lattice orthogonalization," Proc. IEEE Int. Conf. Acoust., Speech. Signal Proc., Boston, Massachusetts, pp. 3740, April 1983. [Ko83b]Koh, T., E. J. Powers, and R. W. Miksad, "An approach to time-domain modeling of nonlinear drift oscillations in random seas," Proceedings of the International Symposium on Offshore Engineering, Rio de Janeiro, Brazil, 1983. [Ko84a]Korenberg, M. J., "Fast orthogonal identification of nonlinear difference equation and functional expansion models," Proc. 30th Midwest Symp. Circuits and Systems, pp. 570-575, 1984. [Ko84b]Korenberg, M. J., "Statistical identification of Volterra kernels of high-order systems," Proc. IEEE Int Symp. Circuits and Systems, pp. 570-575, 1984. [Ko85]Koh, T. and E. J. Powers, "Second-order Volterra filtering and its application to nonlinear system identification." IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-33, No. 6, pp. 1445-1455,December 1985. [Ko86]Korenberg, M. J. and I. W. Hunter, "The identification of nonlinear biological systems: LNL cascade models," Biological Cybernetics, Vol. 55, pp. 125-134, 1986. \|Ko88]Korenberg, M. J., "Identifying nonlinear difference equation and function expansion representations: The fast orthogonal algorithm," Annals of Biomedical Engineering, Vol. 16, pp. 123-142, 1988. [La81]Lawrence, P. J., "Estimation of the Volterra functional series of a nonlinear system using frequency-response data," IEE Proceedings, Vol. 128, Pt. D, No. 5, pp. 206-210, September 1981. [Le78]Lesiak, C. and A. Krener, ‘The existence and uniqueness ofVolterra series for nonlinear systems," IEEE Trans. Automatic Control Vol. AC-23, pp. 1090-1095, 1978. [Le85]Lee. Y. H. and S. A. Kassam, "Generalized median filtering and related nonlinear filtering technique," IEEE Trans. Acoust, Speech, Signal Proc., Vol. ASSP-33, No. 3, pp. 672-683, June 1985. [Le86]Lenk, P. J. and S. R. Parker, "Nonlinear modeling by discrete orthogonal lattice structure," Proc. IEEE Int. Symp. Circuits and Systems, 1986. \|Le91]Lee, J. and V. J. Mathews, "Afast recursive least-squares second-order Volterra filter and its performance analysis," submitted to IEEE Trans. Signal Proc.. January 1991. [Li84]Ling, F. and J. G. Proakis, "A generalized multichannel least squares lattice algorithm based on sequential processing stages," IEEE Trans. Acoust. Speech, Signal Proc., Vol. ASSP- 32, No. 2, pp. 381-390, April 1984. [Lj78]Ljung, L., M. Morf, and D. Falconer, "Fast calculation of gain matrices for recursive estimation," Int. J. Control Vol. 27, pp. 1-19, January 1978. [Lo88]Lou, Y., C. L. Nikias, and A. N. Venetsanopoulos, "Efficient VLSI array processing structures for adaptive quadratic digital filters," Circuits. Systems, Signal Process., Vol. 7, No. 2, 1988. [Lu75]Lucky, R. W., "Modulation and detection for data transmission on the telephone channel," In New Directions in Signal Processing in Communication and Control, J. K. Skwirzynski, ed., Leiden, Holland. Noordhoff, 1975. [Ma78]Marmarelis, P. Z. and V. Z. Marmarelis, Analysis of Physiological Systems, Plenum, New York, 1978. [Ma85]Maqusi, M., "Performance of baseband digital data transmission in nonlinear channels with memory, IEEE Transactions on Communications, Vol. COM-33, No. 7, pp. 715-719, July 1985. [Ma87a]Maragos, P. and R. W. Schafer, "Morphological filters, part I: Their set theoretic analysis and relations to linear shift invariant filters." IEEE Trans. Acoust, Speech, SignalProc., Vol. ASSP-35, No. 8, pp. 1153-1169, August 1987. [Ma87b]Maragos, P. and R. W. Schafer, "Morphological filters, part II: Their relation to median, order static and stack filters," IEEE Trans. Acoust., Speech, Signal Proc.. Vol. ASSP-35, No. 8, pp. 1170-1184, August 1987. [Ma87c]Mathews, V. J., "Improved convergence analysis of stochastic gradient adaptive filters using the sign algorithm," IEEE Transactions on Acoust, Speech. SignalProc., Vol. ASSP- 35, No. 4, pp. 450-454, April 1987. [Ma88]Mathews, V. J. and J. Lee, "A fast recursive least- squares second-order Volterra filter," Proceedings of IEEE Int. Conf. Acoust., Speech, Signal Proc., New York. pp. 1383-1386, April 1988. [Mo80]Mohler, R. R. and W. J. Kolodziej, "An overview ofbilinear system theory and applications," IEEE Trans. Systems, Man, and Cybernetics, Vol. SMC-10, pp. 683-688, October 1980. JULY 1991 IEEE SP MAGAZINE [Mo82]Moore, J. B., "Global convergence of output error recursions in colored noise," IEEE Trans. Automatic Control, Vol. AC-27, No. 6, pp. 1189-1199, December 1982. [Na67]Narayanan, S., ‘Transistor distortion analysis using Volterra series representation," Bel! Systems Technical Journal, Vol. 46, pp. 991-1204, May-June 1967. [Na70]Narayanan, S., "Application of Volterra series to intermodulation distortion of transistor feedback amplifiers," IEEE Transactions on Circuit Theory, Vol. CT-17, pp. 518-527, November 1970. [No82]Nodes, T. A. and N. C. Gallagher, "Median filters: Some modifications and their properties," IEEE Trans. Acoust., Speech, SignalProc.,Vol. ASSP-30, No. 5, pp. 739-746, October 1982. [Op68]Oppenheim, A. V,, R. W. Schafer, and T. G. Stockham, Jr "Nonlinear filtering of multiplied and convolved signals," Proceedings IEEE, Vol. 56, No. 8, pp. 1264-1291, August 1968. [Pa81 IParker, S. R. and F. A. Perry, "A discrete ARMA model for nonlinear system identification,' IEEE Trans. Circuits and Systems, Vol. CAS-28, No. 3, March 1981. [Pa88]Palmieri, F. and C. G. Boncelet Jr.. "A class of nonlinear adaptive filters," Proc. IEEE Int. Conf. Acoust, Speech, Signal Proc.. New York, pp. 1483-1486, April 1988. [Pi90a]Pitas, 1. and A. N. Venetsanopoulos. Nonlinear Digital Filters - Principles and Applications, Kluwer Academic Publishers, Boston, MA, 1990. [Pi90b]Pitas, I. and A. N. Venetsanopoulos, "LMS and RLS adaptive L-filters," Proc. IEEE Int. Conf. Acoust, Speech, Signal Proc., Albuquerque, NM, pp. 1389-1392, April 1990. [Pr75]Prochazka, A. and R. Neumann, "High-frequency distortion analysis of a semiconductor diode for CATV applications, IEEE Transactions on Consumer Electronics, Vol. CE-21, No. 2, pp. 120-129, May 1975. [Ra87]Ramponi, G. and G. L. Sicuranza, "Decision-directed nonlinear filters for image processing," Electronic Letters, Vol. 23, No. 23, pp. 1218-1219, November 5, 1987. [Re84]Reiss, W., "Nonlinear distortion analysis of P-I-N diode attenuators using Volterra series representation," IEEE Transactions on Circuits and Systems, Vol. CAS-31, No. 6, pp. 535-542, June 1984. [Ru81]Rugh, W. J., Nonlinear System Theory. The Volterra- Wiener Approach, Johns Hopkins University Press, Baltimore, Maryland, 1981. [Sa83a]Sandberg, I. W., "On Volterra expansions for time-varying nonlinear systems," IEEE Transactions on Circuits and Systems, Vol. CAS-30, No. 2, pp. 61-67, February 1983. [Sa83b]Sandberg, 1. W., "Series expansions for nonlinear systems," Circuits, Systems and Signal Processing, Vol. 2, No. 1, pp. 77-87, 1983. [Sa83c]Sandberg, 1. W., 'The mathematical foundations of associated expansions of mildly nonlinear systems, IEEE Trans. Circuits and Systems, Vol. CAS-30, pp. 441-455, July 1983. [Sc80]Schetzen, M., The Volterra and Wiener Theory of the Nonlinear Systems, Wiley and Sons, New York, 1980. [Sc81]Schetzen, M., "Nonlinear system modeling based on the Wiener theory," Proceedings IEEE, Vol. 69, No. 12, pp. 15571573, December 1981. [Sh89]Shynk, J. J., "Adaptive IIR filtering," IEEE ASSP Magazine, Vol. 6, No. 2, pp. 4-21, April 1989. [Si84]Sicuranza, G. L., A. Bucconi, and P. Mitri, "Adaptive echo cancellation with nonlinear digital filters," Proc. IEEE Int. Conf. Acoust., Speech, SigncdProc., San Diego, California, pp. 3.10.14, March 1984. [5186]Sicuranza, G. L. and G. Ramponi, Adaptive nonlinear digital filters using distributed arithmetic," IEEE Trans, on Acoust. Speech, and Signal Proc.. Vol. ASSP-34, No. 3, June 1986. [5187]Sicuranza, G. L. and G. Ramponi, "A variable-step adaptation algorithm for memoiy-oriented Volterra filters, IEEE Trans, on Acoust. Speech, and Signal Proc., Vol. ASSP-35, No. 10, October 1987. [Sm88]Smith, M. J., C. F. N. Cowan, and P. F. Adams, "Nonlinear echo cancellers based on transpose distributed arith metic," IEEE Transactions on Circuits and Systems, Vol. CAS-35, No. 1, pp. 6-18. January 1988. [St72]Stockham, T. G. Jr., "Image processing in the context of a visual model," Proceedings IEEE, Vol. 60, No. 7, pp. 828-842, July 1972. [St85]Stapleton, J. C. and S. C. Bass, "Adaptive noise cancellation for a class of nonlinear, dynamic reference channels,' IEEE Transactions on Circuits and Systems, Vol. CAS-32, No. 2, pp. 143-150, February 1985. [Su81]Subba Rao, T., "On the theory of bilinear time series models," J. Royal Statistical Society, Series B, Vol. 43, 1981. [Sy90]Syed, M. A. and V. J. Mathews, "Lattice and QR decomposition- based algorithms for recursive least squares adaptive nonlinear filters," Proc. IEEE Int. Symp. Circuits and Systems, New Orleans, Louisiana, May 1990. [Th71]Thomas,E. J., "Some considerations on the application of the Volterra representation of nonlinear networks to adaptive echo cancellers," Bell. Syst. Tech. J., Vol. 50, pp. 2979-2805, 1971. [Th84]Thapar, H. K. and B. J. Leon, 'Transform-domain and time-domain characterization of nonlinear systems with Vol- terra series," IEEE Transactions on Circuits and Systems, Vol. CAS-31, No. 10, pp. 906-912, October 1984. [Ts88]Tsai, T. C. and D. Anastassiou, "Nonlinear recursive filters and applications to image processing," Proc. ICASSP ‘88, pp. 828-831, New York, April 1988. [Wi58]Wiener, N., Nonlinear Problems in RandomTheory, Wiley and Sons, New York, 1958. [Za85]Zarzycki, J., Nonlinear Prediction Ladder Filters for Higher Order Stochastic Sequences, Springer-Verlag, Berlin, 1985. IEEE SP MAGAZINE JULY 1991
Reference URL	https://collections.lib.utah.edu/ark:/87278/s6pc3m3f