Hostname: page-component-7c8c6479df-27gpq Total loading time: 0 Render date: 2024-03-28T08:54:03.144Z Has data issue: false hasContentIssue false

Approximation theory of the MLP model in neural networks

Published online by Cambridge University Press:  07 November 2008

Allan Pinkus
Affiliation:
Department of Mathematics, Technion – Israel Institute of Technology, Haifa 32000, Israel E-mail: pinkus@tx.technion.ac.il

Abstract

In this survey we discuss various approximation-theoretic problems that arise in the multilayer feedforward perceptron (MLP) model in neural networks. The MLP model is one of the more popular and practical of the many neural network models. Mathematically it is also one of the simpler models. Nonetheless the mathematics of this model is not well understood, and many of these problems are approximation-theoretic in character. Most of the research we will discuss is of very recent vintage. We will report on what has been done and on various unanswered questions. We will not be presenting practical (algorithmic) methods. We will, however, be exploring the capabilities and limitations of this model.

Type
Research Article
Copyright
Copyright © Cambridge University Press 1999

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

Adams, R. A. (1975), Sobolev Spaces, Academic Press, New York.Google Scholar
Albertini, F., Sontag, E. D. and Maillot, V. (1993), ‘Uniqueness of weights for neural networks’, in Artificial Neural Networks for Speech and Vision (Mammone, R. J., ed.), Chapman and Hall, London, pp. 113125.Google Scholar
Attali, J.-G. and Pagès, G. (1997), ‘Approximations of functions by a multilayer perceptron: a new approach’, Neural Networks 10, 10691081.CrossRefGoogle Scholar
Barron, A. R. (1992), ‘Neural net approximation’, in Proc. Seventh Yale Workshop on Adaptive and Learning Systems, 1992 (Narendra, K. S., ed.), Yale University, New Haven, pp. 6972.Google Scholar
Barron, A. R. (1993), ‘Universal approximation bounds for superpositions of a sigmoidal function’, IEEE Trans. Inform. Theory 39, 930945.CrossRefGoogle Scholar
Barron, A. R. (1994), ‘Approximation and estimation bounds for artificial neural networks’, Machine Learning 14, 115133.CrossRefGoogle Scholar
Bartlett, P. L., Maiorov, V. and Meir, R. (1998), ‘Almost linear VC dimension bounds for piecewise polynomial networks’, Neural Computation 10, 21592173.CrossRefGoogle ScholarPubMed
Baum, E. B. (1988), ‘On the capabilities of multilayer perceptrons’, J. Complexity 4, 193215.CrossRefGoogle Scholar
Bishop, C. M. (1995), Neural Networks for Pattern Recognition, Oxford University Press, Oxford.CrossRefGoogle Scholar
Blum, E. K. and Li, L. K. (1991), ‘Approximation theory and feedforward networks’, Neural Networks 4, 511515.CrossRefGoogle Scholar
Buhmann, M. D. and Pinkus, A. (1999), ‘Identifying linear combinations of ridge functions’, Adv. Appl. Math. 22, 103118.CrossRefGoogle Scholar
Burton, R. M. and Dehling, H. G. (1998), ‘Universal approximation in p-mean by neural networks’, Neural Networks 11, 661667.Google Scholar
Cardaliaguet, P. and Euvrard, G. (1992), ‘Approximation of a function and its derivatives with a neural network’, Neural Networks 5, 207220.CrossRefGoogle Scholar
Carroll, S. M. and Dickinson, B. W. (1989), ‘Construction of neural nets using the Radon transform’, in Proceedings of the IEEE 1989 International Joint Conference on Neural Networks, Vol. 1, IEEE, New York, pp. 607611.Google Scholar
Chen, T. and Chen, H. (1993), ‘Approximations of continuous functionals by neural networks with application to dynamic systems’, IEEE Trans. Neural Networks 4, 910918.CrossRefGoogle ScholarPubMed
Chen, T. and Chen, H. (1995), ‘Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems’, IEEE Trans. Neural Networks 6, 911917.CrossRefGoogle ScholarPubMed
Chen, T., Chen, H. and Liu, R. (1995), ‘Approximation capability in C(ℝn) by multilayer feedforward networks and related problems’, IEEE Trans. Neural Networks 6, 2530.CrossRefGoogle Scholar
Chen, X. and White, H. (1999), ‘Improved rates and asymptotic normality for non-parametric neural network estimators’, preprint.Google Scholar
Choi, C. H. and Choi, J. Y. (1994), ‘Constructive neural networks with piecewise interpolation capabilities for function approximations’, IEEE Trans. Neural Networks 5, 936944.Google Scholar
Chui, C. K. and Li, X. (1992), ‘Approximation by ridge functions and neural networks with one hidden layer’, J. Approx. Theory 70, 131141.CrossRefGoogle Scholar
Chui, C. K. and Li, X. (1993), ‘Realization of neural networks with one hidden layer’, in Multivariate Approximations: From CAGD to Wavelets (Jetter, K. and Utreras, F., eds), World Scientific, Singapore, pp. 7789.CrossRefGoogle Scholar
Chui, C. K., Li, X. and Mhaskar, H. N. (1994), ‘Neural networks for localized approximation’, Math. Comp. 63, 607623.CrossRefGoogle Scholar
Chui, C. K., Li, X. and Mhaskar, H. N. (1996), ‘Limitations of the approximation capabilities of neural networks with one hidden layer’, Adv. Comput. Math. 5, 233243.CrossRefGoogle Scholar
Corominas, E. and Balaguer, F. Sunyer (1954), ‘Condiciones para que una foncion infinitamente derivable sea un polinomo’, Rev. Mat. Hisp. Amer. 14, 2643.Google Scholar
Cotter, N. E. (1990), ‘The Stone–Weierstrass theorem and its application to neural networks’, IEEE Trans. Neural Networks 1, 290295.CrossRefGoogle ScholarPubMed
Cybenko, G. (1989), ‘Approximation by superpositions of a sigmoidal function’, Math. Control, Signals, and Systems 2, 303314.Google Scholar
DeVore, R. A., Howard, R. and Micchelli, C. (1989), ‘Optimal nonlinear approximation’, Manuscripta Math. 63, 469478.CrossRefGoogle Scholar
DeVore, R. A., Oskolkov, K. I. and Petrushev, P. P. (1997), ‘Approximation by feedforward neural networks’, Ann. Numer. Math. 4, 261287.Google Scholar
Devroye, L., Gyorfi, L. and Lugosi, G. (1996), A Probabilistic Theory of Pattern Recognition, Springer, New York.CrossRefGoogle Scholar
Donahue, M. J., Gurvits, L., Darken, C. and Sontag, E. (1997), ‘Rates of convex approximation in non-Hilbert spaces’, Const. Approx. 13, 187220.CrossRefGoogle Scholar
Donoghue, W. F. (1969), Distributions and Fourier Transforms, Academic Press, New York.Google Scholar
Draelos, T. and Hush, D. (1996), ‘A constructive neural network algorithm for function approximation’, in Proceedings of the IEEE 1996 International Conference on Neural Networks, Vol. 1, IEEE, New York, pp. 5055.Google Scholar
Edwards, R. E. (1965), Functional Analysis, Theory and Applications, Holt, Rine-hart and Winston, New York.Google Scholar
Ellacott, S. W. (1994), ‘Aspects of the numerical analysis of neural networks’, in Vol. 3 of Acta Numerica, Cambridge University Press, pp. 145202.Google Scholar
Ellacott, S. W. and Bos, D. (1996), Neural Networks: Deterministic Methods of Analysis, International Thomson Computer Press, London.Google Scholar
Fefferman, C. (1994), ‘Reconstructing a neural net from its output’, Revista Mat. Iberoamer. 10, 507555.CrossRefGoogle Scholar
Finan, R. A., Sapeluk, A. T. and Damper, R. I. (1996), ‘Comparison of multilayer and radial basis function neural networks for text-dependent speaker recognition’, in Proceedings of the IEEE 1996 International Conference on Neural Networks, Vol. 4, IEEE, New York, pp. 19921997.Google Scholar
Frisch, H. L., Borzi, C., Ord, D., Percus, J. K. and Williams, G. O. (1989), ‘Approximate representation of functions of several variables in terms of functions of one variable’, Phys. Review Letters 63, 927929.CrossRefGoogle ScholarPubMed
Funahashi, K. (1989), ‘On the approximate realization of continuous mappings by neural networks’, Neural Networks 2, 183192.Google Scholar
Gallant, A. R. and White, H. (1988), ‘There exists a neural network that does not make avoidable mistakes’, in Proceedings of the IEEE 1988 International Conference on Neural Networks, Vol. 1, IEEE, New York, pp. 657664.Google Scholar
Gallant, A. R. and White, H. (1992), ‘On learning the derivatives of an unknown mapping with multilayer feedforward networks’, Neural Networks 5, 129138.CrossRefGoogle Scholar
Geva, S. and Sitte, J. (1992), ‘A constructive method for multivariate function approximation by multilayer perceptrons’, IEEE Trans. Neural Networks 3, 621624.CrossRefGoogle ScholarPubMed
Girosi, F. and Poggio, T. (1989), ‘Representation properties of networks: Kolmogorov's theorem is irrelevant’, Neural Computation 1, 465469.CrossRefGoogle Scholar
Girosi, F. and Poggio, T. (1990), ‘Networks and the best approximation property’, Biol. Cybern. 63, 169176.Google Scholar
Gori, M., Scarselli, F. and Tsoi, A. C. (1996), ‘Which classes of functions can a given multilayer perceptron approximate?’, in Proceedings of the IEEE 1996 International Conference on Neural Networks, Vol. 41, IEEE, New York, pp. 22262231.Google Scholar
Haykin, S. (1994), Neural Networks, MacMillan, New York.Google Scholar
Hecht-Nielsen, R. (1987), ‘Kolmogorov's mapping neural network existence theorem’, in Proceedings of the IEEE 1987 International Conference on Neural Networks, Vol. 3, IEEE, New York, pp. 1114.Google Scholar
Hecht-Nielsen, R. (1989), ‘Theory of the backpropagation neural network’, in Proceedings of the IEEE 1989 International Joint Conference on Neural Networks, Vol. 1, IEEE, New York, pp. 593605.Google Scholar
Hornik, K. (1991), ‘Approximation capabilities of multilayer feedforward networks’, Neural Networks 4, 251257.CrossRefGoogle Scholar
Hornik, K. (1993), ‘Some new results on neural network approximation’, Neural Networks 6, 10691072.CrossRefGoogle Scholar
Hornik, K., Stinchcombe, M. and White, H. (1989), ‘Multilayer feedforward networks are universal approximators’, Neural Networks 2, 359366.CrossRefGoogle Scholar
Hornik, K., Stinchcombe, M. and White, H. (1990), ‘Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks’, Neural Networks 3, 551560.CrossRefGoogle Scholar
Hornik, K., Stinchcombe, M., White, H. and Auer, P. (1994), ‘Degree of approximation results for feedforward networks approximating unknown mappings and their derivatives’, Neural Computation 6, 12621275.CrossRefGoogle Scholar
Huang, G. B. and Babri, H. A. (1998), ‘Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions’, IEEE Trans. Neural Networks 9, 224229.CrossRefGoogle ScholarPubMed
Huang, S. C. and Huang, Y. F. (1991), ‘Bounds on the number of hidden neurons in multilayer perceptrons’, IEEE Trans. Neural Networks 2, 4755.CrossRefGoogle ScholarPubMed
Irie, B. and Miyake, S. (1988), ‘Capability of three-layered perceptrons’, in Proceedings of the IEEE 1988 International Conference on Neural Networks, Vol. 1, IEEE, New York, pp. 641648.Google Scholar
Itô, Y. (1991a), ‘Representation of functions by superpositions of a step or a sigmoid function and their applications to neural network theory’, Neural Networks 4, 385394.CrossRefGoogle Scholar
Itô, Y. (1991b), ‘Approximation of functions on a compact set by finite sums of a sigmoid function without scaling’, Neural Networks 4, 817826.CrossRefGoogle Scholar
Itô, Y. (1992), ‘Approximation of continuous functions on ℝd by linear combinations of shifted rotations of a sigmoid function with and without scaling’, Neural Networks 5, 105115.CrossRefGoogle Scholar
Itô, Y. (1993), ‘Approximations of differentiable functions and their derivatives on compact sets by neural networks’, Math. Scient. 18, 1119.Google Scholar
Itô, Y. (1994 a), ‘Approximation capabilities of layered neural networks with sigmoidal units on two layers’, Neural Computation 6, 12331243.Google Scholar
Itô, Y. (1994b), ‘Differentiable approximation by means of the Radon transformation and its applications to neural networks’, J. Comput. Appl. Math. 55, 3150.CrossRefGoogle Scholar
Itô, Y. (1996), ‘Nonlinearity creates linear independence’, Adv. Comput. Math, 5, 189203.CrossRefGoogle Scholar
Itô, Y. and Saito, K. (1996), ‘Superposition of linearly independent functions and finite mappings by neural networks’, Math. Scient. 21, 2733.Google Scholar
Jones, L. K. (1990), ‘Constructive approximations for neural networks by sigmoidal functions’, Proc. IEEE 78, 15861589. Correction and addition, Proc. IEEE (1991) 79, 243.CrossRefGoogle Scholar
Jones, L. K. (1992), ‘A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training’, Ann. Stat. 20, 608613.Google Scholar
Jones, L. K. (1994), ‘Good weights and hyperbolic kernels for neural networks, projection pursuit, and pattern classification: Fourier strategies for extracting information from high-dimensional data’, IEEE Trans. Inform. Theory 40, 439454.CrossRefGoogle Scholar
Jones, L. K. (1997), ‘The computational intractability of training sigmoidal neural networks’, IEEE Trans. Inform. Theory 43, 167173.CrossRefGoogle Scholar
Jones, L. K. (1999), ‘Local greedy approximation for nonlinear regression and neural network training’, preprint.CrossRefGoogle Scholar
Kahane, J. P. (1959), Lectures on Mean Periodic Functions, Tata Institute, Bombay.Google Scholar
Kainen, P. C., Kurkova, V. and Vogt, A. (1999), ‘Approximation by neural networks is not continuous’, preprint.CrossRefGoogle Scholar
Katsuura, H. and Sprecher, D. A. (1994), ‘Computational aspects of Kolmogorov's superposition theorem’, Neural Networks 7, 455461.CrossRefGoogle Scholar
Kreinovich, V. Y. (1991), ‘Arbitrary nonlinearity is sufficient to represent all functions by neural networks: a theorem’, Neural Networks 4, 381383.CrossRefGoogle Scholar
Kurkova, V. (1991), ‘Kolmogorov's theorem is relevant’, Neural Computation 3, 617622.Google Scholar
Kurkova, V. (1992), ‘Kolmogorov's theorem and multilayer neural networks’, Neural Networks 5, 501506.Google Scholar
Kurkova, V. (1995 a), ‘Approximation of functions by perceptron networks with bounded number of hidden units’, Neural Networks 8, 745750.CrossRefGoogle Scholar
Kurkova, V. (1995b), ‘Kolmogorov's theorem’, in The Handbook of Brain Theory and Neural Networks, (Arbib, M., ed.), MIT Press, Cambridge, pp. 501502.Google Scholar
Kurkova, V. (1996), ‘Trade-off between the size of weights and the number of hidden units in feedforward networks’, Neural Network World 2, 191200.Google Scholar
Kurkova, V. and Kainen, P. C. (1994), ‘Functionally equivalent feedforward neural networks’, Neural Computation 6, 543558.CrossRefGoogle Scholar
Kurkova, V., Kainen, P. C. and Kreinovich, V. (1997), ‘Estimates of the number of hidden units and variation with respect to half-spaces’, Neural Networks 10, 10611068.CrossRefGoogle Scholar
Lapedes, A. and Farber, R. (1988), ‘How neural nets work’, in Neural Information Processing Systems (Anderson, D. Z., ed.), American Institute of Physics, New York, pp. 442456.Google Scholar
Leshno, M., Lin, V. Ya., Pinkus, A. and Schocken, S. (1993), ‘Multilayer feedforward networks with a non-polynomial activation function can approximate any function’, Neural Networks 6, 861867.CrossRefGoogle Scholar
Li, X. (1996), ‘Simultaneous approximations of multivariate functions and their derivatives by neural networks with one hidden layer’, Neurocomputing 12, 327343.CrossRefGoogle Scholar
Light, W. A. (1993), ‘Ridge functions, sigmoidal functions and neural networks’, in Approximation Theory VII (Cheney, E. W., Chui, C. K. and Schumaker, L. L., eds), Academic Press, New York, pp. 163206.Google Scholar
Lin, J. N. and Unbehauen, R. (1993), ‘On realization of a Kolmogorov network’, Neural Computation 5, 1820.CrossRefGoogle Scholar
Lin, V. Ya. and Pinkus, A. (1993), ‘Fundamentality of ridge functions’, J. Approx. Theory 75, 295311.CrossRefGoogle Scholar
Lin, V. Ya. and Pinkus, A. (1994), ‘Approximation of multivariate functions’, in Advances in Computational Mathematics: New Delhi, India, (Dikshit, H. P. and Micchelli, C. A., eds), World Scientific, Singapore, pp. 257265.Google Scholar
Lippman, R. P. (1987), ‘An introduction to computing with neural nets’, IEEE Magazine 4, 422.CrossRefGoogle Scholar
Lorentz, G. G., von Golitschek, M. and Makovoz, Y. (1996), Constructive Approximation: Advanced Problems, Vol. 304 of Grundlehren, Springer, Berlin.CrossRefGoogle Scholar
Maiorov, V. E. (1999), ‘On best approximation by ridge functions’, to appear in J. Approx. TheoryCrossRefGoogle Scholar
Maiorov, V. E. and Meir, R. (1999), ‘On the near optimality of the stochastic approximation of smooth functions by neural networks’, to appear in Adv. Comput. Math.Google Scholar
Maiorov, V., Meir, R. and Ratsaby, J. (1999), ‘On the approximation of functional classes equipped with a uniform measure using ridge functions’, to appear in J. Approx. Theory.CrossRefGoogle Scholar
Maiorov, V. and Pinkus, A. (1999), ‘Lower bounds for approximation by MLP neural networks’, Neurocomputing 25, 8191.CrossRefGoogle Scholar
Makovoz, Y. (1996), ‘Random approximants and neural networks’, J. Approx. Theory 85, 98109.CrossRefGoogle Scholar
Makovoz, Y. (1998), ‘Uniform approximation by neural networks’, J. Approx. Theory 95, 215228.CrossRefGoogle Scholar
Meltser, M., Shoham, M. and Manevitz, L. M. (1996), ‘Approximating functions by neural networks: a constructive solution in the uniform norm’, Neural Networks 9, 965978.CrossRefGoogle Scholar
Mhaskar, H. N. (1993), ‘Approximation properties of a multilayered feedforward artificial neural network’, Adv. Comput. Math. 1, 6180.CrossRefGoogle Scholar
Mhaskar, H. N. (1994), ‘Approximation of real functions using neural networks’, in Advances in Computational Mathematics: New Delhi, India, (Dikshit, H. P. and Micchelli, C. A., eds), World Scientific, Singapore, pp. 267278.Google Scholar
Mhaskar, H. N. (1996), ‘Neural networks for optimal approximation of smooth and analytic functions’, Neural Computation 8, 164177.CrossRefGoogle Scholar
Mhaskar, H. N. and Hahm, N. (1997), ‘Neural networks for functional approximation and system identification’, Neural Computation 9, 143159.Google Scholar
Mhaskar, H. N. and Micchelli, C. A. (1992), ‘Approximation by superposition of a sigmoidal function and radial basis functions’, Adv. Appl. Math. 13, 350373.Google Scholar
Mhaskar, H. N. and Micchelli, C. A. (1993), ‘How to choose an activation function’, in Vol. 6 of Neural Information Processing Systems (Cowan, J. D., Tesauro, G. and Alspector, J., eds), Morgan Kaufman, San Francisco, pp. 319326.Google Scholar
Mhaskar, H. N. and Micchelli, C. A. (1994), ‘Dimension-independent bounds on the degree of approximation by neural networks’, IBM J. Research Development 38, 277284.Google Scholar
Mhaskar, H. N. and Micchelli, C. A. (1995), ‘Degree of approximation by neural and translation networks with a single hidden layer’, Adv. Appl. Math. 16, 151183.Google Scholar
Mhaskar, H. N. and Prestin, J. (1999), ‘On a choice of sampling nodes for optimal approximation of smooth functions by generalized translation networks’, to appear in Proceedings of International Conference on Artificial Neural Networks, Cambridge, England.Google Scholar
Nees, M. (1994), ‘Approximative versions of Kolmogorov's superposition theorem, proved constructively’, J. Comput. Appl. Anal. 54, 239250.CrossRefGoogle Scholar
Nees, M. (1996), ‘Chebyshev approximation by discrete superposition: Application to neural networks’, Adv. Comput. Math. 5, 137151.CrossRefGoogle Scholar
Oskolkov, K. I. (1997), ‘Ridge approximation, Chebyshev-Fourier analysis and optimal quadrature formulas’, Tr. Mat. Inst. Steklova 219 Teor. Priblizh. Garmon. Anal., 269285.Google Scholar
Petrushev, P. P. (1998), ‘Approximation by ridge functions and neural networks’, SIAM J. Math. Anal. 30, 155189.CrossRefGoogle Scholar
Pinkus, A. (1995), ‘Some density problems in multivariate approximation’, in Approximation Theory: Proceedings of the International Dortmund Meeting IDoMAT 95, (Müller, M. W., Felten, M. and Mache, D. H., eds), Akademie Verlag, Berlin, pp. 277284.Google Scholar
Pinkus, A. (1996), ‘TDI-Subspaces of C(ℝd) and some density problems from neural networks’, J. Approx. Theory 85, 269287.Google Scholar
Pinkus, A. (1997), ‘Approximating by ridge functions’, in Surface Fitting and Multiresolution Methods, (Méhauté, A. Le, Rabut, C. and Schumaker, L. L., eds), Vanderbilt University Press, Nashville, pp. 279292.Google Scholar
Pisier, G. (1981), ‘Remarques sur un resultat non publié de B. Maurey’, in Seminaire D'Analyse Fonctionnelle, 1980–1981, École Polytechnique, Centre de Mathématiques, Palaiseau, France.Google Scholar
Ripley, B. D. (1994), ‘Neural networks and related methods for classification’, J. Royal Statist. Soc., B 56, 409456.Google Scholar
Ripley, B. D. (1996), Pattern Recognition and Neural Networks, Cambridge University Press, Cambridge.CrossRefGoogle Scholar
Royden, H. L. (1963), Real Analysis, MacMillan, New York.Google Scholar
Sarle, W. S. (1998), editor of Neural Network, FAQ, parts 1 to 7, Usenet newsgroup comp.ai.neural-nets, ftp://ftp.sas.com/pub/neural/FAQ.htmlGoogle Scholar
Sartori, M.A. and Antsaklis, P. J. (1991), ‘A, simple method to derive bounds on the size and to train multilayer neural networks’, IEEE Trans. Neural Networks 2, 467471.CrossRefGoogle ScholarPubMed
Scarselli, F. and Tsoi, A. C. (1998), ‘Universal approximation using feedforward neural networks: a survey of some existing methods, and some new results’, Neural Networks 11, 1537.Google Scholar
Schwartz, L. (1944), ‘Sur certaines familles non fondamentales de fonctions continues’, Bull. Soc. Math. France 72, 141145.Google Scholar
Schwartz, L. (1947), ‘Théorie générale des fonctions moyenne-périodiques’, Ann. Math. 48, 857928.CrossRefGoogle Scholar
Siu, K. Y., Roychowdhury, V. P. and Kailath, T. (1994), ‘Rational approximation techniques for analysis of neural networks’, IEEE Trans. Inform. Theory 40, 455–46.Google Scholar
Sontag, E. D. (1992), ‘Feedforward nets for interpolation and classification’, J. Comput. System Sci. 45, 2048.CrossRefGoogle Scholar
Sprecher, D. A. (1993), ‘A universal mapping for Kolmogorov's superposition theorem’, Neural Networks 6, 10891094.Google Scholar
Sprecher, D. A. (1997), ‘A numerical implementation of Kolmogorov's superpositions II’, Neural Networks 10, 447457.CrossRefGoogle Scholar
Stinchcombe, M. (1995), ‘Precision and approximate flatness in artificial neural networks’, Neural Computation 7, 10211039.Google Scholar
Stinchcombe, M. and White, H. (1989), ‘Universal approximation using feedforward networks with non-sigmoid hidden layer activation functions’, in Proceedings of the IEEE 1989 International Joint Conference on Neural Networks, Vol. 1, IEEE, New York, pp. 613618.Google Scholar
Stinchcombe, M. and White, H. (1990), ‘Approximating and learning unknown mappings using multilayer feedforward networks with bounded weights’, in Proceedings of the IEEE 1990 International Joint Conference on Neural Networks, Vol. 3, IEEE, New York, pp. 716.Google Scholar
Sumpter, B. G., Getino, C. and Noid, D. W. (1994), ‘Theory and applications of neural computing in chemical science’, Annual Rev. Phys. Chem. 45, 439481.Google Scholar
Sussmann, H. J. (1992), ‘Uniqueness of the weights for minimal feedforward nets with a given input-output map’, Neural Networks 5, 589593.CrossRefGoogle Scholar
Takahashi, Y. (1993), ‘Generalization and approximation capabilities of multilayer networks’, Neural Computation 5, 132139.Google Scholar
de Villiers, J. and Barnard, E. (1992), ‘Backpropagation neural nets with one and two hidden layers’, IEEE Trans. Neural Networks 4, 136141.CrossRefGoogle Scholar
Vostrecov, B. A. and Kreines, M. A. (1961), ‘Approximation of continuous functions by superpositions of plane waves’, Dokl. Akad. Nauk SSSR 140, 12371240 = Soviet Math. Dokl. 2, 1326–1329.Google Scholar
Wang, Z., Tham, M. T. and Morris, A. J. (1992), ‘Multilayer feedforward neural networks: a canonical form approximation of nonlinearity’, Internat. J. Control 56, 655672.Google Scholar
Watanabe, S. (1996), ‘Solvable models of layered neural networks based on their differential structure’, Adv. Comput. Math. 5, 205231.CrossRefGoogle Scholar
Williamson, R. C. and Helmke, U. (1995), ‘Existence and uniqueness results for neural network approximations’, IEEE Trans. Neural Networks 6, 213.CrossRefGoogle ScholarPubMed
Wray, J. and Green, G. G. (1995), ‘Neural networks, approximation theory and finite precision computation’, Neural Networks 8, 3137.CrossRefGoogle Scholar
Xu, Y., Light, W. A. and Cheney, E. W. (1993), ‘Constructive methods of approximation by ridge functions and radial functions’, Numerical Alg. 4, 205223.Google Scholar
Yukich, J. E., Stinchcombe, M. B. and White, H. (1995), ‘Sup-norm approximation bounds for networks through probabilistic methods’, IEEE Trans. Inform. Theory 41, 10211027.Google Scholar