## Neural networks

- Shultz, T. R., Nobandegani, A. S., & Fahlman, S. E. (2022). Cascade-Correlation. In D. Phung, G. Webb, C. Sammut (Eds.).
*Encyclopedia of Machine Learning and Data Science*. New York, NY: Springer. - A review of algorithms in the Cascade-Correlation family and some of their applications. pdf - Nobandegani, A. S., & Shultz, T. R. (18 January 2017). Converting cascade-correlation neural nets into probabilistic generative models. - Humans are not only adept in recognizing what class an input instance belongs to (i.e., classification task), but perhaps more remarkably, they can imagine (i.e., generate) plausible instances of a desired class with ease, when prompted. Inspired by this, we propose a framework which allows transforming Cascade-Correlation Neural Networks (CCNNs) into probabilistic generative models, thereby enabling CCNNs to generate samples from a category of interest. CCNNs are a well-known class of deterministic, discriminative NNs, which autonomously construct their topology, and have been successful in giving accounts for a variety of psychological phenomena. Our proposed framework is based on a Markov Chain Monte Carlo (MCMC) method, called the Metropolis-adjusted Langevin algorithm, which capitalizes on the gradient information of the target distribution to direct its explorations towards regions of high probability, thereby achieving good mixing properties. Through extensive simulations, we demonstrate the efficacy of our proposed framework. pdf
- Kharratzadeh, M., & Shultz, T. R. (2016). Neural implementation of probabilistic models of cognition.
*Cognitive Systems Research, 40*, 99-113. - Understanding the neural mechanisms underlying probabilistic models remains important because these models provide a computational framework, rather than specifying mechanistic processes. Here, we propose a deterministic neural-network model that estimates and represents probability distributions from observable events, via probability matching. Our model learns to represent these probabilities from the occurrence patterns of individual events. This neural implementation of probability matching is paired with a neural module applying Bayes’ rule, forming a comprehensive neural scheme to simulate Bayesian learning and inference. The model also provides novel explanations of base-rate neglect, a notable deviation from Bayes. pdf - Kharratzadeh, M., & Shultz, T. (December, 2015). Probability matching via deterministic neural networks. NIPS 2015 Workshop on Cognitive computation: Integrating neural and symbolic approaches. Montreal, QC, Canada. - We propose a constructive neural-network model comprised of deterministic units which estimates and represents probability distributions from observable events. The probability distributions are learned from positive and negative reinforcements of the inputs. Our model is psychologically plausible because, like humans, it learns to represent probabilities without receiving any summary representation of them. We discuss how the estimated probabilities can be used in a setting with deterministic units to produce matching behavior in choice. Our work is a step towards understanding the neural mechanisms underlying probability matching behavior by specifying processes at the algorithmic level.
- Kharratzadeh, M., & Shultz, T. R. (2015). Neural implementation of probabilistic models of cognition. - We propose a constructive neural-network model to estimate and represent probability distributions from observable events. Probability distributions are learned from positive and negative reinforcements of inputs as individual events are experienced. This is paired with a second neural module applying Bayes' rule, thus forming a comprehensive neural scheme to simulate Bayesian learning and inference. The model provides novel explanations of some deviations from Bayes, including base-rate neglect and overweighting of rare events. pdf
- Kharratzadeh, M. & Shultz, T. R. (2013). Neural-network modelling of Bayesian learning and inference. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.),
*Proceedings of the 35th Annual Conference of the Cognitive Science Society*(pp. 2686-2691). Austin, TX: Cognitive Science Society. - We propose a modular neural-network structure for implementing the Bayesian framework for learning and inference. Our design has three main components, two for computing the priors and likelihoods based on observations and one for applying Bayes’ rule. We show that deterministic neural networks can learn a wide variety of probability distributions when equipped with a learning cessation mechanism. Through comprehensive simulations we show that our proposed model succeeds in implementing Bayesian learning and inference. We also provide a novel explanation of base-rate neglect, the most well-documented deviation from Bayes’ rule, by modelling it as a weight decay mechanism which increases entropy. pdf - Shultz, T. R., & Fahlman, S. E. (2010). Cascade-Correlation. In C. Sammut G. I. Webb (Eds.),
*Encyclopedia of Machine Learning*, Part 4/C, 139-147. Heidelberg, Germany: Springer-Verlag. - A review of algorithms in the Cascade-Correlation family. - Dandurand, F., Berthiaume, V., & Shultz, T. R. (2007). A systematic comparison of flat and standard cascade-correlation using a student-teacher network approximation task.
*Connection Science*,*19*, 223-244. - Student-teacher network approximation tasks were used to investigate the ability of flat and standard CC networks to learn the input-output mapping of other, randomly initialized flat and standard CC networks. For low-complexity approximation tasks, there was no significant performance difference between flat and standard student networks. Both standard and flat CC generalized well on problems of varying complexity. On high-complexity tasks, flat CC networks had fewer connection weights and learned with less computational cost than standard networks did. - Shultz, T. R., Rivest, F., Egri, L., Thivierge, J-P., & Dandurand, F. (2007). Could knowledge-based neural learning be useful in developmental robotics? The case of KBCC.
*International Journal of Humanoid Robotics, 4*, 245–279. - A review of experiments with KBCC indicates that recruitment of relevant existing knowledge typically speeds learning and sometimes enables learning of otherwise impossible problems. Some additional domains of interest to developmental robotics are identified in which knowledge-based learning seems essential. - Dandurand, F., Shultz, T. R., & Rivest, F. (2007). Complex problem solving with reinforcement learning. In
*Proceedings of the 6th IEEE International Conference on Development and Learning*(ICDL-2007), pp. 157-162. IEEE. - We simulate complex problem-solving task of finding which ball in a set is lighter or heavier than others with a limited number of weighing opportunities. We use a SARSA-based Softmax learning algorithm where the reward function is learned with cascade-correlation neural networks. Humans may use means-ends analysis to self-generate rewards in such sequential problems. pdf - Egri, L., & Shultz, T. R. (2006). A compositional neural-network solution to prime-number testing.
*Proceedings of the Twenty-eighth Annual Conference of the Cognitive Science Society*(pp. 1263-1268). Mahwah, NJ: Erlbaum. - KBCC networks create a compositional representation of the prime-number concept and use this representation to decide whether its input is a prime number or not. pdf - Shultz, T. R., Rivest, F., Egri, L., & Thivierge, J. P. (2006). Knowledge-based learning with KBCC.
*Proceedings of the Fifth International Conference on Development and Learning ICDL 2006*. Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN. - Preliminary conference version of Shultz et al. (2007). pdf - Rivest, F., & Shultz, T. R. (2005). Learning with both adequate computational power and biological realism.
*Proceedings of the 2005 Canadian Artificial Intelligence Conference: Workshop on Correlation Learning*(pp. 15-23). University of Victoria, Victoria, BC. - We show mathematically that the computationally-powerful learning rules used in the CC family of algorithms can be rewritten in a form that is a small extension of the biologically realistic Hebb rule. - Rivest, F., & Shultz, T. R. (2005). Knowledge-based cascade-correlation: A review. Inductive transfer: 10 years later.
*NIPS 2005 Workshop*. - A brief review of research on KBCC, which allows the recruitment of existing knowledge in new learning. pdf - Rivest, F., & Shultz, T. R. (2004). Compositionality in a knowledge-based constructive learner.
*Papers from the 2004 AAAI Fall Symposium, Technical Report FS-04-03*, pp. 54-58. AAAI Press: Menlo Park, CA. - KBCC simplifies and accelerates the learning of parity and chessboard problems. Previously learned knowledge of simpler versions of these problems is recruited in the service of learning more complex versions. A learned solution can be viewed as a composition in which the components are not altered, showing that concatenative compositionality can be achieved in neural systems. pdf - Thivierge, J. P., Dandurand, F., & Shultz, T. R. (2004). Transferring domain rules in a constructive network: Introducing RBCC.
*Proceedings of the IEEE International Joint Conference on Neural Networks*, 1403-1409. - In rule-based CC (RBCC), symbolic rules are converted into neural networks which can then be recruited as in KBCC. pdf - Thivierge, J. P., & Shultz, T. R. (2003). Information networks with modular experts. In M.H. Hamza (Ed.),
*IASTED Artificial Intelligence and Applications*(pp. 753-758). Zurich. - Information networks learn by adjusting the amount of information fed to the hidden units. We extend this to manipulate the amount of information fed to modular network experts. Competition among them is obtained by information maximization; collaboration by constrained information maximization. pdf - Shultz, T. R., & Rivest, F. (2003). Knowledge-based cascade-correlation: Varying the size and shape of relevant prior knowledge. In H. Yanai, A. Okada, K. Shigemasu, Y. Kano, & J. J. Meulman (Eds.),
*New developments in psychometrics*(pp. 631-638). Tokyo: Springer-Verlag. - Artificial neural networks typically ignore the role of knowledge in learning by starting from random connection weights. Our new algorithm, knowledge-based cascade-correlation (KBCC), finds, adapts, and uses its relevant knowledge, thereby speeding learning. The more relevant the prior knowledge, the more likely that KBCC recruits it for solution of the target problem. pdf - Takane, Y., Oshima-Takane, Y., & Shultz, T. R (2003). Neural network simulations by cascade correlation and knowledge-based cascade correlation networks. In T. Higuchi, Y. Iba, & M. Ishiguro (Eds.),
*Proceedings of Science of Modeling: The 30th Anniversary Meeting of the Information Criterion (AIC),*(pp. 245-254). - Report on Research and Education 17. Tokyo: The Institute of Statistical Mathematics. CC and KBCC networks are compared on learning the semantics of personal pronouns. pdf - Thivierge, J. P., Rivest, F., & Shultz, T. R. (2003). A dual-phase technique for pruning constructive networks.
*Proceedings of the IEEE International Joint Conference on Neural Networks 2003*(pp. 559-564). - Removing unimportant connection weights in both input and output phases of CC reduces network size while speeding learning and improving generalization. pdf - Thivierge, J. P., & Shultz, T. R. (2002). Finding relevant knowledge: KBCC applied to splice-junction determination.
*IEEE International Joint Conference on Neural Networks 2002*(pp. 1401-1405). - With biological rules as prior knowledge, KBCC performs splice-junction determination by recruiting relevant rules. pdf - Rivest, F., & Shultz, T. R. (2002). Application of knowledge-based cascade-correlation to vowel recognition.
*IEEE International Joint Conference on Neural Networks 2002*(pp. 53-58). - KBCC adapts existing related knowledge in learning to recognize vowels, speeding learning without losing accuracy. pdf - Shultz, T. R., & Rivest, F. (2001). Knowledge-based cascade-correlation: Using knowledge to speed learning.
*Connection Science, 13*, 43-72. - KBCC recruits previously learned sub-networks as well as single hidden units. It finds, adapts, and uses its relevant knowledge to significantly speed learning. pdf - Kamimura, R., Kamimura, T., & Shultz, T. R. (2001). Structural information control for flexible competitive learning. In V. Kurdova, N. C. Steele, R. Neruda, & M. Karny (Eds.),
*Artificial neural nets and genetic algorithms: Proceedings of the International Conference in Prague, Czech Republic, 2001*(pp. 90-93). Heidelberg, Germany: Springer-Verlag. - A new information-theoretic method called structural information overcomes fundamental problems in competitive learning such as dead neurons and deciding on the appropriate number of neurons in the competitive layer. - Kamimura, R., Kamimura, T., & Shultz, T. R. (2001). Information theoretic competitive learning and linguistic rule acquisition.
*Transactions of the Japanese Society for Artificial Intelligence, 16*, 287-298. - A new unsupervised information-theoretic method for competitive learning discovers linguistic rules more explicitly than the traditional competitive method. pdf - Kamimura, T., Kamimura, R., & Shultz, T. R. (2001). Linguistic rule acquisition by information maximization: Neural networks infer the use of donatory verbs.
*Proceedings of the IASTED International Symposia, Applied Informatics: Artificial Intelligence and Applications*(pp. 90-94). Anaheim, CA: ACTA Press. - Maximizing information generates internal representations leading to rule discovery for Japanese donatory verbs. - Kamimura, R., Kamimura, T., & Shultz, T. R. (2001). Self-organization by information control.
*Proceedings of the IASTED International Symposia, Applied Informatics: Artificial Intelligence and Applications*(pp. 188-192). Anaheim, CA: ACTA Press. - Competition among neurons is achieved by maximizing information in competitive units. Cooperation is realized by making neighboring connections behave in the same way. Adaptation occurs by maximizing information content and simultaneously making neighboring connections as similar as possible. - Shultz, T. R., & Rivest, F. (2000). Using knowledge to speed learning: A comparison of knowledge-based cascade-correlation and multi-task learning.
*Proceedings of the Seventeenth International Conference on Machine Learning*(pp. 871-878). San Francisco: Morgan Kaufmann. - Comparison to multi-task learning (MTL) reveals that KBCC uses its knowledge more effectively to learn faster. - Shultz, T. R., & Rivest, F. (2000). Knowledge-based cascade-correlation.
*Proceedings of the International Joint Conference on Neural Networks, Vol. V*(pp. 641-646). Los Alamitos, CA: IEEE Computer Society Press. - Preliminary conference version of Shultz & Rivest (2001). pdf - Takane, Y., Oshima-Takane, Y., & Shultz, T. R. (1999). Analysis of knowledge representations in cascade correlation networks.
*Behaviormetrika, 26*, 5-28. - Mechanisms and characteristics of nonlinear function learning and representations in CC networks, using a variety of knowledge-representation-analysis tools. - Shultz, T. R., Oshima-Takane, Y., & Takane, Y. (1995). Analysis of unstandardized contributions in cross connected networks. In D. Touretzky, G. Tesauro, & T. K. Leen, (Eds).
*Advances in Neural Information Processing Systems 7*(pp. 601-608). Cambridge, MA: MIT Press. - In contribution analysis, analyzing the variance-covariance matrix of contributions yields more valid insights by taking account of connection weights. pdf - Takane, Y., Oshima-Takane, Y., & Shultz, T. R. (1995). Network analyses: The case of first and second person pronouns.
*Proceedings of the 1995 IEEE International Conference on Systems, Man and Cybernetics*(pp. 3594-3599). - Function approximation in CC network simulation of pronoun acquisition. pdf - Takane, Y., Oshima-Takane, Y., & Shultz, T. R. (December, 1994). Approximations of nonlinear functions by feed-forward networks.
*Proceedings of the 11th Annual Meeting of the Japan Classification Society*(pp. 26-33). Tokyo: Japan Classification Society. - How CC networks approximate the continuous XOR problem; evidence from several knowledge-representation-analysis tools. pdf - Takane, Y., Oshima-Takane, Y., & Shultz, T. R. (1994). Methods for analyzing internal representations of neural networks. In T. Kubo (Ed.),
*Proceedings of the 22nd Annual Meeting of the Behaviormetric Society*(pp. 246-247). Tokyo: The Behaviormetric Society. - Five methods for analyzing what a neural network has learned. - Shultz, T. R., & Oshima-Takane, Y. (1994). Analysis of unscaled contributions in cross connected networks.
*Proceedings of the World Congress on Neural Networks*(Vol. 3, pp. 690-695). Hillsdale, NJ: Erlbaum. - A principal components analysis (PCA) of unscaled network contributions (products of sending-unit activations and connection weights entering output units) yields more interesting insights about CC networks than comparable analyses of contributions scaled by the sign of output targets. - Shultz, T. R., & Elman, J. L. (1994). Analyzing cross connected networks. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.),
*Advances in Neural Information Processing Systems 6*(pp. 1117-1124). San Francisco, CA: Morgan Kaufmannn. - Sanger's contribution analysis is extended to the analysis of CC networks. A contribution is product of an output weight and the associated activation on the sending unit, whether that sending unit is an input or a hidden unit, multiplied by the sign of the output target for the current input pattern. The matrix of contributions x input patterns can be subjected to principal components analysis (PCA) to extract the main features of variation in the contributions. pdf