Measuring Catastrophic Forgetting in AI

Wait 5 sec.

TABLE OF LINKSAbstract1 Introduction2 Related Work3 Problem Formulation4 Measuring Catastrophic Forgetting5 Experimental Setup6 Results7 Discussion8 Conclusion9 Future Work and References\4 Measuring Catastrophic ForgettingIn this section, we examine the various ways which people have proposed to measure catastrophic forgetting. The most prominent of these is retention. Retention-based metrics directly measure the drop in performance on a set of previously-learned tasks after learning a new task. Retention has its roots in psychology (e.g., Barnes and Underwood (1959)), and McCloskey and Cohen (1989) used this as a measure of catastrophic forgetting. The simplest way of measuring the retention of a learning system is to train it on one task until it has mastered that task, then train it on a second task until it has mastered that second task, and then, finally, report the new performance on the first task. McCloskey and Cohen (1989) used it in a two-task setting, but more complicated formulations exist for situations where there are more than two tasks (e.g., see Kemker et al. (2018)). An alternative to retention that likewise appears in psychological literature and the machine learning literature is relearning. Relearning was the first formal metric used to quantify forgetting in the psychology community (Ebbinghaus, 1913), and was first used to measure catastrophic forgetting in Hetherington and Seidenberg (1989).\The simplest way of measuring relearning is to train a learning system on a first task to mastery, then train it on a second task to mastery, then train it on the first task to mastery again, and then, finally, report how much quicker the learning system mastered the first task the second time around versus the first time. While in some problems relearning is of lesser import than retention, in others it is much more significant. A simple example of such a problem is one where forgetting is made inevitable due to resource limitations, and the rapid reacquisition of knowledge is paramount. A third measure for catastrophic forgetting, activation overlap, was introduced in French (1991). In that work, French argued that catastrophic forgetting was a direct consequence of the overlap of the distributed representations of ANNs. He then postulated that catastrophic forgetting could be measured by quantifying the degree of this overlap exhibited by the ANN. The original formulation of the activation overlap of an ANN given a pair of samples looks at the activation of the hidden units in the ANN and measures the element-wise minimum of this between the samples. To bring this idea in line with contemporary thinking (e.g., Kornblith et al. (2019)) and modern network design, we propose instead using the dot product of these activations between the samples. Mathematically, we can thus write the activation overlap of a network with hidden units h0, h1, …, hn with respect to two samples a and b as\\ :::infoAuthors:Dylan R. AshleySina GhiassianRichard S. Sutton::::::infoThis paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.:::\