Most if not all of the research I conduct involves computational modeling of memory in some form. Even purely empirical projects often involve testing the predictions of computational models.
While computational modeling is not the most popular topic in memory research, it allows researchers to get away from the “20 questions with nature” game and think more broadly about how memory works. We can manipulate variables in our experiments one at a time and still learn something, but the learning will be too incremental to make real progress in our understanding.
Computational modeling, in contrast, allows for us to explore everything that a memory theory needs to specify: learning, representation, retrieval, and decision-making. We can make a variety of assumptions in each of these stages and directly observe how well models can capture the data. And when I say “data” here, I don’t even mean one particular dataset – we can think globally and capture a range of data, even ones collected from different labs.
The computational modeling I do can fit into a few of the different categories I describe below.
Interference, Representation, and Retrieval in Recognition Memory
If you were to ask most people in the field about models of recognition memory, they would likely have volumes to say about signal detection theory (SDT) or the dual process signal detection (DPSD) model.
While these models are useful models, they primarily specify the decision stage of memory. While they specify decision variables such as distributions of familiarity or high-threshold recollection, neither of these models specify how these decision variables are generated.
The reason why this is consequential is because our most fundamental question about memory, namely “what causes forgetting?“, can’t really be answered by such models.
Most memory researchers today are of the consensus that interference is what is primarily responsible for forgetting. But it is still not necessarily clear what causes interference.
Much of the work I have done attempts to get at this question. I have used global matching models, which assume that retrieval operates by matching the memory cues against the entirety of memory in parallel to produce a measure of “global similarity” – how similar the cues are to memory as a whole, which then drives a decision process. The field has generally emerged on a consensus that this is how recognition operates and the majority of process models of recognition memory employ global matching. These include the Minerva 2 (Hintzman, 1988), Search of Associative Memory (SAM: Gillund & Shiffrin, 1984), Theory of Distributed Associative Memory (TODAM: Murdock, 1982, 1997), Retrieving Effectively from Memory (REM: Shiffrin & Steyvers, 1997), and the Generalized Context Model (Nosofsky, 1988).
Still, a number of remaining questions persist. What representations underlie the global similarity computation? How do such models perform with more sophisticated decision-making models? Several of these models were evaluated qualitatively against manipulations such as list length and category length, but were not applied in a more comprehensive manner. Much of my work has explored these questions, and I will detail the highlights below:
- My dissertation work concerned a measurement of sources of interference within a matrix model I developed with my adviser Simon Dennis. Prior to my time, there was a debate as to whether interference was comprised from the other items on a study list (item-noise) or the prior contexts in which items were experienced (context-noise, e.g. Dennis & Humphreys, 2001). I applied our model using hierarchical Bayesian techniques to ten recognition memory datasets which comprised manipulations of list length, list strength, stimulus type, and word frequency, in both item and associative recognition. Despite this broad range of constraints, the parameter estimates of the model strongly suggested that interference was dominated by the prior contexts in which an item was experienced (context-noise and background-noise). However, this depended somewhat on the stimulus type, with more confusable and similar items being somewhat more susceptible to item-noise. Some of these datasets also included my first publication, which were on the list-strength effect (Osth & Dennis, 2015, Psychological Review).
- I extended the Osth and Dennis (2015) model in a number of subsequent publications. One constraint we did not include was the finding that performance declines across test trials, a finding which has been attributed to item-noise from learning the test items (Criss, Malmberg, & Shiffrin, 2011; Gillund & Shiffrin, 1984). We applied the model at the level of individual test trials while simultaneously modeling decisions with the diffusion decision model (DDM) to account for retrieval latency distributions (see “Decision-Making in Episodic Memory” below for more information). We found strong support here for the idea that the decline in performance was actually due to changes in the episodic context that occur as a consequence of memory retrieval. We were also able to account for a number of other puzzling findings, such as the finding that lexical decision trials do not impair performance and how list strength can modulate the decline in performance through testing (Osth, Jansson, Dennis, & Heathcote, 2018, Cognitive Psychology).
- The modeling efforts above described the cases of item and associative recognition. However, there has been a noticeable dearth of process modeling of source memory, which has been instead dominated by decision models. We extended our matrix model to source memory very simply – by assuming items are also bound to their sources and these are also globally matched against the contents of memory. We applied the model to three datasets manipulating list strength in item recognition and source memory and again found a dominance of pre-experimental sources of interference, namely context-noise and background-noise (Osth, Fox, McKague, Heathcote, & Dennis, 2018, Journal of Memory and Language).
But wait! It sounds like I’m giving a consistent story here that item-noise plays no role in recognition memory. But we have been recently finding that’s not the case:
- Brandt, Zaiser, and Schneurch (2018) replicated the null list length effects commonly found by Simon Dennis and colleagues. However, they found that when performance was restricted to the first study-test cycle, a sizeable list length effect could be found! They attributed their findings to the fact that subsequent lists “balance out” the number of items in memory: if someone studies a 20 item list first, a subsequent 80 item list results in 100 items in memory, whereas a subject that studies an 80 item list first and subsequently studies a 20 item list also has 100 items in memory. Thus, the buildup of proactive interference (PI) from prior lists actually *obscures* a list length effect.
Each of our past modeling efforts did not consider a buildup of PI throughout the experiment. Thus, my student Julian Fox and I extended the Osth, Jansson, Dennis, and Heathcote (2018) model by actually adding items from each study list and test trial into the contents of memory, such that interference builds across the course of the experiment. The model captured list length, word frequency, and test position effects, but in addition, we were able to capture the buildup of PI throughout the experiment.
The most important part was how the buildup of PI changed our inferences about interference. The account of PI actually reversed conclusions – once PI was accounted for, a dominance of item-noise was found! In addition, we found that item-noise and changes in context exhibited roughly comparable contributions to the decline in performance through testing (Fox, Dennis, & Osth, in revision).
Decision-Making in Episodic Memory
One of the things I learned during my Ph D is that in most cases, memory can’t really be divorced from decision-making. If I show you a photograph of someone and ask “Did you see this person?”, I’m asking you to make a decision based on how your memory responds to a memory cue.
For this reason, we can gain a better understanding of memory by also understanding how people make decisions on the basis of what is retrieved from memory.
Much of my work on this subject concerns linking memory models with models of decision-making, such as the diffusion decision model (Ratcliff, 1978) or the linear ballistic accumulator model (LBA: Brown & Heathcote, 2008). These models offer a couple of major advantages over more traditional and simpler models of decision-making, such as signal detection theory (SDT).
The first and most obvious advantage is that models such as the DDM and LBA allow us to account for what was retrieved from memory, but also the latency with which it was retrieved. It has been well established that stronger memories are retrieved more quickly than weaker memories, and thus, for quite some time some very useful data has been omitted from our models and analyses.
In addition, such models also allow for sharper inferences. Traditional approaches assume that, aside from differences in bias across some conditions, all of the remaining observed measures are the result of output from memory. Evidence accumulation models such as the DDM and LBA instead account for decision noise. In particular, in such models performance varies as a function of a threshold for responding. Thus, some conditions and/or individuals may exhibit differing degrees of performance simply because different speed-accuracy thresholds were adopted in response to constraints from the task.
Much of this work has occurred in a recognition memory context, where it’s simpler to specify decision models due to there being only two response options available. Below are some highlights:
- The DDM and the LBA differ in the sources of decision noise, with the DDM having within-trial noise in addition to between-trial noise in drift rate and starting points. While previous work by Donkin, Brown, and Heathcote (2009) found that the two models produced very similar conclusions in most cases, we found that the different sources of decision noise led to very different conclusions about data from receiver operating characteristics (ROCs). In particular, while most models have measured greater variability for targets, we found in some cases the LBA measured equal variability between targets and lures. There were some other interesting things we found as well. In a binary ROC procedure, we found that speed-accuracy thresholds were not constant across each bias condition, indicating a failure of selective influence in the procedure (Osth, Bora, Dennis, & Heathcote, 2017, Journal of Memory and Language)
- In 2015, Simon Dennis and I published a matrix model of recognition memory that was used to estimate sources of interference in recognition memory. from items and contexts (Osth & Dennis, 2015, Psychological Review). However, we did not consider latency data in that analysis and I had always felt a bit uncomfortable about the fact that the interference estimates were likely contaminated by the influence of decision noise. A few years later, I combined that model with the DDM to produce a complete model of recognition memory that specifies encoding and retrieval processes along with the decision process, similar to the Exemplar-Based Random Walk (EBRW) model (Nosofsky, Little, Donkin, & Fific, 2011). Using hierarchical Bayesian approaches, we used the model to account for choice probabilities and complete latency distributions in a number of datasets to explore the sources of interference in addition to the causes of performance declines through the course of testing. We found that such test-related declines were primarily due to changes in the retrieval context through the course of testing, but also found that participants often changed their speed-accuracy thresholds through the course of testing (Osth, Jansson, Dennis, & Heathcote, 2018, Cognitive Psychology).
More recently, I have been working on introducing such models to the free recall task. Free recall is interesting because the pool of potential responses is very large!
- Simon Farrell and I applied racing evidence accumulation models to first responses in a free recall task to jointly account for the serial position curves and latency distributions. We applied such models using hierarchical Bayesian techniques to 14 datasets where latency was recorded from vocal responses. We found… all kinds of interesting things! While most analyses of forgetting functions have found evidence for power law functions, we found almost unanimous support for exponential functions of recency. In addition, we found strong support for the notion that primacy effects are due to a reinstatement of the start-of-the-list, as opposed to primacy effects arising from extra strength in a rehearsal buffer. To our knowledge, these two accounts have not been distinguished previously. Finally, we found evidence that participants began accumulating evidence before the recall cue in immediate free recall cases (Osth & Farrell, 2019, Psychological Review)
Representation of Serial Order
One of the most interesting questions I grappled with during the course of my Ph D was what kind of representations underlie our memory for serial order. That is, given that we experience a sequence, what types of representaitons enable us to retrieve that sequence?
The oldest theoretical notion is that items in the sequence are chained together through associations. I was most sympathetic to this notion, as models of free recall and paired-associate tasks, such as the SAM model (Gillund & Shiffrin, 1984), have generally posited such associations.
I was quite surprised to learn that there was a large literature that accumulated in the 1990’s that found that chaining models cannot provide a viable account of memory for serial order! Instead, the field had converged on other representations, such as direct utilization of a primacy gradient of strengths (Page & Norris, 1998) or associations to within-list positions (Henson, 1998; Brown, Preece, & Hulme, 2000).
My adviser, Simon Dennis, and I were somewhat surprised by these findings. Using a similar logic as Geoff Ward, we began conducting some rather simple experiments attempting to replicate these phenomena using procedures that are more in line with what the free recall literature had been conducting. Namely, instead of using small sets of digits or letters, we conducted experiments using large sets of words that were not reused across trials and used large numbers of participants (100+).
- Chaining models are often tested by looking at what follows an erroneous recall. That is consider if a sequence such as A B C D E F is studied and a participant erroneously skips from A to D. What happens then? Chaining models predict that the following response should be located near D. Ironically, recall is much more likely to continue with a response closer to A – they are more likely to reverse their course and fill-in their missing responses. While most studies in the serial recall literature had found evidence for fill-in, a more recent report by Solway, Murdock, and Kahana (2012) found evidence for chaining models using long lists of words. We found that their data were likely due to the high rates of omissions, as trials with more omissions showed more of an in-fill pattern (continuing onward from D) while trials with less omissions showed more fill-in. We concluded that models that naturally account for fill-in can likely account for both sets of responses, as a high incidence of omissions can obscure an underlying fill-in effect, but the opposite is not the case (Osth & Dennis, 2015, Journal of Experimental Psychology: Learning, Memory, & Cognition).
- The fill-in effect is contrary to chaining models, but what supports positions as an underlying representation? A major piece of evidence here concerns what occurs when people make prior-list intrusions. That is, consider if a list G H I J K L follows a list A B C D E F. As it turns out, when subjects erroneously intrude from the first list, it tends to occur in the same position as they are attempting to recall! That is, if a subject is attempting to recall I (3rd position), if they make an intrusion, they are most likely to recall C. These data were commonly found with small sets of items, and in some cases the data came from cases where subjects wrote their responses on lined response grids, where it is not only apparent as to which position they are trying to recall, but in some cases they could even see their responses from the previous trial. We controlled for each of these possibilities using typed responses where each response disappeared after entering it and a large set of words. We were quite surprised to find that we found the same effects as were traditionally observed – intrusions were most likely to be recalled in the same position as the prior trial. In addition, contrary to results in free recall, intrusions were most likely to be followed by responses from the current list, implying that participants can recover from an intrusion and get back to the present list they are trying to recall (Osth & Dennis, 2015, Journal of Experimental Psychology: Learning, Memory, & Cognition).