The present research was designed to determine whether an ambiguous, visually presented event is better recalled if an emotional (relative to neutral) verbal interpretation of the event is read before or after seeing the video. There are two competing hypotheses. First, researchers have found that emotional events are better recalled relative to neutral events. As such, one possibility is that the presentation of an emotional verbal interpretation of the event – read before or after the video itself – may enhance subsequent memory of the event. Alternatively, research on the “verbal overshadowing effect” shows that the subsequent verbal description of an event can impair memory for the event itself. This suggests that information presented asynchronously to the video may adversely affect memory for the video. We showed participants (N = 649) 2-min videos that could be interpreted in either a mildly or a very negative emotional way. Before or after viewing a video, people were given a script that allowed for a neutral or negative verbal interpretation of the video itself, with the negative interpretation causing them to have a more robust emotional response to the video. Memories of the video were then assessed either immediately or following a 1- or 7-day delay. Memory of both the video (using detail, inference, and wrong probes) and the text (using verbatim, paraphrase, inference, and wrong probes) were examined. Results revealed an “emotional verbal overshadowing effect,” such that emotional information presented asynchronously to the video produced the greatest decrement in subsequent memory.