Prosodic cues in speech – such as F0, intensity, and articulation rate – have been shown to be related to cognitive processes, including memory retrieval. Here we examine whether subtle variations in spoken responses to cues in a paired-associates learning task differentially reflect (a) the availability of the memory trace and (b) the learner’s confidence in the response. We asked 40 participants to complete a learning task with spoken retrieval attempts followed by confidence ratings. The acoustic properties of their speech, including F0, intensity, and articulation rate, were analyzed and related to the collected confidence ratings. Results showed that intensity was indicative of objective recall performance (indicative of memory trace availability), while F0 and articulation rate were primarily related to speakers’ confidence ratings (indicative of underlying certainty about the response). To evaluate these relationships more formally, we compared three structural equation models reflecting competing accounts: one in which prosody primarily indexes memory strength, one in which it primarily indexes confidence, and a hybrid model in which different features map onto distinct processes. Model comparisons favored the hybrid account, suggesting that intensity is more closely linked to memory strength, whereas F0 and articulation rate track confidence. These findings align with the idea that memory performance and metamemory processes affect distinct prosodic features in speech at separate time points during an utterance. Practically, they suggest that real-time analysis of speech prosody can infer cognitive and metacognitive states without explicit input from the speaker, offering promising applications for speech and memory research as well as educational technologies.