Tools that turn speech into text not only save time, Zhicheng Lin finds, but also allow him to multitask and to participate fully in meetings.Credit: Zhicheng LinFor most academics, the sound of typing is the sound of progress. But it’s also the sound of a bottleneck — a slow, physically taxing process that stands between our ideas and the page. We accept this as a necessary part of the job, but should we?Last year, persistent wrist pain from hours spent hunched over a keyboard forced me to question that necessity. The solution, I found, was to reclaim my voice. Modern dictation tools powered by artificial intelligence (AI) allowed me to compose text at a conversational speed, easily outpacing even the most proficient typists (who achieve a maximum of around 80 words per minute) with my natural speaking cadence of 130 words per minute or more. The ergonomic benefits were immediate.Although voice-to-text software now boasts remarkable accuracy, these tools remain largely untapped in academic workflows. They are often perceived as accessibility aids or tools for quick voice memos rather than instruments for scholarly production. This is a missed opportunity. A strategic voice-based workflow can transform how we capture ideas, draft manuscripts and engage with research.TranscriptionAcademic work thrives on fleeting insights. Ideas emerge on walks between buildings, in the shower and in the middle of the night. They surface during interviews with research participants, emerge from seminar discussions and crystallize during informal conversations with colleagues. Conventional note-taking forces us to choose between participating fully in the moment and scrambling to jot down what’s happening. Transcription eliminates this compromise.Transcription converts existing audio recordings into text: the audio exists first, the text follows. Recording a meeting allows you to engage completely, avoiding the distraction of manual note-taking. The resulting transcript becomes a searchable archive of decisions, insights and action points. For researchers conducting interviews, automated transcription transforms hours of playback and typing into minutes of review and annotation.Perhaps most importantly, transcription makes it easy to capture ideas when it would be difficult to write them down. Mulling over a research problem while walking to campus? Recording thoughts on your phone takes seconds. Lying in bed when a solution to a methodological challenge suddenly becomes clear? Voice memos preserve the insight without requiring you to reach for paper or a laptop.There are psychological benefits, too. Speaking our thoughts aloud allows us to bypass the internal editor that often stalls written expression. A blank page can be intimidating, but a voice recorder simply listens.DictationDictation, by contrast, produces text as you speak. In this instance, speech replaces the keyboard. Dictation is compositional rather than retrospective, demanding focused attention, but at the speed of thought rather than of fingers. For first drafts, e-mail responses or reviewer comments, this acceleration compounds quickly.But viewing it simply as a means of increasing speed undersells dictation’s value. Physical relief from keyboard work addresses an occupational hazard. Academic careers span decades; repetitive strain injuries accumulate silently until they become debilitating. Dictation offers not just efficiency, but also sustainability.Moreover, the technology enables us to truly multitask. Dictating while walking transforms commutes into writing sessions. Simple physical activities — folding laundry, organizing shelves or taking light exercise — can accompany composition without compromising either task. This reclaims otherwise dead time for productive work.Building your voice workflowEffective voice integration requires the matching of tools to tasks and the development of new routines around both transcription and dictation. Start with low-stakes applications to build comfort before tackling important work.For basic recording, you might already have all you need. Apple’s Voice Memos, for example, can handle most academic recording needs adequately.For sensitive material requiring offline processing, MacWhisper transcribes entirely on your device — a capability crucial for sensitive interviews and confidential discussions. The software can also handle batch processing and accepts various audio formats, and offers a one-time licence purchase rather than a recurring fee. (SpeechPulse is another option, for both Mac and PC users.)If cloud-based options appeal, Google AI Studio provides free transcription through multimodal large language models (LLMs) such as Gemini 2.5 Pro, although it does also use your data for training purposes. ChatGPT record mode (currently available for paid subscribers on macOS) records, transcribes and automatically generates structured summaries in an editable workspace. Users can also request the original transcript or edit it by giving the AI conversational commands, or prompts. Otter and Granola excel at meeting transcription with automatic speaker identification and summary generation, and ElevenLabs’ Scribe provides high-accuracy transcription.On the dictation front, computers and phones include basic features, but specialized tools often perform better. For maximum accuracy, I prefer Aqua Voice (for which free, paid and group tiers are available). Its Deep Context feature uses on-screen context — such as the active application and visible text — to improve recognition of domain-specific terms and to apply context-appropriate formatting. It also supports a dictionary of custom words.For cross-platform work, consider Wispr Flow, which is available for macOS, Windows and iPhone, and has both free and paid tiers. Users of Apple devices might also consider MacWhisper, SuperWhisper or Spokenly, which offer both local and cloud-based processing.Practical implementation