IntroductionUnilateral vocal fold paralysis (UVFP) is a voice disorder characterised by impaired motility of a single vocal fold. In the general population, the incidence of UVFP among voice pathologies is estimated at 1.2%1. Its etiopathogenesis is wide and includes mechanical trauma to the head and neck, neoplasms, neurological diseases, idiopathic origin and, most commonly, iatrogenic causes (typically represented by injuries occurring in thyroid and cardiothoracic surgery). The most common and perceivable symptom of UVFP is dysphonia, with patients experiencing breathy voices due to the air leakage caused by glottal insufficiency. Such impaired communication may harm the quality of life, possibly leading to stress and isolation, which is particularly worsened for professional voice users (such as teachers, actors, or singers). Moreover, UVFP may cause speech dyspnea, swallowing problems and, in most severe cases, body stabilisation difficulties due to hyperventilation while speaking.Voice rehabilitation constitutes the initial treatment for glottic insufficiency. Surgery becomes necessary when unsatisfactory results are obtained2. There is no clinical consensus regarding the best intervention to close the glottic gap and recover the mucosal wave. A promising technique is injection laryngoplasty (IL), which inserts a mouldable and biocompatible material (e.g., autologous fat and fascia) to medialise the paralysed vocal fold2. The main advantages of this approach are that it does not require open surgery, has material-wide availability, is cost-effective and does not hamper spontaneous reinnervation. Moreover, feedback on voice improvements can be collected in real-time: this is an important aspect as it can counteract the main and inevitable disadvantage of IL, fat reabsorption, because it may help the phonosurgeon to regulate the over-injection. With specific preparation procedures, the beneficial effect of IL on voice outcomes can be permanent.The gold standard for the clinical evaluation of UVFP treatment efficiency and monitoring is the direct visualisation of the vocal fold movements through laryngoscopy. Nonetheless, complementary assessments are advisable as voice production is a multidimensional phenomenon and, thus, a multidimensional approach to study voice changes over time is necessary, especially to demonstrate positive results obtained by any type of treatment. Furthermore, the lack of high-resolution imaging devices in primary care units, or the need to be physically present in hospitals might be overcome by indirect methods of evaluation of voice quality. A well-established method is the perceptual evaluation of voice, which is typically performed along with laryngoscopy, that implements standardised batteries as the commonly used GRBAS scale, that account for the overall degree of dysphonia severity (G), roughness (R), breathiness (B), asthenia (A) and strain (S)3. Nevertheless, auditory assessments may suffer from drawbacks such as inter-rater variability and physicians’ experience. Acoustic analysis has become a widespread technique that automatically provides several parameters that objectively describe multiple aspects of voice production, such as phonation (the fundamental frequency F0), period and amplitude perturbation (jitter and shimmer), and noise (noise-to-harmonics ratio NHR, and normalised noise energy NNE). These measures proved to be sensitive to changes in vocal quality in UVFP patients treated with laryngoplasty. Meta-analyses have highlighted that jitter and MPT significantly improved in both short- and long-term after the injection. In contrast, shimmer was significantly reduced only in the short-term period4. Individual studies have also reported significant improvement of F05 and HNR6.However, the computation of perturbation measures, which directly depend on the estimation of the F0, can become extremely challenging and unreliable, especially in UVFP patients for their highly irregular, breathy, and partially aphonic utterances2,7. Moreover, recent works demonstrated the existence of several nonlinear phenomena in voice production that the algorithms embedded in common tools used for acoustic characterisation are insensitive to. Therefore, the mismatch between the mathematical framework and the biosignals attributes can lead to ambiguous results in acoustic analysis, hindering its advantages and applicability4,7, as well as impeding a clear, robust evaluation of treatment efficiency. For instance, subglottic airflow properties, rheological characteristics of the vocal fold tissue, as well as their mechanical collision, and asymmetry in left and right vocal fold movements may generate unique properties that require novel and alternative techniques to extract and investigate them from audio recordings8,9. To address the literature gap, an emerging body of research proposed nonlinear dynamical systems theory as a promising candidate to investigate dysphonic voices with a broader perspective7,10. As the complete dynamics of a system is often complex and not directly observable, this approach typically relies on reconstructing a given biosignal in the so-called state space, an alternative representation achieved by considering multiple delayed copies of the original time series, to reveal its hidden internal states and compute metrics capable of describing its full evolution even when little information (i.e., a single time series, in this study, the voice recording) is available.Background on nonlinear acoustic analysisThe characterisation of acoustic samples can be divided into three types of feature sets: perturbation, cepstrum, and complexity measures. A multidimensional approach is typically advised to describe the phonatory system functioning efficiently, as some properties related to vocal diseases (e.g., aperiodicity) are also inherent to non-pathological states9. Moreover, when analysing dysphonic voices, perturbation parameters such as jitter and shimmer can become unreliable for detecting alterations from normophonic subjects and, above all, when comparing them across other pathologies and in pre- and post-operative conditions11. Therefore, parameters that can be extracted from audio recordings without relying on the computation of the F0 and that can account for well-known nonlinearities in speech production have become more popular in the last two decades. Among complexity parameters, three categories may be identified: measures describing the state space geometrical properties and trajectories, information theory and self-similarity measures.In the first group, the correlation dimension (D2) is a particular type of fractal dimension that estimates the geometric shape of the state space occupied by the attractor (or a set of points). It specifies the degrees of freedom needed to describe and generate the corresponding physiological process12. A more complex system has a higher dimension, meaning more independent variables may be needed to describe its dynamic state. In voice analysis, D2 was found significantly higher in dysphonic patients than normophonic subjects12,13,14,15, presenting also high recognition rates and accuracies in distinguishing the two populations12,15. On the other hand, the Largest Lyapunov Exponent (LLE) quantifies the sensitivity of the attractor to the initial conditions. It has been introduced in acoustic analysis as the larynx can be considered a dissipative system where oscillations are both attenuated and emphasised16. Lyapunov exponents describe the evolution of trajectories in each dimension of the attractor, and its largest value (i.e., the LLE) represents a simple measure of how two initial nearby trajectories rapidly diverge or converge in the phase space. A LLE \(> 0\) represents divergence, and vice versa.The main disadvantage of these types of metrics is that it requires speech dynamics to be purely deterministic. Such an assumption can be proven only under specific circumstances, and this modelling approach does not account for randomness, which characterises voice, especially in the presence of specific physiological phenomena such as airflow turbulence15. Therefore, information theory measures have been introduced in acoustic analysis to overcome this issue, since they do not consider the deterministic or non-deterministic nature of the signal. One of the most used features is represented by the Approximate Entropy (AE), which reflects the unpredictability of the fluctuations in a time series. Sample entropy (SE) was proposed as an improved version of AE as the latter intrinsically produces a bias toward regularity: SE eliminates self-matching, leads to more consistent results, and presents a good independence from signal length17. SE allowed to separate the vocal phenotypes of four genetic syndromes18, proved to be higher in dysphonic patients19, and was successfully used as a feature in machine learning paradigms20,21. Research has also investigated voice parametrisation with other variants of entropy, e.g., correlation, Shannon, Kolmogorov, Renyi, Fuzzy ones9.Regarding self-similarity measures, detrended fluctuation analysis (DFA) has been implemented in voice analysis as an alternative to entropy measures to study chaotic vibration. DFA represents a method for determining the self-affinity of a signal22 through a scaling exponent \(\alpha\). It has been used to detect the presence of turbulence noise in audio samples. Such value was found significantly higher in disordered voice7 and in Parkinson’s disease23.Nevertheless, DFA (as well as entropy) assumes that the scale invariance does not depend on time and space. However, such variations often occur in biomedical signals, including voice and speech24,25, and consequently indicate a multifractal, rather than monofractal, structure. Hence, multifractal DFA (MFDFA) could be a more adequate and promising technique to analyse and characterise the multiple components, interactions, and scales that the voice production system involves over time26,27. A few works conducted MFDFA to investigate affective speech and newborn cries to develop more efficient automatic recognition tools28,29. Still, more research is needed to understand its capabilities when considered disordered voices.Notably, MFDFA has been implemented to study the so-called complexity matching (CM) between interacting systems30. CM refers to a maximised flow of information between systems when they share similar complexity, and its occurrence has been demonstrated in participants engaged in joint activities, as well as two different effectors (i.e., lower limbs) of the same individual30. In speech analysis, acoustic onsets occurring during dyadic conversation that reflect turn-taking proved to follow a power law distribution (similarly to critical events of complex networks) and a CM effect was found in the temporal structure of language when comparing participants talking about topics where they do not and do share common opinions and beliefs31.The current studyThis paper proposes a novel approach to studying voice quality recovery. It is applied on a specific sample, i.e., UVFP patients after autologous fat IL.Firstly, to account for voice scale invariance dependence from time and space, several complexity metrics will be computed following a multiscale strategy to describe vocal properties comprehensively. This will allow for investigating the latter on multiple levels simultaneously, possibly underlining specific aspects of voice production related to phonation and articulation.Then, it aims to introduce and highlight a potential complexity-matching effect when comparing audio recordings from pre- and post-operative conditions, which is calculated through MFDFA. In our context, even if it is not possible to allude to a concurrent interaction since audio recordings are acquired in two different instances (see Sect. “Patients”), this innovative perspective may highlight how much information, i.e., the irregular, chaotic voice behaviour, of the pre-operative condition remains in the post-treatment one. Here, it is hypothesised that if patients improve voice quality, the similarity between pre- and post-surgery voice will be low, and vice versa for patients with unsatisfactory voice recovery after intervention.Additionally, it will analyse how confounding factors may affect the rehabilitation process. By considering the degree of recovery obtained through voice perceptual assessment, the UVFP population will be divided into two subgroups, and age, aetiology, disease time, and post-measurement time will be investigated to uncover possible statistical differences. This could be useful in clinical practice to guide otolaryngologists in planning and monitoring patients’ follow-ups.MethodsPatientsA total of 69 participants were included in the study, of which 40 are females (mean age = \(49.3\pm 12.0\) years old) and 29 are males (mean age = \(49.7\pm 14.6\) years old). They were diagnosed with UVFP through videolaryngostroboscopy with a flexible endoscope or a \(70^{\circ }\) rigid fiberoptic endoscope, supported by voice perceptual assessment performed with the GRB scale, a variation from the original GRBAS one3 proposed in32. Patients affected by UVFP who did not fully recover voice quality with voice therapy were considered eligible for AFIL. Inclusion criteria were: age >18 years, UVFP lasting since at least 6 months, persistent dysphonia and voice fatigue. Exclusion criteria were: age