Influence of Smartphones and Software on Acoustic Voice Measures

This study assessed the within-subject variability of voice measures captured using different recording devices (i.e., smartphones and head mounted microphone) and software programs (i.e., Analysis of Dysphonia in Speech and Voice (ADSV), Multi-dimensional Voice Program (MDVP), and Praat). Correlations between the software programs that calculated the voice measures were also analyzed. Results demonstrated no significant within-subject variability across devices and software and that some of the measures were highly correlated across software programs. The study suggests that certain smartphones may be appropriate to record daily voice measures representing the effects of vocal loading within individuals. In addition, even though different algorithms are used to compute voice measures across software programs, some of the programs and measures share a similar relationship.

and correlations of acoustic measures taken simultaneously from a head mounted microphone and a Samsung Galaxy Note 3 were significant and strong (r = 0.73, Uloza et al., 2015). The purpose of the current pilot study was to compare within-subject variability among voice measures with different recording devices (i.e., head mounted microphone, Apple, and Android smartphones) and software (i.e., ADSV, MDVP, and Praat). In addition, correlations among voice software programs that provided the voice analysis were also assessed.

METHODS
Ten vocally healthy women and men produced three trials of /a/ sustained for five seconds and three trials of "we were away a year ago" at a comfortable fundamental frequency (F0) and intensity. "We were away a year ago" was selected because all the phonemes are voiced, providing a connected speech example of continuous vocal fold vibration. The vocal health of the participants was determined perceptually during conversational speech on the day of testing by the researchers. Each trial was separated by 10 seconds. A head mounted condenser microphone (AKG C420, Northridge, CA), iPhones 5 and 6s, and Samsung Galaxy S5 were placed 4 centimeters (cm) from the participant's mouth for voice recording (see Figure  1). A 4 cm plastic stick was used to measure the distance from mouth to microphones. All utterances were recorded simultaneously on all devices. Three apps, RØDE Rec LE (iPhone 5) and Recordium (iPhone 6s) for Apple, and Smart Voice Recorder (Samsung Galaxy S5), recorded .wav files. These apps were free, allowed email of the recorded .wav files, and offered a 44,100 Hz sampling rate for recording. The .wav files from the head mounted microphone were saved directly onto the computer that performed the analysis. The middle portion of /a/ (i.e., four seconds, 0.5 seconds trimmed off the beginning and end) and the entire sentence were analyzed. Figure 1. The experimental set-up with the recording devices (iPhone 5 and 6s, Samsung Galaxy S5, head mounted microphone) and the plastic stick that measured 4 cm from the mouth to the microphones.
The acoustic analysis was completed using Praat (Boersma & Weenink, 2015), free software on the web, and KayPENTAX's (Montvale, NJ) Multi-dimensional Voice Program (MDVP) and Analysis of Dysphonia in Speech and Voice (ADSV). The measures of interest included: fundamental frequency (F0), standard deviation of the F0 (SD of F0), jitter%, shimmer%, noise-to-harmonics ratio (NHR), cepstral peak prominence (CPP), and Acoustic Voice Quality Index (AVQI, Maryn, De Bodt, & Roy, 2010) (see Table1). The acoustic measures of F0, SD of F0, jitter%, shimmer%, and NHR were chosen because they represent time-based measures of voice in frequency and amplitude from a nearly periodic voice signal and are measured accurately through sustained vowel. CPP was chosen because it is an alternative to time-based measures and it can be applied to continuous speech, which may provide a more representative sample of voice as compared to sustained vowel. In addition, all of these measures, except AVQI, are among some of the minimum instrumented measures recommended by the Special Interest Group (SIG) 3 Voice and Voice Disorders of the American Speech Language Hearing Association (ASHA) for completion of a comprehensive voice evaluation.

RESULTS
The main effects of software, device, utterance, and trial were analyzed along with two-and three-way interactions for both women and men participants. For F0 and SD of F0, the main effect of utterance was significant for women (F0 p <0.001 and SD of F0 p <0.001), indicating that F0 and SD of F0 were different for /a/ and the sentence. No significant other main effects or interactions were found. For men, all main effects and interactions for F0 were not significant. The differences in F0 seen for women across sustained /a/ and the sentence were not carried over in men. Perhaps with the lower F0s seen in men, distinctions between sustained phonation and connected speech were not apparent in this study. That is, with added mass to the vocal folds in men there may be no significant difference in F0 for the different speech tasks (i.e., vowel vs. connected speech). For SD of F0 in men, the main effects of software and utterance were significant (p <0.001). There was also a significant two-way interaction between software and utterance (p <0.001). No other significant main effects or interactions were seen for SD of F0 in men. The variability around the mean for F0 in men did demonstrate differences across sustained phonation and connected speech.
For jitter% and shimmer% in women, main effects for software (p < 0.001), devices (p < 0.001), and the two-way interaction between software and devices (p < 0.001 for jitter% and p = 0.01 for shimmer%) were significant. For jitter% in men, main effects for software (p < 0.001) and trial (p = 0.01) were significant; however, no interactions were significant. For shimmer% in men, the main effect for devices (p = 0.01) was significant. No other main effects or interactions were seen.
For NHR in women and men, main effects for software (p < 0.001 for women and p = 0.05 for men) and devices (p < 0.001) were significant, but all two-and three-way interactions were not significant.
For CPP in women and men, the main effects for software, devices, and utterance were all significant (p < 0.001) and the two-way interaction for software and devices was significant (p < 0.001 for women and p = .04 for men). In addition for men, the main effect for trial was significant (p < 0.001). Across women and men for CPP, no other main effects or interactions were significant.
For AVQI, software was not a main effect because Praat is the only program that analyzes AVQI. The main effect for devices was significant in both women and men (p < 0.001 for women and p = 0.01 for men). The other main effect of trial and the two-way interaction of devices and trial were not significant for both women and men. Means and standard deviations for all dependent variables are presented in Tables 1 and 2. Correlations between software yielded the following results. There was a strong correlation between CPP values calculated by Praat and ADSV for women (r = 0.96, p < 0.00) and for men (r = 0.94, p < 0.001). For women, there were additional strong correlations between jitter% and NHR calculated by Praat and MDVP (r = 0.64, p < 0.001 for both). Shimmer% in women was not a strong correlation between Praat and MDVP (r = 0.11, p = 0.07). For men, there were no additional strong correlations (jitter% r = .198, p < 0.001; NHR r = 0.29, p < 0.001; shimmer% r = 0.12, p = 0.04).

DISCUSSION
Within-subject for both women and men, iPhone 5 and 6s, Samsung Galaxy S5, and the head mounted microphone yielded no significant differences when comparing voice analysis for F0, SD of F0, jitter%, shimmer%, NHR, CPP, and AVQI across MDVP, ADSV, and Praat. This result is supported by no significant three-way interactions of software, device, and trial indicating that there was no change in the dependent variables across software and across device from trial one to trial three. In addition, algorithms differ for calculating jitter%, shimmer%, NHR, and CPP across software. Even with the different algorithms, there was a strong correlation between ADSV and Praat for calculating CPP in both women and men and also between MDVP and Praat for calculating jitter% and NHR in women only. The overall values may be different, but the trends for these measures follow similar trajectories. It is interesting to note that jitter% and NHR were not strongly correlated across MDVP and Praat for men. Perhaps the lower F0s are disrupting the relationship between the algorithms. There was no difference between women and men for CPP because it is not a time-based measure.
The current results are consistent with previous work that suggested certain apps may be used to accurately and reliably measure environmental noise (Kardous & Shaw, 2014) and a Samsung Galaxy Note 3 compared with a head mounted microphone produced strong correlations between acoustic voice measures (Uloza et al., 2015). A recent study presents contradictory suggestions that the use of apps for dB readings of the human voice is premature because all of the three apps tested were not comparable to a Larson-Davis (Depew, NY) Model 831 Type 1 sound level meter (SLM) (Fava, Oliveira, Baglione, Pimpinella, & Spitzer, 2016). Results indicated that three SLM apps on an iPhone 5 and a RadioShack (Fort Worth, TX) SLM yielded inconsistent dB readings for the human voice at soft, habitual, and loud when compared with a Type 1 SLM. Frankly, it is not surprising that the results in Fava and colleagues (2016) were significantly different across recording devices for the human voice recordings and outside of the established criterion of ± 2dB. The procedures did not account for within subject variability across trials. For example, participants only produced one trial of soft /a/ sustained for five seconds. Because the microphones are different across devices, it is expected that the mean results will vary. In fact, the results from the current study were similar to Fava and colleagues (2016) when only looking at the main effect of device. In the current study, there were differences in the means of some of the voice measures across the smartphones and the head mounted microphone. The clinically relevant question is related to maintaining microphone recording integrity across trials in the same individual. The current study addressed that question and found that the smartphones and the head mounted microphone tested enabled consistent analysis of the voice measures within subject across women and men.
Considering the results of this pilot study, it is possible to capture reliable daily vocal loading effects using smartphones and free apps. To limit variability, use the same phone and the same app within each individual and require a 4 cm distance from mouth to microphone. The results are applicable to the phones and the apps used in the study. Future work needs to investigate other phones and other apps, especially given the rapid evolutions in smartphones. If the SLP does not have access to KayPENTAX's software (i.e., MDVP and ADSV), the recommended minimum acoustic instrumented measures by SIG 3 of ASHA can still be completed using Praat, a free software program downloaded from the internet. In addition, the SLP can include AVQI, which is a measure that is only calculated through Praat. CPP measured through Praat is highly correlated to CPP measured though ADSV for both women and men. Jitter% and NHR are also highly correlated between MDVP and Praat for women only. Even with the measures that are not highly correlated between Praat and ADSV or Praat and MDVP, what matters is withinperson change. Differences seen in that individual from preto post-treatment carries the most weight regardless of the software program used to perform the analysis. The SLP can complete an acoustic voice evaluation, representing the daily effects of vocal loading, using accessible and low-cost options.