Compensatory Lengthening by British Learners of Japanese

Katsumi NAGAI
Department of Applied Linguistics, Edinburgh University

Abstract

Every phoneme has its own intrinsic segmental duration, which varies in the environment. This experiment examines the effect of voicing second consonants in C1V1.C2V2 test words spoken by Japanese native speakers and British learners of Japanese at two levels. Compensatory effect, which is seen between the voiced consonant and the preceding vowel, occurs beyond CV boundaries. Elementary learners can be distinguished because they have little effect of durational compensation. The result implies a possibility of applying the effect to evaluating the learners' achievement of second language learning.

概要

各言語の音素は,発話速度や強調の効果だけでなく,その出現環境に影響される固有の長さを持つと言われる.英語を母語とする初級及び上級日本語学習者と日本語母語話者にCVCVから成る単語を文中で発音させて個々のセグメント長を比較したところ,第2子音とその先行する母音の長さの補償効果に,学習者の到達度による差が見られた.第2言語音韻獲得の指標としての利用が期待できる.

1. Introduction and Preceding Studies

Duration, as a phonetic and physical unit of time of speech event, needs to be distinguished from the phonological length. In this paper, duration is defined as a speaker's time between start-point and end-point of utterance, which is considered to be free from the divergence among listeners. Because everybody has their own default tongue setting, it takes more time for a tongue to move to the destination when the tongue deviates from the natural position for a schwa. Of course, each segmental duration becomes longer when the speaker speaks at a slow tempo. A stressed syllable also has longer duration than the other unstressed syllables in an English word (Fry 1955).

It is said that every phoneme of languages has its own segmental duration (Lehiste 1970:18ff.). Firstly, English tense vowels and diphthongs are usually longer than their corresponding lax vowels as observed between /i/ and /I/. Secondly, differences in duration are reported as a function of place of articulation. Klatt (1976) clarifies that bilabial stops are typically slightly longer in duration than alveolars and velars. Thirdly, voiceless obstruents are longer than voiced sound. Klatt (1976) says voiceless fricatives are about 40ms longer in duration than the corresponding voiced fricatives. He also shows that the voiceless fricative /s/ in "soo" is inherently longer than the voiced fricative /z/ in "zoo". Borden et.al. (1994:142) explains: "while these durational differences are found across language, suggesting that it is basically conditioned by physiology, English shows very large differences in vowel duration before voiced and voiceless consonants, suggesting a learned overlay."

In the case of Japanese, Sugito (1996:274) clarifies that a native speaker's second unvoiced consonant of 'tata' is longer than the second voiced consonant of 'dada', while the reverse is true for Chinese speakers. Research on Spanish has also shown a small average difference (18ms) between vowels preceding voiced and voiceless consonants. Campbell's report (1992:213) states that Japanese vowels are, contrary to English, shorter when they are followed by voiced consonants. This inconsistency implies that the effect of voicing a consonant on the preceding vowel is not an innate factor. Therefore, it is necessary to test compensatory lengthening by both native and nonnative speakers. Would it be reasonable to assume an inherent relative difference of segmental duration between the utterances of native Japanese speakers and British learners of Japanese?

2. Experiment

2.1. Aim
The aim of this experiment is to clarify whether voicing of the second consonants of Japanese CV.CV words spoken by both English and Japanese speakers have an effect on each segmental duration. It is hypothesized that the intrinsic effect of voicing in Japanese speech sound is consistently affected by the difference of learners' achievement level. Is the duration of voiceless consonants by Japanese speakers consistently longer than that of voiced consonants? Does the vowel in the first CV sequence become longer across the moraic boundary, when the second consonant is [+voiced]?
2.2. Design
Pairs of test words are read by subjects to investigate whether the effect of voicing the second consonant is at work. Supposing both native Japanese speakers and British learners of Japanese have their own intrinsic segmental duration of speech sound, it is plausible that the difference of their first language has an effect on the segmental duration and compensation of the change. If English speakers' voicing of the second consonant gives a different effect from that of Japanese speakers', it would be a sign of their phonetically intrinsic segmental duration being overwhelmed by the effect of voicing.

The null hypotheses in this experiment are (1) H0: Native speakers' and advanced learners' duration of a voiceless segment in a Japanese sentence is as long as that of their voiced counterparts, and (2) H0: Native speakers' and advanced learners' vowel duration preceding the voiced consonant does not change because the compensatory effect occurs only within a single moraic CV boundary. Data from the advanced learners are expected to fall between native speakers and elementary learners.

2.3. Materials
The following sets of test words are read by subjects. The words shown below without glosses are nonsense words. Although Sugito (1989:172) reports that Japanese syllables do not change in duration even if the test words in a frame sentence have different pitch patterns, all the test words were read with a HL pitch pattern.

kaka              kata ('shoulder')  kasa ('umbrella')

kaga (place-name) kada (place-name)  kaza

The six words above were embedded in the frame sentence:

Kore-ga    ...  desu.

"This-NOM  is   ....

It must be mentioned that some of the elementary learners misread the test sentences during the first rehearsal time. All the British subjects have learned Japanese kana alphabets, however, the elementary learners sometimes seem to have trouble reading the list because Japanese voiced sound is written in a character very similar to its voiceless counterpart. That is, just adding double dots ( ゛) makes a difference between voiced (がga) and voiceless (かka) segments in Japanese. If all the test words are written in Roman transcriptions, on the contrary, native Japanese speakers would feel unnatural and have difficulty when reading the sentences. This fact implies a need for research without using alphabets or kana, such as close examination of spontaneous speech outside the recording room. A set of test words which are comprised of only nonsense words might be more suitable, however, Klatt (1976) concludes that the similarities between nonsense syllable studies and spontaneous speech are greater than the differences. The reading lists were written in hiragana. Total number of utterances amounted to 6 words×5 times×12 subjects = 360.
2.4. Subjects
Subjects in this experiment are four native speakers of Japanese, four British elementary learners of Japanese, and four British advanced learners of Japanese. All the native Japanese are speakers of standard Tokyo dialect, and are all language teachers in Japan.

The British elementary learners of Japanese are sophomores at the University of Edinburgh. They have studied Japanese one and half years including four lectures on basic Japanese grammar and two follow-up tutorials per week. They have no experience in studying abroad.

The British advanced learners are senior students at the same university, who lived in Japan for one year as exchange students studying Japanese language. The curriculum they finished includes Japanese grammar, translation into and from Japanese, 'conversation and discussion', and Kanji. Although their vocabulary in Japanese is somewhat limited and some unnatural accentuation still remains, they have little difficulty in making themselves understood in Japanese.

2.5. Procedure
Each sentence was read five times after a few minutes of practice. Their utterances were recorded using a sensitive condenser microphone (Senheisser MKH-815). The signal was sent to a microphone amplifier (Soundcraft 200B) and recorded digitally on a DAT recorder (SONY PCM-2700A) with 16bit/44.1KHz sampling. The recordings were analyzed on a UNIX workstation (Sun Sparc Station 5) with a D/A and A/D conversion board. The duration of the segmental units of words were measured by wide-band spectrograms and time domain waveforms on a VDT screen of X-Waves analyzer with digitization of 16 KHz sampling frequency with aid of intensity curves and time domain curves (Figure 1 and Figure 2). The criteria used for pinpointing cursors in the display are shown in Table 1 below:


1. Consonant 1 (/k/) closure duration: the interval between the offset of energy in the formants of the preceding vowel (/a/) and the release burst spike.
2. Voice Onset Time (VOT) of Consonant 1 (/k/): from the first release of bursts to the first visible glottal pulse of the following vowel (/a/). In case that the following vowel is devoiced, the VOT becomes the entire aspiration-filled interval between the two /k/ closures.
3. Vowel 1 (/i/ or /a/) duration: from the first glottal pulse to the closure of the following consonant.
4. Consonant 2 (/k/, /g/, /t/, /d/, /s/, or /z/) duration: from the offset of energy in the formants of Vowel 1 (/a/) to either the release burst of Vowel 2 (for /k/ and /t/.) or the onset of energy in the formant structure of Vowel 2 (for /s/, /z/, /d/, and, /k/ or /t/ without release burst evidence. See Figure 1 and Figure 2.) In case that a trace of nasalized /g/ is detected, the duration of the nasal constriction is also measured.
5. Voice Onset Time for Consonant 2 (/k/ and /t/): from the beginning of the first release burst on the initial stop to the first visible striation representing glottal pulsing. In case that very weak pulses called "edge vibration (Lisker and Abramson 1967)" are observed, they are ignored as inaudible signal.
6. Vowel 2 duration: from the first glottal pulse following the release of Consonant 2 to the closure for the /d/ of the carrier word desu.
Table 1 Measurement of duration


Figure 1 Measurement of kasa by a native speaker of Japanese


Figure 2 Waveform of Figure 1.

When more than two burst spikes were observed before the offset of the consonantal closure, the first one was used to measure the voice onset time as shown in Figure 3 below. The cursors in a display window were reset every time of pinpointing (Ohala and Lyberg 1976).

The independent variable 'GROUPS' was at three levels (native speakers, elementary learners, and advanced learners), while 'VOICED' was at two levels (+/-voiced) when repeated measure ANOVA was executed. The VOTs are included in V1 or V2 when data is statistically tested because they are considered to vary consistently across the place of articulation of the stop sound (Port and Rutunno 1979).
Figure 3 An example of two burst spikes

2.6. Results
Mean duration and the standard deviations are shown in Table 2. Each column is means pooled across the subjects. The results are plotted in Figure 4, Figure 5 and Figure 6. The most eminent feature is that the native speakers' overall word duration is similar despite its segmental varieties. The total word duration of kaga is slightly longer than the rest of test words. However, almost all of the native speakers keep their duration consistent in spite of the variety of segmental duration. It means that if the number of mora is the same, the duration of the word looks almost the same because of its mora-timing. On the other hand, word duration by elementary learners looks much more deviated than that of native speakers. Total word duration of the elementary learners becomes longer when the test words include voiced C2. It implies that isochronal mora timing is not realized correctly when elementary learners speak Japanese words. Advanced learners fall between native speakers and elementary learners when it comes to total word duration.

Repeated measure ANOVA reveals that voiced C2 duration by native speakers is shorter than voiceless C2 (F(1,22)=11.42 p<0.005), but that voiced C2 duration by elementary learners is not shorter than their voiceless counterparts (F(1,22)=0.15 n.s.). Advanced learners fall between native speakers and elementary learners (F(1,22)=3.29 p=0.083). This result enables the null hypothesis (1) to be rejected. While native speakers' voicing of C2 makes itself shorter than voiceless C2, the voicing of C2 makes the preceding V1 longer (F(1,22)=17.26 p<0.001). The same is true for advanced learners (F(1,22)=11.09 p<0.005), but this compensatory effect is not seen in elementary learners' utterance (F(1,22)=0.13 n.s.). Then the null hypothesis (2) needs to be rejected too.

The subject groups have a significant main effect on all total word duration (F(2,69)=141.57 p<0.001) across all test words probably because elementary learners take nearly twice as long to utter the test words than native speakers. The total word duration of each group is, however, not affected by the distinction of [+/-voiced] features (native speakers: F(1,22)=0.09 n.s., advanced learners: F(1,22)=1.07 n.s.). It implies that the total duration of Japanese words could be used, too, as a scale of learners' achievement in second language phonology. Table 2 Mean segmental duration (S.D.) in ms.

Figure 4 Segmental duration by native speakers.

Figure 5 Segmental duration by elementary learners.

Figure 6 Segmental duration by advanced speakers.

The /k/ and /t/ in C2 position are different from each other in point of the place of articulation. This difference gives no effect on the total word duration (F(1,14)=0.04, n.s.) of native speakers. In this distinction of /k/ and /t/, elementary learners (F(1,14)=0.19 n.s.) and advanced learners (F(1,14)=0.39 n.s.) showed no difference of total word duration either. Another pair of /s/ and /z/ in C2 position differ in their manner of articulation. They also exhibit no effect on the total word duration (native speakers: F(1,14)=0.01 n.s., elementary learners: F(1,14)=0.19 n.s., and advanced learners: F(1,14)=0.78 n.s.). The results above reconfirm the findings by Port et.al. (1987) with native Japanese subjects. Then it seems that effect of C2 voicing, regardless of their different articulation, would be a suitable scale to evaluate learners' SLA phonology, because effects of the manner of articulation in Japanese language are, as Campbell (1992) and Sugito (1989) pointed out, not very large.

3. Discussion and Conclusion

The intrinsic difference in duration between voiced and voiceless consonants plays a key role in the experiment. It has been reported in Port, et.al. (1980) and Campbell (1987) that voiced consonants are intrinsically short and that a vowel becomes longer when it precedes the voiced consonant. The result above revealed that the same kind of compensatory segmental adjustment is observed in the speech of native speakers and advanced learners of Japanese. While the Japanese voiceless obstruent C2 read by English speakers is longer than voiced ones, British elementary learners of Japanese could not share this effect perhaps because of their insufficient phonological system, which is expected to cause the lengthening of devoiced consonants. In other words, learners can be classified from a viewpoint of whether they can show the universal-looking lengthening effect of voicing. It might be one of the elementary learners' targets to achieve more natural utterances.

A more conspicuous fact would be that a compensatory effect occurs beyond the CV boundary of the Japanese C1V1.C2V2 word. That is, C2 segmental duration gives effect on the preceding V1 duration across the moraic CV boundaries. This compensation by C2 for the duration of preceding V1 is observed only in native Japanese speakers' and advanced learners' utterances. It implies that the compensatory effect across the boundary of two morae might be also one of the important factors of natural speech in Japanese because V1 duration by elementary learners does not vary by the effect of its following C2 duration. It may support the existence of a unit of two morae, or bimoraic foot, which explains Japanese compensatory effect more well-formedly.

It is true, in this experiment, that a unit of Japanese CV morae is also a unit of CV syllables at the same time. Though discussion on phonological segmentation of interlanguage is beyond the scope of this short paper, Ohtake's experiment (1992) of target monitoring tasks also hints at the co-existence of syllables and morae in Japanese phonology. Apart from the dispute of whether it is a language-universal effect or not, language teachers and researchers need to keep studying what prevents the elementary learners' compensatory lengthening and how their first language interacts with their interlanguage phonology.

Traditional pedagogy has claimed that the basic phonological unit of Japanese is moraic (C)V sequences, which have constantly equal duration. However, teachers of Japanese need to know a more important fact that compensatory timing control of Japanese is realized not only inside of a single CV sequence but also across the adjacent CV units to keep the total word duration consistent.

4. References

Beckman, M. 1982. 'Segmental duration and the 'mora' in Japanese' Phonetica. 39. 113-135.
Beckman, M. and Shoji Atsuko. 1984. "Spectral and Perceptual Evidence for CV Coarticulation in Devoiced /si/ and /syu/ in Japanese." Phonetica. 41. 61-71.
Borden, G. J., Harris, K. S., and Raphael, L. J.. 1994. Speech Science Primer (Third Edition). Baltimore: Williams & Wilkins.
Blumstein, S. E. and Stevens, K. N. 1979. "Acoustic invariance in speech production: Evidence from measurements of the spectral characteristics of stop consonants." Journal of the Acoustical Society of America. 66. 1001-1017.
Campbell, N. 1992. 'Speech Timing in English and Japanese.' Proceedings for International Symposium on Japanese Prosody. Nara, Japan.
Clark, J. and Yallop, C. 1995. An introduction to Phonetics and Phonology (Second Edition). Oxford: Blackwell.
Cutler, A., Mehler, J., Norris, D., and Segui, J.. 1986. 'The syllable's roll in the segmentation of French and English.' Journal of Memory and Language. 25. 385-400.
Daigakuto Kagaku Kokai Symposium Soshikiiinkai. 1993. Kokusaikasuru Nihongo. Tokyo: KUBAPRO.
Flege, J. E. and Hillenbrand, J. 1984. 'Limits on phonetic Accuracy in Foreign Language Speech Production.' Journal of Acoustical Society of America. 76(3).. 706-721.
Fowler, A. 1980. "Coarticulation and theories of extrinsic timing." Journal of Phonetics. 8. 113-133.
Fry, D. B. 1955. "Duration and intensity as Physical Correlates of Linguistic Stress." Journal of Acoustical Society of America. 27. 765-768.
Ganong, W. F. and Zatorre, R. J. 1980. 'Measuring phoneme boundaries in four ways.' Journal of the Acoustical Society of America. 68. 431-439.
Han, M. S.. 1962. 'The Featureof Duration in Japanese.' Onsei no Kenkyu. 10. 65-80.
Hoequist, C. Jr. 1983. "Syllable Duration in Stress-, Syllable- and Mora-Timed Languages." Phonetica. 40. 203-237.
Kaiki N. and Sagisaka Y. 1992. 'The control of segmental duration in speech synthesis using statistical methods.' In Tokura (ed.). Speech Perception, Production and Linguistic Structure. Amsterdam: IOS Press.
James, A. and Leather, J. (ed.) 1986. Sound Patterns in Second Language Acquisition. Dordrecht: Foris.
Klatt, D. H. 1975. "Vowel lengthening is Syntactically Determined in a Connected Discourse." Journal of Phonetics. 3. 129-140.
Klatt, D. H. 1976. "Linguistic uses of segmental duration in English: Acoustic and perceptual evidence." Journal of the Acoustical Society of America. 59. 5. 1208-1221.
Ladefoged, P. 1993. A Course in Phonetics (Third Edition). New York: Harcourt Brace Jovanovich.
Ladefoged, P. and Maddieson, I. 1996. The Sounds of the World's Language. Oxford: Blackwell.
Laver, J. 1994. Principles of phonetics. Cambridge: Cambridge University Press.
Lehiste, I. 1970. Suprasegmentals. Cambride: MIT Press.
Lehiste, I. 1972. "The timing of utterances and lingustic boundaries." Journal of the Acoustical Society of America. 51. 2018-2024.
Mehler, J., Dommergues, J., Frauenfelder, U., and Segui, J. 1981. 'The syllable's role in speech segmentation.' Journal of Verbal Learning and Verbal Behavior. 9. 295-302.
Meyer, A. S. 1992. 'Investigation of phonological encoding through speech error analyses: Achievements, limitations, and laternatives.' Cognition. 42. 181-211.'
Morton, J. and Frankish, C. 1976. "Perceptual centres (P-centers)." Psychological Review. 83. 405-408.
Ohala, J., and Lyberg S. 1976. "Comments on 'temporal interactions within a phrase and sentence.'" Journal of the Acoustical Society of America. 59. 990-992.
Otake Koji. 1992. 'Nihongo onseino chikakujyono bunsetsuno tan'i: onsetsuto mora'. In Haraguchi S. ed. Nihongono morato onsetsukozoni kansuru sogoteki kenkyu 2. Mombusho 1992.
Pierrehumbert, Janet. 1994. 'Syllable structure and word stuctue: a study of triconsonantal clusters in English. In Keating, P. A. ed.. Phonological Structure and Phonetic Form: Papers in Laboratory Phonology III. Cambridge: Cambridge University Press. 168-190.
Port, R. F., Dalby, J., and O'Dell, M. 1987. 'Evidence for mora timing in Japanese'. Journal of the Acoustical Society of America. 81(5). 1574-1585.
Port, R. F., Al-Ani, S., and Maeda S. 1980. 'Temporal compensation and universal phonetics,' Phonetica. 37. 235-252.
Port, R. F., and Rotunno, R. 1979. "Relation between voice-onset time and vowel duration." Journal of the Acoustical Society of America. 66. 654-662.
Sugito Miyoko. 1989. 'Nihongoto Eigono accent-to intonation'. In Sugito (ed.) 1989. Nihongoto Nihongo Kyoiku. Vol. 2. Tokyo: Meiji Shoin.
Sugito Miyoko. 1996. Nihongono Oto. Osaka: Izumi Shoin.
Takagi Naoyuki and Mann, V. 1994. "A perceptual basis for the systematic phonological correspondences between Japanese load(sic.) words and their English source words." Journal of Phonetics. 22. 343-356.
Vance, T. J. 1987. An introduction to Japanese phonology. New York: State University of New York Press.

This research at Edinburgh University was supported by Rotary Foundation Grant 1996. I am also grateful to all staff at the Department of Linguistics and Applied Linguistics, especially to Dr. Turk, Dr. Sorace, and Mr. K. Mitchell for their kind advice. All errors and misunderstandings are mine. Please write to nagai@holyrood.ed.ac.uk or nagai@lisa.lang.cn.ac.jp


(c) Katsumi NAGAI 1998 : Jump to the top, Centre for Research and Educational Development in Higher Education, and Faculty of Education, Kagawa University, 760-8521 JAPAN