On the other hand, English is called a stress-timed language, whose stresses occur at almost equal intervals. While stressed syllables in English are pronounced with a greater amount of energy than unstressed syllables as Ladefoged (1993) explains, Japanese speakers mainly change the height of tone to accentuate the segment. The rules of stress alignment, which can apply recursively to realize a well-formed description, have been studied by a number of researchers. However, it must be noted that stress-timing and mora-timing are categorizations made by different criteria. Since stress is a supersegmental feature of utterances, stress can be placed on both syllables and morae. It is necessary to examine the phonetic reality of mora-timing by both native speakers and learners of Japanese because mora-timing by speakers of stress-timed language has been relatively neglected.
The concept of the mora, as an abstract isochronous unit of timing in Japanese, was comprehensively re-examined by Port, et.al. (1987). They stated that the mora needs to be defined by the segments which comprise it, rather than as a traditional CV syllable which yields compensation within itself. Their experiments included the investigation of utterances which contain more than two morae at the same time unlike Beckman's experiment (1982). They concluded that "the concept of the mora as an abstract isochronous unit of timing in Japanese captures many of the most salient features of timing in this language (Port, et.al.1987:1584)." If so, acquisition of mora-timing would be indispensable for learners of Japanese. Experiments in this paper try to answer the following questions by examining reality and importance of British learners' mora-timing in Japanese: (i) "Does British learners' length of learning Japanese affect the total duration and mora timing of Japanese words?" and (ii) "Which of morae or syllables do the learners count when the number of morae and syllables is different?"
Watashi-wa ...-to ii-mashi-ta. I-TOP ...-ACC say-POLITE-PAST "I said ...."
# of morae ra-set ka-set si-set 1 ra ka si 2 raku kata sita 3 rakuda katana sitaku 4 rakudaga katanasi sitakusu 5 rakudagata katanarasi sitakusuru 6 rakudagataka katanarasida sitakusuruka 7 rakudagatakasi katanarasidake sitakusurukanaTable 1 Test words in Experiment 1
The British elementary learners of Japanese are sophomores at the University of Edinburgh. They have studied Japanese nearly one and half years including six lectures and one tutorial per week at the Centre for Japanese Studies. They have no experience in studying abroad.
The British advanced learners are senior students at the same university, who lived in Japan for one year as exchange students to study Japanese language. The curriculum they finished includes Japanese grammar, translation into and from Japanese, conversation and discussion, and Kanji. Their vocabulary in Japanese is somewhat limited and some unnatural accentuation still remains, but they have little difficulty in making themselves understood in Japanese.
Their utterances were recorded using a sensitive condenser microphone (Senheisser MKH-815). The signal was sent to a microphone amplifier (Soundcraft 200B) and recorded digitally on a DAT recorder (SONY PCM-2700A) with 16bit/44.1KHz sampling. The recordings were analyzed on a UNIX workstation (Sun Sparc Station) with a D/A and A/D conversion board. The duration of the segmental units of words was measured by wide-band spectrograms and time domain waveforms on a VDT screen of X-Waves analyzer with digitization of 16 KHz sampling.
Consistency was the first priority in measuring duration. The criteria for segmentation followed standards in Ladefoged (1993:199ff.). The beginning of /s/ was the onset of the noise pattern in higher frequencies, and, in case of /k/, the onset of the closure was measured. The end of words were the onset of the closure of /t/ in the carrier sentences. Apart from the confusion of a psychological and physical scale, segmentation of speech sound is said to be very difficult because speech is a continuum. Beckman and Shoji (1984) also state that "a central problem in the study of speech production and perception is the difficulty of reconciling linguistic representations of an utterance as a series of discrete, static, temporally unspecified phonemic segments with the lack of such units in the acoustic signal". However, as Ganong and Zatorre (1980) tested the reliability of four methods of measuring phoneme boundaries, recent sound spectrographs and waveforms created precisely by computers are quite reliable and powerful tools to analyze speech sound. Blumstein and Stevens's pioneering work (1979:1002) says, 'the observation, based on acoustic theory, that short-time spectra sampled at consonantal release show distinctive gross characteristics for different places of articulation suggests that these properties are utilized by the human speech perception mechanism in order to extract information conveying place of articulation.' It is doubtless that careful investigation of the spectra, where amplitude is plotted against frequency by computer, reveals a close correlation between foreign learners' segmentation control and the acoustic signal. One of the most difficult things was pinpointing the onset of /r/. Unlike stop sounds which have a clear onset and offset of closure, the onset of /r/ is unclear, especially after /a/ in the carrier sentence. Intensity curves and time-domain waveforms were essential aids there.
Where the focus is on the difference between native speakers and learners of Japanese, si-set of the test words must be examined carefully because the set is expected to include devoiced high vowels after /s/. If /i/ in si-set is constantly devoiced, the number of syllables of the si-set would be different because adding only a consonant makes CCV ([?ta]) and CCVCCV ([?takhsu]) structures. English speakers might be distinguished from native Japanese speakers when the number of morae and the number of syllables in the word is different as in the case of sita (2 morae/1 syllable) and sita.kusu (4 morae/2 syllables) in si-set probably because English speakers are used to syllable-based segmentation. However, such an effect in si-set is hardly seen in the result of Experiment 1. English speakers can be hardly distinguished from Japanese speakers from the viewpoint of their word duration. The reason might be either (i) English speakers' segmentation at CVC boundaries of Japanese words is different from that of English words. That is, they can realize the isochrony of morae when they speak Japanese CVC syllables putting two phonological peaks on the CVC word as is seen in moraic consonants of Osaka dialect. Or, (ii) high vowels in the test words (/i/ and /u/) are not devoiced. In other words, the test set does not have a CVC structure but have a CV.CV structure.
If (i) above is right, it would be necessary to examine English speakers' segmentation of Japanese CVC words more closely. Why do their utterances in Japanese sound less natural than native speakers despite of their linear increases of duration? This question leads up to the aim of Experiment 2 because the difference between mora-timing and syllable-timing is tested directly. On the other hand, (ii) could be a counter-proposition to Port et.al. (1987). The spectrographs were double-checked whether the vowels in si-set were devoiced or deleted. The close examination proved that /i/ between /s/ and /t/ (in its carrier sentence) were often devoiced, especially when native speakers spoke at a fast tempo. One speaker, who frequently voiced /i/ between /s/ and /t/, sometimes pronounced it weaker or devoiced. When the same subject spoke at a fast tempo, /i/ was often deleted in the spectrograph. It was observed in his data that more slowly and carefully the speaker spoke, the less devoicing and deletions of /i/ occurred. However, it was impossible to deduct a general rule or environment where the devoicing occurred.
Japanese is often referred as a language with the voiceless vowels. Ladefoged and Maddieson (1996:49) explain that vibration of vocal folds is prevented by opening the glottis widely enough so that the folds are too far apart to vibrate, or by too low or high subglottal pressure, even if the other articulatory organs are set appropriately. Phonologically, the simplest rule would be that "high vowels (/i/ and /u/) are devoiced when preceded and followed by voiceless obstruents," as seen in Fromkin and Rodman (1983:37). Devoicing and deletion of Japanese vowels sound complicated because most preceding studies include subjective conditions such as "in slow or careful speech." Sakuma (1963:232) and Vance (1987:48), for example, claim that devoicing is applied "in careless pronunciation". Observation of the data in Experiment 1 revealed that devoicing in Japanese occurs like an allophonic free variation in both native and non-native speakers of Japanese. Pedagogically more important fact would be that devoicing or deleting vowels gives little effect on the total word duration. In other words, word duration increases almost linearly as the number of morae increase, even if a vowel in the test word is deleted or devoiced.
In order to test whether English speakers' mora-timing is properly at work or not, duration of test words which have two syllables and three morae (e.g. baa.ku and bak.ku) and test words which have three syllables and three morae (e.g. ba.ku.do) are compared. If elementary learners of Japanese cannot use a mora-counting tactic when they speak, duration of the test words which have two syllables and three morae (baa.ku and bak.ku) would be shorter than that of the test words which have three syllables and three morae (ba.ku.do). Therefore, the null hypothesis in this experiment is that H0: the elementary learners' duration of three-mora/three-syllable words (ba.ku.do and bi.ku.do) is not longer than that of three-mora/two-syllable words (baa.ku, bak.ku, bii.ku, and bik.ku).
baku (2/2) baaku (3/2) bakku (3/2) bakudo (3/3) (number of biku (2/2) biiku (3/2) bikku (3/2) bikudo (3/3) morae/syllables)Two of them (bikku and bakku) are loan words from English (devoiced 'big' and 'bag'), and another two of them, which are less common words in Japanese, have the meanings (biku 'creel' and baku 'tapir'). All the rest are nonsense words. The sentence list is written in hiragana orthography.Table 2 Test words in Experiment 2
Measurement of intervals on a visual display followed the method and criteria of Port, et.al.. (1987) and Port and Rotunnno (1979) Amplitude and waveform windows of X-Waves were essential aids because the relative degree of darkness in a wide band spectrogram sometimes showed only a rough cut of the segment. There the magnified waveform needed to be examined on the screen. Close observation of the amplitude of the waveform and repeated audio playbacks between the two cursors enabled fairly precise segmentation. When more than two burst spikes were observed before the offset of the consonantal closure, the first one was used to measure the voice onset time. In order to minimize the sampling and measurement error, a reliability check by a newly sampled spectrogram was conducted again, and the cursors on a VDT window were reset at every pinpointing (Ohala and Lyberg 1976). All data were analyzed with SPSS (4.0 for UNIX System V/386) statistical package. MANOVA command was performed because the same experimental unit was measured repeatedly. The repeated data was set horizontally so that all of a subject's scores across occasions resided in one case. This type of multivariate data-setup prevents subjects from being involved in a random effect nested under between-subject factors (SPSS 1988:33, Davidson 1996:158-160). As in Port, et.al. (1987), Voice Onset Time was calculated as a part of the following vowel (/u/) in the statistical tests.
The duration of two-mora words (baku and biku) should be two-thirds as long as that of three-mora words if each mora duration is constant. Native speakers' mean duration of two-mora words (281ms) yield 73.2% of three-mora words (388ms). It is close to 75%, and two-mora words are shorter than three mora words (baku: F(1,10)=15.33 p<0.01, biku: F(1,10)=13.35. p<0.01). Most duration of two-mora words by non-native speakers are also shorter (F (1,10)=5.16 p<0.05 (baku), F(1,10)=4.46 p=0.06 (biku) for elementary learners, and F(1,10)=6.06 p<0.05 (baku) F(1,10)=9.86 p<0.01 (biku) for advanced learners) than three-mora words, but elementary learners' duration of two-mora words (86.5% of three-mora words) is longer than those of advanced learners (81.4% of three-mora words).
The native speakers' duration of initial /b/ are longer in the two-syllable/three-mora words. For example, /b/ in baaku and bakku is longer than those of baku and bakudo (F(1,14)=4.75 p<0.05), and the same is true in biiku and bikku (F(1,14)=5.42 p<0.05). This result is consistent with Port, et.al.. (1987). However, elementary learners' duration of /b/ in baaku and bakku is not longer than those of baku and bakudo (F(1,14)=0.06 n.s. /b/ in biiku/bikku is not longer either (F(1,14)=0.01 n.s.). Advanced learners are indistinguishable from elementary learners in the point of /b/ duration (/b/ in baaku/bakku F(1,14)=0.21 n.s. /b/ in biiku/bikku F(1,14)=0.21 n.s.).
It is natural that /k/ in bakku or bikku is longer than other words because the control of closure duration plays an important role for the naturalness of Japanese stop consonants (Han 1992). Native speakers (2.38 times) and advanced learners (2.36 times) achieve more than twice closure duration. However, the duration is smaller for elementary learners (1.72 times). It results in the elementary learners' foreign accent because insufficient closure of Japanese geminate consonants yields unnaturalness and perception of a single consonant (Sugito 11989:169, Nagai 1994). It is also true that opinions are still divided on the relation between closure duration of stop consonants and their naturalness as seen in Beckman (1982) and Han (1992). Campbell's statistical survey (1992) reports that strong correlations are observed between the duration of consonants and the type, gemination, and position in the breath group. He shows that there is another strong correlation between the vowel duration and its neighboring phonemic environment or its position in the sentence. However, it would be also important that extra duration is said to go into pauses in Japanese (Sugito 1982:343ff.). The same effect is reported in the case of English (Goldman-Elsler 1968).
Port, et.al. (1987) showed that native speakers' /k/ in baaku and biiku is longer than those in baku, bakudo, biku, and bikudo. However, such effects are rarely seen in the result of this experiment. Native speakers' /k/ in baaku/biiku is not longer than /k/ in baku/biku and bakudo/bikudo (baaku: F(1,10)=3.23 n.s. biiku: F(1,10)=1.36 n.s.). Port, et.al. (1987) claimed that their finding of longer /k/ after longer vowels was the evidence of the incorrectness of positing isochronous syllable-timed compensation. This discrepancy might imply that voicing of the second consonant is more influential than the duration of its preceding vowel. The diversity of speaking rates between groups must be mentioned lastly. Mean duration of three-mora words (baaku, bakku, bakudo, biiku, bikku, and bikudo) pooled within groups are 388ms for native speakers, 713ms for elementary learners, and 516ms for advanced learners. It may indicate that the higher level learners achieve, the faster they can speak. According to Haggins (1964), the normal range of conversational speaking rate is 4 to 7 syllables per second (250ms to 143ms per syllable) in English. A large number of the values by elementary learners are outside the range of four to seven syllables per second.
Needless to say, physiological mechanisms of phonation do not differ among speakers' of every language. It means that English speakers and Japanese speakers share the same articulatory system. It may be taken for granted, however, some language teachers still cannot be free from a superstition like "Japanese people cannot speak as rhythmical as English speaking people because we lack innately good rhythmic sense they have." etc.. Naturally, it takes more effort to open the jaw for a long distance. The test word baaku and biiku include long vowels which should be longer than the baku/bakudo and biku/bikudo because shorter vowels tend to involve less jaw opening.
Mora-timing in Japanese might not be compatible with the models constructed for syllable-timed and stress-timed language (Port, Al-Ani, and Maeda 1980). Even though the segmental duration have linguistic 'free variations,' language teachers need to make them 'learnable' to English speakers as language-specific characteristics. If speaking a second language is a mapping from their stress-timed segmentation into the mora-timed one, it would be interesting to posit a function between, for example, the duration of /k/ by native speakers and /k/ by language learners. Suppose the mapping over mora timing is a filter of English phonology, it might be described as a bundle of rules extracted from the differences of both languages. For example, English speakers' lip rounding for [u] sound, which begins with the articulation of the preceding sound such as [s] in the word strew (Borden, Harris, and Raphael 1994:156), must be filtered out there. The rules need to include a "converter" from syllable-counting to mora-counting such as "syllable-opener" to make basic Japanese CV syllables out of CVC syllables. Because typical problems of assigning ill-formed timing seem to occur when a CVC syllable is reorganized into moraic CV structure, studying the segmentation problem of closed syllables might be a shortcut for the research of second language learning. At a phonological level, Kubozono (1995) succeeds in explaining a relation between Japanese loan words and their English source words. Takagi and Mann (1994) advocate the hypothesis that the length of Japanese loan word vowels and consonants can be predicted by their original words. Their perceptual experiment revealed that the lax vs. tense vowels in English words systematically correspond to short vs. long vowels respectively, and the stops after lax vs. tense vowels correspond to the geminate vs. single consonants in the Japanese loan words. Two experiments in this paper are trials to bridge the gap between the preceding studies and second language learning. Further experimental studies with a variety of words and pitch patterns would be essential to apply the findings to teaching pronunciation of the Japanese language.
Figure 2 Mean word duration by elementary learners (at fast tempo)
Figure 3 Mean word duration by advanced learners (at fast tempo)
Figure 4 Mean mora duration by native speakers (at fast tempo)
Figure 5 Mean mora duration by elementary learners (at fast tempo)
Figure 6 Mean mora duration by advanced learners (at fast tempo)
Figure 7 Segmental and word duration by native speakers
Figure 8 Segmental and word duration by British elementary learners of Japanese
Figure 9 Segmental and word duration by British advanced learners of Japanese