THE MEANS AND METHODS OF SPEECH EXAMINATION

X-RAY PHOTOGRAPHY

Those processes of articulation wich are hidden from direct visual observation can be made visible and well analysable by means of X-ray photography. Thus the larynx and the articulatory movements taking place in the supraglottal cavities and the configuration of the articulatory organs producing a speech sound at a given moment constitute the subject matter of phonetic investigation.

Radiographic speech investigation is almost as old as the discovery of X-radiation, since there have been cases of its use for linguistic purposes from as early as 1897�two years after its discovery.[3]

Although the method is old, it is not out-of-date, since no other instrument can replace it (not even fibreoptics cinematography), and besides, the techniques of radiology have developed and improved considerably in the past decades.

Its use for phonetic purposes requires the fixation and storage of the picture appearing on the screen of the X-ray apparatus for purposes of thorough examination, because by synchronic observation we can perceive and bring to our consciousness only a small fragment of the complex series of articulatory movements which produce a speech sound. Thus it became necessary to resort to photographing the X-ray image on celluloid. By means of photoradiography we can produce a static picture of a phase of articulation, while cineradiography catches the workings of the articulatory organs in their dynamic process. The invention of the videotape-recorder has opened new opportunities for the radiological investigation of the mechanisms of speech, because there is no longer need for the time- and material-consuming processing of the film; the recordings made are immediately ready for use, and the synchronising of the sound and picture sequence constitutes no problem.

1 have taken my photoradiograms at the Radiological Clinic of the Semmelweis University of Medical Sciences with the kind help of Professor Dr.

Istvan Torok. The up-to-date Siemens apparatus Sirescop 2 (see Figure 2) with its picture intensifier enabled us the take a relatively large number of photographs of one informant. The photographs were taken at a distance of 60 cm from the X-ray tube, in standing position, with an exposure time of 0.04 sec, at 64 kW and 16 mAs, on sheet film of 13 by 18 cm (Forte Medifort R and ORWO HS 11). Counterstaining was not used.

Because of the short exposure time I had the sounds produced in their natural environment, i.e. in words. By listening attentively and by observing the articulation on a monitor we managed to make exposures of the sounds in question. Even so, reduced vowels and plosives demanded great practice and good reflexes. 1 previously made the informants acquainted with the purpose and process of the experiment and made them familiar with the working of the articulatory organs by presenting to them an X-ray film on Hungarian sound production. Before each exposure I announced the word to be pronounced and told the radiologist on which sound to make the exposure. The informants repeated the word once, then, at a signal from the radiologist, they said it again so that the photoradiogram could be taken. The entire experiment was recorded by a UHER tape-recorder and, in spite of the noise of the apparatus, it can be clearly distinguished on which element of the sound string uttered the exposure had been made.

In the conspectus I publish the contact prints of the X-ray film. Knowing that the fine differences in shade lose much in consequence of copying and that the information contained in a radiogram is hard to transmit in a typographical publication, for greater lucidity and easier orientation I have made schematic drawings of the X-ray films and thus I have managed to bring out even the contours not visible in the photocopies. With one or two exceptions all radiograms were taken of sounds in the rendering of Anatoly Gusev.

ARTICULATORY ORGANS IN THE RADIOGRAM

(Figures indicate articulatory organs in the schematic radiogram of Plate 1., while capital letters indicate resonating chambers.)

� � mouth cavity � � pharyngeal cavity C � atrium D � nasal cavity [4]

LABIOGRAPHY

The role of the lips in articulation was recognized from the beginning of the development of phonetics.

Lip articulation belongs to that type of articulatory activity which can be most easily observed without any special appliance, by mere visual perception. However, the fact that the articulatory functioning of the lips consists in series of complex movements which are swift and, therefore, hard to follow, did lead to some difficulty. The thorough observation and description of the details was not easy. Furthermore, in speech, the articulation of individual sounds is realized in constant transition and it is often difficult to grasp one phase of the articulation and describe it as a characteristic feature of the articulation of the sound in question. Therefore, at an earlier stage of phonetic investigation, isolated sounds were pronounced in prolongation and drawings were made of the lip positions.

On the basis of lip articulation speech sounds can be divided into different groups. In one group the lips serve as the place of articulation (bilabial plosives, fricatives and rolled sounds, besides labiodental fricatives and affricates) and thus have a central role in articulation. Secondly, they play an important part in producing the sound quality by changing the opening of the resonating chambers (the principle of classification for vowels is articulation with rounded versus spread lips: the differentiation between labial and illabial vowels.) Finally there are sounds in the articulation of which the lips have a minor role, because due to coarticulatory phenomena these sounds are not characterized by an independent lip position but take over the lip position of the neighbouring sounds.

Of the various instruments available (such as the branched apparatus for measuring the distance between the lips, the apparatus for measuring the pressure of the lips, electromyographic instruments, etc.) and methods of investigating lip articulation we have chosen the photolabiographic and cinelabiographic procedure. Both have an invaluable advantage over other methods of investigation in that they can be employed while maintaining the naturalness of speech.

Labiograms not only record lip articulation in the narrow sense of the word but they also yield analysable pictures of the articulatory movements of the face from both a frontal and a side view.

For the sake of exact measurements and description we placed measurement points on the informants� faces. By measuring the distances between such points we received numerical data about a) the moving away of the upper lip in relation to the tip of the nose, b) changes in the distance between the upper and lower lips, c) the movements of the lower lip in relation to the genial process, d) change in the distance between the tip of the nose and the genial process which gives the degree of the opening between the jaws, e) and in side-view labiograms we observed the positional changes of the lips and the genial process on a horizontal plane.

Unwanted movements and positional changes of the head were averted by seating the informant in a chair which had been specifically constructed for the purpose and provided with a headrest. For taking side-view pictures we used a mirror adjusted at the necessary angle and fixed to the chair.

The photolabiograms were taken of the characteristic phases (the so-called �pure phases�) of the articulation of sounds. The informant pronounced the sounds not in isolation but in meaningful strings of sounds, in words. The exposure had to be adjusted to the sounds to be examined, which, after some practice, we could do with fairly good results. To find out whether the photo was taken at the right moment we made a tape-recording of the entire experiment. From the utterance and the click of the shutter of the camera in the recording we were able later to point out or check what the labiogram presented. In more problematic cases we even employed retarded play-back.

The cinelabio^rams were taken with a 16 mm Pentaflex camera at a speed of 32 pictures per sec. Thus we obtained an average of 8-12 pictures of each fully articulated sound, of which 9 are included in the present conspectus.

The 9 pictures by themselves provide a sufficient basis for a correct evaluation of the transitions of the articulation. In the case of reduced vowels and certain consonants fewer pictures were obtained of the articulatory process because of the shorter length of their production. Even these sounds, however, were examined not in isolation but in words. The cuttings were made from the statics or from enlargements.

Later, in the framework of the Hungarian-Russian contrastive phonetic research project I also recorded some material with a Siemens videotape-recorder on a 1� tape at a speed of 50 pictures per sec, but the copying of the videotape has not taken place yet. Our labiographic experiments are illustrated in Figure 3.

MEASUREMENT POINTS OF THE PHOTOLABIOGRAMS

(See the photolabiogram in Plate 1)

a) The opening between the upper and lower lips. (The first set of statistics in the analysis section gives the distance between the outer measurement points, the second the distance between the inner edges of the lips as a percent of the distance measured at rest position.)

b) The opening between the corners of the lips. (In the analysis section we find two sets of statistics again. The first gives the distance between the outer measurement points, the second the distance between the inner corners of the lips.)

c) The opening of the lower jaw. (This set of statistics represents the changes of distance between the measurement points on the tip of the nose and the genial process.)

d)The horizontal movement of the lower jaw. (This is measured in the side-view picture.)

PALATOGRAPHY

The palatograms were taken by so-called mirrored palatography. In this method the informant�s tongue is covered with a greasy and innocuous paint made from medical carbon (carbo medicinalis) and cocoa-powder. Then the informant utters the sound to be examined while the mirror image of his palate is photographed onto a 35 mm film by means of the palatograph, which is shown in Figure 4.

Enlargements may subsequently be made from the film on photographic paper at will. After taking the photograph traces of the paint can easily be removed from the palate and the tongue by rinsing.

In the palatograms obtained this way the articulations of the palate and the tongue can be well observed and studied. The advantages of mirrored photopalatography over the artificial palate used earlier are numerous: a) it affects the naturalness of articulation to a smaller degree; b) it represents the articulation of not only the palate but also of the teeth and the tongue; c) the insertion of a foreign body between the tongue and the palate can be avoided;

d) a photograph can contain more information than a drawing made of the artificial palate.

The authenticity of the rendering was ensured by previous practising and sound recordings made of the experiment. As the investigation now could only be carried out by the pronunciation and photographing of individual sounds I tried to obtain linguistically relevant sound production by having the pronunciation of these sounds practised in words and making the informants conscious of the articulation of these sounds prior to the photographing. When examining vowels I also gave words to the informants to repeat a couple of times before covering their tongues with paint and, after spreading the paint on their tongues, I told them to produce only the wanted sound while recalling the words. With' consonants the problem was more easily solved due to the fact that they could be uttered in words and syllables formed with the addition of [a] (da, ta, za, na, as�). The greatest difficulty occurred with unstressed and especially with reduced vowels. By replaying the recording I was able to check the correctness of the pronunciation.

1 took several trial photographs with each informant to get them used to the experiment before the final photographing. On the last occasion I took the palatograms of the full stock of speech sounds with some short breaks.

In the conspectus I publish the palatograms of two persons (mostly of T. Krylova and, to a lesser extent, of R. Frolova) because I found them the most suitable.

THE DIVISION OF THE PALATOGRAM (See Plate 1)

a � upper front teeth (dentes superior) b � the alveolar ridge (alveolum) c � the rear part of the alveolar ridge (postalveolum) d � the hard palate (palatum durum) e � the soft palate (velum molle)

C � the central zone L � the lateral zone

LINGUOGRAPHY

To gain knowledge of the articulatory processes within the oral cavity we can use linguo- graphy. By the analysis of palatograms we can establish the place of articulation with sufficient precision but we cannot tell which part of the tongue actually touches the palate in the course of the articulation. For the description of the articulatory features of the various speech sounds we also had to specify whether they were of e.g. apical, coronal, dorsal, palatalized, etc. articulation. By the joint analysis of palatograms and linguograms we can present a more comprehensive and authentic description of articulation.

I prepared the linguograms with a procedure similar to that employed for palatograms. Instead of the tongue, however, I now covered the palate with the substance described above. After the articulation we took a photograph of the tongue. Traces of paint unquestionably show the contact area of the tongue.

The preparation of the informants, the reproduction of the speech sounds and the checking of the correctness of articulation by means of tape-recording were done as they were for the palatographic experiment. The device which we call the linguograph is shown in Figure 5.

The linguograms in this conspectus have been taken of T. Krylova�s articulation.

THE DIVISION OF THE LINGUOGRAM (See the linguogram in Plate I)

a � the tip of the tongue (apex)

b � the front of the upper part of the tongue (praedorsum) c � the centre of the upper part of the tongue (mediodorsum) d � the back of the upper part of the tongue (postdorsum) e � the right and left rims of the tongue (corona)

OSCILLOGRAPHY

The speech chain outside the somatic channels of man can be studied as an acoustic phenomenon. The speech sound (as we have pointed out earlier) is a quasi-identical complex of �� vibrations, a vibration-stereotype which can be defined by physical parameters. Its linguistic nature consists in its being an element of speech which in turn is the realization of language. Thus the extent and the identity of a speech sound is specified not only on the basis of the identity, similarity or proximity of its acoustic features, but primarily on the basis of its functional identity in speech, i.e. of its role within the sound string. From the foregoing it follows that e.g. [a] and [-a-] differ not only acoustically but also in their speech function, because they occur at different points of the sound string, in different environments and phonetic positions (cp.: ��, ��). From the point of view of correct pronunciation and speech recognition differences in sound quality are important.

At the same time the difference between the sounds [a]j and [afe of speakers X and Y is not significant in speech; it displays an extralinguistic, individual characteristic (= the speaker�s voice quality). This is why it is right to differentiate between speech sounds and their realizations by individual speakers.

If we disregarded the primacy of function in speech and tried to specify the domain of a speech sound solely, with reference to its acoustic features, we should have to divide e.g. [�a ] into at least three acoustic units or three different qualities. We could also mention here the differences in the physical reality of the silent phases and plosion phases in the articulation of plosives.

Other types of sound quality differences play a role in the semiotic functioning of language. (Cp.: �� , �� , �� ,

�� .) This function is usually considered as the phonological function. The examination of this function of language leads us to the study of the mutual relationships between elements of speech and elements of language, of the relationship between the material, empiric units of speech and the elements of the linguistic system, i.e. to phonology and morphophonology.

The investigation of the acoustic properties of speech sounds covers the following features: a) length, b) pitch, c) loudness (intensity) and d) quality. While the articulatory records (diagrams)�with the exception of cinelabiograms�either display only a short ' phase in the articulation (40 msecs in the case of radiograms, 33 msecs in the case of photo- labiograms), or yield a joint picture of the articulation of the tongue and the palate (palato- grams and linguograms), oscillograms are able to record the full process of phonation.

The oscillographic investigations were carried out from a magnetic tape with a 12 channel loop oscillograph of type K-115 made in the USSR (see Figure 6.). For the registration I used Forte photographic paper of type Kardofort Ultra with a width of 120 mm. The speed of registration was 500 mm/sec, thus 1 mm represents 2 msecs of time. Fbr the records we used 4 channels with galvanometers working at different frequency transmission bands in order to be able to set off the frequency bands which play a role in the dif- ferentation of the various types and characteristics of the speech sounds.

THE TECHNICAL DATA OF THE OSCILLOGRAM

I prepared the oscillograms of the various speech sounds in words representing the specific sound environments. It is from these that I selected the conspectus sections which are 65 mm long and show the sounds in a time framework of 130 msecs. The oscillograms were made of A. Gusev�s voice.

The lowest curve of our oscillograms serves for the analysis of the fundamental tone. For �filtering� this out we used a Danish Frokjaer-Jensen audio frequency filter of type 440, which, in the region below 220 cps, transmitted the speech signal component onto the 2.5 kcps galvanometer of the oscillograph between 50 and 220 cps.

The curve of the second channel was produced by a galvanometer with a frequency of 600 cps, which ensured linear transmission between 0 and 300 cps, and decreasing transmission above 300 cps with a decrease of 5 % per 100 cps. At 1,200 cps this already meant a decrease of 75 %.

For the third channel we used a galvanometer with a frequency of 1,200 cps. This provided linear transmission between 0 and 600 cps, while above 600 cps it showed a decrease of 10% per 200 cps.

It is the curve of the. fourth channel which yields the fullest, complex visual representation of the speech sound. Its galvanometer with a frequency of 15,000 cps was used for linear transmission in the region between 0 and 9,000 cps, while between 10,000 and 15,000 cps it worked with a setting off of 25 %, and above 15 kcps with a decreasing transmission.

SPECTROGRAPHY

With regard to their physical properties the elements of speech can be divided into three groups: a) vocal sounds consisting of periodic vibrations ([a], [o], [u], [e], [i], etc.), b) noises consisting of aperiodic vibrations ([p], [t], [k], [ts], [tfl, [f], [s], fj], [x], etc.) and c) sounds containing both vocalic and noise elements ([j], [n], [m], [r], [1] and [b], [d], fgj, [dzj, [dj], M, M, [3] etc.).

Spectrographic analysis allows us to study intensively the constituent elements of sounds and to examine the internal structure and relationships to one another of the various components. Vocals are complex sounds the spectra of which contain a fundamental tone of high intensity and low frequency and several (usually 3-5) bundles of reinforced harmonics of higher frequency and lower intensity known as formants. That is why their spectra have distinct formant patterns and characteristic formant structures. In the spectra of noises we find internally unstructured components of varying durations in the frequency band corresponding to the manner of articulation. In the spectra of voiced consonants the fundamental tone represents the vocalic element, while the noise element is identical with that of the voiceless consonants having the same place and manner of articulation.

For the investigation of the acoustic structure of Russian speech sounds we used two kinds of apparatus. I started work with a Kay6061-ASona Graph or dynamic spectrograph (see Figure 7.), which records the acoustic components of sounds on a special kind of thermosensitive paper in sequences of 2.5 secs in the region between 85 and 8,000 cps. The record itself is called a sonagram after the English name of the apparatus. The Sona Graph is capable of performing wide-band and narrow-band analysis, with or without recording of the intensity curve, in linear or logarithmic representation. Besides it can make a so-called amplitude segment of any 8 msec long section of the sound. The acoustic parameters appear in the sonagram in three dimensions. Along the vertical axis we can read the frequency (= pitch) of the components, along the horizontal axis the time, and from the degree of blackening of the component traces we can see the intensity (= strenght) of the components. For the precise determination of intensity relations amplitude segments can be made in which frequency is shown along the horizontal axis and intensity in db. along the vertical. The amplitude segments of the conspectus were made in the region between 85 to 8,000 cps, with a narrow-band analysing filter, in linear representation, of A. Gusev�s voice.

Only after completing the conspectus was it possible to use a new kind of spectrographic apparatus, which is an improved Sona Graph developed by the American Voiceprint Laboratories, referred to as Type 700. (See Figure 8.) For the sake of better quality and easier interpretation of the records I made the spectrograms with this apparatus, too. It is these that I am publishing here. Thq apparatus also analyses 2.5 sec long speech sequences in the region between 25 to 8,000 cps, in linear or logarithmic representation. Additionally, at an analysing bandwidth of 300 cps it is suitable for making what are known as contour- lined spectrograms. The spectrograms in the conspectus were made with this kind of analysis, because in this way the intensity of the components is not merely estimated from the degree of blackening but can be determined with numerical precision from the number of the contours. The contours indicate an intensity difference of 5 db. The apparatus is provided with a built-in segmentator allowing the 2.5 sec long segment to be narrowed at will. Thus it is possible to analyse the selected segment: a single sound or part of a sound. Scale-magnifying can also be performed: any frequency band below 8,000 cps can be magnified for a more thorough examination of the details. The spectograms can be provided with frequency markers at every 1,000 cps and time markers at every 100 msecs simultaneously with the analysis. The apparatus is also suitable for making amplitude segments.

In order to facilitate the observation of the acoustic character of speech sounds I presented vowels in logarithmic, consonants in linear representation. The analyses were made of A. Gusev�s voice.

<< | >>

↑

��: �. ��. �� . AKADEMIAI KIADO �� . ��, 1981. 1981

�� THE MEANS AND METHODS OF SPEECH EXAMINATION:

- �� - �� - ��. ��. �� - �� - �� - �� - �� - �� - �� - �� - ��. ��. �� -

- �� - �� - �� - �� - �� - �� - �� - �� - �� - �� - �� - �� - �� - �� - �� - �� - �� - �� - �� - �� - �� - �� - �� - �� - �� - �� - �� - �� - �� - �� - �� -

THE MEANS AND METHODS OF SPEECH EXAMINATION

��� �� ���� THE MEANS AND METHODS OF SPEECH EXAMINATION:

�� THE MEANS AND METHODS OF SPEECH EXAMINATION: