dc.description.abstract | This work addresses the information structure of the of the Minas Gerais variety of spoken Brazilian Portuguese (BP), focusing on Topic (TOP), based on the C-ORAL-BRASIL informal speech corpus. The general goals are two: (I) To contribute to the project C-ORALBRASILby recording sessions of spontaneous speech, by transcribing and segmenting, by performing text-to-speech alignment, by tagging and, more importantly, by validating the informal section of the corpus. (II) To contribute to the study of informational structure of spoken BP through the investigation of characteristics of the TOP. The perspective thatunderlies the research is the Theory of Language into Act (L-Act). According to L-Act, the speech act is the fundamental communication activity and the utterance is its linguistic realization, therefore, the utterance is the reference unit for the study of spontaneousspeech. Utterances are delimited in the speech flow by prosodic boundaries that speakers perceive as conclusive (terminal); it may consist of one or more intonational units. In the latter case, the smaller units are delimited by prosodic boundaries that speakers perceive asnon conclusive (non-terminal). Prosodic parsing is the basis for structuring speech: prosodic units correspond to information units. The more important units of an information pattern are Comment (COM) and Topic (there are others, fulfilling textual or dialogic functions). The COM is the single unit necessary to form an information pattern,because it conveys the illocutionary force, being sufficient to perform a speech act. The TOP has the function of defining the domain of reference in which the illocution expressed in the COM must be interpreted. Regarding the research corpus, this work contributed to thecompilation of the informal C-ORAL-BRASIL. The spontaneous speech transcriptions of the recorded sessions were prosodically segmented, marking the prosodic boundaries of terminal and non-terminal types. As part of the overall goal (I), a statistical validation of this prosodic annotation was conducted, by checking the agreement between annotators through the Kappa test. The results showed a general agreement of 0.9 on the annotation of terminal boundaries and 0.7 for non terminal boundaries, a result of great statistical relevance. To fulfill the objective (II), a subcorpus of C-ORAL-BRASIL was extracted and eachunit received intonational tagging regarding its information function according to the L-Act categories. It was found that the information structure of speech will be more or less complex depending on the communicative situation. When the interaction amongparticipants is less anchored in immediate context, utterances tend to be more complex, with more occurrences of TOP. The TOP is most commonly accomplished through a noun (39.5%) or verb (37.7%) phrase. In the first case, the NP is typically filled by nouns and itsdeterminers, but there is a high occurrence of demonstrative pronouns in dialogues, and personal pronouns in monologues. In the second case, there is a predominant use of VP to express hypothetical situations in conversations and dialogues, while in monologues it is more relevant that the TOP delimits a temporal domain of reference. As for the prosodicproperties, TOP on BP can be performed with four different prosodic forms (types 1, 2, 3 and 4), with type 2 been the most common (47%) and type 3 the rarest (5%). Overall, this study provided a new perspective to the study of speech, emphasizing the use of new methodsthat enable a deeper understanding about the organization of information in spoken language and how context influences the use of specific linguistic structures. | |