Skip Navigation

Literary and Linguistic Computing 1993 8(4):243-257; doi:10.1093/llc/8.4.243
© 1993 by Association for Literary & Linguistic Computing
This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by BIBER, D.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?


Articles

Representativeness in Corpus Design

DOUGLAS BIBER

Department of English, Northern Arizona University

Douglas Biber, Department of English, Northern Arizona University, PO Box 6032, Flagstaff, AZ 86011–6032, USA. E-mail: biber{at}nauvax.ucc.nau.edu
The present paper addresses a number of issues related to achieving ‘representativeness’ in linguistic corpus design, including: discussion of what it means to ‘represent’ a language, definition of the target population, stratified versus proportional sampling of a language, sampling within texts, and issues relating to the required sample size (number of texts) of a corpus. The paper distinguishes among various ways that linguistic features can be distributed within and across texts; it analyses the distributions of several particular features, and it discusses the implications of these distributions for corpus design.

The paper argues that theoretical research should be prior in corpus design, to identify the situational parameters that distinguish among texts in a speech community, and to identify the types of linguistic features that will be analysed in the corpus. These theoretical considerations should be complemented by empirical investigations of linguistic variation in a pilot corpus of texts, as a basis for specific sampling decisions. The actual construction of a corpus would then proceed in cycles: the original design based on theoretical and pilot-study analyses, followed by collection of texts, followed by further empirical investigations of linguistic variation and revision of the design.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Journal of English LinguisticsHome page
Z. Xiao and A. McEnery
Two Approaches to Genre Analysis: Three Genres in Modern American English
Journal of English Linguistics, March 1, 2005; 33(1): 62 - 82.
[Abstract] [PDF]


Home page
American SpeechHome page
C. F. MEYER
ADS ANNUAL LECTURE: CAN YOU REALLY STUDY LANGUAGE VARIATION IN LINGUISTIC CORPORA?
American Speech, December 1, 2004; 79(4): 339 - 355.
[Abstract] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.