Text
linguistics at the millennium:
Corpus
data and missing links
robert
de beaugrande
Abstract
Text
linguistics seems to have originated chiefly in order to expand the search for
constraints, which was being noticeably impeded by the self-imposed restrictions
in a ‘linguistics’ centred on isolated, invented sentences and abstract
formalisms. Yet early attempts to bring the ‘text’ into the scope of su ch a
linguistics now seem inverted: for us the actual text, not the invented
sentence, must be the essential linguistic unit, and is sustained by internal
systemic organisation and by its external systemic organisation within one or
more ‘intertexts’. In the coming millennium, this prospect can now finally
be documented and clarified by working with very large corpora of authentic
texts, whereby we can hope to uncover some of the vital and delicate missing
links between ‘language’ and ‘text’.
A. ‘Language’ and ‘text’ in ‘modern
linguistics’
1.
Modern ‘general linguistics’ has been a singular enterprise, at times most
sharply distinguished from other approaches to ‘language’ by its
self-imposed restrictions. Since its outset, it has been influenced by Saussure’s
(1966 [1916]: 232) aspiration that ‘the true and unique object of
linguistics’ should be ‘language studied in and for itself’. In effect,
this vision of ‘linguistic’ science has been pursuing the question: what
would ‘language’ look like when it’s off by itself and not being used (cf.
§ 14, 82)?
2.
An unwelcome answer would be that it no
longer looks like a real language. If so, the term ‘language’ used by
this linguistics loses its ordinary meaning, namely: a mode of communication used among the members of a human community.
Some common uses in this meaning can be seen in these authentic data supplied in
July 1994 from the ‘Bank of English’ at Birmingham University, the world’s
largest computerised text-data corpus, then containing over 200 million words
(cf. § 60, 64, 89):
(1)
you should be pleased that the
French language has been spared
(2)
he has no qualifications in teaching
English as a Foreign Language
(3)
it’s old-fashioned, and it’s in
a foreign language. People are frightened of it
(4)
I was told afterward that my
language was most entertaining
(5)
fatally damaged? I don’t want to
use language of that sort
(6)
there is a lot of bad language and
gratuitously oafish behaviour
(7) General
Kryuchkov used the language of the Cold War when he accused the US
(8) He violently opposes the new language law, which makes
major concessions to ethnic minorities
Sample
(1) concerns the ‘French language’ spoken as a native language by a whole
nation, whereas sample (2) concerns the ‘English language’ as a
subject-matter to be ‘taught’ to, and learned by, people who speak a
different native language. Sample (3) implies that ‘a foreign language’ may
contribute to ‘frightening people’. Samples (4), (5), and (6) indicate how
particular uses of ‘language’ get evaluated. In sample (7), ‘language’
covers both the style and the content of what a Russian General said —
belligerent in tune with the ‘Cold War’. And sample (8) mentions a
government regulation concerning which language or languages should be used and
when, for instance as an ‘official language’ in a multilingual country. So
all these uses relate to real people who either might use a ‘language’ for
communication, or else might be hindered in doing so because they had ‘no
qualifications’ (2), or because the ‘language’ was ‘foreign’ (3) or
was restricted by a ‘language law’ (8), and so on. None of the uses matches
‘language in and for itself’ in the austere theoretical meaning of
Saussurian linguistics.
3.
This mismatch might explain why Saussure (1966
[1916]: 9, 11)
asserted that ‘speech cannot be studied’, ‘for we cannot discover its
unity’; it is only a ‘heterogeneous mass’ of ‘accessory and accidental
facts’ (§ 10, 21f, 39, 82). Later, Chomsky (1965: 4, 201) asserted in similar
vein that the ‘observed use of language’ ‘surely cannot constitute the
subject-matter of linguistics, if this is to be a serious discipline’; ‘much
of the actual speech observed consists of fragments and deviant expressions of a
variety of sorts’. As if in parallel, these same linguists asserted that ‘the
concrete entities of language are not directly accessible’ (Saussure, 1966
[1916]: 110); and that ‘knowledge of the language, like most facts of interest
and importance, is neither presented for direct observation nor extractable from
data by inductive procedures of any known sort’ (Chomsky, 1965: 18).
4.
The mismatch is thus a signal that ‘language’, being the true
‘subject-matter of linguistics’ as a ‘serious discipline’, is not what
real people hear or see in ‘actual speech’. Another well-known claim then
falls into place: ‘linguistic theory is concerned primarily with an ideal
speaker-listener, in a completely homogeneous speech-community, who knows its
language perfectly’ (Chomsky, 1965: 3). Neither this ‘speaker’ nor this
‘community’ ‘exist in the real world’, as Chomsky (1977: 172) has calmly
conceded. So the true nature of ‘language’ is known only to academics whose
degrees in ‘theoretical linguistics’ somehow equip them with privileged
access to ‘perfect knowledge’ (cf. § 29, 34, 83).
5.
One of Saussure’s (1966: 8) most candid acknowledgements also falls into
place: whereas
‘other sciences work with objects that are given in advance’, in
‘linguistics’ ‘it is the viewpoint that creates the object’. What he did
not acknowledge but has been extensively displayed in the subsequent history of
modern linguistics is that multiple ‘viewpoints’ create multiple
‘objects’, whereby the meaning of the term ‘language’ has grown steadily
more unstable and obscure. Linguistics has been fragmented into disputatious
factions, each assigning to ‘language’ its own idealised meaning. And much
time and print has been expended upon disputing over whose idealisation is
better without turning for adjudication to the evidence of real language in
‘actual speech’ (Beaugrande, 1997a, 1998a, 1998b).
6.
What then about the status and meaning of the term text?
In the field proposed by linguists from Saussure up through Chomsky, the
‘text’ would seem to merit no home at all. As Halliday (1994: xxii) has
remarked, Saussure’s ‘understanding of the relationship between the system
of language and its instantiation in acts of speaking’ implied that ‘the
text’ ‘can be dispensed with’; and ‘linguistics, for much of the
twentieth century’, has accordingly been ‘obsessed with the system at the
expense of text’
7.
A few linguists did proffer a home for the ‘text’ in their theories, but
failed to secure it in their own practices of investigation. For Hjelmslev (1969
[1943]: 12), ‘the linguistic investigator is given’ ‘the as yet unanalysed
text in its undivided and absolute integrity’ (but cf. § 88). So
‘linguistic theory starts from the text as its datum and attempts to show the
way to a self-consistent and exhaustive description of it through an analysis’
(1969: 21). The ‘theory’ must also ‘indicate how any other text of the
same premised nature can be understood in the same way’ by ‘furnishing us
with tools that can be used on any text’ (1969: 16). ‘Obviously, it would be
humanly impossible to work through all existing texts’, and ‘futile’ as
well, if ‘the theory must also cover texts as yet unrealised’ (1969: 17)
(cf. § 88). Even so, Hjelmslev counselled that ‘linguistic theory’ should
seek to ‘describe and predict’ ‘any conceivable or theoretically possible
texts’ ‘in any language whatsoever’ (1969: 16f). He even stipulated that
‘linguistic theory’ should ‘foresee’ ‘a language without a text
constructed in that language’ (1969: 39f) (cf. § 37).
8.
Hjelmslev’s line of argument implied a programmatic move to situate the
‘text’ inside the same abstract theoretical sphere as ‘language’ (or
‘langue’) defined by Saussure, whose programme Hjelmslev is known to have
heartily approved. We might sense a remarkable expansiveness in the aspiration
for ‘linguistic theory’ to provide one single method of ‘description’
and ‘analysis’: first for the one ‘unanalysed given text’, then for
‘all theoretically possible texts’, and finally even for the non-existent
texts of ‘a language without a text’. How this aspiration could be put into
practice is obscure: in the four published volumes of Hjelmslev’s writings, I
could not find even one demonstration. His only directive was that ‘the text
is regarded as a class analysed into components, then these components as
classes analysed into components, and so on until the analysis is exhausted’
(1969: 12f). So the text would be like a big chunk of language waiting to be
taken to pieces which would then be taken into smaller pieces, over and over,
until we arrives at the opposite end from its ‘undivided integrity’ (§ 7,
57).
9.
We might contrast Hjelmslev’s programme with that of J.R. Firth, who declared
that ‘the text is the main concern of the linguist’ (1968 [1952-59]: 24, 90,
173). ‘All texts’ ‘in modern spoken languages’ are considered to
‘carry the implication of utterance’ and are ‘referred to typical
participants in some generalised context of situation’ (1957 [1951]: 220,
226). So the ‘attested language text’ should be ‘duly recorded’ and
‘abstracted from the matrix of experience’ (1968: 199f, 99). Since it may
not be ‘possible or desirable to present the whole of the materials collected
during the observation period’, some ‘corpus’ is ‘essential’ (1968:
32) (cf. sections E, F, and G).
10.
Still, Firth did not intend to countermand Hjelmslev’s programme by situating
the text on the side of ‘speech’ in opposition to Saussure’s
‘language’, because Firth (1968: 28, 41, 127, 139f) roundly rejected the
whole dichotomy between ‘langue and parole’. By ‘referring’ the text to
‘typical participants’ and ‘generalised contexts’ and by ‘abstracting
it from the matrix of experience’, the linguist would lift the text up from
the merely ‘heterogeneous,
accessory, and accidental’ plane
where we saw Saussure situating ‘speech’ (§ 4). Unfortunately, Firth’s
own four published volumes give only a few sketchy demonstrations, far short in
practice of what his writings on theory projected.
11.
At all events, the ‘text’ was likely to remain on the margins of linguistics
as long as it was eclipsed by the sentence. Curiously, the ‘sentence’ did not start out as the
prototype of ‘language’
in the meaning of Saussure. He had argued that although ‘the sentence is the
ideal type of syntagm’, ‘it belongs to speaking [parole], not to language
[langue]’ (1966 [1916]: 124). However, his reservations distinctly placed the sentence on both
sides: ‘in the syntagm there is no
clear-cut boundary between the language fact, which is a sign of collective
usage, and the fact that belongs to speaking and depends on individual freedom;
in a great number of instances it is hard to classify a combination of units
because both forces have combined in producing it, and they have combined in
indeterminate proportions’ (1966: 124f; see Beaugrande, 1999a for discussion).
12.
These early reservations did not prevent the sentence from later occupying the
centre of modern linguistics, most famously when the ‘generative’ approach
defined a ‘language’ to be an ‘infinite of set sentences’ (Chomsky,
1957: 13) (cf. § 32). Officially, the ‘sentence’ was a unit purely
restricted to ‘syntax’ or ‘grammar’, two terms now used as if they were
interchangeable, although they cannot be, as we shall see (§ 17f). Yet this
restriction could not be sustained during attempts to actually formulate a
‘generative grammar’, even for small samplings of invented English
sentences. The term ‘sentence’ was also being unofficially used (among other
things) for a semantic unit that should be called a ‘proposition’ and for a
pragmatic unit that should be called a ‘speech act’, as if to paper over the
‘indeterminate proportions’ in Saussure’s reservation (Beaugrande, 1997a).
So just when ‘generative’ linguists were volubly announcing that the
‘study of competence abstracts away from the whole question of performance’
and ‘why speakers say what they say, how language is used in various social
groups, how it is used in communication, etc.’ (Dresher and Hornstein, 1976:
328), the concept of the ‘sentence’ was being quietly stretched and bent in
the direction of ‘performance’ (cf. § 50).
13.
As long as the ‘sentence’ was getting stretched, the need to recognise the
‘text’ might not have seemed urgent. But the restrictions of the single
sentence eventually had to become onerous for linguistic investigations even in
the narrowed area of ‘syntax’ or ‘grammar’. And some linguists would
eventually respond by looking ‘beyond the sentence’ and at the ‘text’.
Not surprisingly, much early work in ‘text linguistics’ emblematically
aspired to build upon what now came to be called, for purposes of contrast,
‘sentence linguistics’ — a term that would have seemed oddly redundant to
the ‘generative’ approach. By exploiting the ‘indeterminate proportions’
implicit in the sentence (§ 11), such work need not settle the question of
whether the ‘text’ might be a unit of the ‘actual speech’ rejected by
the linguists who had followed Saussure and Chomsky (cf. § 3, 10, 23).
14.
The most straightforward strategy for getting the text inside a linguistics
designed for the sentence would be to define the ‘text’ as a sequence of sentences. As such, the ‘text’ could smoothly
inherit the established properties of the sentence, such as being
‘grammatical’, ‘well-formed’, and ‘rule-governed’. Yet disputes
arose in the early 1970s over the question of whether those properties were
sufficient, so that established ‘sentence linguistics’ could account for
whatever might be found (Dascal and Margalit, 1974); or whether new properties
would be found that belong or apply only to texts (van Dijk, 1972).
15. Text linguists naturally favoured the second
option, which justified their work as a accredited field. But in retrospect, I
find the question inverted. Following such leads as Hartmann (1968, 1971) and
Schmidt (1973), we can accept the text as the essential linguistic unit and then
explore the status of the sentence as one of the potential segments within a
text, and thus one that could explicitly benefit from the previously implicit
bending in the direction of performance (§ 12, Beaugrande, 1999a). Moreover,
the sentence could richly inherit the properties that have since been
established for the text, such as being ‘cohesive’, ‘coherent’,
‘intentional’, ‘acceptable’, ‘informative’, and ‘situational’
(Beaugrande and Dressler, 1981) (cf. § 59).
16.
In the process, we might reach the unsettling conclusion that syntax doesn’t exist in natural language insofar as the term means
a system of ‘formal rules’ for arranging words together in sentences (cf. §
14) (compare García, 1979; Givón. 1979).
As already implied by Saussure’s reservation about the ‘sentence’ (§ 11),
speakers and writers certainly select and combine words in response to important
‘non-syntactic’ factors, such as lexical preferences, communicative
situations, and personal motivations (§ 98-101). To assert that these factors
are all ‘beyond the sentence’ or ‘outside the sentence’ is to ignore
their potent effects inside the
sentence and to block off significant resources for description and analysis.
17.
The restrictions of syntax become most obtrusive when it gets directly equated
with ‘grammar’ (§ 12). Despite the once fashionable notion of a ‘deep
structure underlying surface structure’ (e.g. Chomsky, 1965), a ‘grammar’
which looks only at the order of words inside isolated sentences and not at
‘why speakers say what they say, how language is used in various social
groups, how it is used in communication’ (§ 12) must remain a shallow and
superficial enterprise. In English — unlike many other languages —
word-order is sufficiently ‘frozen’ in some areas to sustain carefully
selected Aspects of the Theory of
Syntax’; and these have indeed been the concerns of ‘generative
linguistics’ until it withdrew into ‘universals’ and ‘mental
representations’, where the ‘sentence’ is no longer crucial (see
Beaugrande, 1998a for discussion).
18.
A realistic and empirically justified ‘grammar’ would rather be the front
end for all the relevant motivations that speakers or writers typically
apply or reflect when they put classes of words (e.g. Nouns) or of word-parts
(e.g. Suffixes) in one order rather than another (cf. § 101). This lesson might
also be drawn from the recalcitrant obstacles encountered by the
early projects of text linguistics to construct a text
grammar from the top down in the narrow and abstract
‘generative’ sense. The ‘text grammar’ for a fairly short text by
Bertolt Brecht (‘Herrn K.s Lieblingstier’), which even shows a simple
vocabulary and some parallelisms in its phrasing, got tangled in an explosive
complexity of ‘rules’1 (cf. van Dijk, Ihwe, Rieser, and Petöfi,
1972), and the project was eventually abandoned with no official conclusion (cf.
§ 38). The same fate can be predicted for any project to describe ‘texts’
on comparable levels of abstraction and formality: what gets ‘abstracted
away’ during the ‘formalisation’ would be vitally needed in order to
account just for the order of words,
and far more for the choices of words
(cf. § 40).
19.
The converse approach for a ‘text linguistics’, and the one that has
gradually won out, is to work from the bottom up. We need to examine a
comprehensive range and variety of authentic texts and explore what sorts of
properties deserve to be accounted for, including, but not restricted to, those
of ‘grammar’ in the broad sense of § 18. We can apply whichever categories
and concepts of previous ‘linguistics’ seem productive, but we can also
apply ones from adjacent fields, such as literary studies, cognitive science,
artificial intelligence, ethnography, economics, and political science (cf.
Beaugrande, 1980, 1997a, in preparation) —
whatever bases we can enlist in exploring how speakers
and do select and combine words inside phrases, clauses, sentences, or any other
relevant units, such as paragraphs, essays, or science textbooks.
20.
This approach has a distinctly Firthian flavour, although we can exploit fields,
methods, and resources that were not available to Firth and his pupils. Today,
we are far better positioned to ‘observe’
and ‘collect’ a
‘corpus’ of ‘attested
language texts’ and to determine what is ‘typical’ and ‘generalisable’
(§ 9f). Admittedly, the sheer size of the task remains daunting enough to
indicate why Saussurian linguistics was so eager to rule out the ‘study of
speech’ (§ 3). Yet the task deserves to be at the top of our agenda for the
next milennium.
B. Virtual system and actual system
21.
As a provisional strategy, some ‘text linguists’ (myself included) have for
a number of years been proposing to view the relation between a language and a
text as one between a virtual system
and an actual system. One pioneer
for this view was the eminent linguist and language philosopher Peter Hartmann,2 who, together with his pupils (e.g. Siegfried J. Schmidt, Roland
Harweg, Walter A. Koch, Götz Wienold), were the most thoughtful originators of
text linguistics.
Against any linguistics that might view texts being
like Saussure’s ‘speech’, i.e., ‘heterogeneous’, ‘accidental’, and
devoid of ‘unity’ (§ 3), we maintain that the text is internally
systemic on its own terms and is externally
systemic in respect to other texts (§ 47, 99). The text is thus an intersystemic
event during which multiple systems interact and converge (§ 89).
22.
This viewpoint does not mean that ‘the interplay of langue and parole somehow
vanishes on the level of texts’ — a consequence inferred from my work by
Lindemann (1981: 126). Much in the spirit of Firth (§ 10), text linguistics
firmly rejects the dichotomy of ‘langue and parole’ for having been deeply
misconceived all along. The major flaw, not widely recognised, has lain in
attributing to ‘language’ an ideal
order, and to ‘speech’ an accidental
disorder (Beaugrande, 1998a, 1998b, 1999a). The bizarre implication would be
that using a ‘language’ in ‘speech’ triggers an abrupt catastrophic
transition from stable and integrative order over to unstable and disintegrative
disorder. Conversely, the linguist whose ‘study abstracts away from how
language is used in communication’ (§ 12) miraculously restores the pristine,
ideal order. The practices of linguistic investigation would consist chiefly of
‘idealising language’; yet we should be wary of projecting an ideal that is
fully disconnected from real language (Beaugrande, 1998a, 1998b) (§ 97).
23.
Nor again does this same viewpoint mean that ‘texts seem to be exclusively
created by an actualisation of lower-level virtualities’, and that ‘texts do
not dispose of any virtual systemic aspects of their own’ — further
consequences inferred by Lindemann (1981: 126). Of course texts hinge crucially
upon the ‘virtual systemic aspects’; our problem is that ‘virtuality’ is
not readily isolated from ‘actuality’ when we work with real texts.
Lindemann himself proposed that ‘actually occurring texts may be virtualised
by exploring’ their ‘systemic aspects’, yet this is precisely what
text linguists do in practice though without describing our practices by that
term. However, we cannot be just putting ‘virtuality’ back into a space
where it has been ‘actualised out’, because texts
always retain some aspects of their virtuality as long as the language is
still known and used.3 So the ‘text’ is not just a unit of ‘actual
speech’ (‘parole, langage, performance’ etc.) in the senses of Saussure
and Chomsky (cf. § 3, 10, 13), but is rather a unit
which actually links language to speech.
24.
Some powerful evidence for the ‘virtuality’ would be that a single text can
be received and interpreted (‘actualised’) in more than one way by different
participants in a text-event or even by the same participant at different times.
Such is indeed a constitutive principle of literature; the literary text might, as Wellek and Warren (1956:
152) have done, be compared to ‘langue’, and each ‘individual
realisation’ to ‘parole’. Yet far from navigating between the order of
‘langue and the disorder of ‘parole’ (§ 22), the audience recreates a novel order for the ‘textual world’. Your motivation a reward
for reading or listening to a literary text is thus to be a privileged
participant each time in constructing its ‘world’, whose order is in part
your own creative achievement (Beaugrande, 1988). The same process would apply,
on a far less conscious plane of creative awareness, for the actualisation and
re-actualisation of any text;
literature merely accentuates and thematises the process to deepen and broaden
our understanding of the human situation.
25.
The implication would be that the ‘same’ text need not have the ‘same
order’ for everybody. At least in its fine details, each actualisation is
unique and unrepeatable. Moreover, each participant has a partially
individualised knowledge of the virtual system, with greater variation in the
lexicon and lesser variation in the grammar (§ 83ff). So we must resolutely
address the questions of how and how far the respective actualisations of any
text can agree or coincide well enough for a text receiver to feel confident of
‘understanding’ what the producer ‘meant’; or for several participants
to receive a text in what is thought to be the ‘same way’ Evidently, the
degree of convergence among participants in a text-event is achieved on line
through the interaction of multiple systems whose design both anticipates and
adjusts to these achievements.
26.
The production and reception of a text would be complementary
transitions between a more open mode
and a more closed mode of systemic
order; and these two modes would determine each other in a continual
dialectic. Contrary to the assumptions of linguistics cited above,
we would conclude that these two modes of order must be quite proximate: the
order of language is more closed and the order of texts more open than has been
widely acknowledged before (§ 56).
27.
Text linguistics accordingly needs to develop models of how the one mode of
order sustains and tunes the other. We might postulate a cycle whereby the language
‘actualises’ to sustain and tune the texts, whilst the texts
‘virtualise’ to sustain and tune the language. My intransitive usages of the
two verbs may sound esoteric, since the agents of these reciprocal activities
are of course the participants in the text-event; but the activities are
normally just by-products of communicative interaction and do not match what the
participants are consciously doing or intending to do (Hartmann, 1963). The
exceptions would of course be the poets, whose enterprise consists of seeking
and testing new ways of organising language (Beaugrande, 1978, 1988).
28.
Our own enterprise as text linguists consists of consciously ‘virtualising’
texts whenever we ‘actualise’ them for the purpose of drawing inferences
about the order of the virtual language system (§ 23). A key question, still to
be adequately explored, is our own version of the well-known problem of
‘participants’ versus ‘observers’ (cf. Harweg, 1968). How can far our conscious
and attentional ‘actualising-for-virtualising’ correspond to, or
accurately describe, the unconscious
and automatic actualising and
virtualising of ordinary text producers and receivers; and what might occur
during our processes of making conscious and focusing attention? Quite
plausibly, the potential of the texts being investigated gets considerably more
elaborated precisely insofar as we prolong and intensify the cycle postulated in
§ 27 (see also § 83ff).
29.
In addition, the principle of each participant having a partially individualised
knowledge of the virtual system (§ 25) must apply to us too. What might be our
status as individuals and within the wider community? Even if we could set aside
our academic training and consider ourselves reasonably typical,
we could not yet consider ourselves representative,
due to several constraints. The most obvious constraint is that the language
experience of any individual, even one
who has read or written extensively, is only a small part of the experience of
the community. Instead of conveniently taking the community to be
‘homogeneous’ and purporting to have privileged access to the ‘perfect
knowledge of the language’ (§ 4), we need to lay open the degrees of real
uniformity and diversity among language users and among their texts to
large-scale adjudication with substantive evidence.
30.
A less obvious constraint on being representative could be inferred from the virtuality
I have suggested that texts always retain (§ 23). This cannot be directly
measured by their manifested uniformity, because ‘virtuality’ is by definition open and
dynamic, and what is manifested is the actuality
that partly confirms it and partly specifies or modifies it to suit the context.
Instead of vowing that
‘knowledge of the language’ is never presented for direct observation nor
extractable from data’ (§ 3), we can assume that ‘knowledge of the
(virtual) language’ never totally
converges with any set of actualities, but does move steadily
closer toward convergence as the set gets larger and more diversified. In
that sense, we all — text linguists or not — remain ‘language learners’
throughout our lives as we accumulate experience with texts. Text linguists are
just exceptionally self-conscious ‘learners’, ‘observing’
and ‘extracting’ whatever we can in the knowledge that there is always far
more beyond it all.
31.
Still another constraint on being representative is the unresolved uncertainty
about intuition providing the source
for a ‘grammar’ to ‘describe the intrinsic competence of the idealised
native speaker’ (Chomsky, 1965: 24). Contrary to such well-known claims,
unaided intuition is not broader and deeper than language experience but
narrower and shallower. It is best secured for the more stable and general
‘frozen’ areas of the virtual system whose virtuality seems static and hence
independent of ‘actual speech’, whence the preference for restricting
linguistics to a few Aspects of the Theory
of Syntax (cf. § 17). But intuition is not well secured for the dynamic
cycles of actualising and virtualising precisely because of the continual
tuning. The interacting systems settle down in precise detail only at the
respective stages of the interaction itself. So the native speakers’ intuition
may well not be reliable for telling you what they would
say until there arises a real-life occasion when they do say it. Intuition operates most smoothly after the fact in making
sense of what has already been said (§ 99).
32.
This overall line of argument also indicates that the language system never
settles down in a ‘synchronic’ dimension; consequently, all attempts since
Saussure to construct a ‘synchronic description’ are doomed to be partial,
restricted, and provisional. However, these limits should not disband the
‘synchronic’ linguists so much as rather resituate their enterprise. Like
total convergence (§ 30), ‘synchronicity’ is a factor we can move closer to
yet never attain. It is not set apart in some ideal ‘language’ disconnected
from ‘speech’ but represents the totality of simultaneous actualisings of
the virtual system (cf. Harweg, 1968: 142). So we cannot remotely achieve any
‘synchronic description’, however partial, by ‘dispensing
with texts’ (cf. § 6); we need bigger and better means for examining what
large sets of texts have in common.
33.
We thus return to a deep question: if language and text, or virtual and actual,
are closely interconnected yet never converge, how then are their connections
sustained? On a highly general plane, we might stipulate: the language is a set of texts; each text is one member of the set.
We might interpret this stipulation by viewing a ‘language’
as one vast ‘continuous text’, as ‘the totality of what has been written
and spoken (or perhaps even thought) in a language’ (Harweg, 1968: 142, my
translation).
34.
We might feel reminded here of the definition (already
cited in § 12)
of a language being ‘an infinite of set sentences’ (Chomsky, 1957: 13).4
The term ‘infinite’ merits closer examination, as I have shown elsewhere
(Beaugrande, 1998a). The ‘infinity’ was hypothesised on the simplistic
mechanical basis of ‘recursive
devices’ (Chomsky, 1957: 23f) whereby any sentence could, in theory, be made
longer or more complex. In practice, the set of infinitely long or complex
sentences is an empty set
and thus an intractable object for constructing a ‘grammar’. Nor can it
plausibly be attributed even to the
‘perfect knowledge’ of ‘ideal speakers’, who also form an empty set of
humans (or superhumans) (cf. § 4).
35.
Still, for the sake of argument, we could infer that if
a text is defined to be a ‘sequence of sentences’ (§ 14), then
‘generative’ theory would allow for an infinitely
long text. That text would
be theoretically equivalent to one superlong sentence with ‘sentence
boundaries as sentential connectives’ (Fodor and Katz, 1964: 491). Yet such a
text would also inescapably belong to an empty set. At most, our theorising
could plausibly postulate the concept of an infinitely
long intertext, whose practical correlate could be a conversation in
which the ‘last word’ can never be said. Such a notion has a
‘post-modern’ or ‘post-structuralist’ flavour, though we are arguably
still dealing with an empty set. What is given is a real set of intertexts such
as conversations, each of which is at any one moment finite, though quite
possibly still open, such as the discourse of history, which could be
definitively terminated only if history came to an end.
36.
Halliday (1997: 6) has for some time ‘preferred to reverse the principle and
characterise a language as an infinite system generating a finite body of
text’; but following a recent discussion where I pointed out that, in terms of
mathematical theory, the ‘infinite’ must include combinations with
infinitesimal probabilities, he now favours ‘replacing “infinite” with
“indefinitely large”’. The finiteness of the set of texts in existence at
any given moment in time is hardly disputable, but is a weak constraint for a
living language, especially one spoken as widely as English. Another weak
constraint comes from the ‘indefinite largeness’ of the set of potential
texts in English. The key question, as pointed out elsewhere by Halliday (1993),
remains: how does the set of existing texts constrain the set of potential texts
such that, at any given moment, some ways of adding to the existing set are substantially
more probable than others?
37.
Problems with sets also arose when, influenced by the high-level aspirations of
‘generative linguistics’, some text linguistics essayed to formulate the
constraints for distinguishing the set of texts from the set of
non-texts (see Dressler, 1972 for discussion). At the time, these
constraints seemed intuitively plausible in theory, but we can now see why they
proved intractable in practice. The attempt to create a non-text is always just
one step within an ongoing textual event, and thus still a sub-text, however
unconventional. Its ‘textuality’ is locally disrupted but globally
sustained. So in both theory and practice, the ‘set of non-texts’ is one
more empty set and
therefore obeys no constraints. Nor can it
be identified with the set of non-existent texts for which Hjelmslev would have
us postulate a ‘language’ (§ 7f), since those would still be ‘texts’,
although virtual — texts of a ‘language’ whose ‘textual process’,
i.e., its actualisation, is itself purely ‘virtual’ (cf. Hjelmslev, 1969
[1943]: 40).
38.
I shall cite just one more constraint on our being representative: the practices
of text linguistics who analyse and describe texts inevitably add to the set of
texts. We thus incur the additional challenge of probing the relations between
the set of texts we are analysing and the set of texts we are producing as we go
along (cf. § 83ff). This challenge corresponds to Firth’s (1968) distinction
between
‘language under description’ and ‘language of description’, and to his
epigram about linguistics being ‘language turned back on itself’. Evidently
because science in general has preferred to regard its own texts ‘as
a purer code, eschewing rhetoric and simply reporting natural fact’ (Bazerman,
1988: 6, 14), this challenge is still far from fulfilled. My own studies have
consistently found that the relation between scientific texts and their domain
they purport to account for is complex and unstable, especially when that domain
is language itself, as in modern linguistics. Far
too little attention has been devoted to the practical strategies of a linguist
producing and receiving a text about ‘language’, even though they have
significant effects upon the operations and results of ‘theorising’
(Beaugrande, 1984, 1991).
39.
For the present discussion, one effect I would highlight has been the widespread
textual strategy among linguists of inventing
sets of artificial pseudo-texts,
usually isolated sentences which are either trivial in their
‘grammaticalness’ (like ‘the man hit the ball’) or else wildly bizarre
in their ‘ungrammaticalness’ (like ‘*ball the hit ball man the’). These
are intended to capture some constraints, optimistically called ‘rules’, of
the ideal system being called ‘language’; and their artificial status may
even be deemed to purge them from the ‘heterogeneous’
and ‘accidental’ qualities of ‘speech’ (cf. § 3). But in the account I
have proposed here, the production of pseudo-data would ‘tune’ the language
system out of its natural ‘frequencies’, and thus lead to the postulating of
a pseudo-system which does not deserve
to be called a ‘language’.
40.
A second effect I would highlight has been the widespread textual strategy among
linguists of formalising texts,
apparently to render them more amenable to a scientific account (as in Koch,
1971; Ballmer, 1975). The actual result is to create a set
of semi-texts that are artificial in a different way from pseudo-texts,
namely in having some features or aspects of texts whilst many other features
have been ‘abstracted away’ (§ 12, 18). These semi-texts are not trivial;
in practice, they may be almost incomprehensible to everyone but their creators.
The production of semi-data would thus also tune the language system out of its
natural frequencies, and lead to the postulating of a semi-system
which deserves to be called a ‘language’ no more than does a pseudo-system.
The practices of the linguists during this production remain quite arbitrary and
ad-hoc until we have a full and creditable account of which features or aspects
should or should be ‘abstracted away’ and how the remainder should be
‘formalised’.
41.
The practices of ‘formalising’ texts might correspond to a theory of the
‘text’ itself being a theoretical unit underlying a practical unit called
‘discourse’ (cf. van Dijk, 1972). Here, we might recall Hjelmslev’s
aspiration to
get the ‘text’ inside the same abstract theoretical sphere as Saussure’s
‘language’, which Hjelmslev himself never attempted to demonstrate in
practice (§ 8). And, as I have remarked (§ 18), the demonstrations for ‘text
grammars’ displayed an explosive complexity of rules and features, precisely
insofar as a text is not solely a theoretical or virtual unit but also a
practical and actual unit. If we disregard or ‘abstract away from’ its
‘practicality’ and ‘actuality’, we lose a host of significant
constraints; trying to replace or reconstruct them all with rules and features
is a gratuitous, self-defeating exercise.
42.
Instead of complicating the relation between language and text with sets of
pseudo-data and semi-data, the preferable alternative strategy would be to let
the texts
represent themselves (§ 82). The texts would retain the same representation
as recorded speech or writing within the experience of the community of
participants in text-events. Special annotations such as representing the
intonation of speech in a written format should be clearly authorised by the
purposes of the investigation and should not impede unduly comprehension or
access to the data.
43.
The practices of text linguists would then not entail transforming texts into
some other representation but rather transforming our own modes of accessing and
exploiting texts as data sources. We need to base our own authority not upon
holding higher academic degrees in ‘theoretical linguistics’ (cf. §§ 4,
29) but upon examining large sets of authentic textual data (section H).
C. Theory and practice
44.
These deliberations about the theories and practices of text linguists counsel
us to move onto a higher plane (doing what Peter Hartmann used to call ‘Überhöhung’).5
There, we could define a ‘language’ to be a
general theory of human knowledge and experience evolving in a dialectical
relation to texts as a set of
practices for working out the theory (cf.
Hartmann, 1963; Halliday, 1987). By that definition, all the members of a
language community are implicated in the ‘theory and practice’ of language
and text (cf. Beaugrande, 1997b, 1998c). More precisely, they sustain and share
a dynamic theory which evolves through a criss-crossing
interaction of many implicit and partial theories about how the practices of
everyday life and ordinary talk are organised. Whilst the diverse contexts come
and go, different partial sub-theories (rather than the whole theory) are
applied and specified in and through the practices.
45.
To be sure, a language is a highly unique type of theory. It cannot be
effectively tested and verified or falsified in the familiar manner of a
‘scientific theory’, because it partially constitutes
what it postulates. We cannot get outside
language in order to talk about it without implicating ourselves in it (cf. §
38). Such would seem to be the aspiration of projects to ‘formalise’
language, but they merely end up replacing it a ‘semi-system’ whose relation
to real language presents even more knotty problems (§ 40).
46.
Nor again can we say that any one language is a ‘more correct theory’ or a
‘more valid theory’ than any other. The potential of any language for
expressing human experience is infinite in theory, though always finite at any
one moment of practice. Some languages have, for historical of political
motives, had their potential more fully actualised in specific domains, such as
science and technology. Such is true of English but by no means ‘validates’
the language to be the ‘superior’ one so often extolled in the discourse of
‘International English’ (discussion in Pennycook, 1994; Beaugrande, 1999b).
47.
Now, by the definitions proposed above, what has been officially called a
‘theory of language’ or a ‘linguistic theory’ so far would be a ‘meta-theory’,
whereas the texts we produce to formulate and expound the theory would display
our ‘meta-practices’. According to my line of argument here, these
‘meta-domains’ would be fundamentally different depending on what their
object of investigation is declared to be. In the meanings of Saussurian and
Chomskyan linguistics, ‘language studied in and for itself’ is a theory
about itself, about pure ‘theoreticalness’ disconnected from the practices of ‘actual speech’. In
consequence, their ‘linguistic
theory’ would be a ‘meta-meta-theory’, as
shown most programmatically in the English title of Hjelmslev’s Prolegomena
to a Theory of Language. This theory is doubly remote from practice, and the
practices of ‘doing linguistics’ in general and ‘building linguistic
theories’ in particular are radically under-constrained. Not surprisingly,
many books on ‘theoretical linguistics’ have been sharply critical of the
prior state of the field and have proposed to make a fresh start (Beaugrande,
1991: 334ff). Over time, ‘linguistic theory
appears to offer a stunning variety and disparity of clashing doctrines’, and
‘striking divergences in terms, slogans, and technical contrivances’
(Jakobson, 1970: 12).
48.
But if our
object is declared to be authentic texts, then a ‘text-linguistic
theory’ stays proximate to the practices of real language-users, and our own
meta-practices would not be radically under-constrained. Indeed, we can
productively invest our own status as participants in analysing or interpreting
data produced for a community to which we rightly belong, such as the readers of
the Times, so that our
‘text-linguistic practices’ are reconciled with the practices in our object
domain, though of course not simply converging with them. Against the tendency
of ‘scientific
discourse’ ‘to hide itself’, as if it were somehow ‘not writing at
all’ (Bazerman, 1988: 14) (§ 38), a ‘science of text’ would gain by
displaying and even analysing its own discourse and the latter’s implication
in the constitutive and constructive practices of text-events (see section H).
49.
Working back toward ourselves from the opposite side, we should also devote far
more attention to the ‘theoreticalness’ of producing and receiving texts in
and about everyday life. Most producers and receivers would probably consider
themselves ordinary practitioners or practical users of a language, who have no
interest or authority concerning ‘theoretical’ matters. They might be
irritated or perplexed to be told that knowing a language endows them with one
among the most extensive and powerful theories a human can achieve, far beyond
the revered but restricted ‘theories’ of ‘science’. But until they can
acknowledge this vast endowment they will not be adequately empowered to use
this resource for gaining access to knowledge and society (cf. Beaugrande,
1997a, 1998c).
50.
In sum, a chief reciprocal task for a science of texts would be to cultivate an
active sense of the ‘practicalness’ of scientific texts (ours in particular)
and the ‘theoreticalness’ of ordinary texts, within a programme for
reconciling the two sides within a dialectical interaction. Doing so would
require reconstructing the ‘missing links’ between theory and practice, and
between a language and the texts in that language. I shall now consider some
prospects as we look ahead to the new millennium.
D.
Missing links
51.
Perhaps the term ‘missing links’ sounds overly dramatic. After all, the
linkage between a language and a text — between theory and practice — is
achieved whenever human beings communicate, although they don’t recognise
their achievement in those terms. But in terms of scientific investigation, our
theories and models of this achievement undeniably manifest some ‘missing
links’. I have been suggesting here that the toughest problems have stemmed
from the assumption that a ‘language’ has a quite different mode of
organisation than does ‘actual speech’ (or texts), and from the
corresponding aspiration to describe language independently of actual speech. So
the urgent move now would be to develop a more ‘text-like’ view of a
language and a more ‘language-like’ view of a text (§ 55).
52.
One promising approach would be the ‘systemic functional linguistics’
prominently developed by Michael Halliday, who was once a pupil of J.R. Firth.
Halliday was among the first to realise that the order of a language must be
‘systemic’ in modes that anticipate, reflect, and support the ‘systemic’
uses of the language in texts (cf. § 21). Firth’s (1968: 192) vision of a
text as a ‘longer piece’ to be ‘described as a relational network of
structures and systems at clearly distinguished but congruent levels, converging
again in renewal of connection with experience’ accorded with Halliday’s
(1994: xiv) vision of ‘a language’ ‘as networks of interlocking
options’. Quite plausibly, the networks constitute or contribute one type of
‘missing links’ whose precise nature is now beginning to be explained.
53.
As Halliday (1997: 6) has recently acknowledged, his approach must contend with
vast exponential size of the networks needed to represent ‘systemic
potential’ as ‘alternative combinations of features’. The network needs to
‘present’ not just ‘those sets of options that are currently being
instantiated’ but also the ‘open-ended’ ‘further expansion of that
potential’ (1997: 7). Moreover, any such network representation should reflect
the cline between options which are highly probable and options which are highly
improbable (§ 36) (Halliday, 1993). Some options might be fully possible but
rarely or never get used (§ 62).
54.
So the networks for a language would be both too unwieldy and too
under-determined to serve directly as networks for a text. In exchange, the
networks for texts, such those I introduced for the pedestrian ‘rocket’ text
(Beaugrande, 1980), are too compact and specific to represent the networks of
the English language. Once more, we seem to detect some ‘missing links’,
this time between two types of systemic networks.
55.
The strategic place to look for ‘missing links’ would now be in a very large
corpus of authentic texts. Such a corpus can offer our best means for promote
both a text-like view of language, and a language-like view of texts (§ 51). It
can also display the competence of the language community bent very far toward
performance, as well as the community’s performance bent very far toward its
competence (cf. § 12). We might undertake to complement the systemic functional
approach by exploring appropriately sorted corpus data which may ‘provide
evidence for our system networks, allowing them to extend much further in
delicacy while continuing to model language as potential’ (cf. Halliday, 1997:
24).
56.
Corpus data are so eminently suited to informing us about ‘networks’ because
they offer concrete displays of the constraints upon how sets of choices can
interact. In the ‘lexicon’ part of the ‘lexicogrammar’ of English, these
constraints constitute the collocability
in the virtual system, and the textual actualisations are the lexical
collocations. In the ‘grammar’ part of the ‘lexicogrammar’,
these constraints constitute the colligability
in the virtual system, and the textual actualisations are the grammatical
colligations. Following Halliday (1961) and Hasan (1986), we can say
that the lexical choices are more delicate,
and the grammatical choices are less delicate; ultimately, the lexicon would be the ‘most delicate
grammar’, whilst the grammar would be the ‘least delicate lexicon’. Every point can access every other. Thus, what is
‘collocable’ can constrain not just the collocations but the colligations;
what is ‘colligable’ can constrain not just the colligations but the
collocations; and so on (cf. § 63). Such constraints ensure that the order of
language is reasonably closed, whilst the order of texts is reasonably open (§
26).
57.
These prospects counsel us to adopt a synthetic
approach of exploring how choices converge in texts, rather than the analytic
approach of subdividing the text into steadily smaller pieces,
as counselled by Hjelmslev (who never showed us how) (§ 8). We would then
highlight the systemic continuity of the text event: not just the mere ‘co-presence’ of
smallest meaningful units (e.g. ‘morphemes’), but the interaction of
meanings within a coherence that is not just the sum of the meanings of the
parts (Beaugrande, 1984).
58.
Perhaps an analogy from particle physics might be helpful. There, the four
‘forces’ have
recently been reinterpreted not as brute pushes and pulls, but as information
exchanges of ‘messenger particles’:
photons for the electromagnetic force, ‘gluons’ for the strong force holding
the nucleus of the atom together, ‘bosons’ for the weak force regulating
radioactive decay, and — presumed on grounds of consistency but not yet
observed — ‘gravitons’ for gravity. By analogy, we could envision semantons
as sub-symbolic messenger particles of meaning being interchanged among the
virtual meanings of morphemes or whole words (cf. Smolensky, 1989; Beaugrande, 1997a).
Through these ‘information interchanges’, some word-choices appear to
‘attract’ each other at varying degrees of strength, in analogy to magnetism
or gravity, and thus to determine which areas of virtual meaning are being
actualised to constitute the context (cf. § 63).
59.
The interactive constraints and ‘attractions’ indicated in Fig. 1 would
continually supply and sustain the links — ‘missing links’ insofar as they
have eluded our investigations to date — between the language and the text.
Part of the actualised output would be the cohesion
and coherence of the text as
‘standards of textuality’, but also the other ‘standards’ identified in
text linguistics: intentionality, acceptability,
informativity, and situationality
(Beaugrande and Dressler, 1981) (§ 15). The ‘most missing links’ of all
begin to emerge, perhaps like the ‘cold dark matter’ astronomers believe to
be holding galaxies together, namely the links of intertextuality. For the first time ever, we can assemble and
compare the evidence for detailed and delicate intertextual constraints shared
among hundreds or even thousands of related collocations or colligations.
Instead of seeing the text
as a set or sequence of sentences, we can
finally see it as a contribution to intertext
(§ 35, 87).
Click
here to go to Text
linguistics at the milennium 2