List of self-segregating morphology methods
Here are a couple of threads about the topic:
- A thread started by Rex May in March 2006 on the AUXLANG mailing list
- A thread started by Jim Henry in April 2006 on the CONLANG mailing list
- A thread started by Gary Shannon in September 2010 on the CONLANG mailing list
- All morphemes are the same length. E.g., all are one syllable, or perhaps all are two syllables, and the phonotactics are such that syllable boundaries are unambigious. Or all morphemes are the same length in terms of segments, e.g. all contain three phonemes (but some may have CVC, some CVV, some VCV etc... so not all have the same number of syllables). The former type of system is probably easier for humans to parse; the latter, possibly easier for computers.
- A subset of the phonemes of the language are designated as an initial set (a), and the rest of the phonemes as the subsequent set (b). A word must begin with one or more phonemes from the initial set and end with one or more from the subsequent set. (Tceqli uses this method, with plosives and fricatives in the initial set, and vowels, nasals and liquids in the subsequent set.) Words might have the forms ab, aab, abb, aaab, abbb, etc., and the morpheme boundaries are wherever a b phoneme is followed by an a phoneme. Some variations of this method are:
- You could divide up the phonological segments into the following classes; a. Segments that can be the first segment of a morpheme, but can't be any non-first segment. b. Segments that can't be the first segment of a morpheme, but can be any non-first segment. Then the morphemes will look like a, ab, abb, abbb, abbbb, ... etc. Morpheme boundaries would occur just previous to each a.
- You could divide up the phonological segments into the following classes; c. Segments that can be the last segment of a morpheme, but can't be any non-last segment. d. Segments that can't be the last segment of a morpheme, but can be any non-last segment. Then the morphemes will look like c, dc, ddc, dddc, ddddc, ... etc. Morpheme boundaries would occur just after each c.
- If you require every morpheme to contain at least two segments, you could divide up the phonological segments into the following classes; e. Segments that can be the first or last segment of a morpheme, but can't be any non-first not-last segment. f. Segments that can't be the first nor last segment of a morpheme, but can be any non-first non-last segment. Then the morphemes will look like ee, efe, effe, efffe, effffe, ... etc. (Without the two-segment-minimum, ee might be "e, e" or might be "ee". Morpheme boundaries would occur just after each fe and just before each ef, but a string of ee morphemes would have to be parsed globally; you couldn't tell how to parse it unless you had the whole utterance.
- If you require every morpheme to contain at least two segments you could divide up the phonological segments into the following classes; b. Segments that as before can't be the first segment of a morpheme, but can be any non-first segment. d. Segments that as before can't be the last segment of a morpheme, but can be any non-last segment. Then the morphemes will look like db, ddb, dbb, ddbb, dddb, dddbb, ddbbb, ... etc. Morpheme boundaries would occur just before the d in bd.
- A subset of vowels are used only in initial or final syllables, while others are used in others. Konya did this, with /e i o u/ in initial syllables and /a/ in second and subsequent syllables of a polysyllable. Or one could use pure vowels except in final syllables, which must have a diphthong; or ditto with front and back, or rounded and unrounded, or nasal and oral vowels...
- The initial phoneme indicates the number of syllables to follow (as in Jeff Prothero's Plan B); or indicates the number of phonemes to follow (various forms of Huffman encoding).
- Require the last segment of each morpheme to code the length of the morpheme. This has the disadvantage of requiring you to parse from the end of an utterance backward.
- All morphemes begin and end in a consonant and have no consonant clusters within them. A consonant cluster therefore marks a morpheme boundary.
- Inverse of above: all morphemes/words begin and end in vowel, and have no sequences of two vowels within them. Two vowels in a row mark a morpheme boundary. (Ilomi uses a variation of this, with two vowels in a row marking a word boundary and /n/ between two vowels marking a morpheme boundary within a compound word.)
- Modification of either of the above methods: To avoid adjacent vowels slurring into diphthongs, or possible difficult consonant clusters at word/morpheme boundaries, reserve a particular consonant (perhaps /?/ or /n/ or /l/) to mark boundaries between VCV... morphemes or a particular vowel (perhaps schwa) to mark boundaries between CVC... morphemes.
- Tone or stress marking to distinguish initial or final syllable from following or preceding ones, and maybe distinguish monosyllables from initial (or final) syllables of polysyllabic words.
- E.g., all morphemes are stressed on the initial syllable (or all on the final syllable). This requires that all morphemes be at least two syllables long, so probably not desirable if conciseness is a goal.
- Or all monosyllables and the final syllable of polysyllabic morphemes have low tone, while initial/medial syllables of polysyllabic morphemes have high tone (or vice versa).
All the schemes that segregate vowels into initial and non-initial or final and non-final sets, plus those marking boundaries with tone or stress, need to have restricted phonotactics to avoid ambiguity at syllable boundaries. E.g., if syllables can be (C)V(n), then a sequence like "tánpùníkà" (with acute and grave accents marking high and low tones) could be parsed as "tánpùn-íkà" or as "tánpù-níkà", i.e. the syllable patterns could be CVN CVN V CV or possibly CVN CV CV CV. Making the initial consonant mandatory, or forbidding "n" to occur at the beginning of a syllable (or just at the beginning of a morpheme) would fix that; analogous but more complex problems may occur with any syllable structure that allows optional final consonants.
Self-segregation at both morpheme and word levels
- The Ilomi method mentioned above, with /n/ marking morpheme boundaries within compound words and a sequence of two vowels marking a word boundary; or its consonantal inverse, with CVC... morphemes and a schwa or some such unstressed vowel marking morpheme boundaries within words, consonant clusters marking boundaries between words.
- A variation on the above, with multiple intra-word conjunctions reserved for specifying the particular manner and/or order in which morphemes within a compound modify each other. There could be high and low precedence joiner morphemes, such that /ipeNahumafi/ could be parsed into /ipe/, /ahu/, and /afi/, and then the joiner morphemes /N/ and /m/ specify that /ahu/ + /afi/ modifies /ipe/ rather than /ipe/ + /ahu/ being modified by /afi/, to avoid ambiguity within compound words. Or the different joiner morphemes could specify the way the modifier morpheme applies to its head (quality, source, purpose, admixture, equal mixture, etc.); or with a larger set there could be high and low precedence versions of each manner-conjoiner morpheme.
And Rosta's Livagian uses another method which, though not a self-segregating morphology in the strict sense, partly serves the same purpose with less restriction in the phonological shape of words. It requires a full knowledge of the lexicon to parse unambiguously, however. The key is that no actual morpheme must look like a prefix or suffix substring of another actual morpheme. So, for instance, if in a string "kesumalipe" you recognize "kesu" and "pe" as familiar morphemes, you know that this must be "kesu" followed by "ma li" or "mali" followed by "pe"; the fact that "kesu" is a real morpheme in a language meeting this criterion means that there cannot be another morpheme "kesuma" or "kesumali", and there can't be any morpheme like "lipe" or "malipe". But if you have only learned the phonology of the language and don't know much vocabulary yet, you can't deduce the morpheme boundaries from the phonotactics of the word; you would have to start by looking up "k" in the lexicon, then "ke", then "kes", until you find "kesu"; then start looking for "m", "ma", etc.