Conlang Documentation: Difference between revisions

From FrathWiki
Jump to navigationJump to search
mNo edit summary
(added some examples of lexicon formatting)
Line 7: Line 7:
You can use any word processor or text editor for the grammar, but keep in mind that file formats can be deprecated and software isn't supported forever. Whatever you use should be able to produce headings to organize the document into sections for phonology, syntax, etc. and tables to illustrate inflectional paradigms (if not analytic) or example glosses. It goes without saying it should also be able to render whatever characters you've chosen for your Romanization, if not using Latin script itself. [[wikipedia:markdown|Markdown]] is particularly suitable for this task, as the bare markup is intended to be presentable as-is without being rendered, and it supports both headings and (in most dialects) tables. You can write a plaintext document in Markdown and use any number of converters to render it as HTML, PDF, etc. Whatever you use, make sure it can be converted into other formats in the event you change platforms or otherwise lose access to the software.  
You can use any word processor or text editor for the grammar, but keep in mind that file formats can be deprecated and software isn't supported forever. Whatever you use should be able to produce headings to organize the document into sections for phonology, syntax, etc. and tables to illustrate inflectional paradigms (if not analytic) or example glosses. It goes without saying it should also be able to render whatever characters you've chosen for your Romanization, if not using Latin script itself. [[wikipedia:markdown|Markdown]] is particularly suitable for this task, as the bare markup is intended to be presentable as-is without being rendered, and it supports both headings and (in most dialects) tables. You can write a plaintext document in Markdown and use any number of converters to render it as HTML, PDF, etc. Whatever you use, make sure it can be converted into other formats in the event you change platforms or otherwise lose access to the software.  


For the lexicon, many conlangers use a spreadsheet application like Excel, Google Sheets, LibreOffice Calc, etc. It's a good idea to organize it like a dictionary, with columns for the lemma, part of speech, definition, and, if you care to, etymology. Most of these programs allow you to filter columns to search for a particular word or definition. However, spreadsheets make it hard for you to write lengthy definitions or usage notes.
The lexicon, it's best to use a format that can be filtered and queried. While most plaintext editors and word processors have a find function, it merely searches for matches in an unstructured manner. Many conlangers use a spreadsheet application like Excel, Google Sheets, LibreOffice Calc, etc. It's a good idea to organize it like a dictionary, with columns for the lemma, part of speech, definition, and, if you care to, etymology. Most of these programs allow you to filter columns to search for a particular word or definition. However, spreadsheets make it hard for you to write lengthy definitions or usage notes.


If you want to be more verbose but still have the ability to look up entries easily later, a personal wiki is a good alternative. It has the additional advantages of linking between entries to reference other terms in etymologies or derivative terms. [https://obsidian.md/ Obsidian] is a good choice since it stores notes as individual Markdown files, though keep in mind that certain minutia of how it operates makes it hard to use diacritics in file names or use a case-sensitive Romanization as is done with [[Klingon]].
If you want to be more verbose but still have the ability to look up entries easily later, a personal wiki is a good alternative. It has the additional advantages of linking between entries to reference other terms in etymologies or derivative terms. [https://obsidian.md/ Obsidian] is a good choice since it stores notes as individual Markdown files, though keep in mind that certain minutia of how it operates makes it hard to use diacritics in file names or use a case-sensitive Romanization as is done with [[Klingon]].


[https://tiddlywiki.com/ TiddlyWiki] is great if you want to share your work online. It's a single monolithic HTML file that can be uploaded to a bare-bones hosting service like Neocities. It's naming scheme for articles is case-sensitive, and it's very customizable if you're willing to endure a steep learning curve. And if you really want to, there are always full wikis like [https://www.dokuwiki.org/dokuwiki Dokuwiki] or [https://www.mediawiki.org/wiki/MediaWiki Mediawiki] (which is what FrathWiki uses). Though, as mentioned before, online content could not be kept forever, so it is still better to have an offline copy of it.
[https://tiddlywiki.com/ TiddlyWiki] is great if you want to share your work online. It's a single monolithic HTML file that can be uploaded to a bare-bones hosting service like Neocities. It's naming scheme for articles is case-sensitive, and it's very customizable if you're willing to endure a steep learning curve. And if you really want to, there are always full wikis like [https://www.dokuwiki.org/dokuwiki Dokuwiki] or [https://www.mediawiki.org/wiki/MediaWiki Mediawiki] (which is what FrathWiki uses).


Whatever you end up using, write as though you have an audience. A language is a complex system and may change a lot as you keep working on, so it can quickly become a BlackBox even to you if you come back after a long break or meet some text written long ago. Use complete sentences, and make lots of example texts to illustrate grammatical features. Also, documents in your mother language (if not English) and even your conlang itself are highly welcomed, as the latter would give a hint to improve your conlang even more.
You can also use a plaintext data interchange format to organize lexicon entries. This has the double advantage of being human-readable in whatever text editor you care to use, but also structured in a way that can be parsed by any number of programming languages with the appropriate libraries. Three such formats are widely supported, having existed for over 20 years at time of writing: [https://www.w3.org/XML/ XML], [https://www.json.org/json-en.html JSON], and [https://yaml.org/ YAML]. Of these, XML is the oldest, but its syntax is quite verbose compared to the other two. YAML is the most human-readable, but has somewhat less support compared to JSON. JSON is probably the sweet spot between breadth of support and human-readability. Converting between JSON and YAML is trivial, so you can write a lexicon in YAML and convert it to JSON for processing. A fourth format similar to YAML, [https://nestedtext.org/en/latest/ NestedText], is perhaps the most suited to the needs of a conlang lexicon. It has no scalar data types other than string, meaning you don't need to worry about when to quote values, but it has very little support.
 
A detailed explanation of how these formats work is outside the scope of this article, but the following examples from the [[Commonthroat]] lexicon rendered in each format may provide an idea of how they work.
 
JSON:
 
<pre>
{
  "lemma": "qnpq",
  "pronunciation": "h1233h",
  "partOfSpeech": "verb",
  "definition": "count"
}
 
</pre>
 
XML:
 
<pre>
<entry>
  <lemma>
    qnpq
  </lemma>
  <pronunciation>
    h1233h
  </pronunciation>
  <partofspeech>
    verb
  </partofspeech>
  <definition>
  count
  </definition>
</entry>
</pre>
 
YAML/NestedText:
 
<pre>
- lemma: qnpq
  pronunciation: h1233h
  partOfSpeech: verb
  definition: count
</pre>
 
 
Whatever you end up using, there are some things to keep in mind. First, backup, backup, backup! A lot of hard work goes into making a conlang, and you don't to lose it. Do not use a forum thread as your primary means of documentation, and do not store your work (only) in the Cloud, as it has the same problems as proprietary document formats, support isn't garenteed forever and you may someday lose access. Second, write as though you have an audience. A language is a complex system and may change a lot as you keep working on, so it can quickly become a BlackBox even to you if you come back after a long break or meet some text written long ago. Use complete sentences, and make lots of example texts to illustrate grammatical features. Also, documents in your mother language (if not English) and even your conlang itself are highly welcomed, as the latter would give a hint to improve your conlang even more.

Revision as of 08:16, 20 July 2025

This article is a stub. If you can contribute to its content, feel free to do so.

This page offers strategies for documenting your conlang as well as tools for doing so.

All conlangs start out as rough outlines, perhaps a single page with a hastily scribbled phonology table and a few inflectional paradigms. These are called sketchlangs. Many, if not most, will never go beyond this stage. That's not a bad thing. However, if you intend to keep working on a language long-term, you need to develop a systematic way of writing documentation. There are at least two, and preferably three, things you'll need to keep track of: Your grammar, your lexicon, and example texts, which may be separated or may be integrated into the other two documents.

You can use any word processor or text editor for the grammar, but keep in mind that file formats can be deprecated and software isn't supported forever. Whatever you use should be able to produce headings to organize the document into sections for phonology, syntax, etc. and tables to illustrate inflectional paradigms (if not analytic) or example glosses. It goes without saying it should also be able to render whatever characters you've chosen for your Romanization, if not using Latin script itself. Markdown is particularly suitable for this task, as the bare markup is intended to be presentable as-is without being rendered, and it supports both headings and (in most dialects) tables. You can write a plaintext document in Markdown and use any number of converters to render it as HTML, PDF, etc. Whatever you use, make sure it can be converted into other formats in the event you change platforms or otherwise lose access to the software.

The lexicon, it's best to use a format that can be filtered and queried. While most plaintext editors and word processors have a find function, it merely searches for matches in an unstructured manner. Many conlangers use a spreadsheet application like Excel, Google Sheets, LibreOffice Calc, etc. It's a good idea to organize it like a dictionary, with columns for the lemma, part of speech, definition, and, if you care to, etymology. Most of these programs allow you to filter columns to search for a particular word or definition. However, spreadsheets make it hard for you to write lengthy definitions or usage notes.

If you want to be more verbose but still have the ability to look up entries easily later, a personal wiki is a good alternative. It has the additional advantages of linking between entries to reference other terms in etymologies or derivative terms. Obsidian is a good choice since it stores notes as individual Markdown files, though keep in mind that certain minutia of how it operates makes it hard to use diacritics in file names or use a case-sensitive Romanization as is done with Klingon.

TiddlyWiki is great if you want to share your work online. It's a single monolithic HTML file that can be uploaded to a bare-bones hosting service like Neocities. It's naming scheme for articles is case-sensitive, and it's very customizable if you're willing to endure a steep learning curve. And if you really want to, there are always full wikis like Dokuwiki or Mediawiki (which is what FrathWiki uses).

You can also use a plaintext data interchange format to organize lexicon entries. This has the double advantage of being human-readable in whatever text editor you care to use, but also structured in a way that can be parsed by any number of programming languages with the appropriate libraries. Three such formats are widely supported, having existed for over 20 years at time of writing: XML, JSON, and YAML. Of these, XML is the oldest, but its syntax is quite verbose compared to the other two. YAML is the most human-readable, but has somewhat less support compared to JSON. JSON is probably the sweet spot between breadth of support and human-readability. Converting between JSON and YAML is trivial, so you can write a lexicon in YAML and convert it to JSON for processing. A fourth format similar to YAML, NestedText, is perhaps the most suited to the needs of a conlang lexicon. It has no scalar data types other than string, meaning you don't need to worry about when to quote values, but it has very little support.

A detailed explanation of how these formats work is outside the scope of this article, but the following examples from the Commonthroat lexicon rendered in each format may provide an idea of how they work.

JSON:

{
  "lemma": "qnpq",
  "pronunciation": "h1233h",
  "partOfSpeech": "verb",
  "definition": "count"
}

XML:

<entry>
  <lemma>
    qnpq
  </lemma>
  <pronunciation>
    h1233h
  </pronunciation>
  <partofspeech>
    verb
  </partofspeech>
  <definition>
  count
  </definition>
</entry>

YAML/NestedText:

- lemma: qnpq
  pronunciation: h1233h
  partOfSpeech: verb
  definition: count


Whatever you end up using, there are some things to keep in mind. First, backup, backup, backup! A lot of hard work goes into making a conlang, and you don't to lose it. Do not use a forum thread as your primary means of documentation, and do not store your work (only) in the Cloud, as it has the same problems as proprietary document formats, support isn't garenteed forever and you may someday lose access. Second, write as though you have an audience. A language is a complex system and may change a lot as you keep working on, so it can quickly become a BlackBox even to you if you come back after a long break or meet some text written long ago. Use complete sentences, and make lots of example texts to illustrate grammatical features. Also, documents in your mother language (if not English) and even your conlang itself are highly welcomed, as the latter would give a hint to improve your conlang even more.