Appendix 2: Input Format¶
Fields marked with ** are required. Text fields are limited to 100 characters except for:
textoriginal
,textmkup
andcasevalues
have no length limitation- the three comment fields
collcmt
,textcmt
andcasecmt
can be up to 500 characters - coder ID, which is limited to 32 characters.
Collection fields¶
- collid
- Collection ID, which needs to be unique within the workspace. If this is not provided in the file, collfilename is assigned by the program
- collfilename
- directory and name of the YAML file (without the suffix) where the file was read from; this is assigned by the program
- colldate
- collection date YYYY-MM-DD
- **colledit
- datetime of editing of this collection [provided by system]
- collcmt
- collection comments
- categories [optional]
- categories and items for dynamic selection menus (dynselect)
- texts
- one or more related texts
- cases
- zero or more coded records
Category fields¶
These are all indented: the first line is the category name followed by a required colon (:). This is followed by the
menu options, one per line preceded by an indent and a hyphen-space (- ``). If the menu option begins with an asterisk (
*``)
it is the default value for the menu. The following figure shows an example of menu items specified for three categories,
statecat
,``torgcat`` and loccat.
Text fields¶
- **textid
- unique text ID for CIVET. This needs to be unique within the
workspace, and given how collections might get mixed across
workspace folders, ideally should be unique for the entire project.
If a value for the
text
field is not provided it will be assigned by the program. - **textdate
- text date YYYY-MM-DD
- textdelete:
- Boolean: text has been marked for deletion.
- textpublisher
- publisher [any string]
- textpubid
- publisher ID [any string]
- textbiblio
- bibliographic citation
- textgeogloc
- geographical locations
- textauthorr
- author [any string]
- textlang
- language
- textlicense
- copyright notification or other license information
- **textlede
- lede/headline/abstract—this is a short summary of the article which will be highlighted and also will appear in the sorting routine.
- textcmt
- comment
- **textoriginal
- original text of the story; this will not be modified by the system
- textmkup
- marked up text: this is the annotated version of the story with any mark-up that has been added either automatically on manually
- textmkupdate
- datetime time of editing of this block [provided by system]
- textmkupcoder
- coder ID
Case fields¶
- ** caseid
- Internal case/event ID. This is assigned by the program and probably should not be changed; external IDs can be entered as variables.
- ** casedate
- Date and time this case was coded [provided by system]
- casecmt
- comment for case
- casecoder
- coder ID
- casevalues
- This is a string formatted as a Python dictionary which contains pairs of variable names and values
Date formats¶
[This has not been consistently implemented in Beta-0.9]
Dates are ISO-8601 (http://en.wikipedia.org/wiki/ISO_8601; http://www.w3.org/TR/NOTE-datetime; https://xkcd.com/1179/; http://www.cl.cam.ac.uk/mgk25/iso-time.html) so generally either
- YYYY-MM-DD
- YYYY-MM-DDThh:mm:ss
- YYYY-MM-DDThh:mm:ss[+-]hh:mm
UTF-8 Encodings¶
The system currently translates UTF-8
encodings to ASCII using the
Django function encoding.smart_str()
. We expect to eventually
convert CIVET to Python 3.x (at present it is Python 2.7) which
is UTF-8 “native” but it isn’t there yet, so you are best off doing
your own conversions during the process of converting the original
texts to the YAML formatting.
Sample File¶
The following figure shows an example of a simple YAML file; This is a screen capture of a file being edited with BBEdit, hence the color mark-up. A workspace demonstration file with several collections can also be downloaded in the program.