Appendix 2: Input Format

Fields marked with ** are required. Text fields are limited to 100 characters except for:

  • textoriginal, textmkup and casevalues have no length limitation
  • the three comment fields collcmt, textcmt and casecmt can be up to 500 characters
  • coder ID, which is limited to 32 characters.

Collection fields

Collection ID, which needs to be unique within the workspace. If this is not provided in the file, collfilename is assigned by the program
directory and name of the YAML file (without the suffix) where the file was read from; this is assigned by the program
collection date YYYY-MM-DD
datetime of editing of this collection [provided by system]
collection comments
categories [optional]
categories and items for dynamic selection menus (dynselect)
one or more related texts
zero or more coded records

Category fields

These are all indented: the first line is the category name followed by a required colon (:). This is followed by the menu options, one per line preceded by an indent and a hyphen-space (- ``). If the menu option begins with an asterisk (*``) it is the default value for the menu. The following figure shows an example of menu items specified for three categories, statecat,``torgcat`` and loccat.

Example of categories

Text fields

unique text ID for CIVET. This needs to be unique within the workspace, and given how collections might get mixed across workspace folders, ideally should be unique for the entire project. If a value for the text field is not provided it will be assigned by the program.
text date YYYY-MM-DD
Boolean: text has been marked for deletion.
publisher [any string]
publisher ID [any string]
bibliographic citation
geographical locations
author [any string]
copyright notification or other license information
lede/headline/abstract—this is a short summary of the article which will be highlighted and also will appear in the sorting routine.
original text of the story; this will not be modified by the system
marked up text: this is the annotated version of the story with any mark-up that has been added either automatically on manually
datetime time of editing of this block [provided by system]
coder ID

Case fields

** caseid
Internal case/event ID. This is assigned by the program and probably should not be changed; external IDs can be entered as variables.
** casedate
Date and time this case was coded [provided by system]
comment for case
coder ID
This is a string formatted as a Python dictionary which contains pairs of variable names and values

Date formats

[This has not been consistently implemented in Beta-0.9]

Dates are ISO-8601 (;;; so generally either

  • YYYY-MM-DDThh:mm:ss
  • YYYY-MM-DDThh:mm:ss[+-]hh:mm

UTF-8 Encodings

The system currently translates UTF-8 encodings to ASCII using the Django function encoding.smart_str(). We expect to eventually convert CIVET to Python 3.x (at present it is Python 2.7) which is UTF-8 “native” but it isn’t there yet, so you are best off doing your own conversions during the process of converting the original texts to the YAML formatting.

Sample File

The following figure shows an example of a simple YAML file; This is a screen capture of a file being edited with BBEdit, hence the color mark-up. A workspace demonstration file with several collections can also be downloaded in the program.

YAML file