Appendix: Instructions for updating FAST data sets

This appendix shows the current process we go through to update the data sets on a quarterly basis. A number of the steps are idiosyncratic to our specific technical environment and this should be seen as illustrative only. Over time we have gradually simplified the process by automating various tasks -- for example the calculation of the mediation scores used to involve three separate runs of a program originally developed for the research reported in Schrodt et al 2001, but that program was subsequently modified to allow the statistics to be computed in a single run. All of the non-commercial computer programs used here are open source; most are available on the KEDS project web site and any others can be provided by the authors on request.

Acquiring AFP text

  1. In a browser, go to LEXIS-NEXIS page on KU library electronic data sources page:
  2. Click "Sources" in the blue bar that goes across the top of the page. This will take you to a page titled "Source List"
  3. In the Source List "Find Title" box, enter "Agence France Press" and click "Find Title". This will produce a list of several AFP sources.
  4. Click the "Search this title" link of the first option (English) in the list. This will take you to a standard NEXIS search page, except that it has now been set to search only AFP.
  5. Enter the search strings below.

    israel or palestin! or syria or jordan or lebanon or egypt or plo

    This string (and other search strings used in the KEDS project) can be found in the www.input file that will be used below.

  6. Change the "in" selection to "Full text"
  7. Enter a series of dates that will give you fewer than 1000 stories—this is the limit for the stories returned by a NEXIS search on the Academic-Universe site. Sometimes this takes a bit of experimenting, but a search period of two weeks generally works. Click "Search"
  8. Click the first story in the resulting list. Copy the address, which will look something like _m=69565bab69eb293ee7c03173007d31bd&_docnum=1&wchp=dGLbVzb-lSlzV &_md5=7178503bdb21eac6c35eb90a4d666814 so be sure you get the whole thing. (Netscape messed this up in the past; MS-Explorer, Safari and Mozilla seem to work fine.)
  9. Use BBEdit to open the file "www.input" in the "download.dir" directory on the KU Unix server raven. Paste the address you just copied into the first line of this file (there can be more stuff in the file; the program just reads the first line) [BBEdit is a text editing program for the Macintosh]
  10. Log onto the raven account. Enter
    cd download.dir [puts you into the download subdirectory]
    perl [runs the nexispider program]
  11. The program will ask for a file prefix; enter AFPLE and hit [return]
  12. You should see the stories start scrolling by—this will go pretty quickly and periodically pause when there is a delay retrieving the story.

    [You'all young whippersnappers should know that in the old days, we had to do this over a phone line with a 300 baud modem. Before we left to walk to school 5 miles barefoot in the snow uphill both ways in July]

    If the program stops immediately you probably gave it the URL of the NEXIS index, rather than the URL of the first story; return to step [8].

  13. Tricks if you are downloading a bunch of stories:
  14. a. You can log into raven multiple times and have several copies of going simultaneously (this is why we use Unix systems rather than MS-Windows...). nexispider automatically assigns non-conflicting file names to the resulting files based on the prefix you entered in step [11] and the beginning and end of the download (it gets this information from the file; you don’t need to enter it).

    b. As soon as starts in any window, you can change the www.input file, putting a new URL at the top of the file.

    c. Click the "Edit search" option in the top bar of the NEXIS window to change the dates on a search.

  15. Now use the Unix "mv" command to move the file(s) to the appropriate subdirectory, e.g. mv AFPLE.020601-020615 text_files/levant/yr2003/ where "text_files/levant/yr2003/" gets one into the year 2003 subdirectory inside the levant subdirectory inside text_files, which is how we've got things set up at the moment.
  16. Hint: You can move a bunch of files at once using a wild-card mv AFPLE.* text_files/levant/yr2003/

  17. Move into that directory using the "cd" command
    cd text_files/levant/yr2003/
  18. Copy the contents of the subdirectory into a file named "filelist" using the command filelist
  19. Open the file "filelist" in BBEdit using BBEdit’s "Open from FTP server..." menu option (you'll need to go through the intermediate directories to get there) and delete any file names that are not AFP downloads (this will be "filelist", "" and possibly some other left-over junk). The remaining files will be in chronological order assuming they have the names given them by nexispider.
  20. Run
    which combines all of the files, puts the entries into correct chronological order (they are in reverse order in the NEXIS output), and delete everything except the lead (first) sentence. The output will be in a file with a ".rev" suffix; the original files are left unchanged.
  21. Change the name of the output to something informative. This file is now ready to be run through TABARI.

Coding and Aggregating

  1. Go to Schrodt's G4 and go into the Data/FAST Data folder. This has a set of files that look like FAST YYYY-MM Each folder contains CAMEO and WEIS dictionaries for doing the update, and a file containing the data up until the end of the previous update (this will be data from the beginning of the current year). There should also be a copy of the TABARI automated coding program in there somewhere. Make a new copy of this folder and re-name it with the last month (and year, if doing a new year) of the data.
  2. Open the new reversed data and the previous updated data, and append the new data to the old. Rename this data set AFPLE.YYYY-MM.L (e.g. AFPLE.2003-07.L) where MM is the last month of the data. (The ".L" suffix indicates that these are leads only)
  3. Move the copy of TABARI and the new text file into the CAMEO folder for the relevant region. It will probably replace an existing folder, unless you've switched to a new year.
  4. Update the project file
    by modifying the name of the text file (adjusting the month) and then also change the name of that file (updating the month)
  5. Run TABARI with the project file that is in the folder; A)utocode the new file. Rename the output file
    LVNT.WEIS.YYYY-MM (e.g. LVNT.WEIS.2003-07 Delete the earlier version of the file (there is still a copy in the file you copied the directory from)
  6. Run this file through the program One-A-Day Filter; output file will have a .filt suffix.
  7. Run the program Trim_Events on the filtered file; delete the .filt infix so that .trim remains as the suffix.
  8. Copy the trimmed file into K_Count folder in FAST Data. Edit the file Goldstein.FAST.TAB and edit the "Prefix" and "Files" fields to correspond to the new yy-mm. combination.
  9. Run the program K-Count. Make a new folder FASTyy-mm and put all of the output files (which also contain Israel-Lebanon information) plus the .trim file in that folder. Copy the PAL>ISR and ISR>PAL files back to the FAST YYYY-MM folder
  10. Make a copy of the file FAST.YYMM.xls (e.g. FAST.0307.xls) and update the final year-month infix. Open this file in MS-Excel.
  11. Open the PAL>ISR and ISR>PAL files in BBEdit and transfer the results into the "Goldstein total" and "Goldstein average" cells, adding months as needed. (Calculate the averages using a calculator based on the Goldstein score and the number of events per month; both are reported in the K-Count output files.). Select (i.e. click on) the various lines on the embedded graphs and update the final cell of the series; note that this involves updating both the $A entry to get the labels as well as the $B/$C entry to get the numbers.
  12. Run the FAST_Mediation program with the *.trim file as input and YY-MM as the suffix. The file Mediate.out.YY-MM (e.g. Mediate.out.03-07) contains the counts of mediation events for the UN, USA, and Europe. (Note: just enter the suffix, not the entire file name, since the latter will cause a buffer overflow and freeze the computer...)
  13. Enter these figures into the "Mediation" worksheet of the FAST.YYMM.xls file; update the final cells of the series.
  14. Update the Documentation worksheet in the spreadsheet.

Updating the casualty data

  1. In a browser, go to
    click the "Statistics" link, then on that page, the "Monthly Totals" link under "Fatalities". (Note that the data may or may not have been compiled to the end of the month. If it hasn't, wait a few days until it has been completed)
  2. Get the total of the following figures:
  3. Total these, and enter in the "Deaths" worksheet of the FAST.YYMM.xls file.

  4. Go to
    Again, note that the data may or may not have been compiled to the end of the most recent month, although usually PRCS does daily updates.
  5. Copy the appropriate monthly totals from the "Total Deaths" column. Enter this total in the "Deaths" worksheet of the FAST.YYMM.xls file; update the final cells of the series.

Updating the WEIS data

  1. Make a copy of the Data/ folder, rename with the new final date.
  2. Folder for updating the .summary file is currently Vinland G3/Programming/K_Count.03. Move the LVNT.WEIS.yyyy-mm.trim into this—currently this is moving across volumes on the G3 so it will automatically make a copy.
  3. Edit the Goldstein.TAB file, changing the lines
    PREFIX=LVyy-mm. [near top]
    LVNT.WEIS.yyyy-mm.trim [near bottom]
  4. Run the program KEDS_Count, which is the product of an ancient Pascal compiler, takes forever to run, and produces a very large number of individual dyadic files.
  5. Edit the file merge.files to correspond to the new prefix. Run the program, which produces the file merge.output. This contains the new summary information.
  6. Open and merge.output in BBEdit. Cut and paste the new data into the .summary file, and change the name. Wouldn’t hurt to read this into Excel and plot a few of the series to make sure nothing weird has happened.
  7. Open LVNT.WEIS.yyyy-mm.filt and add new records from this to LEVANT.WEIS.
  8. Edit Levant.WEIS.ReadMe to reflect the new dates—these occur in three or four different places and this probably needs some additional re-writing. Also update the total event count.
  9. Stuff and zip the files using the StuffIt utility; when doing the .zip version, change the preferences so that the text files will be converted to MS-DOS line endings.
  10. Upload to the KEDS web site, and change the HTML text to reflect the new dates and event totals. Test the downloads to make sure they work correctly.

Last update: 30 September 2004