This appendix shows the current process we go through to update the data sets on a quarterly basis. A number of the steps are idiosyncratic to our specific technical environment and this should be seen as illustrative only. Over time we have gradually simplified the process by automating various tasks -- for example the calculation of the mediation scores used to involve three separate runs of a program originally developed for the research reported in Schrodt et al 2001, but that program was subsequently modified to allow the statistics to be computed in a single run. All of the non-commercial computer programs used here are open source; most are available on the KEDS project web site and any others can be provided by the authors on request.
- In a browser, go to LEXIS-NEXIS page on KU library electronic data sources page: http://web.lexis-nexis.com/universe
- Click "Sources" in the blue bar that goes across the top of the page. This will take you to a page titled "Source List"
- In the Source List "Find Title" box, enter "Agence France Press" and click "Find Title". This will produce a list of several AFP sources.
- Click the "Search this title" link of the first option (English) in the list. This will take you to a standard NEXIS search page, except that it has now been set to search only AFP.
- Enter the search strings below.
israel or palestin! or syria or jordan or lebanon or egypt or plo
This string (and other search strings used in the KEDS project) can be found in the www.input file that will be used below.
- Change the "in" selection to "Full text"
- Enter a series of dates that will give you fewer than 1000 storiesÑthis is the limit for the stories returned by a NEXIS search on the Academic-Universe site. Sometimes this takes a bit of experimenting, but a search period of two weeks generally works. Click "Search"
- Click the first story in the resulting list. Copy the address, which will look something like
http://web.lexis-nexis.com/universe/document? _m=69565bab69eb293ee7c03173007d31bd&_docnum=1&wchp=dGLbVzb-lSlzV &_md5=7178503bdb21eac6c35eb90a4d666814
so be sure you get the whole thing. (Netscape messed this up in the past; MS-Explorer, Safari and Mozilla seem to work fine.)
- Use BBEdit to open the file "www.input" in the "download.dir" directory on the KU Unix server raven. Paste the address you just copied into the first line of this file (there can be more stuff in the file; the program just reads the first line) [BBEdit is a text editing program for the Macintosh]
- Log onto the raven account. Enter
cd download.dir [puts you into the download subdirectory]
perl nexispider.pl [runs the nexispider program]
- The nexispider.pl program will ask for a file prefix; enter AFPLE and hit [return]
- You should see the stories start scrolling byÑthis will go pretty quickly and periodically pause when there is a delay retrieving the story.
[You'all young whippersnappers should know that in the old days, we had to do this over a phone line with a 300 baud modem. Before we left to walk to school 5 miles barefoot in the snow uphill both ways in July]
If the program stops immediately you probably gave it the URL of the NEXIS index, rather than the URL of the first story; return to step [8].
- Tricks if you are downloading a bunch of stories:
a. You can log into raven multiple times and have several copies of nexispider.pl going simultaneously (this is why we use Unix systems rather than MS-Windows...). nexispider automatically assigns non-conflicting file names to the resulting files based on the prefix you entered in step [11] and the beginning and end of the download (it gets this information from the file; you donÕt need to enter it).
b. As soon as nexispider.pl starts in any window, you can change the www.input file, putting a new URL at the top of the file.
c. Click the "Edit search" option in the top bar of the NEXIS window to change the dates on a search.
- Now use the Unix "mv" command to move the file(s) to the appropriate subdirectory, e.g.
mv AFPLE.020601-020615 text_files/levant/yr2003/
where "text_files/levant/yr2003/" gets one into the year 2003 subdirectory inside the levant subdirectory inside text_files, which is how we've got things set up at the moment.
Hint: You can move a bunch of files at once using a wild-card
mv AFPLE.* text_files/levant/yr2003/
- Move into that directory using the "cd" command
cd text_files/levant/yr2003/
- Copy the contents of the subdirectory into a file named "filelist" using the command
filelist
- Open the file "filelist" in BBEdit using BBEditÕs "Open from FTP server..." menu option (you'll need to go through the intermediate directories to get there) and delete any file names that are not AFP downloads (this will be "filelist", "nexisreverse.pl" and possibly some other left-over junk). The remaining files will be in chronological order assuming they have the names given them by nexispider.
- Run
perl nexisreverse.pl
which combines all of the files, puts the entries into correct chronological order (they are in reverse order in the NEXIS output), and delete everything except the lead (first) sentence. The output will be in a file with a ".rev" suffix; the original files are left unchanged.
- Change the name of the output to something informative. This file is now ready to be run through TABARI.