SinCris WDC: Recommendations for the National Editors

 back  

Information for the National Editors

Introduction

You are responsible for gathering biographical data on crystallographers in your country. It is hoped that most of this data will come to you by e-mail in the form of completed data entry forms. However, you may also receive data entry files on diskette or other magnetic media, and if there is limited or no access to email in your country, you may find also that you will need to arrange the entry of data by hand from paper forms.

Whether you enter the data yourself, or receive it direct from contributors, each entry will consist of a series of data fields, defined in the STAR format described by Syd Hall [(1991), J. Chem. Inf. Comput. Sci. 31, 326-333]. It is probably most convenient if every individual entry is in a separate file, though it is possible to combine several entries in a file. Indeed, when your work of data collection is complete, you will send to Chester a single STAR file containing all the entries for your country.

STAR File Format; Validation

Let us suppose for now that you will place each individual entry into a separate file, for ease of preliminary handling. Each such file must be a valid STAR file. The data entry form (distributed here as form.wdc) is such a file. It is filled with empty data items (query marks '?' for every item required). The contributor is expected to replace each query with the appropriate data. If the replacement text is a single word or number, it may go in without any problem. If it is a short phrase, with several words separated by spaces, the entire phrase should be surrounded by matching quote marks (' or ") so long as it will fit on a single 80-character line. If it must extend over more than one line, a semicolon must be given as the leftmost character of the first line, and of the line following the last line of text. These rules are discussed in more detail in an article in Acta Cryst. (1993), A49, 000-000.

The STAR file begins with a data_xxxx block name, then a "loop_" statement followed by a list of data names defining the data that will follow. Any particular data item will be recognised by processing software according to its position within the file; if it is the fifth data item, its nature is defined by the fifth data name in the list. It is therefore essential that the file structure is maintained, otherwise it would be possible to lose track of which data name referred to any given fragment of data.

The program QUASAR is a general-purpose STAR handling tool which can be used to validate the structure of any STAR file. You can test it by typing the following sequence of commands:

(the first line starts the program; the next line specifies the name of the file to test - here it is the "test.wdc" test file supplied; and the third line is the command to check the logical structure).

You should run QUASAR on any STAR file you receive, and rerun it any time you edit such a file.

Keyword Checking

Each entry file should be examined to ensure that it conforms to the rules for presentation as defined in the Dictionary file stardic.W92, which is also included in the package. Old versions have to be replaced by new ones available from Chester. Most checking can be done by eye, but to ease the burden of checking the keywords supplied in the _interests_key_words field, the program keychk has been supplied. It will identify the keywords data field and extract the phrases in that field. Each phrase is matched against the list of
keywords, and its presence or absence noted. If present, the relevant list in which the phrase appears is given. For a phrase of several individual words separated by underscore characters (e.g. Non-crystalline_potassium_compounds), if the entire phrase is not matched, further matches are sought by progressively dropping the leftmost component. The match is performed by converting the input characters all to lower case, but your Editorial judgement should be used to edit the input field so that only proper names begin with an upper-case letter.

The program is a pure filter, so you should invoke it under Unix by typing
keychk < test.wdc
or
cat test.wdc | keychk

VALENT - A Tool for Handling Entries

The shell script 'valent' is provided as an example of a utility that you may use in administering the data you receive. Before you run it for the first time, you should edit the file so that the variable 'TOPDIR' is set to the full pathname of the directory your files reside in, and 'COUNTRY' is set to the name of the country you are handling. Be aware that case is significant. Thus write France, not FRANCE or france.

Valent allows you to:

You may try to use option 2 (adding a file) with the test file test.wdc to check the logical structure of the file. The result of the test is printed on the screen, and you are asked if you wish to check the keywords. If errors are indicated in the file structure, you should exit at this point and correct the file. When there are no errors, you may answer in the affirmative to run keychk on the file. The results of this program are also written to the screen. You are then asked whether you wish to append the entry to the master file for your country. If you answer 'n', you exit the program, so that the valent utility has simply validated the file. If you answer 'y', the data in the file will be appended to the master file for your country.

Try option 1 (input from the keyboard) with your own data. It calls a program named input which adds as many entries as you want and checks their structure and the validity of the keywords. New data will be added to the file created by input called "your_country_acc.wdc". This file will be deleted only when the contents are added to your national database. It is editable if you find that some entries are not valid keywords.

Example of Use of the Package

The tools provided are intended to assist you in your work of collecting and editing your country's entries; you are free to work in different ways if you prefer, using specific features of your own computer system to help in your work, or writing your own software if you so wish. Here we present an example of how one subeditor (the Subeditor for Ruritania) may go about his work.
    [ THE EDITOR READS HIS MAIL AND DISCOVERS A MESSAGE FROM A CONTRIBUTOR.
      THE CONTRIBUTOR HAS HELPFULLY GIVEN HIS NAME IN THE Subject: FIELD ]
	       
$ mail
Mail version SMI 4.0 Thu Jul 23 13:52:20 PDT 1992  Type ? for help.
"/usr/spool/mail/bm": 1 message 1 new
>N  1 rupert@zenda.bitnet Fri Nov 13 14:33  134/4682  WDC entry for hentzau

    [ THE EDITOR SAVES THE MESSAGE TO A FILE hentzau.ent IN HIS WORKING
      DIRECTORY. HIS MAIL SYSTEM WILL APPEND THE MESSAGE TO ANY EXISTING
      FILE OF THAT NAME. HIS PLAN IS TO HAVE EVERY ENTRY IN A FILE NAMED
      AFTER THE CONTRIBUTOR - THIS WILL MAKE COLLATION EASIER LATER. EACH
      FILE IS GIVEN THE SUFFIX ".ent" (FOR "entry") ]
& w /usr/home/subed/wdc/hentzau.ent
"/usr/home/subed/wdc/hentzau.ent" [New file] 132/4657
& q

    [ THE EDITOR NOW CHANGES TO HIS WORKING DIRECTORY AND RUNS VALENT ]
$ cd /usr/home/subed/wdc
$ valent 
Input new data from Keyboard  [1]  
               from a file    [2]  
Exit                          [3] 
 
2
Give filename
hentzau.ent

  STAR File Processor (May 18 92)
 -------------------------------- 
 STAR archive file is hentzau.ent                   
 Checking archive file for logical integrity.
 Error >>> Data structure error at data item  dshgfadshgfdsahgfkadsjhgfadsjhg
           Fatal error -- archive line     1
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Copyright (c)1992 International Union of Crystallography
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    [ OOPS - FORGOT TO REMOVE THE MAIL HEADER! SOME MAIL SOFTWARE WILL DO
      THIS AUTOMATICALLY; OR IT MAY BE EASY TO WRITE A SCRIPT TO DO IT ]
Add to country list [y|n] ? 
    [ EXIT AT THIS POINT ]
n
    [ EDIT THE FILE AND RERUN ]
$ vi hentzau.ent 
$ valent
Input new data from Keyboard  [1]  
               from a file    [2]  
Exit                          [3] 
				
2
Give filename
hentzau.ent


  STAR File Processor (May 18 92: mod 920712 BM)      
 -------------------------------- 
 STAR archive file is hentzau.ent                   
 Checking archive file for logical integrity.
 Checking complete and correct.

Check keywords [y|n] ? 
    [ NOW CHECK THE KEYWORDS ]
y

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Copyright (c)1992 International Union of Crystallography
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

(Hentzau): crystallography_of_carbon_compounds is not in any list
               crystallography is in the Methods list
               of_carbon_compounds is not in any list
               (of) - ignore
               carbon_compounds is in the Compounds list
           diamond is in the Compounds list
           non-crystalline_minerals is not in any list
               non-crystalline is in the Attributes list
               minerals is in the Compounds list

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Add to country list [y|n] ? 
    [ LOOKS OK. NOW THE EDITOR WILL EXAMINE THE OTHER FIELDS. HE HAS
      DECIDED TO KEEP THE ENTRIES IN SEPARATE FILES FOR NOW ]
n
$

    [ SOME CONSIDERABLE TIME LATER, WHEN THE EDITOR HAS COLLECTED AND VALIDATED
      ALL THE RURITANIAN ENTRIES, HE WILL CREATE HIS COUNTRY LIST AT ONE
      SITTING. FIRST HE ENSURES THAT ANY OLD COPIES OF HIS COUNTRY LIST
      ARE GONE ]
$ rm ruritania.wdc
    [ NOW HE TYPES A LOOP INSTRUCTION TO PROCESS EVERY ENTRY IN ALPHABETIC ORDER
      SO THAT ENTRIES WILL BE ADDED TO THE COUNTRY LIST ALREAD COLLATED. EVEN
      THOUGH HE HAS TAKEN CARE WITH ALL HIS EDITING AND RERUN QUASAR EVERY
      TIME HE HAS EDITED A FILE, HE REMAINS CAUTIOUS AND PROCESSES EVERY ENTRY
      THROUGH VALENT ]
$ for F in *.ent
> do
> valent $F
> done
     [  . . . SOME HOURS LATER ]
$ mail -s "WDC country list for Ruritania" teched@iucr.org < ruritania.wdc
If you use a different version of the Unix operating system, or if you use similar tools on a different operating system, you may not be able to follow this model exactly. However, it should suggest to you the way in which you might like to proceed with your data collection and validation.

Please send your comments and your suggestions to Yves Epelboin, epelboin@lmcp.jussieu.fr .


Last update April 11 1996 Y.E.
This service is made available through a grant from CNRS and Ministère de l'Education Nationale