|
'CIF' (Crystallographic Information File) is a subset
of STAR (Self-defining Test Archive and Retrieval format [1]).
The CIF format is suitable for archiving, in any order, all types
of text and numerical data. The goals of
CIF are a data structure that is general, upwardly
compatible, flexible, and which facilitates electronic publication.
CIF was developed by the IUCr Working Party on Crystallographic
Information in an effort sponsored by the IUCr Commission on Crystallographic
Data and the IUCr Commission on Journals. The results of this
effort were seen in a dictionary of data items sufficient for
archiving the small molecule crystallographic experiment and its
results[2]. This dictionary was adopted by the IUCr at its 1990
Congress in Bordeaux. CIF is now the format in which structure
papers are submitted to Acta Crystallographica C; software has
been developed to automatically typeset a paper from a CIF.
In 1990, the IUCr formed a working group that would
expand this dictionary by including data items relevant to the
macromolecular crystallographic experiment. This working group
was chaired by Paula Fitzgerald (Merck) and included Enrique Abola
(Protein Data Bank), Helen Berman (Rutgers), Phil Bourne (then
at Columbia) Eleanor Dodson (York), Art Olson (Scripps), Wolfgang
Steigemann (Martinsreid), Lynn Ten Eyck (SDSC), and Keith Watenpaugh
(then Upjohn).
The original short term goal of the working group was to fulfill the mandate set by the IUCr: to define CIF data names that needed to be included in the CIF dictionary in order to adequately describe the macromolecular crystallographic experiment and its results. Long term goals were also established: to provide sufficient data names so that the experimental section of a structure paper could be written automatically and to facilitate the development of tools so that computer programs could easily interface with mmCIF. During the course of the development of the mmCIF dictionary, however, these goals were greatly expanded, and the resulting dictionary can now be thought of as a flat-file representation of a fully-relational database schema describing the complete macromolecular cryst-allographic experiment and its results.
In order to describe the progress of this project and to solicit community feedback, several informal and formal meetings were held. The first meeting, hosted by Eleanor Dodson, convened in April 1993 at the University of York. The attendees included the mmCIF working group, structural biologists and computer scientists. A major focus of the discussion was whether the formal structure of the dictionary that was implemented using Dictionary Definition Language (DDL 1.0) was adequate to deal with the complexity of the structural data items. Criticisms included the idea that the data typing was not strong enough and that there were no formal links among the data items. A new working group was formed to try to address these issues. The second Workshop was hosted by Phil Bourne in Tarrytown, NY, in October 1993. The topics at that meeting focused on the development of software tools and the DDL. In October 1994, a workshop hosted by Shoshana Wodak at the Free University of Brussels resulted in the development of a new DDL that addressed the various problems that had been identified. Following the Brussels meetings, the mmCIF dictionary (including a complete image of the CIF core dictionary) was recast in DDL 2.1.
The mmCIF dictionary has continued to grow and be
refined during the several years of its development, originally
based on input from the working group, and subsequently based
on input garnered at the three CIF workshops. By mid-1995, a
version of the mmCIF dictionary that was considered complete in
most regards was in hand, and that dictionary was presented to
the community at large for review at the 1995 ACA Meeting in Montreal.
The review was (and still is) managed via a Web page
and a mailing list. The Web page (http://ndbserver-.rutgers.edu/mmcif)
contains copies of the dictionary (as plain text and as an HTML
Web-searchable version), as well as background material, examples
of mmCIF files, and archives of the discussions on the mmCIF mailing
list. The Web page also contains information on the DDL, and
access to a number of mmCIF software tools.
The mailing list is used for posting comments from the community, suggestions for changes, errata and such. To subscribe to the mailing list, send a one-line message containing the text "subscribe mmciflist Your Name" to
requests@ndbserver.rutgers.edu.
To post to the mailing list, send messages to mmciflist@ndbserver.-rutgers.edu.
The review process was an active one, with a large
number of people taking a close look at the dictionary and making
very useful comments, corrections, and suggestions for additional
data items. The New Jersey contingent of the working party met
regularly, discussed responses to each of the issues that were
raised on the mailing list, and made changes to the dictionary
based on the results of those discussions. Updated versions of
the dictionary were then posted on the Web page.
By late winter of 1996, we felt that the dictionary had assumed its final form, and we posted announcements about the mmCIF dictionary and its availability to a number of widely-read crystallographic newsgroups. These announcements have generated a small number of rather minor corrections and additions to the dictionary.
Following the IUCr meeting in Seattle, Version 1.0 of the dictionary will be released. There are still a number of wording changes that will need to be made to mmCIF dictionary definitions to bring them into alignment with the newly revised version of the CIF core dictionary, but we DO NOT ANTICIPATE ANY FURTHER REVISIONS OF SUBSTANCE to the mmCIF dictionary. In particular, the ATOM SITE records, the heart and soul of the dictionary, will not be modified. We thus encourage users of all types, including software
developers, to begin working with the dictionary. We anticipate that as people begin to really use the mmCIF data structure, they will find further data items that they would like to see included, but only those data items that constitute obvious omissions to the current schema will be added to Version 1.0; true expansions of the data structure will be deferred to the eventual Version 2.0.
The development of the mmCIF dictionary and the associated
DDL 2.2.1 has been an enormous task, and any list of contributors
to the effort will certainly be incomplete. Still, we have so
appreciated the people that have taken the time to think carefully
and constructively about all of this, and we would like to recognize
their efforts. But we must begin by recognizing Syd Hall, David
Brown and Frank Allen, who began the entire CIF effort and who
recruited us to do the extensions for macromolecular structure.
The background given above lists people who were
members of the original working party, but the number of people
who contributed to the original design of the mmCIF data structure
is in fact much larger. We would like to thank Steve Bryant (NCBI),
Vivian Stojanoff (PDB), Jean Richelle (Brussels), Eldon Ulrich
(Madison), and Brian Toby (NIST).
There are also the people who realized the shortcomings
of the original DDL and worked hard to convince us that a more
rigorous underpinning for the dictionary would been needed. Their
suggestions (and pointed criticisms) resulted in the development
and implementation of DDL 2.1. Out thanks go out to Michael Scharf
(EMBL), Peter Grey (Aberdeen), Peter Murray-Rust (Glaxo), Dave
Stampf (PDB), and Jan Zelinka (York).
Writing the dictionary and developing the new DDL
were just the starting points for evaluation and critique, and
this effort has been greatly aided by the input from COMCIFs,
the IUCr committee with oversight over this process (Brian McMahon,
Coordinating Secretary). But the real process of review, after
the dictionary was released to the public for comment in August
1995, has involved a much larger cast. We cannot say enough about
the valuable input we have gotten from Fran Bernstein (PDB), Herb
Bernstein (BNL), Dale Tronrud (Oregon), and Peter Keller (Daresbury).
Our efforts has been greatly enabled by the staff
of the Nucleic Acid Database at Rutgers University, who have dealt
with many of the technical issues of implementation of mmCIF with
real data. So we would also like to thank Anke Gelbin, Shu-Hsin
Hsieh, and Christine Zardecki.
Without the three CIF workshops, this effort would never have taken the shape and focus it now has, and we are eternally gratefully to the organizers of those workshops - Eleanor Dodson, Phil Bourne, and Shoshana Wodak - and to the sponsors who provided the funding - ESF, EU, NSF, and DOE.
[1] S.R. Hall (1991) J. of Chemical Information and Computer Science, 31,
326-333.
[2] S.R. Hall, F.H. Allen and I.D. Brown (1991) Acta
Cryst., A47, 655-685.