|
Therevid
Databases
Reinventing
the Wheel or Prudently Hitching a Ride?
[Note:
Copyright 1997 by Gail E. Kampmeier and Michael E. Irwin. This paper
was submitted for the Entomological Collections Network abstracts
of presentations made at the 1996 meeting in Louisville, KY, December
1996. Although the references to the condition of the therevid databases
were current at the time of writing, the Mandala database system has
evolved significantly since then. Please see request a demo from Gail
Kampmeier (gkamp@uiuc.edu) or
see the Mandala website.]
"Rolling
our own" database design using an off-the shelf database engine
such as Claris' FileMaker Pro(tm),
was not our original intention when we applied for a National
Science Foundation PEET (Partnerships for Enhancing Expertise
in Taxonomy) grant in early 1995. We were beta-testing Rob Colwell's
Biota, which showed
promise, but the released version that might have allowed us the
flexibility we needed, came out too late: we were already off and
running.
The
illusion that YOU can't ever design a database for your data, that
you need to rely on a professional programmer, or a dedicated management
system that may cost big $$$, scares many biologists into unnecessary
procrastination and inactivity. Databases for your data don't need
to be complex, even as complex as our ever-evolving databases (which
started off relatively simply, by the way). They just need to be
able to allow you to input your data easily AND get it out again
for your own analyses and interpretations, or when this greater-than-sliced-bread
OTHER system comes along, or there is a move afoot to pull everybody's
data together into one megadatabase system.
For
us, FileMaker Pro presented such a solution. Although most well-known
by Macintosh users, the database engine has been cross-platform
for two versions (2.1 (quasi-relational) and 3.0 (relational)),
and was recently cited as the second most popular database on the
PC. It is most famous for its easy-to-use interface, and the ability
of ordinary people (not just techno-geeks) to quickly and easily
set up a framework for their data and begin the real task of entering
the data (more on features of FMP).
This
is not to say that FileMaker Pro does not have its complexities,
or that you will remain ultimately satisfied with the first way
you have organized your data (or even the second or third). But
with some forethought and research into the kinds of information
you should be documenting, you should always be able to take your
data and reorganize it, add fields, change the way it is presented,
or ship it to another platform, application, or database. Some of
the important questions to ask are
what kind of data do I have? what makes it unique? Is
it a specimen? taxon? collecting event? all of the above?
what
kinds of outputs (queries) will I want to make of my data? Does
my method of organization make it reasonable to search for the
kinds of knowledge I hope to gain by using a database to organize
my data in the first place?
what
are the necessary fields (categories) for my data? Can I safely
combine pieces of information into one field (e.g., "location",
rather than break it into umpteen smaller fields for "country",
"state/province", "county", ... and "microsite")
that I know I'll never want to see in any other combination (the
answer is NO!: you need to break up the locality information into
separate fields; you can always write a calculation that puts
them together again)?
who
will be using this database? Do you assume that whomever uses
the database will have the same competency in using it as you?
Or do you make its operation as explicit and as foolproof as possible?
Where is the line between utility and beauty and how much time
should you spend going beyond utility?
Capturing
Insect Specimen Data: The Case of the Therevidae (Diptera)
Our
therevid databases center around the management of specimen-based
information. Each specimen is given a unique number with a 3-letter
prefix. If specimens already have barcodes or other unique numbers
attached, these numbers are used, with an appropriate 3-letter prefix.
The label information accompanying each specimen is then entered
into a series of related databases for
label information as it appears with the specimen.
lots,
defined as collecting event, including locality of collection
(political divisions from country to smallest political unit;
named geographic features; elevation; longitude and latitude),
collectors, date of collection, and method;
taxon
name and authority
determination
history (determiner, year, determined as);
loan
history (contrary to many museum-based databases, this is not
the emphasis of our management system);
tracking
of illustrations made;
atmospheric
and substrate conditions at the time of collection
and
any
accompanying biological or ecological information that may be
included about the specimen on the labels, including associations
with other specimens.
In
addition, information is recorded about the
sex (male, female, unknown)
type
(e.g., holotype, paratype, specimen, etc.)
condition
of the specimen (is it missing body parts?)
dissections
(what parts have been dissected?)
preservation
method (pinned? pointed? 70% ethanol? etc.)
if
the specimen were used in molecular studies, how was it preserved?
is there another tracking number given by the molecular biology
lab? GenBank number?
stage
collected
stage(s)
in collection
pupation
and emergence dates for reared specimens
All
of the databases
have context sensitive (field specific), database specific,
and general help developed using ClickWare's
ClickHelp(tm). Expectations of form and content of each field
and actions of buttons are detailed for the user.
feature
electronic tracking of questions and problems with a specialized
database that allows users to input unanswered questions and others
to respond with answers and track when the problem has been resolved.
This eliminates the pieces of paper that accumulate and suddenly
disappear into a black hole somewhere before they are answered.
It also allows tracking of the types of questions asked, sometimes
leading to better database design, and enables the user to see
if certain kinds of questions have been dealt with in the past.
Curious
about...
how many specimens were trapped during a collecting event?
Note that if specimens have not been named, this field is blank
in the list
where
specimens of a particular taxon have been collected?
View
this information via "portals" or windows that allow you
to see data from a related database without the information being
stored again in that database (conserving the size of various files
and thus influencing the speed of operations once files become large).
Needs
to aid in geocoding of collecting locations, worldwide
gazetteers with major and minor geographical features, latitude,
longitude, and elevation information are essential. Information
on named features in the United
States are readily available via the WWW, but world locations
are difficult to impossible to find. Affordable CD-ROMs that are
cross-platform would be ideal.
Data
input began in July 1995, with undergraduate students processing
specimens from collections from around the world. Information on
over 34,000 specimens have been input into the databases.
The
databases were presented four times in 1996 at various meetings.
Feedback from users and discussions at meetings where they have
been presented have helped the databases evolve to the form seen
today.
Return
to Mandala
|