QUB |
Archaeology and Palaeoecology |
The 14Chrono Centre
Manual for psimpoll and pscomb
Data preparation
Contents
psimpoll is basically a plotting program. Calculations of percentages and
concentrations (but not necessarily accumulation rates), and most other things
must be done beforehand. All input files
should be in plain text format, but
how they are produced is irrelevant to psimpoll.
pscomb receives as input files that have been previously output by
psimpoll. There is therefore no data preparation, except the running of
psimpoll. The main points to bear in mind when running psimpoll if output
will be used in pscomb are:
- Change any taxa names within psimpoll to the form that you
want on the pscomb diagram. This might, for example, mean including a site
name with a taxon name. Use option `t' when selecting taxa interactively
(psimpoll menu K);
- Ensure that taxa destined to be plotted against a
common axis really do have the same axes. For example, use
menu H in
psimpoll to plot samples within a time-scale that will be used uniformly
across a number of sites.
The easiest way to get started is simply to ignore most of this documentation.
Prepare a small dataset, with any data (real or imagined, using or copying
from the example main input file).
Get psimpoll running,
with the command psimpoll filename. psimpoll will run and create a
file called filename.PS. Then send filename.PS to a PostScript
printer, as described under Running psimpoll.
Datasets are structured around `samples' and `taxa'. A `sample' is the basic
unit of study, within which various attributes have been measured. Samples
may be stratigraphically related to each other, in which case the term `level'
is often used for them. The measured
attributes are `taxa', and they may be concrete (e.g., the abundance of
Betula pollen), or abstract (e.g., the rate-of-change of something or
other). The `number of taxa' on line 2 must include them all, except
rare cases indicated explicitly below. Any text in an input file may include
certain options (for oblique style, accents, superscripts, etc): these
are described fully under menu I.
The order of entries in a file is important, as described below, but the exact
layout of the file is less important. In general, text items (such as taxon
names) are considered to be ended by either a comma (','), or by the end of a
line. Numerical values, such as the data themselves are generally ended by
either one or more spaces or the end of the line. TAB characters have the same
effect as spaces. It is thus possible to have either of two basic layouts: one
item per line, giving a file with one column, or one taxon per line, beginning
with the name, followed by a comma, and then all the numerical values for that
taxon separated by spaces. Each layout has its uses, depending on where the
data come from. All intermediate combinations of these basic layouts are
possible, as long as each item is correctly ended (with comma, space,
newline, TAB, as appropriate).
- Item 1
- A title, any reasonable number of characters, on a line of its
own. Must not begin with the character `#', to prevent
confusion with TILIA-type files.
- Item 2
- Number of taxa in the dataset.
- Item 3
- Number of samples in the dataset.
- Item 4
- A two-character code
(first character and
second character).
- Items 5 onwards
- Depths, ages, sample numbers, or sample labels.
Labels, if used (option `t'), should be presented one
per line, or separated by commas, to a maximum length of
39 characters per label. Numbers can be
separated by spaces or presented one per line, and may
be positive or negative. Following the usual convention
of measuring depths from the top of a sequence downwards,
positive values increase down the column, and negative
values increase upwards. If the first character of the
2-character code (Item 4) is `m', the sample value should
be followed by the sample thickness, in the same units.
- Items 6 onwards
- The data, presented by taxa (i.e., taxon name (maximum 39 characters),
followed by values for each sample for that taxon).
When using the `x' option (Item 4), the taxon name must be
followed by a label for the units (see above): this can
be a space character (i.e., no unit). The data can be
presented with each item on a separate line, or with
the taxon name followed by a comma, label (if any)
followed by a comma, then the data values separated by spaces.
- Items 7-14
- optional additional data: give one value per sample,
except that a single value prefixed by `-' indicates
that the value applies to all samples. Items 7-13 are
needed for concentration confidence intervals: Item 14
is additional information for accumulation rate confidence
intervals.
- Item 7
- marker pollen suspension (tablet) concentration.
- Item 8
- if known, sample standard deviation of marker concentration;
otherwise enter `1', and the sample standard deviation is
calculated as the square root of the marker concentration
(this assumes a Poisson distribution in marker suspensions).
- Item 9
- volume of marker suspension added or the number of tablets.
- Item 10
- if known, sample standard deviation of marker suspension
added; otherwise enter 1, and the standard deviation is
calculated as 0.8% of the volume added. For tablets, enter 0.
- Item 11
- marker grain count for each sample (must be >100 for
calculation of confidence intervals);
- Item 12
- volume of sediment added.
- Item 13
- if known, sample standard deviation of sediment volume;
otherwise enter 1, and the standard deviation is
calculated as 1% of the sediment volume.
- Item 14
- if known, height of the sediment sample along the core;
otherwise enter 1, and it will be assumed to be the cube
root of the sediment volume. For a sampler with a circular
cross-section, enter the diameter of the sampler.
Several optional affects can be achieved by suitable editing of the dataset.
- Samples. Any sample value can be excluded by marking it with
`x', after the number, with no intervening spaces (e.g.,
210x). Use to omit suspect samples without redoing the
whole file, or to plot a stratigraphic portion of the diagram
(but note the use of menu H
for this latter purpose);
- Data. Any data value equal to (by default) -1 will be ignored in
producing the plot file. Use where a data value is missing for whatever
reason (e.g., charcoal values in every other sample). The default
value can be changed in menu Bg;
- Taxon names. Several effects are possible here, to allow
the mixing of plots of pollen data with certain non-pollen data
using reasonable scales and captions, and to select a
subset of taxa for plotting. Access to all is by prefixing the
taxon name with one or more characters. All labels associated with
taxa names can be modified in menu Ik.
The effects
listed below are to do with the type of object being
plotted: its recognition, treatment, and labelling of its axes.
Additional effects are possible within the piece of text used as
the taxon name. These are all detailed below in the description of
menu I. Certain taxa have particular input
requirements, and, if present, must be identified by
prefixes.
Other prefixes affect labelling and
presentation of axes. They can be omitted initially, and given
interactively by renaming the taxon (see
menu K).
Other effects can be achieved by inclusion of characters within
taxon names (see menu I).
psimpoll can recognise and handle input files prepared in
TILIA-type format. These include those produced by TILIA itself (general
format) (Grimm 1992), and .p15 files
available from NOAA. These files
are distinguished by having a `#' as the first character of the first line.
They are generated from TILIA (ver. 2) by selecting [D] Save data file
from the TILIA (ver. 2) main menu, then [A] General format, and answering the
prompts for file name, dataset title, and number of decimal places for the
data.
psimpoll can also handle the ascii files of raw pollen data
available from NOAA. It does so by calculating percentages from the raw data,
using TILIA's taxon category codes to construct a sum. The codes currently
defined in psimpoll (following those found in NOAA files) are:
- A
- Trees and shrubs;
- B
- Herbs;
- C
- Dwarf shrubs;
- D
- Equisetum;
- F
- Pteridophytes;
- G, M
- Sphagnum;
- J
- Parasitic plants;
- Q
- Aquatics;
- X
- Indeterminable grains;
The main pollen sum is constructed from A + B + C + D + F + J. The taxa in the
groups `Sphagnum', `Aquatics' and `Indeterminable grains' are
calculated as percentages of the main sum plus the sum of their group. These
sums are saved and included in the dataset, as the individual subsums, and as taxa called 'Sum ABCDFJ' and 'Sum GM'. They can be reused if the dataset is
saved in psimpoll format (menu J). Taxa with
category codes other than those listed above are left unchanged, and may not
plot correctly.
The table above can be used to edit the NOAA .ASC files
appropriately if the category codes used do not coincide with those above.
psimpoll also detects, reads and uses radiocarbon date information
from NOAA .ASC files, provided that this is detected in a standard
format:
# Radiocarbon dates:
# Depth Thk Age SDup SDlo Lab no. Basis Material
# 159.0 5.0 10170 60 60 TP-313 U Wood
# 212.0 6.0 10520 60 60 TP-456 U Charcoal
...
# Dating notes: Wood = pine
#
The phrase 'Radiocarbon dates:' is used to detect the start of the section,
and either 'Dating notes:' or a line with just '#' are used to detect the
end of the section. If missing, any values for 'Thk', 'SDup' or 'SDlo' are set to zero. 'Depth'
and 'Age' must be present.
psimpoll looks for optional associated files for information to
include on the diagram and to control writing of the diagram.
The names of these are normally derived from the main input
file name, and, by default, each should be on the same disc and in the same
directory as the dataset file, have the same first four characters of the
filename, with the next characters indicating the type of the information
file.
Two files save the state of psimpoll in its current run for
optional re-use in a later run. One, with the filename extension .CFG,
holds the current configuration of modifiable variables. The second,
with the extension .SCR, saves the
details of interactive plotting. In both cases, the filename extension
replaces any extension on the main input
filename, and the rest of the file name is as the main input file.
Finally, a file called psimpoll.COL contains the
details of available colours.
Follow the links below for file formats and examples.
- fileTS
- Troels-Smith lithological symbols
(Example);
- fileC14
- Radiocarbon-ages
(Example);
- fileCAL
- Control file for calibrated ages
(Example);
- xxxx
- Calibrated ages (from BCal)
(Example);
- fileZONE
- Zonation data for inclusion in labelled boxes at the end of the
diagram (Example);
- file.CFG
- A complete list of settings from the menu.
When it is read in, any filenames
that specify a drive letter are changed to have the same drive
letter as the main input file. This file is read by default,
but may be written from menu O
(Example);
- file.SCR
- A record of the taxa and options selected from
interactive plotting. Written by default, but read for re-use
by choosing menu P
(Example)
- psimpoll.COL
- A colour palette for psimpoll, containing
colour definitions on the CMYK system, referenced to an identifier
and a name. Colours in the palette are accessed by name in the
TS file, the ZONE file,
menu C, and
interactively. They may also be accessed
by the identifier from the interactive menu,
for brevity.
Thus, a file called a:\dallp could have associated with it the
files a:\dallTS, a:\dallC14, a:\dallZONE,
a:\dallp.CFG, a:\dallp.SCR and psimpoll.COL.
I recommend using the first four characters of a filename to indicate site
(e.g., dall for `Dallican Water'), and the fifth letter to
indicate data type (e.g., p for percentages, c
for concentrations, a for accumulation rates, and b for a
concentration dataset which is to be converted to accumulation rates).
The same
set of associated files will then work for all types of diagrams from the same
site, except that the .CFG and .SCR files are specific to
the main
input file. The colour palette file can be made specific to a site, or
kept general. Avoid filename suffixes. psimpoll ignores the
TS file when datasets are presented by age.
The names of associated files (except for .CFG, .SCR, and
psimpoll.COL) can be changed (menu E).
psimpoll output may be written to up to 9 different output files.
All are text files and can be read by any text editor or word processor
(but beware of saving them in anything other than text format). The two
PostScript files can be sent directly to PostScript devices for display
or printing. The files are:
- .ps
- main output file: this is written in PostScript
page description language, and contains all the instructions
for plotting the diagram. It is not intended for reading by
mere humans, but some details of the format of this file,
for the curious, are given in
Technical matters;
- AD
- age-depth conversion file: this contains the details
of any
age-depth conversion, with confidence interval estimates for ages and
gradients when these have been calculated. The default file name used is
fileAD where `file' is the first four characters in the main
input filename. A different name can be given in
menu Eb;
- AD.ps
- age-depth plotted output file: this contains a
graphical representation (PostScript format) of the details
of any
age-depth conversion, with confidence interval estimates for ages
when these have been calculated. The default file name used is
fileAD.ps where `file' is the first four characters in the main
input filename. A different name can be given in
menu Ei;
- STAT
- data analysis file: this may contain the details of
results of statistical analyses, zonation, principal components
analysis, Fourier analysis, independent splitting of taxa.
In some cases this
output supplements or repeats information that is also given on
screen (e.g., zonation). In other cases, all the results are in
this file (e.g., Fourier analysis). The default file name used is
fileSTAT where file is the first four characters
of the main input file name. A different name can be given in
menu Ec;
- DAT
- data output file: contains output of the
current dataset, incorporating any changes made during the current
run of psimpoll (e.g., conversion of depths to ages).
See menu J. The default file name used is
fileDAT where file is the first four characters
of the main input file name. A different name can be given in
menu Eh;
- .CFG
- configuration file: see above;
- .SCR
- script file: see above;
- ZONE
- zonation file: zonation output for the desired number of zones,
in a format for immediate input and
use by psimpoll, is written to this file. An
example is available.
- .LOG
- log file (psimpoll.LOG): written to the same directory
as the executable file, with details of some of
the parameters used by psimpoll during a run. This file
is mainly intended to help KDB track errors. It is overwritten
with each run.
pscomb has two output files. The main output file as the same format
as the psimpoll main output file. Additionally there is a log file
(pscomb.LOG) for the same purpose as the psimpoll log file.
Back to contents page
Copyright © 1995-2007 K.D. Bennett
Archaeology and Palaeoecology | 42 Fitzwilliam St | Belfast BT9 6AX | Northern Ireland | tel +44 28 90 97 5136
Archaeology and Palaeoecology | The 14Chrono Centre | URL http://www.qub.ac.uk/arcpal/ |
WebMaster