QUB | Archaeology and Palaeoecology | The 14Chrono Centre

Manual for psimpoll and pscomb

Data preparation

Contents

psimpoll is basically a plotting program. Calculations of percentages and concentrations (but not necessarily accumulation rates), and most other things must be done beforehand. All input files should be in plain text format, but how they are produced is irrelevant to psimpoll.

pscomb receives as input files that have been previously output by psimpoll. There is therefore no data preparation, except the running of psimpoll. The main points to bear in mind when running psimpoll if output will be used in pscomb are:

Simple use

The easiest way to get started is simply to ignore most of this documentation. Prepare a small dataset, with any data (real or imagined, using or copying from the example main input file). Get psimpoll running, with the command psimpoll filename. psimpoll will run and create a file called filename.PS. Then send filename.PS to a PostScript printer, as described under Running psimpoll.

Format of main input file

Datasets are structured around `samples' and `taxa'. A `sample' is the basic unit of study, within which various attributes have been measured. Samples may be stratigraphically related to each other, in which case the term `level' is often used for them. The measured attributes are `taxa', and they may be concrete (e.g., the abundance of Betula pollen), or abstract (e.g., the rate-of-change of something or other). The `number of taxa' on line 2 must include them all, except rare cases indicated explicitly below. Any text in an input file may include certain options (for oblique style, accents, superscripts, etc): these are described fully under menu I.

The order of entries in a file is important, as described below, but the exact layout of the file is less important. In general, text items (such as taxon names) are considered to be ended by either a comma (','), or by the end of a line. Numerical values, such as the data themselves are generally ended by either one or more spaces or the end of the line. TAB characters have the same effect as spaces. It is thus possible to have either of two basic layouts: one item per line, giving a file with one column, or one taxon per line, beginning with the name, followed by a comma, and then all the numerical values for that taxon separated by spaces. Each layout has its uses, depending on where the data come from. All intermediate combinations of these basic layouts are possible, as long as each item is correctly ended (with comma, space, newline, TAB, as appropriate).

Item 1
A title, any reasonable number of characters, on a line of its own. Must not begin with the character `#', to prevent confusion with TILIA-type files.
Item 2
Number of taxa in the dataset.
Item 3
Number of samples in the dataset.
Item 4
A two-character code (first character and second character).
Items 5 onwards
Depths, ages, sample numbers, or sample labels. Labels, if used (option `t'), should be presented one per line, or separated by commas, to a maximum length of 39 characters per label. Numbers can be separated by spaces or presented one per line, and may be positive or negative. Following the usual convention of measuring depths from the top of a sequence downwards, positive values increase down the column, and negative values increase upwards. If the first character of the 2-character code (Item 4) is `m', the sample value should be followed by the sample thickness, in the same units.
Items 6 onwards
The data, presented by taxa (i.e., taxon name (maximum 39 characters), followed by values for each sample for that taxon). When using the `x' option (Item 4), the taxon name must be followed by a label for the units (see above): this can be a space character (i.e., no unit). The data can be presented with each item on a separate line, or with the taxon name followed by a comma, label (if any) followed by a comma, then the data values separated by spaces.
Items 7-14
optional additional data: give one value per sample, except that a single value prefixed by `-' indicates that the value applies to all samples. Items 7-13 are needed for concentration confidence intervals: Item 14 is additional information for accumulation rate confidence intervals.
Item 7
marker pollen suspension (tablet) concentration.
Item 8
if known, sample standard deviation of marker concentration; otherwise enter `1', and the sample standard deviation is calculated as the square root of the marker concentration (this assumes a Poisson distribution in marker suspensions).
Item 9
volume of marker suspension added or the number of tablets.
Item 10
if known, sample standard deviation of marker suspension added; otherwise enter 1, and the standard deviation is calculated as 0.8% of the volume added. For tablets, enter 0.
Item 11
marker grain count for each sample (must be >100 for calculation of confidence intervals);
Item 12
volume of sediment added.
Item 13
if known, sample standard deviation of sediment volume; otherwise enter 1, and the standard deviation is calculated as 1% of the sediment volume.
Item 14
if known, height of the sediment sample along the core; otherwise enter 1, and it will be assumed to be the cube root of the sediment volume. For a sampler with a circular cross-section, enter the diameter of the sampler.

Options within main input file

Several optional affects can be achieved by suitable editing of the dataset.

Input from a TILIA-type file

psimpoll can recognise and handle input files prepared in TILIA-type format. These include those produced by TILIA itself (general format) (Grimm 1992), and .p15 files available from NOAA. These files are distinguished by having a `#' as the first character of the first line. They are generated from TILIA (ver. 2) by selecting [D] Save data file from the TILIA (ver. 2) main menu, then [A] General format, and answering the prompts for file name, dataset title, and number of decimal places for the data. psimpoll can also handle the ascii files of raw pollen data available from NOAA. It does so by calculating percentages from the raw data, using TILIA's taxon category codes to construct a sum. The codes currently defined in psimpoll (following those found in NOAA files) are:
A
Trees and shrubs;
B
Herbs;
C
Dwarf shrubs;
D
Equisetum;
F
Pteridophytes;
G, M
Sphagnum;
J
Parasitic plants;
Q
Aquatics;
X
Indeterminable grains;
The main pollen sum is constructed from A + B + C + D + F + J. The taxa in the groups `Sphagnum', `Aquatics' and `Indeterminable grains' are calculated as percentages of the main sum plus the sum of their group. These sums are saved and included in the dataset, as the individual subsums, and as taxa called 'Sum ABCDFJ' and 'Sum GM'. They can be reused if the dataset is saved in psimpoll format (menu J). Taxa with category codes other than those listed above are left unchanged, and may not plot correctly.

The table above can be used to edit the NOAA .ASC files appropriately if the category codes used do not coincide with those above.

psimpoll also detects, reads and uses radiocarbon date information from NOAA .ASC files, provided that this is detected in a standard format:

# Radiocarbon dates:
#   Depth   Thk     Age   SDup   SDlo  Lab no.    Basis  Material
#   159.0   5.0   10170     60     60  TP-313       U    Wood
#   212.0   6.0   10520     60     60  TP-456       U    Charcoal

...

# Dating notes:   Wood = pine
#

The phrase 'Radiocarbon dates:' is used to detect the start of the section, and either 'Dating notes:' or a line with just '#' are used to detect the end of the section. If missing, any values for 'Thk', 'SDup' or 'SDlo' are set to zero. 'Depth' and 'Age' must be present.

Associated files

psimpoll looks for optional associated files for information to include on the diagram and to control writing of the diagram. The names of these are normally derived from the main input file name, and, by default, each should be on the same disc and in the same directory as the dataset file, have the same first four characters of the filename, with the next characters indicating the type of the information file.

Two files save the state of psimpoll in its current run for optional re-use in a later run. One, with the filename extension .CFG, holds the current configuration of modifiable variables. The second, with the extension .SCR, saves the details of interactive plotting. In both cases, the filename extension replaces any extension on the main input filename, and the rest of the file name is as the main input file.

Finally, a file called psimpoll.COL contains the details of available colours.

Follow the links below for file formats and examples.

fileTS
Troels-Smith lithological symbols (Example);
fileC14
Radiocarbon-ages (Example);
fileCAL
Control file for calibrated ages (Example);
xxxx
Calibrated ages (from BCal) (Example);
fileZONE
Zonation data for inclusion in labelled boxes at the end of the diagram (Example);
file.CFG
A complete list of settings from the menu. When it is read in, any filenames that specify a drive letter are changed to have the same drive letter as the main input file. This file is read by default, but may be written from menu O (Example);
file.SCR
A record of the taxa and options selected from interactive plotting. Written by default, but read for re-use by choosing menu P (Example)
psimpoll.COL
A colour palette for psimpoll, containing colour definitions on the CMYK system, referenced to an identifier and a name. Colours in the palette are accessed by name in the TS file, the ZONE file, menu C, and interactively. They may also be accessed by the identifier from the interactive menu, for brevity.
Thus, a file called a:\dallp could have associated with it the files a:\dallTS, a:\dallC14, a:\dallZONE, a:\dallp.CFG, a:\dallp.SCR and psimpoll.COL. I recommend using the first four characters of a filename to indicate site (e.g., dall for `Dallican Water'), and the fifth letter to indicate data type (e.g., p for percentages, c for concentrations, a for accumulation rates, and b for a concentration dataset which is to be converted to accumulation rates). The same set of associated files will then work for all types of diagrams from the same site, except that the .CFG and .SCR files are specific to the main input file. The colour palette file can be made specific to a site, or kept general. Avoid filename suffixes. psimpoll ignores the TS file when datasets are presented by age.

The names of associated files (except for .CFG, .SCR, and psimpoll.COL) can be changed (menu E).

Output files

psimpoll output may be written to up to 9 different output files. All are text files and can be read by any text editor or word processor (but beware of saving them in anything other than text format). The two PostScript files can be sent directly to PostScript devices for display or printing. The files are:
.ps
main output file: this is written in PostScript page description language, and contains all the instructions for plotting the diagram. It is not intended for reading by mere humans, but some details of the format of this file, for the curious, are given in Technical matters;
AD
age-depth conversion file: this contains the details of any age-depth conversion, with confidence interval estimates for ages and gradients when these have been calculated. The default file name used is fileAD where `file' is the first four characters in the main input filename. A different name can be given in menu Eb;
AD.ps
age-depth plotted output file: this contains a graphical representation (PostScript format) of the details of any age-depth conversion, with confidence interval estimates for ages when these have been calculated. The default file name used is fileAD.ps where `file' is the first four characters in the main input filename. A different name can be given in menu Ei;
STAT
data analysis file: this may contain the details of results of statistical analyses, zonation, principal components analysis, Fourier analysis, independent splitting of taxa. In some cases this output supplements or repeats information that is also given on screen (e.g., zonation). In other cases, all the results are in this file (e.g., Fourier analysis). The default file name used is fileSTAT where file is the first four characters of the main input file name. A different name can be given in menu Ec;
DAT
data output file: contains output of the current dataset, incorporating any changes made during the current run of psimpoll (e.g., conversion of depths to ages). See menu J. The default file name used is fileDAT where file is the first four characters of the main input file name. A different name can be given in menu Eh;
.CFG
configuration file: see above;
.SCR
script file: see above;
ZONE
zonation file: zonation output for the desired number of zones, in a format for immediate input and use by psimpoll, is written to this file. An example is available.
.LOG
log file (psimpoll.LOG): written to the same directory as the executable file, with details of some of the parameters used by psimpoll during a run. This file is mainly intended to help KDB track errors. It is overwritten with each run.
pscomb has two output files. The main output file as the same format as the psimpoll main output file. Additionally there is a log file (pscomb.LOG) for the same purpose as the psimpoll log file.
Back to contents page
Copyright © 1995-2007 K.D. Bennett

Archaeology and Palaeoecology | 42 Fitzwilliam St | Belfast BT9 6AX | Northern Ireland | tel +44 28 90 97 5136


Archaeology and Palaeoecology | The 14Chrono Centre | URL http://www.qub.ac.uk/arcpal/ | WebMaster