Manual for psimpoll and pscomb

QUB | Archaeology and Palaeoecology | The 14Chrono Centre

Manual for psimpoll and pscomb

Technical matters

Programming language
Computers
PostScript
Fonts and character encoding
Compatibility
Page layout
Known problems and limitations

Programming language

psimpoll and pscomb are written in ANSI C. The C language has been around since about 1972, developed first of all in the Bell Laboratories by Dennis Ritchie. Several dialects appeared subsequently, but the language is now standardized around the recommendations of the American National Standards Institute (ANSI) and the International Standards Organization (ISO). This should mean that the source code can be compiled to run on any hardware that has an ANSI-conforming C compiler. The source code was prepared with text files on an IBM PC, and has been compiled using the strict ANSI-compliant options of the following compilers:

Borland Turbo C++ 3.0 on IBM-compatible PC
Free Software Foundation gcc under SunOS 4.1 UNIX on Sun4m
cc under IRIX 5.1 UNIX on IP22
cc under SCO Development System on IBM-compatible PC
Symantec THINK C on Macintosh
Linux 2.0.18 - 2.2.24
Mac Os X 10.1-10.3

The use of ANSI C rather than QuickBASIC (as versions of psimpoll before 2.00) produces a smaller executable file that runs faster and has source code that is portable.

Computers

psimpoll has been installed successfully under DOS on IBM PS/2 30, Viglen 386SX20, Viglen 486DX266, Elonex 386 SX-T, Toshiba T1200XE, and several Pentium PCs, under Macintosh on Mac Plus, Mac Classic, Mac LC II, and Performa computers, and under UNIX on Sun, IRIX, and AIX machines. It has also been installed to run under UNIX through SCO Open Desktop / Open Server on a Viglen 486DX266, and Linux on Pentium-based machines. I would be interested to hear of other installations.

PostScript

The plotting of pollen diagrams has always been a highly individual process, dependent on locally available hardware and software. Because different printers and plotters run from different sets of control sequences, it has not been possible to transport the information needed to plot a pollen diagram except as the data itself or as a finished, camera-ready diagram. In principle, two users with identical hardware and software could exchange a plotter file, but in practice this is rarely done.

The PostScript language provides a way to describe any piece of graphics or text in a text file (with ASCII coding) that can be read and interpreted on many printers (especially, but not exclusively, laser printers). Software packages now often include a PostScript driver which can either send output directly to the printer/plotter, or generate an ASCII file with the commands to generate the work on any output device with a PostScript interpreter. Since a driver is available in many packages, and the interpreter is available in many output devices, the ASCII file provides a useful way to link the two without both being present at the same site. As an ASCII file, a PostScript file can be easily moved by e-mail.

PostScript is probably most readily available, as standard, in the Apple LaserWriter printers, but can added to many other printers by purchase of an additional card. The Apple LaserWriter is an excellent laserprinter, and it can be connected to PC-compatibles through a COM port, but it may be necessary to tinker with the wiring to overcome the incompatibility of Apple and IBM (see, for example, Glover 1989, Fig. 5).

PostScript is a `simple interpretative programming language' (Adobe 1990, pp. 345-678), which describes the appearance of a page, whether text, graphics, or both. Adobe own the copyright in the operators and specification of PostScript, but allow anyone to use them, under certain conditions. A typical file will have a header, such as %!PS-Adobe-3.0, definition of sundry parameters for the page (e.g., font, character size), and definition of operators (to reduce repetition in the body of the page). %%, followed by a keyword, introduces a standard set of `document structuring conventions', which are usually comments. Campbell (1993) gives a brief introduction to some commonly used commands. Operands precede their operators, so (ABC) show prints ABC at the current location. showpage prints the current page and prepares for the next. For graphics, 20 40 lineto is one segment of a `path', from the location 0 0 to 20 40 in the current co-ordinates. Text and graphics alike can be moved around on the page, rotated, and scaled. PostScript files are ASCII, so they can be edited using an appropriate editor (e.g., EDIT in DOS 5, or `non-document' mode of WordStar), and I do this with text files to include special characters, but there is not too much that can be done with this kind of fiddling. More significant is the possibility of writing PostScript drivers in any programming language, and thus producing output from almost any software that can be printed on any output device with a PostScript interpreter.

Use of PostScript thus has considerable potential for improving the ready portability of pollen diagrams, whether for publication or as part of information exchange, and that is why psimpoll and pscomb produce their output as PostScript files.

Fonts and character encoding

Characters are stored in computer systems as numbers, usually integers of less than 256 (using base 10). The character number coding is reasonably stable for numbers less than 128, using the ASCII system, although other codings do exist. Beyond 128, matters become complex, and vary between systems. PostScript can use several types of encoding, of which two are important for text characters. The default is `Adobe Standard' encoding, and the other is `ISO Latin 1' encoding. These produce identical output characters up to number 128 (except that a hyphen in `Standard' is a dash in `ISO Latin 1'), but are different for higher numbers. psimpoll uses ISO Latin 1 encoding, because this has most of the accented characters needed. But Standard encoding is needed for Polish suppressed-l and French ligature characters (see Table). In the beginning section of the PostScript file, there is a section that defines a series of fonts (F1, F2, etc) in terms of the font selected by the user, and converts the encoding from the default into ISO Latin 1. Then another series of fonts (Fa, Fb, etc) are defined without conversion, thus using the default Standard encoding. When a piece of text is written, one of the fonts is selected, and this will nearly always be from the F1 series. But if you need Polish suppressed-l or French ligature, the appropriate font from the other series will be used automatically.

The exact amount of space taken up by printed characters needs to be known in order to calculate the space needed for text in labels (e.g., zone labels). This is done with data from Adobe font metrics (AFM) files, using the versions included with WordStar 5, Sun UNIX, and via anonymous ftp from unix.hensa.ac.uk. These files also give the heights of characters, and this information is used for correctly locating certain accents.

Many accented characters are available as single characters in ISO Latin 1 encoding, but others need to be created by combining the unaccented character with the accent (e.g., the long Hungarian umlaut). Positioning the acccent accurately needs information on the width of the character and accent, height of the character, and the height of the base of the accent when printed normally. With these data, the character is printed, then the current position of the `pen' is `back-spaced' to the point where it should start drawing the accent centred over (or under) the character, raising the `pen' position if necessary so that the accent will not overprint the character, and finally the accent is drawn. Then the pen position updates to the location it had after drawing the character.

The different fonts in the two series are for different sizes (title size, basic, and subscript / superscript), and for roman style and oblique style.

psimpoll uses `Symbol' to obtain special characters, notably the circle-and-cross character in radiocarbon age columns, and also the Greek alphabet.

Compatibility

The format of main data files has not changed from previous versions of psimpoll, and the C14, TS, and ZONE associated files are also unchanged. The ZONA file has been abandoned, from psimpoll version 2.21 onwards, since psimpoll can now calculate zones itself, and generate a zonation file with age data included. However, the scope of the .CFG file has been extended considerably, so that old versions will no longer work, and may cause the program to crash, or at least work in unexpected ways. .CFG files created by versions of psimpoll earlier than 2.20 should be deleted before running the current version. pscomb requires input from PostScript files created by psimpoll version 2.20 or later.

Page layout

The default layout for a page described with PostScript is described in terms of co-ordinates with an origin (0, 0) in the bottom left corner of a page held vertically (`portrait'). The default units are 1/72 inch. Diagrams produced with psimpoll are laid out along the long axis of a page, so the diagram `origin' is the upper left corner when the page is held horizontally (`landscape').

It is assumed, by default that the paper being used is A4, with length 297 mm, and width 210 mm (this can be changed in menu Ae. Locations on the page are measured in units of 0.01 mm, necessitating a scale factor in the PostScript file. The default usable area of the page is 240 mm long and 240 / square-root(2) = 169.7 mm high, centred on the page. The length can be changed in menu Ac. The height is adjusted automatically to maintain the same proportion, but can be changed itself (menu Ad). For US paper, with dimensions 11" × 8", the usable paper length should be given as 226 mm, and the usable height as 123 mm.

Within this usable area, psimpoll fills the space with `boxes', building up from left to right until the space is full (at a rate that depends on the scale factor from menu A). A `box' may be an axis, a descriptive column (e.g., sediment), or a plotted curve. Each box is preceded by a translation that shifts the origin along the page. The instructions for the image of each box are then given relative to this new origin. The default proportion of the width of the paper occupied by the diagram height is 75% of the usable area, or 127 mm. This figure can be modified in menu Ad. The location of captions is fixed relative to the diagram, so they will move closer in if the height of the diagram is reduced.

Within the program, all text is rotated 90° to bring it from the natural horizontal position of portrait orientation to the correct natural position for landscape orientation. Rotations specified in menu De (default 45°) are additional to this rotation, and control the angle of rotation from the horizontal position of landscape orientation.

The numbers in the PostScript output file giving locations on the page are thus in units of 0.01 mm. The first of each pair is the x-axis co-ordinate, increasing from the left edge of a page held in portrait location (i.e., in the direction of down the depth axis of the resulting diagram). The second number of each pair is the y-axis co-ordinate, increasing from the lower edge of a page held in portrait location (i.e., in the direction of along the resulting diagram). The co-ordinate origin shifts along the page, in the y-axis direction, for the plotting of each box.

Within the PostScript output file, boxes are demarcated by a series of comments that enable pscomb to break the file back into boxes and rebuild a new file by combining boxes from different files. The comment lines begin with `%', which means that the PostScript interpreter ignores them, and they are interpreted by pscomb as follows:

%e: Stop interpreting the input file. Further lines are read in, but ignored until %s is encountered. Used to exclude the original setup and new page instructions;
%i: Occurs within a box, and marks the beginning of lines that pscomb should ignore. Used to exclude the instructions that draw zone lines across the page, for example;
%j: Occurs within a box, and marks the end of lines that pscomb should ignore.
%n: Marks the start of a new box (ending the previous box), and is followed on the same line by six characters that give the name of the box, as displayed on the screen while the file is being read and in the interactive menu;
%s: Start interpreting the input file. Lines at the beginning of the file, and after %e are ignored until %s is encountered;
%t: Immediately followed by a number that gives values for the translation along the page of the box in units of 0.01 mm. This enables pscomb to accumulate the proportion of a page that it has filled with boxes, and to determine whether it has enough room for the next one.

pscomb produces a new output file by combining boxes from the input files with new set of the document structuring conventions, instructions for ending and starting pages, and adding titles and general footnotes.

Known problems and limitations

(DOS only) pscomb will crash, complaining about calloc or realloc memory allocation problems if the size of the input files plus the size of the program exceeds available conventional memory (normally about 500kB). It may be possible to work around this by using pscomb on individual input files to reduce them to the minimum size necessary.
Smoothing works after printing of dataset to a file, so smoothed data cannot be output.
The program does not assume that it is being run on a computer that supports ASCII character coding. However, PostScript interpreters do assume that input files are ASCII. So, output from a non-ASCII computer might need to be translated before presentation to a PostScript printer.
The handling of output to screens assumes that screens are 25 lines high, with 80 characters per line. The program should work correctly with screens of other dimensions, but the appearance may be odd.
It is not possible to have more than two columns of numbers in sum columns.
Sediment columns cannot be plotted against age.
Script file names are associated with input file names. In future, they may be associated with output file names in order to connect a particular output file more directly with the commands used to create it.
The anglo 'point' is used as decimal separator. Arguably, psimpoll should be able to use the European 'comma', depending on locale, but unfortunately this is not currently possible.

Back to contents page

Archaeology and Palaeoecology | 42 Fitzwilliam St | Belfast BT9 6AX | Northern Ireland | tel +44 28 90 97 5136

Archaeology and Palaeoecology | The 14Chrono Centre | URL http://www.qub.ac.uk/arcpal/ | WebMaster

Queen's University of Belfast
- Archaeology and Palaeoecology Homepage
  - The 14Chrono Homepage