UMASS-Boston Electronic Field Guide Project UMASS-Boston
Electronic Field Guide Project

Key Author Instructions

This is a brief introduction to the use of Excel to make a multimedia polychotomous key from which we can generate several different kinds of representations, as illustrated in this application.

I. Generalities specialized to keys made by systemitists or field naturalists.

The production of the web-accessible key consists of two parts and some software connecting them:
a. An Excel file provided by the key author which contains a representation of the character states at each decision point and some optional strings representing the location of multi-media files illustrating the state. At terminal nodes the author will provide information that represents either just a Taxon name and perhaps its Taxonomic rank, or more generally, represents data for a query into an accompanying (or web-accessible) database.
b. An optional database that contains descriptions or other information about the result of keying out a taxon or group of taxa. For example, if the key is to genera or species in a particular family, the database might hold descriptive data sufficient to make up taxon descriptive pages. The database is optional because the key author might be content simply to produce a name as the result of traversing the key.

II.Rendering applications

After an author has produced a spreadsheet, various applications may turn that into a web-accessible key. Different applications might produce different "look and feel", e.g. for different target audiences, or a single application might be able itself to produce several different variants. However, for any given spreadsheet, all rendering applications are producing something that represents the same set of decisions represented by the author with a given key. However, typically we write a rendering application with application to a very broad collection of key documents. For example, an author might wish to produce keys to various collections of taxa, but have the renderings have the same look and feel no matter what the keyed-out taxa are.

III. Key tree nodes.

Every node consists of two spreadsheet cells alone in a row, accompanied by an optional (though quite central) Excel comment on the second cell. The first cell contains character state information, typically in plain text in the language chosen by the author. All nodes beginning in a given column are siblings in the tree. Typically this would mean they represent different states of the same character, but that need not be the case if the author wishes to represent, at a given level, states that do not exhaust all those of a given character. For example, a piece of the tree might have three siblings: flowers red flowers yellow leaves ovate This might not be the same as flowers red flowers yellow flowers neither red nor yellow ...<children>
The reason the first need not be the same as the second is that in the first example, the author might desire to deal with a case where the the states are difficult to determine (e.g. if there are no flowers). Allowing non-exclusive and non-exhaustive states at a given level implies that there may be several paths to the same taxon, so that strictly speaking we are not dealing with a tree but rather a so-called "directed acyclic graph".

IV. Node structure.

As mentioned above, the first cell represents character state. We also refer to the text in this cell as the "node description." The second cell is somewhat arbitrary. We refer to it as the "node name" and what it represents is up to the author. Node names can be optional, but key rendering applications will typically supply one internally for various uses, e.g. constructing "backward" links from children to parent nodes. Consequently, a meaningful node name can be of assistance in the design and maintenance of rendering applications. Attached to the name cell is an optional, but extremely important Excel comment which must conform to a structure described in the section "property lists" below. These allow rendering applications to enhance the utility of the rendered key by adding such things as multi-media illustrations of the character states. There could be property lists on the node descriptions also, but our present renderers make no use of anything but the lists on the name cell. The children of a node should begin in the successor column to their parents, each one on a separate row. Any node with description text in column N+1 is a child of the node starting in column N and which is the first such one "above" it. Of course, the higher in the tree is a node (i.e. the earlier column it starts in), the more rows will be between it and its siblings due to all the other parts of the tree intervening that represent the descendents of its siblings.

V. Root node

The root node, represented in the first row of the spreadsheet is a description of the key itself. It should have children but no siblings.

VI. Property lists

The syntax of all property lists is the same, but there is usually a difference between the properties on interior nodes (decision points, which usually represent character states, as discussed above) and those on leaf nodes which usually represent the result of keying something out. In this section we discuss the syntax that applies to all property lists. In the discussion < > surrounds things which are variables and for which the author will substitute something explicit. In Excel, comments on a cell are made by use of the menu Insert->Comment or by default, the key sequence Alt-i m or by "Insert comment" on the right mouse button (in Windows) When this action is performed, a small yellow box pops up, typically with the user's name in bold on the first line. Our current key renderers ignore the first line but it may be useful to leave there so to give some history of who edited the key. Subsequent lines must be entered in pairs: attribute=<attributeNameString> value=<attributeValueString> There are three reserved characters in the <attributeNameString> and <attributeValueString>. These are ':' , '#' and '/'. Use of the first two is described in the 'PLD Tree Reference Manual', a separate document. The third is described below. We sometimes call a attribute-value pair as a 'property', and the entire Excel comment---excluding the aforementioned user name---the 'property list' of the node.

VI.1. Property lists on interior nodes

In this document we limit ourselves to the simplest, but most common usage of these attribute-value pairs in the case where the author desires to have renderers produce a multi-media key. In that case interior nodes will typically have a single property, whose attributeNameString is either the word 'image' 'audio' or 'video' and whose <attributeValueString> is a representation of a file name at which the rendering application can locate the media files. It is best that this be somewhat "local" and independent of the exact location on her own computer that the author has stored the media files. For example: attribute=image value=images/thumb/mechanitis.jpg or attribute=video value=video/hiRes/flight1.mpg

VI.2. Property lists on terminal nodes

Terminal, or 'leaf' nodes are nodes with no children. They represent the outcome of a decision made by following a particular path through the tree. Typically, then, they determine a single taxon or group of taxa. Often this group can be described by the result of a query to a database, or it might be as simple as just a taxon name. More importantly, However, we do not actually have to distinguish those two cases: that is the job of the key rendering application, which must have available to it some knowledge of what is intended to happen at leaf nodes. So the simplest property list at a leaf node takes the form attribute=<dbFieldName> value=<dbFieldValue> For example: attribute=Family value=Solanaceae A clever renderer (*) could actually construct a query to the author's database of the form "Display the entry or entries" with Family=Solanaceae. Further, the renderer could be quite sophisticated about what is meant by "Display the entry". For example it could produce a beautiful taxon page, collection of taxon pages, or it could produce XML to be passed to a software program that is invoking the key in some fashion other than by human interaction. Alternatively, a render's behavior could be quite simple, and ignore any database altogether and just indicate that the user has determined that they have keyed out to Family=Solanaceae. As a general principle, the author should make these terminal properties be as "generic" as possible and leave to the rendering application exactly how to deal with it. However, if a database is associated with the key, even conceptually, it is good practice to have something that would provide a simple database query. But it's bad practice to be overly explicit. For example, this is a legal property: attribute=FileMakerQuery value=FMPJS?-db=efg%5fstreams.fp5&-layid=5&-format=formvwcss.htm&-max=1&-token.0=25&-mode=browse&-op=bw&GENUS=Musculium&-lop=and&-find and a rendering application could quite easily be made to deal with it correctly, but it would not survive a change of the database structure, much less of the database software itself. If there are sub-keys for the current key, the author can link to them by following similar principles as above but by using the key word "url" for the attribute name. For example: attribute=url value=subKeyName If the author has static html pages as their species page, they could use that for the value property of property above, but that will not survive changes to the url where the static pages are located.

VI.3 Structured propery lists.

Sometimes the author desires that several properties by considered in aggregate (or even have more complex relationships than simple aggregation). Excel comments are pure unstructured text, but if truth be told, we turn them into XML which has substantial ability to structure them. Therefore we have to give an author a way to guide key renderers about this structure. The simplest case is illustrated here. For more complex cases, see the PLD Tree Reference Manual. attribute=efg:Structure#TaxonGroup value=efg:start attribute=Species value=Mechanitis polymnia attribute=Species value=Mechanitis lysimnia attribute=efg:Structure#TaxonGroup value=efg:end In this example, the optional indentation is ignored---it is just an aid to readability. The above example might be used when a key can not resolve beyond the two species mentioned. The strings efg:Structure, efg:start, and efg:end are reserved (see the PLD Tree Reference Manual). However, they do not place explicit requirements on a key rendering application. Rather the key author must negotiation with the author of the key rendering application what is intended by aggregating the two properties in this way. The tag 'TaxonGroup' following the '#' is chosen by the author (**) and typically this tag will be used to help the tree renderer decide which among possibly several different structures is intended by the key author. Again, this would be part of the specification of a key rendering application negotiated between the key author and the author of the rendering application.

(*) as, of course, are all of the EFG project key renderering applications.... :-)
(**)But the start and end keys must match in the same way as nested parenthesis must. Early in January 2004 we expect to release a Visual Basic tool for Excel that enforces this and makes these annotations easier to produce.

$Id: KeyAuthorInstructions.html,v 1.2 2005/02/20 16:30:41 kasiedu Exp $