DOCUMENTATION PREPARATION PROGRAM 'EXTRACT_DOC'

INTRODUCTION

The program extract_doc is used to prepare documentation files by extracting documentation sections from Fortran or 'C' code source files of from an input file in a special pre-html format devised for use with this tool. When used with Fortran or 'C' program source code files, the documentation sections must follow a particular convention. All documents produced have a similar overall layout. The output files are either in 'html' format, 'tidytext' format or a (formatted) plain ascii text file format. Index files may also be produced. The program automatically generates lists of sections and subsections in appropriate places and when 'html' output is requested, this will contain links to the various sections and sub-sections; the index files (one for the sections/sub-sections and one for the figures) will also contain links to the appropriate places.

List of sections:

General Layout of a Document
Running the Program
Pre-html Documents
Documentation in Program Source Code

GENERAL LAYOUT OF A DOCUMENT

The general layout of a document is as follows:

CHAPTER 1 Title
1.1 Introduction
1.2 Section
1.2.1 Introduction
1.2.2 Sub-section
1.2.3 Sub-section
...
1.3 Section
1.3.1 Introduction
1.3.2 Sub-section
1.3.3 Sub-section
...
etc.

The chapter (or overall section) number is supplied when the program is run and may be omitted if the document stands on its own. For documentation from program source code, the sections are sections of routines (there may be several such sections in one source code file) and the sub-sections are individual routines. At the end of the introduction, a list of sections in the document is given (with links if using 'html' format) and at the end of the introduction to a section, a list of the sub-sections (or routines) is given (again with links in 'html' mode).

RUNNING THE PROGRAM

Introduction

This section describes how the program is run, the program options available and the nature of the index files produced.

List of subsections in this section:

Program Command Line
Program Option Switches
Index files

Program Command Line

The program is run using the following options:

extract_doc [-program] [-tidytext] [-ascii] [-width ncols] [-just] [-root idxroot ] [-section secnum ] [-docfile filename ] [-name chapname ] [-inline code ] [-footer filename ] < input_file > output_file

Program Option Switches

The following option switches are available:
-program

Input file is from Fortran or 'C' source code; if not specified the input file is assumed to be in pre-html format.

-tidytext

The output is to be in TIDYTEXT format; the default output is HTML format.

-ascii

The output is to be in plain ascii format;the default output is HTML format.

-width ncols

ncols is the page width (number of columns) for the plain (ascii) text output option; It should also be the minimum number of columns which will be used by the 'tidytext' program for the tidytext output option. The default is 75 and the minimum allowed value is is 25.

-just

Justify automatically formatted paragraphs to right as well as left for plain (ascii) text output option.

-root idxroot

idxroot is the user supplied root file name for output index (.idx) and figures index (.ifg) files; if no root name is given, no index files will be written. For 'tidytext' or plain text output, an additional figures list file (.fls) will be written giving the figure names and the names of the files from which the figures are to be prepared.

-section secnum

secnum is the chapter or section number as a string. The internal section numbers will be appended to this string. If a string of '?' is given, then templates will be output for the section nos. (e.g. ? ?.?). If this keyword is not specified the sections will remain un-numbered. If the section number contains one or more decimal points then the style of the index files will be modified slightly to give less emphasis to the section headings.

e.g. -secnum 4 will give sections 4.1, 4.2 etc. and 4.1.1, 4.1.2 etc.

-docfile filname

filname is the name that will be used for the created 'html' output file; then it is only required if index files are being written as it is needed to form the data for links from the index to the document. If it is not given, the index files will not contain links. It is ignored in 'tidytext' or plain text output mode.

-name chapname

If a section number was specified, the main title/heading is prepended by the string CHAPTER followed by the secnum string (or SECTION if secnum contains a decimal point). If -name is used, the user supplied string 'chapname' will be used instead.

e.g. -name appendix -section 3 will give APPENDIX 3: title string...

-inline code

The code is either 'all' or 'none' to request that all images in an html output file are to in-lined or that all are to be given external links. This overrides the individual choices specified in the input pre-html file. The option is ignored for 'tidytext' or plain text output.

-footer filname

filname is the name of a file containing text in 'html' format to be added at the end of an 'html' output file e.g. for adding a set of standard links and/or address data. The data in the file is ignored for 'tidytext' or plain text output.

Index files

If index files are requested, then two (or three in the case of 'tidytext' or plain text output) such files are produced. The first file contains the chapter title and the section and sub-section headings and the second contains a list of the figures. The third file, when present, gives a list of the figures with the names of the image files from which the figures are to be prepared for a 'tidytext' or plain text format document. If 'html' output is being used, then these index files will contain links to the appropriate sections/figures in the main documentation file provided that the name of that file has been passed to the 'extract_doc' program as a program option. The index files are most likely to be of use in a multi-chapter document and it is suggested that, in such a case, a script file is composed which will prepare an overall index file from concatenating the individual index files.

PRE-HTML DOCUMENTS

Introduction

The pre-html file used as input to the 'extract_doc' program resembles 'html' in its use of tags. It enables a simple though restricted layout for the document but provides the bonus that the 'extract_doc' program will give lists, in appropriate places, of the sections/sub-sections present and when used with the 'html' output format option will generate automatically a set of links to these sections/sub-sections. It also enables the output of documents compatible in format with those extracted from program source code. The pre-html format enales figure handing to be defined for both 'html' and 'tidytext'/plain-text output cases.

List of subsections in this section:

Layout of a Pre-html Document
Markers in a Pre-html Document
Handling of Figures
Special Characters

Layout of a Pre-html Document

The document is layed out using the following items tagged as shown with items enclosed in ellipses indicating user supplied material which may contain various 'html' tags and other items as described in more detail in the following section.:

<.TITLE> ...title-string... </.TITLE>

<.AUTHOR>
...text...
</.AUTHOR>

<.INTRO>
...text...
</.INTRO>

<.SECT ...section-header...>
...text...
</.SECT>

<.SUBSECT ...subsection-header...>
...text...
</.SUBSECT>

<.SUBSECT>
...text...
</.SUBSECT>

...

<.SECT>
...text...
</.SECT>

...
etc.

The items are as follows:

Title

This item must be present. It defines a title string which will appear at the top of the document. In 'html' it is used as both the title and level 1 header.

Author

This item is optional. It is a section of text giving details of the document author(s). The section will be output after the title and before the introduction section.

Introduction

This item must be present. It is a section of text giving a general description of the subject matter of the document. The program will automatically append a list of the sections present in the document to this item and these will have links to the relevant sections when the output file is in 'html' format.

Section

One or more sections must be described. The section item consists of two parts, a short section header string within the tag and a body of text which gives a general description of the subject matter of the section. The section header string is used both as a section header in the output file (usually appended to a section number) and in the list of sections automatically generated at the end of the Introduction section. It will also be used in the index file if written. The text body is used to provide an introductory sub-section for the section in question and the program will automatically append a list of the sub-sections present in the section to which this item belongs. These will have links to the relevant sub-sections when the output file is in 'html' format.

Sub-section

One or more sub-sections must be present per section. The sub-section item consists of two parts, a short sub-section header string within the tag and a body of text. The sub-section header string is used both as a sub-section header in the output file (usually appended to a sub-section number) and in the list of routines automatically generated at the end of the introductory sub-section for the current section. It will also be used in the index file if written.

Markers in a Pre-html Document

The pre-html file used as input to the 'extract_doc' program resembles 'html' in its use of tags. It only allows a restricted set of 'html' codes to be used but in addition it uses some special tags. The 'extract_doc' special tag names start with a dot e.g. <.TITLE>. All tags are treated in a case insensitive manner.
Tags defining the basic items/section of the document
<.TITLE>
This is followed by the title string which will be formatted by the 'html' browser or by 'tidytext'.
</.TITLE>
Terminates the title string.
<.AUTHOR>
The author section (if present) follows this tag.
</.AUTHOR>
Terminates the author section.
<.INTRO>
The introduction text body follows this tag.
</.INTRO>
Terminates the introduction.
<.SECT ...section-header...>
A section description text body follows this tag.
</.SECT>
Terminates the section description.
<.SUBSECT ...subsection-header...>
A sub-section text body follows this tag.
</.SUBSECT>
Terminates the sub-section text body.

Standard 'html' tags allowed within a text body

A text body is the text within the author item, the introduction item, a section item or a sub-section item. No tags are processed outside such items. Many of the tags, though valid 'html' tags may only be given on separate lines in a pre-html file.

Tags which must be given on separate lines are as follows:

<P>, <PRE>, </PRE>, <UL>, </UL>, <OL>, </OL>, <DL>, </DL>, <HR>

Tags which must be given only at the strt of a line are as follows:

<LI>, <DT>, <DD>

Tags which may be given within a line are as follows:

<A>, </A>, <B>, </B>, <I> </I>

Special 'extract_doc' tags

A number of additional tags specific to the program 'extract_doc' may also be used within the text body. These are the following:

<.AL>, </.AL>: These are equivalent for 'html' output to those for an ordered list and items are introduced in the same manner using the <LI> tag. In tidytext or plain text output the items list will be tagged with letters as opposed to numbers; In 'html' output the items will be treated as for any other ordered list.

<.SINGLES>, </.SINGLES>: These tags introduce and end a section of text in which each line in the input file is to be output to a single line. In 'html' output, the normal font is used for each line and it will be spaced in the usual manner; for 'tidytext' or plain text output, it will be equivalent to a pre-formatted/table section. Each of the tags must be given on a separate line.

<.HTML>, </.HTML>: These tags introduce and end a section which is in 'html' format and which is to be copied directly to the output file when an 'html' output file is being written. For a 'tidytext' or plain text output file, the section is ignored. Each of the tags must be given on a separate line. (Note the dot preceding the HTML in the tag; these are not the standard <HTML>, </HTML> tags which are not used in a pre-html file though they are used in the output 'html' file.)

<.TIDYTEXT>, </.TIDYTEXT>: These tags introduce and end a section which is in 'tidytext' format and which is to be copied directly to the output file when a 'tidytext' output file is being written. For an 'html' output file or a plain text output file, the section is ignored. Each of the tags must be given on a separate line.

<.ASCII>, </.ASCII>: These tags introduce and end a section which is in plain text format and which is to be copied directly to the output file when a plain ascii text output file is being written. For an 'html' output file or a 'tidytext' output file, the section is ignored. Each of the tags must be given on a separate line.

<.NEWPAGE>: This tag will force a new page in 'tidytext' output mode. It is ignored for an 'html' or plain text output file. The tag must be given on a separate line.

<.FIGURE ...figure-name...>, </.FIGURE>: These tags introduce and end a special section which gives details for figures to be included in the document. Details are given below. Each of the tags must be given on a separate line.

<.LINK "url" ...text...>: These tags allow additional links to be introduced into the document. The quotes around the URL are optional. The user supplied text is used as the reference for the link. For 'tidytext' or plain text output, only the reference text is output. For 'html' output an entry of the form <A HREF = "url">...text...</A> is created.

Handling of Figures

Documention in 'tidytext' or plain text format has no direct provision for the inclusion of figures and one of the advantages of 'html' is the possibility of including figures directly or via links. The special figures section, enclosed with the tags <.FIGURE> and </.FIGURE>, enables the definition of figures in a pre-html document. Two formats of line are recognised within the section; these describe how a figure is to be handled for each of the possible output file types. Normally both should be given.

For 'html' the format of the line is as follows:

HTML: ...image_file_name... code

The name of the required image file is given followed by a code wich is either INTERNAL or EXTERNAL (the latter assumed by default). If INTERNAL is given, then the image will be included in-line when an 'html' output document is prepared. When EXTERNAL, a link to the figure will be included instead. A figure string includes the word 'Figure', a figure number (appended to the chapter/section no. string if defined in the program command line) and the figure name from within the <.FIGURE> tag. There will be an anchor point to this Figure number string. When the figure is external, the figure name string will be highligted as the hypertext link. Though each figure may be specified as internal or external via this mechanism, it is possible to override these choices globally via the program command line and to make all figures internal or all figures external.

For 'tidytext' or plain ascii text output, the format of the line is as follows:

TEXT: ...image_file_name... number/code

The name of a file containing the figure is followed either by an integer giving the number of blank lines to be left in the document for the figure to be added later or the code END indicating that the figure is to be added at the end of the document. In both cases a figure string will be written (made up as in the 'html' case.). If the END code option is used, then a line of the form '(at end of chapter)' is also output (the word chapter will be replaced by a lower case version of any chapname string defined via the program command line). For figures at the end of the document, new pages will be added and these will be annotated with the figure strings.

All figures are numbered in the sequence they are defined in the document. A list of figures is extracted for the figures index file if this was requested.

Special Characters

Some special characters in 'html' have to be represented by escape sequences. This practice is followed in a pre-html file with a limited number of such characters/escape-sequences being allowed. (Note that, in contrast, the characters are used directly when input is from program source code documentation and not their corresponding escape sequences). In pre-html, the following escape sequences are recognised:
   <   &lt;
   >   &gt;
   &   &amp;
   "   &quot;

DOCUMENTATION IN PROGRAM SOURCE CODE

Introduction

The program 'extract_doc' may be used to extract documentation from program code source files. These may be Fortran code source files, 'C' code source files or 'C' code source files which also contain Fortran bindings and documentation of the Fortran Calls. In the descriptions below items surrounded by ellipses e.g. ...title... represent text supplied by the user.

List of subsections in this section:

Documentation Layout in a Source Code File
Summary of Documentation Item Codes
Outline of Fortran Documentation Sections
Outline of 'C' Documentation Sections
Description of Documentation Sections
'C' Routines with Fortran Interfaces

Documentation Layout in a Source Code File

The documentation sections in a program source code file are as follows:

Title

Introduction

Section order list (optional)

Section description

Routine description
Routine definition
Parameter description
Additional documentation (optional)
...
Routine description
Fortran call
Parameter description
Additional documentation (optional)
...
Section description
...
etc.

There may be several sections of routines and each section may contain any number of routines (up to program limits). The items following the routine description may be in any order (or even omitted) but, if present, will be output in the order given; there may be more than one set of additional documentation given. In some cases bothe Fortran and 'C' routine definitions and parameter descriptions may occur in the same file (i.e. a set of 'C' routines with Fortran interfaces) if required.

Summary of Documentation Item Codes

The following gives a summary of the codes which are used to introduce and terminate documentation items. In a Fortran source code file, only the Fortran codes may be used because of the way commenting is done. In a 'C' code file the 'C' codes will normally be used but some of the Fortran codes may also be appropriate.
Item-type                 Fortran-code(s)     'C'-codes

Title CD-Title: /*-Title: Introduction CD-Intro: /*-Intro: Section order CD-Section_order: /*-Section_order: Section description CD-Section: /*-Section: Routine Description CD-Routine: /*-Routine: Fortran definition CD-Fortran: /*-Fortran: 'C' definition CD-C: /*-C:*/ Parameters description CD-Parameters: /*-Parameters: or /*-Parameters:*/ Additional documentation CD-Doc: /*-Doc: End of item CD-end or CD-end: -end*/ or /*end*/

The only documentations sections which need not be part of the code's comments are the Fortran and 'C' routine definitions. The program treats the codes in a case insensitive manner.

Outline of Fortran Documentation Sections

The documentation sections in a Fortran source code file are included as follows. A full description of the items is given below. All lines start with the comment character 'C'.

CD-Title: ...title-string...
CD-end

CD-Intro:
...text...
CD-end

CD-section_order:
...list-of-section-numbers...
CD-end

CD-Section: ...section-header...
...text...
CD-end

CD-Routine: ...routine-header...
...text...
CD-end

CD-Fortran:
...subroutine/function-definition...
CD-end

CD-Parameters:
...parameters-description...
CD-end

CD-doc:
...text...
CD-end

Outline of 'C' Documentation Sections

The documentation sections in a 'C' source code file are included as follows. A full description of the items is given below.

/*.Title: ...title-string...
-end*/

/*-Intro:
...text...
-end*/

/*-section_order:
...list-of-section-numbers...
-end*/

/*-Section: ...section-header...
...text...
-end*/

/*-Routine: ...routine-header...
...text...
-end*/

/*-C:*/
...function-definition...
/*end*/

/*-Parameters:
...parameters-description...
...routine-return...
-end*/

/*-Doc:
...text...
-end*/

Description of Documentation Sections

The following documentation items may be defined in program source code files and the way in which they will be treated is noted:
Title

This item must be present. It defines a title string which will appear at the top of the document. In 'html' it is used as both the title and level 1 header.

Introduction

This item must be present. It is a section of text giving a general description of the subject matter of the document. Paragraphs will be automatically formatted by the html browser or tidytext program or, in the case of plain text output, by the extract_doc program itself. Blank lines (ignoring the C comment character in Fortran) are used to indicate paragraph separators. The program will automatically append a list of the sections present in the document to this item and these will have links to the relevant sections when the output file is in 'html' format.

Section Order

This item is optional. It allows the sections of routines to be output in a different order from that in the input file. It consists of a list of the section numbers in the order they are to be output. For example if there are four sections, the the section order list 3 2 1 4 will cause the third section to be output first, followed by the second, first and fourth. In the output document, the sections will be numbered in this re-arranged order. If the section order item is omitted, sections will be output in the order they occur in the input file.

Section Description

One or more sections must be described. The section item consists of two parts, a short section header string and a body of text which gives a general description of the routines included in the section. The section header string is used both as a section header in the output file (usually appended to a section number) and in the list of sections automatically generated at the end of the Introduction section. It will also be used in the index file if written. Paragraphs in the text body will be automatically formatted by the html browser or tidytext program or, in the case of plain text output, by the extract_doc program itself. Blank lines (ignoring the C comment character in Fortran) are used to indicate paragraph separators. The text body is used to provide an introductory sub-section for the section in question and the program will automatically append a list of the routines present in the section to which this item belongs. These will have links to the routine descriptions when the output file is in 'html' format.

Routine Description

One or more routines must be present per section. The routine item consists of two parts, a short routine header string and a body of text which gives a general of the routine. The routine header string is used both as a sub-section header in the output file (usually appended to a sub-section number) and in the list of routines automatically generated at the end of the introductory sub-section for the current section. It will also be used in the index file if written. Paragraphs in the text body will be automatically formatted by the html browser or tidytext program or, in the case of plain text output, by the extract_doc program itself. Blank lines (ignoring the C comment character in Fortran) are used to indicate paragraph separators.

Routine Definition

This item applies to the last routine defined and is the routine definition. In 'C' this will be the actual function definition which will normally contain the parameter type declarations. In Fortran, it may be the actual subroutine/function definition line or may contain a set of commented lines describing the subroutine/function call. The text in this section is treated as pre-formatted both by an 'html' browser and by the 'tidytext' program.

Routine Parameters

This applies to the last routine defined and is a section in comments which describes the routine's parameters. The text in this section is treated as pre-formatted both by an 'html' browser and by the 'tidytext' program. Some standard form of layout should be used for such sections. in 'C', the routine's return value should also be described within this item. Note that there must be no comment sections within this parameters section. If the parameters are actual declarations (possibly followed by comments) then the parameters section should be bounded by lines containing the codes /*-Parameters:*/ and /*end*/.

Additional Routine Documentation

This is optional and will often not be needed. It applies to the last routine defined. It enables further sections of documentation to be supplied for the routine. The text within such sections will be treated as pre-formatted.

'C' Routines with Fortran Interfaces

These are handled as for the 'C' routines but there will be additional sections present describing the Fortran subroutine/function definition. These will be in a 'C' comment section using the Fortran item codes or may use the equivalent 'C' codes e.g.

/*
CD-Fortran:
...subroutine/function-definition...
CD-end

CD-Parameters:
...parameters-description...
CD-end

CD-doc:
...text...
CD-end
*/

/*-Fortran:
...subroutine/function-definition...
-end*/

/*-Parameters:
...parameters-description...
-end*/

/*-doc:
...text...
-end*/



John W. Campbell
CCLRC Daresbury Laboratory
Last update 30 Sep 1997