DOCUMENTATION PREPARATION PROGRAM 'EXTRACT_DOC'
The program extract_doc is used to prepare documentation files by extracting
documentation sections from Fortran or 'C' code source files of from an
input file in a special pre-html format devised for use with this tool.
When used with Fortran or 'C' program source code files, the documentation
sections must follow a particular convention. All documents produced have
a similar overall layout. The output files are either in 'html' format,
'tidytext' format or a (formatted) plain ascii text file format. Index files
may also be produced. The program automatically generates lists of sections
and subsections in appropriate places and when 'html' output is requested,
this will contain links to the various sections and sub-sections; the index
files (one for the sections/sub-sections and one for the figures) will also
contain links to the appropriate places.
List of sections:
General Layout of a Document
Running the Program
Pre-html Documents
Documentation in Program Source Code
The general layout of a document is as follows:
CHAPTER 1 Title
1.1 Introduction
1.2 Section
1.2.1 Introduction
1.2.2 Sub-section
1.2.3 Sub-section
...
1.3 Section
1.3.1 Introduction
1.3.2 Sub-section
1.3.3 Sub-section
...
etc.
The chapter (or overall section) number is supplied when the program is run
and may be omitted if the document stands on its own. For documentation from
program source code, the sections are sections of routines (there may be
several such sections in one source code file) and the sub-sections are
individual routines. At the end of the introduction, a list of sections
in the document is given (with links if using 'html' format) and at the
end of the introduction to a section, a list of the sub-sections (or routines)
is given (again with links in 'html' mode).
This section describes how the program is run, the program options available
and the nature of the index files produced.
List of subsections in this section:
Program Command Line
Program Option Switches
Index files
The program is run using the following options:
extract_doc [-program] [-tidytext] [-ascii] [-width ncols] [-just]
[-root idxroot ] [-section secnum ] [-docfile filename ]
[-name chapname ] [-inline code ] [-footer filename ]
< input_file > output_file
The following option switches are available:
- -program
- Input file is from Fortran or 'C' source code; if not specified the
input file is assumed to be in pre-html format.
- -tidytext
- The output is to be in TIDYTEXT format; the default output is HTML format.
- -ascii
- The output is to be in plain ascii format;the default output is
HTML format.
- -width ncols
- ncols is the page width (number of columns) for the plain (ascii)
text output option; It should also be the minimum number of columns which
will be used by the 'tidytext' program for the tidytext output option. The
default is 75 and the minimum allowed value is is 25.
- -just
- Justify automatically formatted paragraphs to right as well as left
for plain (ascii) text output option.
- -root idxroot
- idxroot is the user supplied root file name for output index (.idx)
and figures index (.ifg) files; if no root name is given, no index files
will be written. For 'tidytext' or plain text output, an additional figures
list file (.fls) will be written giving the figure names and the names of
the files from which the figures are to be prepared.
- -section secnum
- secnum is the chapter or section number as a string. The internal
section numbers will be appended to this string. If a string of '?' is
given, then templates will be output for the section nos. (e.g. ? ?.?).
If this keyword is not specified the sections will remain un-numbered. If
the section number contains one or more decimal points then the style of
the index files will be modified slightly to give less emphasis to
the section headings.
e.g. -secnum 4 will give sections 4.1, 4.2 etc. and 4.1.1, 4.1.2 etc.
- -docfile filname
- filname is the name that will be used for the created 'html' output
file; then it is only required if index files are being written as
it is needed to form the data for links from the index to the document.
If it is not given, the index files will not contain links. It is ignored
in 'tidytext' or plain text output mode.
- -name chapname
- If a section number was specified, the main title/heading is prepended
by the string CHAPTER followed by the secnum string (or SECTION if
secnum contains a decimal point). If -name is used,
the user supplied string 'chapname' will be used instead.
e.g. -name appendix -section 3 will give APPENDIX 3: title string...
- -inline code
- The code is either 'all' or 'none' to request that all images in an html
output file are to in-lined or that all are to be given external links.
This overrides the individual choices specified in the input pre-html file.
The option is ignored for 'tidytext' or plain text output.
- -footer filname
- filname is the name of a file containing text in 'html' format
to be added at the end of an 'html' output file e.g. for adding a set
of standard links and/or address data. The data in the file is ignored
for 'tidytext' or plain text output.
If index files are requested, then two (or three in the case of 'tidytext'
or plain text output) such files are produced. The first file contains the
chapter title and the section and sub-section headings and the second contains
a list of the figures. The third file, when present, gives a list of the
figures with the names of the image files from which the figures are to be
prepared for a 'tidytext' or plain text format document. If 'html' output
is being used, then these index files will contain links to the appropriate
sections/figures in the main documentation file provided that the name of
that file has been passed to the 'extract_doc' program as a program option.
The index files are most likely to be of use in a multi-chapter document and
it is suggested that, in such a case, a script file is composed which will
prepare an overall index file from concatenating the individual index files.
The pre-html file used as input to the 'extract_doc' program resembles 'html'
in its use of tags. It enables a simple though restricted layout for the
document but provides the bonus that the 'extract_doc' program will give
lists, in appropriate places, of the sections/sub-sections present and when
used with the 'html' output format option will generate automatically
a set of links to these sections/sub-sections. It also enables the output
of documents compatible in format with those extracted from program source
code. The pre-html format enales figure handing to be defined for both 'html'
and 'tidytext'/plain-text output cases.
List of subsections in this section:
Layout of a Pre-html Document
Markers in a Pre-html Document
Handling of Figures
Special Characters
The document is layed out using the following items tagged as shown with
items enclosed in ellipses indicating user supplied material which may
contain various 'html' tags and other items as described in more detail
in the following section.:
<.TITLE> ...title-string... </.TITLE>
<.AUTHOR>
...text...
</.AUTHOR>
<.INTRO>
...text...
</.INTRO>
<.SECT ...section-header...>
...text...
</.SECT>
<.SUBSECT ...subsection-header...>
...text...
</.SUBSECT>
<.SUBSECT>
...text...
</.SUBSECT>
...
<.SECT>
...text...
</.SECT>
...
etc.
The items are as follows:
- Title
- This item must be present. It defines a title string which will appear
at the top of the document. In 'html' it is used as both the title and
level 1 header.
- Author
- This item is optional. It is a section of text giving details
of the document author(s). The section will be output after the title
and before the introduction section.
- Introduction
- This item must be present. It is a section of text giving a general
description of the subject matter of the document. The program will
automatically append a list of the sections present in the document to
this item and these will have links to the relevant sections when the output
file is in 'html' format.
- Section
- One or more sections must be described. The section item consists of
two parts, a short section header string within the tag and a body of text
which gives a general description of the subject matter of the section.
The section header string is used both as a section header in the output
file (usually appended to a section number) and in the list of sections
automatically generated at the end of the Introduction section. It will also
be used in the index file if written. The text body is used to provide an
introductory sub-section for the section in question and the program will
automatically append a list of the sub-sections present in the section
to which this item belongs. These will have links to the relevant sub-sections
when the output file is in 'html' format.
- Sub-section
- One or more sub-sections must be present per section. The sub-section
item consists of two parts, a short sub-section header string within the
tag and a body of text. The sub-section header string is used both as a
sub-section header in the output file (usually appended to a sub-section
number) and in the list of routines automatically generated at the end of
the introductory sub-section for the current section. It will also be used
in the index file if written.
The pre-html file used as input to the 'extract_doc' program resembles 'html'
in its use of tags. It only allows a restricted set of 'html' codes to be
used but in addition it uses some special tags. The 'extract_doc' special
tag names start with a dot e.g. <.TITLE>. All tags are treated in
a case insensitive manner.
- Tags defining the basic items/section of the document
- <.TITLE>
- This is followed by the title string which will
be formatted by the 'html' browser or by 'tidytext'.
- </.TITLE>
- Terminates the title string.
- <.AUTHOR>
- The author section (if present) follows this tag.
- </.AUTHOR>
- Terminates the author section.
- <.INTRO>
- The introduction text body follows this tag.
- </.INTRO>
- Terminates the introduction.
- <.SECT ...section-header...>
- A section description text body follows this tag.
- </.SECT>
- Terminates the section description.
- <.SUBSECT ...subsection-header...>
- A sub-section text body follows this tag.
- </.SUBSECT>
- Terminates the sub-section text body.
- Standard 'html' tags allowed within a text body
- A text body is the text within the author item, the introduction item,
a section item or a sub-section item. No tags are processed outside such
items. Many of the tags, though valid 'html' tags may only be given on
separate lines in a pre-html file.
Tags which must be given on separate lines are as follows:
<P>, <PRE>, </PRE>, <UL>, </UL>, <OL>,
</OL>, <DL>, </DL>, <HR>
Tags which must be given only at the strt of a line are as follows:
<LI>, <DT>, <DD>
Tags which may be given within a line are as follows:
<A>, </A>, <B>, </B>, <I> </I>
- Special 'extract_doc' tags
- A number of additional tags specific to the program 'extract_doc' may
also be used within the text body. These are the following:
<.AL>, </.AL>: These are equivalent for 'html' output to those
for an ordered list and items are introduced in the same manner using the
<LI> tag. In tidytext or plain text output the items list will be
tagged with letters as opposed to numbers; In 'html' output the items will
be treated as for any other ordered list.
<.SINGLES>, </.SINGLES>: These tags introduce and end a section
of text in which each line in the input file is to be output to a single
line. In 'html' output, the normal font is used for each line and it will
be spaced in the usual manner; for 'tidytext' or plain text output, it will
be equivalent to a pre-formatted/table section. Each of the tags must be
given on a separate line.
<.HTML>, </.HTML>: These tags introduce and end a section which
is in 'html' format and which is to be copied directly to the output file
when an 'html' output file is being written. For a 'tidytext' or plain
text output file, the section is ignored. Each of the tags must be given on
a separate line. (Note the dot preceding the HTML in the tag; these are not the
standard <HTML>, </HTML> tags which are not used in a pre-html
file though they are used in the output 'html' file.)
<.TIDYTEXT>, </.TIDYTEXT>: These tags introduce and end a section
which is in 'tidytext' format and which is to be copied directly to the output
file when a 'tidytext' output file is being written. For an 'html' output
file or a plain text output file, the section is ignored. Each of the tags
must be given on a separate line.
<.ASCII>, </.ASCII>: These tags introduce and end a section
which is in plain text format and which is to be copied directly to the output
file when a plain ascii text output file is being written. For an 'html'
output file or a 'tidytext' output file, the section is ignored. Each of
the tags must be given on a separate line.
<.NEWPAGE>: This tag will force a new page in 'tidytext' output mode.
It is ignored for an 'html' or plain text output file. The tag must be given
on a separate line.
<.FIGURE ...figure-name...>, </.FIGURE>: These tags introduce
and end a special section which gives details for figures to be included in
the document. Details are given below. Each of the tags must be given on a
separate line.
<.LINK "url" ...text...>: These tags allow additional links to be
introduced into the document. The quotes around the URL are optional. The
user supplied text is used as the reference for the link. For 'tidytext'
or plain text output, only the reference text is output. For 'html' output
an entry of the form <A HREF = "url">...text...</A> is created.
Documention in 'tidytext' or plain text format has no direct provision for
the inclusion of figures and one of the advantages of 'html' is the
possibility of including figures directly or via links. The special figures
section, enclosed with the tags <.FIGURE> and </.FIGURE>,
enables the definition of figures in a pre-html document. Two formats of
line are recognised within the section; these describe how a figure is to be
handled for each of the possible output file types. Normally both should be
given.
For 'html' the format of the line is as follows:
HTML: ...image_file_name... code
The name of the required image file is given followed by a code wich is
either INTERNAL or EXTERNAL (the latter assumed by default). If INTERNAL
is given, then the image will be included in-line when an 'html' output
document is prepared. When EXTERNAL, a link to the figure will be included
instead. A figure string includes the word 'Figure', a figure number
(appended to the chapter/section no. string if defined in the program
command line) and the figure name from within the <.FIGURE> tag.
There will be an anchor point to this Figure number string. When the figure
is external, the figure name string will be highligted as the hypertext link.
Though each figure may be specified as internal or external via this mechanism,
it is possible to override these choices globally via the program command line
and to make all figures internal or all figures external.
For 'tidytext' or plain ascii text output, the format of the line is as
follows:
TEXT: ...image_file_name... number/code
The name of a file containing the figure is followed either by an integer
giving the number of blank lines to be left in the document for the figure
to be added later or the code END indicating that the figure is to be added
at the end of the document. In both cases a figure string will be written
(made up as in the 'html' case.). If the END code option is used, then
a line of the form '(at end of chapter)' is also output (the word chapter
will be replaced by a lower case version of any chapname string
defined via the program command line). For figures at the end of the document,
new pages will be added and these will be annotated with the figure strings.
All figures are numbered in the sequence they are defined in the document.
A list of figures is extracted for the figures index file if this was
requested.
Some special characters in 'html' have to be represented by escape sequences.
This practice is followed in a pre-html file with a limited number of such
characters/escape-sequences being allowed. (Note that, in contrast, the
characters are used directly when input is from program source code
documentation and not their corresponding escape sequences). In pre-html,
the following escape sequences are recognised:
< <
> >
& &
" "
The program 'extract_doc' may be used to extract documentation from program
code source files. These may be Fortran code source files, 'C' code source
files or 'C' code source files which also contain Fortran bindings and
documentation of the Fortran Calls. In the descriptions below items
surrounded by ellipses e.g. ...title... represent text supplied by the
user.
List of subsections in this section:
Documentation Layout in a Source Code File
Summary of Documentation Item Codes
Outline of Fortran Documentation Sections
Outline of 'C' Documentation Sections
Description of Documentation Sections
'C' Routines with Fortran Interfaces
The documentation sections in a program source code file are as follows:
Title
Introduction
Section order list (optional)
Section description
Routine description
Routine definition
Parameter description
Additional documentation (optional)
...
Routine description
Fortran call
Parameter description
Additional documentation (optional)
...
Section description
...
etc.
There may be several sections of routines and each section may contain any
number of routines (up to program limits). The items following the
routine description may be in any order (or even omitted) but, if present,
will be output in the order given; there may be more than one set of
additional documentation given. In some cases bothe Fortran and 'C' routine
definitions and parameter descriptions may occur in the same file (i.e. a
set of 'C' routines with Fortran interfaces) if required.
The following gives a summary of the codes which are used to introduce and
terminate documentation items. In a Fortran source code file, only the
Fortran codes may be used because of the way commenting is done. In a 'C'
code file the 'C' codes will normally be used but some of the Fortran codes
may also be appropriate.
Item-type Fortran-code(s) 'C'-codes
Title CD-Title: /*-Title:
Introduction CD-Intro: /*-Intro:
Section order CD-Section_order: /*-Section_order:
Section description CD-Section: /*-Section:
Routine Description CD-Routine: /*-Routine:
Fortran definition CD-Fortran: /*-Fortran:
'C' definition CD-C: /*-C:*/
Parameters description CD-Parameters: /*-Parameters:
or /*-Parameters:*/
Additional documentation CD-Doc: /*-Doc:
End of item CD-end or CD-end: -end*/ or /*end*/
The only documentations sections which need not be part of the code's
comments are the Fortran and 'C' routine definitions. The program
treats the codes in a case insensitive manner.
The documentation sections in a Fortran source code file are included as
follows. A full description of the items is given below. All lines start
with the comment character 'C'.
CD-Title: ...title-string...
CD-end
CD-Intro:
...text...
CD-end
CD-section_order:
...list-of-section-numbers...
CD-end
CD-Section: ...section-header...
...text...
CD-end
CD-Routine: ...routine-header...
...text...
CD-end
CD-Fortran:
...subroutine/function-definition...
CD-end
CD-Parameters:
...parameters-description...
CD-end
CD-doc:
...text...
CD-end
The documentation sections in a 'C' source code file are included as
follows. A full description of the items is given below.
/*.Title: ...title-string...
-end*/
/*-Intro:
...text...
-end*/
/*-section_order:
...list-of-section-numbers...
-end*/
/*-Section: ...section-header...
...text...
-end*/
/*-Routine: ...routine-header...
...text...
-end*/
/*-C:*/
...function-definition...
/*end*/
/*-Parameters:
...parameters-description...
...routine-return...
-end*/
/*-Doc:
...text...
-end*/
The following documentation items may be defined in program source code
files and the way in which they will be treated is noted:
- Title
- This item must be present. It defines a title string which will appear
at the top of the document. In 'html' it is used as both the title and
level 1 header.
- Introduction
- This item must be present. It is a section of text giving a general
description of the subject matter of the document. Paragraphs will be
automatically formatted by the html browser or tidytext program or,
in the case of plain text output, by the extract_doc program itself. Blank
lines (ignoring the C comment character in Fortran) are used to indicate
paragraph separators. The program will automatically append a list of
the sections present in the document to this item and these will have
links to the relevant sections when the output file is in 'html' format.
- Section Order
- This item is optional. It allows the sections of routines to be output
in a different order from that in the input file. It consists of a list
of the section numbers in the order they are to be output. For example if
there are four sections, the the section order list 3 2 1 4 will cause the
third section to be output first, followed by the second, first and fourth.
In the output document, the sections will be numbered in this re-arranged
order. If the section order item is omitted, sections will be output
in the order they occur in the input file.
- Section Description
- One or more sections must be described. The section item consists of
two parts, a short section header string and a body of text which gives
a general description of the routines included in the section. The section
header string is used both as a section header in the output file (usually
appended to a section number) and in the list of sections automatically
generated at the end of the Introduction section. It will also be used
in the index file if written. Paragraphs in the text body will be
automatically formatted by the html browser or tidytext program or,
in the case of plain text output, by the extract_doc program itself. Blank
lines (ignoring the C comment character in Fortran) are used to indicate
paragraph separators. The text body is used to provide an introductory
sub-section for the section in question and the program will automatically
append a list of the routines present in the section to which this item
belongs. These will have links to the routine descriptions when the output
file is in 'html' format.
- Routine Description
- One or more routines must be present per section. The routine item
consists of two parts, a short routine header string and a body of text which
gives a general of the routine. The routine header string is used both as
a sub-section header in the output file (usually appended to a sub-section
number) and in the list of routines automatically generated at the end of
the introductory sub-section for the current section. It will also be used
in the index file if written. Paragraphs in the text body will be
automatically formatted by the html browser or tidytext program or,
in the case of plain text output, by the extract_doc program itself. Blank
lines (ignoring the C comment character in Fortran) are used to indicate
paragraph separators.
- Routine Definition
- This item applies to the last routine defined and is the routine
definition. In 'C' this will be the actual function definition which
will normally contain the parameter type declarations. In Fortran,
it may be the actual subroutine/function definition line or may contain a
set of commented lines describing the subroutine/function call. The text in
this section is treated as pre-formatted both by an 'html' browser and by
the 'tidytext' program.
- Routine Parameters
- This applies to the last routine defined and is a section in comments
which describes the routine's parameters. The text in this section is treated
as pre-formatted both by an 'html' browser and by the 'tidytext' program.
Some standard form of layout should be used for such sections. in 'C',
the routine's return value should also be described within this item. Note
that there must be no comment sections within this parameters section. If
the parameters are actual declarations (possibly followed by comments) then
the parameters section should be bounded by lines containing the codes
/*-Parameters:*/ and /*end*/.
- Additional Routine Documentation
- This is optional and will often not be needed. It applies to the last
routine defined. It enables further sections of documentation to be
supplied for the routine. The text within such sections will be treated
as pre-formatted.
These are handled as for the 'C' routines but there will be additional sections
present describing the Fortran subroutine/function definition. These will
be in a 'C' comment section using the Fortran item codes or may use the
equivalent 'C' codes e.g.
/*
CD-Fortran:
...subroutine/function-definition...
CD-end
CD-Parameters:
...parameters-description...
CD-end
CD-doc:
...text...
CD-end
*/
/*-Fortran:
...subroutine/function-definition...
-end*/
/*-Parameters:
...parameters-description...
-end*/
/*-doc:
...text...
-end*/
John W. Campbell
CCLRC Daresbury Laboratory
Last update 30 Sep 1997