All the information in this quick guide has been taken from the Mosflm manual, for a more in-depth tutorial and further information please see this manual which is available on the web or station. Here we shall consider only the barest minimum information needed to run Mosflm.
Data processing can be broken down into three sections: (1) Determining crystal orientation, cell parameters, possible space group, (2) Generating the reflection lists and integrating the images and (3) Scaling and merging the data. The first two steps can be performed using Mosflm and the third using Scala. The INPUT files for Mosflm are (1) the image file and (2) a file containing the crystal orientation matrix (which can itself be generated by Mosflm). The OUTPUT files are (1) An output mtz file containing integrated intensities, this can be set by HKLOUT filename. (2) If autoindexing or refining cell parameters a file containing the refined crystal orientation matrices is written, it's name being set by NEWMAT (defaults to NEWMAT). (3) Summary file containing a summary of the processing results. This can be input to xloggraph for graphical representation. (4) If running interactively the output is written to the terminal window and a file mosflm.lp.
Getting Started
To start the Mosflm program type 'mosflm'. This will initiate the program and produce the Mosflm control window (which contains information on the version number etc.). It is now possible to enter information on the images, detector and other experimental parameters manually using the following keywords. Mosflm does not have to be told whether the image to be displayed is binned or unbinned, it can display both. However, for binned images it still uses the unbinned beam centre position. Alternatively the information can be put into a file, e.g. startup.com, which can be run by typing @startup.com in the Mosflm window. This will cause the Mosflm display window to appear which has an Image Display area together with a list of Processing Parameters (which can be changed), a Main Menu and a table of Output Parameters.
TITLE lysozyme test data
IMAGE lys1_1_001.img
BEAM 94.60
92.89
SCANNER ADSC
WAVE 0.979
DISTANCE 180
RESOLUTION 1.8 (if
known)
MOSAIC (if known)
SYMM p422 (if known)
CELL 79.388 79.388 37.831
90 90 90 (if known)
GO
Autoindexing
Once the first image has been displayed Autoindexing should then be run to determine the orientation matrix, spacegroup and the unit cell etc. These parameters are normally determined from a single image. Autoindexing may be carried out interactively or in batch mode but spacegroup selection can only be done interactively as the user needs to select the cell and spacegroup. To Autoindex interactively, first click on FIND SPOTS. The parameters involved in spot finding (which can be altered) are Threshold, Rmin, Rmax, X Offset, Y Offset, Min X size, Max X size, Min Y size, Max Y size, Min. no. of pixels, X splitting and Y splitting, all of which have sensible defaults. The program searches for spots lying within the radial limits Rmin, Rmax (mm) from the direct beam position. It is necessary to determine the radial background. The direction of the chosen area for this is at right angles to the rotation axis avoiding the backstop shadow. Normally this is centred on the direct beam position but it can be moved by X Offset and Y Offset.
The positions of the found spots are displayed as red crosses. Only those to be used for autoindexing are displayed. MinI/sig(I) (default 20) can be used to change the number of spots. Check that the spots are located ok. Threshold and Max X size, Max Y size can be changed to influence spot finding. It is possible to Edit the found spots e.g. remove some which come from a satellite crystal etc. If autoindexing using more than one image, use READ IMAGE to read in the next image and the do FIND SPOTS. It is also possible to SELECT IMAGES for use in autoindexing.
After the spots have been located satisfactorily select AUTOINDEXING from the Main Menu. The image will then be autoindexed and a filename prompted for the output Orientation Matrix. Most of the questions asked during autoindexing can be defaulted. Autoindexing uses a FFT based algorithm The program will give a list of possible unit cells and space groups which are sorted on the PENALTY of each solution. The user has to choose one. Choose the solution with the highest possible symmetry but with a reasonably low penalty. Once a choice has been made autoindexing is repeated automatically imposing the appropriate cell constraints. The final orientation matrix and cell parameters are written to the NEWMAT file. To test if autoindexing has worked PREDICT the pattern and see if it matches the observed spots. Small shifts in the pattern can be corrected using ADJUST. If there is a large difference repeat the autoindexing using the updated direct beam parameters (updated automatically by using ADJUST).
If the spacegroup and cell have been given (manually input or via a command file) the cell parameters determined by autoindexing will be permuted to best match the input values. However it is still necessary to select the solution from the list. If no spacegroup has been input a list of choices are provided sorted on penalty. The lower the penalty the better. It is possible to check the autoindexing by PREDICTING the spots on the current image and seeing how well they match the observed ones. If autoindexing fails: check the direct beam cordinates are ok, try adjusting the THRESHOLD, MinI/sig(I), the maximum cell length or try reading in another image, finding spots and autoindexing again. Also try to avoid images looking down a principle zone.
Autoindexing is normally done interactively but the commands may be put into a file and this file run by typing, for example, @auto.com at the mosflm prompt.
TITLE Lysozyme test data
IMAGE lys_1_001.img
BEAM 94.60
92.89
SCANNER ADSC
WAVE 0.979
DISTANCE 180
RESOLUTION
MOSAIC (if
known)
SYMM p422 (if known)
CELL 79.388 79.388 37.831 90 90 90 (if
known)
GO
Further options are usually invoked via the menu of the X-windows interface.
Once autoindexing has been carried out it is possible to estimate the mosaic spread. To do this predict the spots with different values of mosaic spread to see what value best fits the observed pattern of spots.
Strategy
Once the crystal orientation has been determined it is possible to work out what rotation range is needed to collect an optimum dataset. The STRATEGY option allows a data collection strategy to be worked out semi automatically. It requires the parameters normally needed to process a set of images, the crystal symmetry, orientation, distance, wavelength, detector type and direct beam position. The rotation range required for a complete dataset is determined from the crystal symmetry and orientation. The phi value PHIZONE which (for orthorhombic or lower symmetry) places an axis in the XZ plane or (for trigonal or higher symmetry) places the unique symmetry axis in the plane normal to the x-ray beam and containing the rotation axis, is determined. A reflection list corresponding to a total rotation of PHITOT starting at phi=PHIZONE is generated. The reflection list is compared to a list of all unique reflections for this spacegroup and the completeness and multiplicity calculated both as a function of rotation and resolution. If the total rotation angle to be collected is specified and the number (up to 4) of discontinuous segments to be used, the program will determine start and end phi values for each segment that will give the highest possible completeness.
To run STRATEGY select the STRATEGY option, input is then given in the Mosflm window, initially at the MOSFLM prompt. If no previous data has been collected enter the following keywords,
MOSFLM STRATEGY
MOSFLM GO
The program will determine PHIZONE and generates a reflection list and a unique reflection list, merges these and then tells you what rotation range to collect to get a maximally complete dataset. It will give the completeness of the data for the rotation angle generated. STATS will give a breakdown as a function of rotation angle and resolution and a breakdown of anomalous data.
If there is not enough time to collect the full rotation range it is possible to determine the best segments to collect to achieve maximum completeness (Try two or three segments).
STRATEGY ROTATE 50 SEGMENTS
2
STRATEGY GO
This will find two 50 degree segments to give maximum completeness.
It is also possible to use STRATEGY to optimise data collection when some data has already been collected on the same or different crystals or to optimise anomalous data collection. To optimise the number of anomalous pairs, rather than completeness of unique data, include the subkeyword ANOMALOUS, e.g.
STRATEGY ROTATE 60 SEGMENTS 2 ANOMALOUS
After running STRATEGY it is also desirable to check the oscillation angle is ok. It is possible to check what the maximum rotation angle is to avoid getting too many overlaps. This is done using TESTGEN.
STRATEGY TESTGEN
This will then describe the possible keywords. E.g. if data collection was in two segments of -15 to 15 and 45 to 75 degrees and you want no overlaps,
TESTGEN START -15 END 15
GO
This will calculate the maximum possible rotation angles for this range at intervals of 5. Then do,
TESTGEN START 45 END 75
GO
for the second segment.
To get back to the main menu type EXIT at the STRATEGY prompt.
Integrating the First Image
Once data collection has begun it is a good idea to integrate the first image to check the exposure is ok. To get an indication of data quality integrate the first image and see how the mean <I>/sigma<I> varies with resolution. Aim to have a ratio of at least 3.0 in the outermost resolution bin. If it is lower than this consider collecting data to lower resolution or using a longer exposure. Select INTEGRATE and answer the questions. Set the centre and radius of the backstop shadow thus,
BACKSTOP CENTRE # # RADIUS #
The predicted pattern will then
be displayed. If the pattern is ok CONTINUE. If the pattern is not
aligned with the spots ADJUST and follow the instructions to align the
pattern with the spots. Now you can integrate the image. Check
<I>/sigma<I> and Rsym if you have fully recorded reflections.
SDRATIO is a good guide and should be ~ 1-3.
Obtaining Accurate Cell Parameters - Postrefinement
Unit cell parameters are refined as part of autoindexing, but not all the parameters will be well defined. Accurate cell parameters are best determined by post refinement, for which it is necessary to have a number (at least two) of abutting images. Orthorhombic and lower symmetries need data from two orientations widely spaced in phi. Higher symmetries need only one block of data. If you have more than 15 deg of data use the REFINE CELL option (or POSTREF SEGMENT if running in the background) to get an accurate cell then don't refine it during integration. It is best to use three or four images in each wedge. Post refinement gives very accurate cell parameters but has a relatively small radius of convergence. Often two or three rounds of integration and refinement are needed.
If you have less than 15 degs of data use the refined cell from autoindexing in the processing and try postrefining using an angular wedge of data. However if this is unstable (large sd's or shifts from cycle to cycle) then fix the cell parameters, as the values from the autoindexing, while they may be in error, will be sufficiently accurate to process a local region in reciprocal space.
For interactive postrefinement, obtain the crystal orientation by autoindexing, choose REFINE CELL and select the number of segments of data to use in refinement, the first image and the number of images to be used in each segment. When using data from two segments widely separated in phi the crystal orientation may have changed sufficiently that the orientation matrix for the first segment of data does not accurately predict the first image of the second segment. It may be necessary to derive a new orientation matrix for this image. Under these circumstances it is best to fix all cell parameters for the autoindexing to those determined for the first segment of data. It is necessary to have a realistic estimate of the mosaic spread before starting postrefinement.
The following commands can be put in the command file to do postrefinement in the background.
#
mosflm << eof > lyspost.log
TITLE postrefining
lysozyme test data
BEAM 94.60 92.89
WAVE 0.979
SCANNER ADSC
DISTANCE
180
SYMM P422
MATRIX lys1.mat
MOSAIC 0.2
RESOLUTION 1.8
IDENT
lys1
DIRECTORY /data/seg/test
POSTREF SEGMENT 3
PROCESS 1 3 [ANGLE 1.0
START 0.0]
RUN
PROCESS 43 45 [ANGLE 1.0 START 42.0]
RUN
MATRIX
lys1b.mat
PROCESS 86 88 [ANGLE 1.0 START 87.0]
RUN
END
eof
#
This will do postrefinement in three segments with a new orientation matrix being used for the last segment. This is necessary for example if the crystal has slipped during data collection. It is recommended that the final cell parameters are used to integrate all the images on the dataset, fixing the cell parameters in the post refinement thus,
POSTREF FIX ALL
Integrating a Block of Images
Integrate a block of between five and ten images. Use the INTEGRATE menu option. Watch for warning messages in case some parameters/options need to be reset. It is also good to check the appearance of standard profiles. Ensure spots are resolved and the peak is not splitting into background. PROFILE TOLERANCE parameters are crucial in determining the appearance of standard profiles. Try to make sure no peaks are being averaged and check that not to many reflections are being rejected as BAD SPOTS. The program will, by default, refine both the cell parameters and the crystal orientation using postrefinement during integration of the images. It is preferable to determine accurate cell parameters prior to integration using REFINE CELL (interactive) or POSTREF SEGMENT (batch). Resulting cell parameters are then input using a CELL keyword and the cell not refined during integration (use POSTREF FIX ALL). This will refine the crystal orientation (and mosaic spread) but not the cell parameters. It is possible to toggle on the update display button in the display window (found below the Processing Parameters) to examine the images after integration.
An example of the commands to process the first ten degrees of data might be:
TITLE Lysozyme test data
MATRIX lys_1.mat
SYMMETRY
p422
MOSAIC 0.3
IDENT *
PROCESS 1 TO 10 [ANGLE 1.0 START
0.0]
DIRECTORY /data1/seg/test
EXTENSION .img
BEAM 94.60
92.89
BACKSTOP CENTRE 94 94 RADIUS 12
DISTANCE 180
WAVELENGTH
0.979
PLOT
RUN
Examine the results of postrefinement and check that the change in missetting angles is gradual. Changes in cell parameters (if refined) shouldn't really occur. If they do consider increasing the number of images used in POSTREFINEMENT. Refined angular residual should be about one tenth of the summed mosaic spread and beam divergence, although this will depend on the strength of reflections included in refinement. If beam parameters are refined check that they are stable.
Integrating the entire Dataset
If a small block of images can be integrated successfully go on
and integrate the entire dataset. Assuming an accurate cell has already been
obtained so no further refinement of cell parameters is required. An example of
a command file integrate.com, used to integrate in the background is
shown below.
#
setenv SUMMARY lys1.sum
setenv HKLOUT
lys1.mat
mosflm << eof > lys1.log
TITLE Lysozyme test
data
WAVE 0.979
SCANNER ADSC
BEAM 94.60 92.89
DISTANCE
180
SYMMETRY p422
MATRIX lys_1_postref.mat
MOSAIC 0.2
IDENT
lys_1
DIRECTORY /data1/seg/test
POSTREF FIX ALL
PROCESS 1 TO 90 [START
0 ANGLE 1.0]
ADDPART
GO
RUN
END
eof
#
During data processing fully recorded reflections over several images are added together to give a well defined standard profile. The number of images is determined by the program (#5-10) but can be set by the BLOCK subkeyword in the PROCESS line. It is optional to add partials from adjacent images using the ADD PART keyword. The program automatically determines the best measurement box parameters. It determines the spot size from spots in the centre of the image (parameters are set for this search by keyword SPOT). This information is used to set the initial sizes for overall dimensions NXS, NYS and the corner and rims parameters NC, NRX, NRY (all in pixels). Following detector parameter refinement using spots from the centre of the first image, the program will then optimise the rim and corner parameters.
Program Output
The most useful output is probably the breakdown of I/sig(I) as a function of resolution. This will give an immediate idea of the quality of the data, particularly at the high resolution end. A mean I/sig(I) of 3.0 will give an R-merge of #20-30%. If there are symmetry related fully recorded (or summed partial) reflections on a single image, statistics are also provided on the agreement between their intensities. Check:
(1) Standard profiles look OK (peak within peak region)
(2)
Weighted residual is #1.0
(3) For warning messages (end of logfile and in
summary file)
The initial part of the LOGFILE gives information on the parameters used in processing, along with keywords to change these parameters. It is important to read this bit. Also check the standard profiles to ensure the background mask optimisation has worked ok (especially if you have a high degree of diffuse scatter or spots close together).
A graphical representation of the SUMMARY files information can be obtained by running xloggraph. The summary file contains information on the following:
IMAGE - image number.
CCX, CCY, CCOM - camera constants
(mm/degrees). These should be constant throughout data collection.
DIST -
refined distance.
YSCALE - Relative scale factor in scanner Y direction, this
should be close to 1.00.
TILT, TWIST - Deviations from normal incidence on
the detector. Should be close to zero (less than 20) and constant.
ROFF, TOFF
- not used for CCD detector so should be 0.00.
RESID - rms positional
residual after refinement of detector parameters. Strong images should be 0.02 -
0.04. Can be higher for weak images or if partials are included in refinement.
Residual of 0.15 is only one pixel. Residuals larger than 0.04 for a strong
image are not good, it is likely that there is an error in the cell parameters
which should be refined using the POSTREFINEMENT options.
WRESID - Weighted
residual which should be close to unity. Larger values suggest errors in cell
parameters or crystal orientation.
FULL, PART, OVER, NEG - Number of fully
recorded, partially recorded and overloaded reflections measured on an image.
NEG - number of reflections with negative (summation integration)
intensity.
BAD - Number of badspots. Reflections are classed as badspots if
they fail any of five criteria:
(1) BGRATIO >3
(2) PKRATIO >
3.5
(3) Intensity negative or greater than 5 sigma
(4) Background too
large
(5) Too few background pixels left after rejecting
outliers.
I/SIG(I) - Two columns (1) average I/SIG(I) for the whole dataset
(2) I/SIG(I) in the outermost resolution bin.
RSYM - R factor on intensities
for symmetry related fully recorded (or summed partial) reflections on the same
image.
NSYM - Number of reflections (not number of observations) included in
Rsym.
SDRAT - Ratio of observed agreement between symmetry related reflection
intensities to their estimated standard deviations. Should be 1.4 if there are
two measurements for each reflection or close to unity if there are four or
more. More useful than Rsym as it doesn't depend on intensity.
Indicators of data quality I/SIG(I) and Rsym values as a function of resolution. However, you get better indications by looking at the results of merging measurements of symmetry related reflections using SCALA. Look at the standard deviation analysis at the end of SCALA. If SIGM is 1 you can't do any better. Inevitably there are errors which are not accounted for in the estimated standard deviation. Thus its quite normal to have to boost the standard deviations by 20 - 30% (i.e. SDFAC of 1.2 - 1.3) to get a SIGM value of 1. The presence of a few outliers can destroy the SIGM analysis so look at the monitored reflections for evidence of them. If the crystal has high mosaic spread pay attention to the partial bias analysis. If it's more than 1 - 3% the mosaic spread is probably wrong.