Resources - Molecular Forecaster

Parameters

Preface

It is highly recommended to use the user interface of FORECASTER and VIRTUAL CHEMIST to generate the parameter files. This will ensure the necessary parameters are set.

Although the value of many parameters may be altered, default values should be used unless a specific system requires different settings. These parameters are essentially used by the developers for optimization and evaluation of the program. In general, modification of a specific value may not significantly improve or affect the accuracy but may result in longer docking runs.

Parameters common to many programs can be found in :

FORECASTER and VIRTUAL CHEMIST

Generic parameters common to most programs.

Main_Mode value

This parameter instructs FORECASTER to select a program. For example, with PREPARE, use the prepare_protein. The description of each program described below will start with this keyword.

Run_Mode value

This parameter instructs FORECASTER to select a mode for this program. The description of the available modes for each program are described below if more than one.

Forcefield forcefield_file.txt

Following this keyword, provide the name of the force field file to use (if a forcefield other than fitted_ff.txt is to be used). The format of this force field should be consistent with the required format for all Forecaster programs.

Default value is fitted_ff.txt if this keyword is not provided.

Generic parameters common to many programs dealing with macromolecules

Macromolecule [ protein | metalloprotein | RNA | DNA | metalloprotein_hb ]

This keywords defines the nature of the macromolecule with all programs dealing with macromolecules (e.g., PREPARE, PROCESS, FITTED).

- Metalloprotein: special parameters have been developed for zinc metalloenzymes, heme-containing enzymes and magnesium-containing enzymes.
- Metalloprotein_HB: this mode describes the metal coordination as a hydrogen bond in FITTED. Not as accurate as Metalloprotein, it is to be used for debugging/testing purposes.

Generic parameters common to many programs dealing with small molecules

Molecule molfilename

Provide the name for the molecule file (can contain more than one molecule). Supported formats are mol2 and sdf.

Input_Format [ mol2 | sd | sdf | fitted | amber ]

This parameter provide the format of the small molecule(s) which the program will process. If this parameter is not given, it will automatically assign the format from the extension. All programs can read sdf and mol2 format.

This parameter provide the format of the molecule(s0) which the program will output. If this parameter is not given, SMART will automatically assign the format based on the parameters. All programs can write mol2 format, most can write sdf format.

Assign_Bond_Order [ Y | N ]

This parameter instructs to check and clean the provided bond order. Default is N.

Multi_Molecule_File [ Y | N ]

This parameter enables (or disables) the output of multiple molecules into a single file. Default is Y.

Programs

MATCH-UP

Generic parameters

Main_Mode prepare_protein

This parameter instructs FORECASTER to select a program. For MATCH-UP, use “prepare_protein”.

Run_Mode [ superpose | make_similar | alignement ]

Following the keyword, specify the run mode which should be applied.

- superpose: superpose the protein structures. This mode has been designed to superpose protein structures from sequence alignment. As a result, it provides accurate superposition only for protein structures with high sequence similarity (ie, mutants or same protein structures co-crystallized with different ligands).
- make_similar: starts by superposing structures (see superpose mode) then ensures that the structures are identical throughout (other than conformation). For example, it will mutate residues that are different between structures and it will add water molecules to have the same number in all structures.
- alignment: computes a sequence alignment.

Protein <# proteins>
protein_file1.pdb chainID
protein_file2.pdb chainID
…

Following this keyword, specify the number of protein files to be processed.

On subsequent lines, the protein filenames, pdb files only (1 line per file).

Next to the pdb filename, the chain ID is necessary. Recommended: All. It can be any combination as desired, such as A, AB, ABC,
AD, or All (for everything ).

Complete_Superposition [ Y | N ]

This parameter instructs MATCHUP to superpose each protein structure to all the other ones if there are more than 2.

Keep_All_Chains [ Y | N ]

This parameter instructs MATCHUP print out all the chains after superposition even if they were not considered when superposing.

Ligand_Include number-of-ligands
residue-name chain residue-number
residue-name chain residue-number
…

Following this keyword, provide the number of ligand residues to be considered (a ligand can be made up of more than one residue).
Only one ligand per protein is allowed.

On subsequent lines, the residue name, chain and numbers are specified one per line as it appears in pdb (ex: TMC B 500). The superposition will focus on the residues round the ligand (ie binding site).

Ligand_Cutoff ligand cutoff

Protein residues within this cutoff (in Å) from any of the ligands are considered part of the binding site. Default is 6 Å.

MUTATE

Generic parameters

Main_Mode prepare_protein

This parameter instructs FORECASTER to select a program. For MUTATE, use “prepare_protein”.

Run_Mode make_mol2

Following the keyword, specify the run mode which should be “make_mol2”.

Protein number-of-proteins
protein_file1.pdb
protein_file2.pdb
…

Following this keyword, specify the number of protein files to be processed.

On subsequent lines, the protein filenames, pdb files only (1 line per file).

To reduce any residue name conflicts between protein structures, it is recommended to run one at a time.

Ligand_Include number-of-ligands
residue-name chain number
residue-name chain number
…

Following this keyword, provide the number of ligand residues to be considered (a ligand can be made up of more than one residue). Only one ligand per protein is allowed.

On subsequent lines, the residue name, chain and numbers are specified one per line as it appears in pdb (ex: TMC B 500).

Model value

If the PDB file is a set of NMR structure, this parameter will instruct MUTATE to use a specific one.

Mutation number of residues to mutate
Residue names

Following the keyword, specify the number of residues to mutate. On the following lines list the residue names as illustrated below.

Mutation 3
ALA A 272
TRP A 311
SER A 313

MutatedAANum number of mutations per residue to mutate
Residue types

Following the keyword, specify the number of mutations. In the example below, indicates that the first residue listed will be mutated into 2 (Phe and Tyr) possible residues, the second one into 3 (Arg Set and Thr) and the last into 1 (Cys).

MutatedAANum 2 3 1
PHE TYR
ARG SER THR
CYS

StatsLibrary [ BioCatalysisStats | DFGLibrary | filename ]

Provide the generic file names for the statistics file (conformational libraries of side-chain rotamers). Default is “BioCatalysisStats”.

StatsResolution [ 120 | 60 ]

The BioCatalysisStats and DFGLibrary are provided in the Forecaster package with two levels of resolution (120 and 60). If you use
one of these two libraries and use the StatsLibrary keyword, provide the resolution using this keyword.

Optimize [ Y | N ]

Optimization of tautomers and water molecules can be performed (Y) or not (N). Default is Y.

Sequence_Alignement file name

MUTATE will identify the mutation to carry out from a sequence alignment (output file from MATCHUP).

Optimize_With_Ligand [ Y | N ]

Optimization of tautomers and water molecules can be performed considering (Y) the ligand or not (N). Default is Y.

Iterations number of iterations

Number of optimization iterations. Default is 10.

Protonate atom to protonate

This keyword is used to define atoms to be protonated by the program. If PREPARE does not assign the correct protonation state, the user can override PREPARE to force a given protonation state using this keyword.

Following this keyword, specify the residue name, chain, number and atom name as it appears in the pdb file. (ex: HIS A 58 NE2).

Deprotonate atom to deprotonate

This keyword is used to define atoms to be deprotonated by the program. If PREPARE does not assign the correct protonation state, the user can override PREPARE to force a given protonation state using this keyword. Default is Y.

Following this keyword, specify the residue name, chain, number and atom name as it appears in the pdb file. (ex: HIS A 58 NE2).

Hybridization atom do hybridize new hybridization

This keyword is used to define atoms to change the hybridization by the program. If PREPARE does not assign the correct hybridization state, the user can override PREPARE to force a given hybridization state using this keyword.

Following this keyword, specify the residue name, chain, number and atom name as it appears in the pdb file and its hybridization.
(ex: HIS A 58 NE2 sp2).

Protein_Reference_Structure 1
file.pdb

Following this keyword is the number of reference protein structure files used to compute the protein RMSD (deviation of the modeled mutant structure from the wild type structures). On the following lines is the protein file name.

NMA value

This parameter instructs MUTATE to apply the normal mode analysis (NMA) method. The value is the number of modes to compute.

NMA_springs value

This parameter provides MUTATE with the spring force constant when the NMA method is used. 0 indicates that the Hinsen equation will be used. The value is the number of modes to compute. Default is 0.

NMA_cutoff value

This parameter instructs MUTATE to add springs only between residues within this distance. Default is 10 Å.

NMA_stepsize value

This parameter instructs MUTATE to apply step size of this value. Default is 3 Å.

NMA_stepnbr value

This parameter instructs MUTATE to apply this number of steps. Default is 1.

NMA_mode [ Forward | TwoSided | Multistep | Combo | Cloud ]

This parameter instructs MUTATE of the method to use when moving atoms. apply this number of steps.

- TwoSided (default): .
- Multistep: .
- Combo: .
- Cloud: .

PREPARE

Generic parameters

Main_Mode prepare_protein

This parameter instructs FORECASTER to select a program. For PREPARE, use “prepare_protein”.

Following the keyword, specify the run mode which can be:

- “make_mol2”: converts a PDB file to a mol2 file.
- “make_mol2_flexible”: converts a PDB file to a mol2 file and generate multiple conformations of flexible binding site residues.
- “nma”: reconstructs a protein structure from a complete PDB file and a backbone.
- “mutation”: superpose two mutants.
- “stats”: collects statistics on the protein side chains.
- “distribution”: analyse statistics regarging adjacent and interacting residues.
- “distribution2”: This method will read in from a previous run with “stats” in order to calculate statistics considering backbone conformation (alpha helix, beta sheet or loop) and identify major conformations.
- “distribution3”: This method will read the stats and compare to a dataset of conformations.

Protein number-of-protein-files
protein_file1.pdb chainID
protein_file2.pdb chainID
…

Following this keyword, specify the number of protein files to be processed.

On subsequent lines, the protein filenames, pdb files only (1 line per file).

Next to the pdb filename, the chain ID is optional. Default is All. It can be any combination as desired, such as A, AB, ABC, AD, or All (for everything ). In NMA mode, the first file is the complete PDB (template) and the following ones are the backbones used as targets.

Ligand_Include number-of-ligands
residue-name chain number
residue-name chain number
…

Following this keyword, provide the number of ligand residues to be considered (a ligand can be made up of more than one residue). Only one ligand per protein is allowed.

On subsequent lines, the residue name, chain and numbers are specified one per line as it appears in pdb (ex: TMC B 500).

Ligand_Exclude number-of-ligands
residue-name chain number
residue-name chain number
…

Following this keyword, provide the number of residues which should not be excluded from the protein.

On subsequent lines, the residue name, chain and numbers are specified one per line as it appears in pdb (ex: TMC B 500).

Protein_Include number-of-protein-residues
protein-name-1 chain number
protein-name-2 chain number
…

Residue to be included in the protein mol2 file.

On the same line following this keyword, specify the number of protein residues.

On subsequent lines, the residue name, chain and numbers are specified one per line as it appears in pdb (ex: PTR A 201).

Can be used for protein residues that are not recognized automatically by the program as natural amino-acid residues.

Model value

If the PDB file is a set of NMR structure, this parameter will instruct MUTATE to use a specific one.

Parameters for pdb to mol2 conversion

Mutate residue-name chain number new-type

Following this keyword, provide the name of a residue to be mutated and the type of aminoacid it should be mutated into. (ex: PHE A 352 TYR).

Delete residue-name chain number

Following this keyword, provide the name of a residue to be deleted. (ex: PHE A 352).

Mode [ fitted | normal ]

Mode of execution. In the fitted mode, only a maximum of 20 water molecules within 5 Å of the ligand are conserved in the protein mol2 file. In the normal mode, no water molecule deletion is performed.

Optimize [ Y | N ]

Optimization of tautomers and water molecules can be performed (Y) or not (N). Default is Y.

Optimize_With_Ligand [ Y | N ]

Optimization of tautomers and water molecules can be performed considering (Y) the ligand or not (N). Default is Y.

Iterations number-of-iterations

Number of optimization iterations. Default is 10.

Protonate atom-to-protonate

This keyword is used to define atoms to be protonated by the program. If PREPARE does not assign the correct protonation state, the user can override PREPARE to force a given protonation state using this keyword.

Following this keyword, specify the residue name, chain, number and atom name as it appears in the pdb file. (ex: HIS A 58 NE2).

Deprotonate atom-to-deprotonate

This keyword is used to define atoms to be deprotonated by the program. If PREPARE does not assign the correct protonation state, the user can override PREPARE to force a given protonation state using this keyword. Default is Y.

Following this keyword, specify the residue name, chain, number and atom name as it appears in the pdb file. (ex: HIS A 58 NE2).

Hybridization atom-to-hybridize new hybridization

This keyword is used to define atoms to change the hybridization by the program. If PREPARE does not assign the correct hybridization state, the user can override PREPARE to force a given hybridization state using this keyword.

Following this keyword, specify the residue name, chain, number and atom name as it appears in the pdb file and its hybridization.
(ex: HIS A 58 NE2, sp2).

Flexibility [ Y | N ]

New conformations will be generated for flexible side-chains within the active site. Only one pdb file should be used with the Protein keyword. Default is No.

Following this keyword, other keywords may be used (otherwise default values will be used): Max_Num_of_Flex, Num_Of_Confs and Num_Of_Mutations.

Flexibility_Mode [ Quick | Full ]

When the new conformations will be generated for flexible side-chains, they could be optimized (Full) or not (Quick). Default is Full.

Optimize_Adjacent [ Y | N ]

When the new conformations will be generated for flexible side-chains, the adjacent residues may be optimized (Y) or not (N). Default is N.

Max_Num_of_Flex number

Maximum number of flexible side-chains to be considered when the Flexibility parameters is used. (Not all will be considered simultaneously (see keyword below). Default is 5.

Num_Of_Flex number

The number of simultaneously flexible side chain to be produced per protein conformation when the Flexibility parameters is used. Default is 2.

Num_Of_Confs number

The number of new protein conformations to be generated when the Flexibility keyword is used. Default is 5.

Random_Conformations [ Y | N ]

The new protein side chain conformations will be randomly selected from the library of conformers (Y) or selected based on probability (N). Default is N.

Particle_Water [ Y | N ] number-of-particles

Adds water molecules as single particle water to the pdb. On the same line, Yes should be followed by the number of particle waters
to be added (10 is suggested). Default is No.

Water_Keep_Distance number

Distance between each water molecule and water both ligand and protein to keep it.

Keep_All_Water [ Y | N ]

If yes, all the water molecules are kept.

Remove_Chains [ Y | N ]

If specific chains are specified with the Protein parameter, the ones not listed will be removed in the protein structure generated by PREPARE.

Oxidize_Heme [ Y | N ]

This keyword instructs PROCESS to add an oxygen to the heme group. This is needed if this structure is to be used to predict site of metabolism using IMPACTS. Defautlt is N.

Add_Atom atom_to-be-added

This parameter instructs PREPARE to add an atom (residue name chain name residue number atom name new atom type (eg. Cl, F)). This is useful when self-docking of covalently bound molecules is carried out. For example, upon covalent bond formation an atom may be displaced (SN2 with alkyl bromide) and the leaving group is no longer present in the structure. If one wants to redock it back, the leaving group should be reinserted. For now only N, SMe, O, C, N, F, Cl and Br are possible.

Add_Bond bond

This parameter instructs PREPARE to add a bond (residue_name chain_name residue_number atom_name_#1 atom_name_#2). In some crystal structures, small molecules are distorted and PREPARE may not see some of these bonds.

Add_Multi_Atoms number-of-atoms
atom1-to-be-added
atom2-to-be-added
…

This parameter instructs PREPARE to add multiple atoms (residue-name chain-name residue-number atom-name new-atom-type (eg. C, O)). This is useful when self-docking of covalently bound molecules is carried out. For example, upon covalent bond formation an atom may be displaced (SN2 with alkyl bromide) and the leaving group is no longer present in the structure.

If one wants to redock it back,the leaving group should be reinserted. For now only N, S, O, C, N, F, Cl and Br are possible. As an example, for 4ONN, we should add a tosyl group on C12.

Add_Multi_Atoms 11

BY1 B 201 C12 S1 1 sp3 – This adds the first atom (S1) with a single bond and an sp3 hybridization (although we understand this is not sp3, sp3 refers to tetrahedral)

BY1 B 201 S1 O2 2 sp2 – This adds a first oxygen on the S just added above with a double bond

BY1 B 201 S1 O3 2 sp2 – This adds a second oxygen

BY1 B 201 S1 C4 1 sp3 – This adds the first carbon of the aromatic ring

BY1 B 201 C4 C5 1 sp2 – This adds the second carbon of the aromatic ring

BY1 B 201 C5 C6 2 sp2 – This adds the third carbon of the aromatic ring

BY1 B 201 C6 C7 1 sp2 – This adds the fourth carbon of the aromatic ring

BY1 B 201 C7 C8 2 sp2 – This adds the fifth carbon of the aromatic ring

BY1 B 201 C8 C9 1 sp2 – This adds the sixth carbon of the aromatic ring

BY1 B 201 C9 C4 2c sp2 – As C4 has already been identified and 2c is given as the bond, this only closes the ring

BY1 B 201 C7 C10 1 sp3 – This adds the methyl carbon on the aromatic ring

Add_Multi_Bonds number-of-bonds
bond1-to-be-added
bond2-to-be-added
…

This parameter instructs PREPARE to add multiple bonds (residue-name_#1 residue-chain_#1 residue-number_#1 atom-name+#1 residue-name_#2 residue-chain_#1 residue-number_#1 atom-name+#2 (eg, SER A 203 OG PHO A 203 P1).

Split_Residue number-of-residues
residue1-to-be-split
residue2-to-be-split
…

This parameter instructs PREPARE to split a residue in several different residues. For example the syntax:
Split_Residue 1
7 SGB A 203 SER PHO P1 C1 O1 O2 C2 C3 C4
would split one residue up in two components. 7 atoms (P1, C1, O1, O2, C2, C3, C4) from this residue (SGB A 203) would be split into SER A 203 and PHO A 203.

Parameters for statistics collection and analysis

Chain value

Statistics may be restricted to a single chain. Default is “all”.

Max_Bfactor value

Statistics may be restricted to a residue with a low B-factor. The upper limit can be set here. Default is 50.

Resolution value

Residue with dihedral angle different by less than this value are considered equivalent. Default is 0.

Data_File value

This parameters provide the name of the file in which conformations are predefined (for Run_Mode distribution3).

Parameters for protein reconstruction (NMA mode)

Mutate residue-name chain number new-type

Following this keyword, provide the name of a residue to be mutated and the type of aminoacid it should be mutated into. (ex: PHE A 352 TYR).

Delete residue-name chain number

Following this keyword, provide the name of a residue to be deleted. (ex: PHE A 352).

SPLASH'EM

Generic Parameters

Main_Mode prepare_protein

This parameter instructs FORECASTER to select a program. For SPLASH’EM, use “prepare_protein”.

Run_Mode make_mol2

Following the keyword, specify the run mode which can be:

“make_mol2”: converts a PDB file to a mol2 file.

Protein number-of-nucleic-acid-files
nucleic-acid-file1.pdb chainID
nucleic-acid-file2.pdb chainID
…

Following this keyword, specify the number of nucleic acid files to be processed.

On subsequent lines, the file names, pdb files only (1 line per file).

Next to the pdb filename, the chain ID is optional. Default is All. It can be any combination as desired, such as A, AB, ABC,
AD, or All (for everything ). In NMA mode, the first file is the complete PDB (template) and the following ones are the backbones used as targets.

Ligand_Include number-of-ligands
residue-name chain number
residue-name chain number
…

Following this keyword, provide the number of ligand residues to be considered (a ligand can be made up of more than one residue). Only one ligand per protein is allowed.

On subsequent lines, the residue name, chain and numbers are specified one per line as it appears in pdb (ex: TMC B 500).

Ligand_Exclude number-of-ligands
residue-name chain number
residue-name chain number
…

Following this keyword, provide the number of residues which should not be excluded from the protein.

On subsequent lines, the residue name, chain and numbers are specified one per line as it appears in pdb (ex: TMC B 500).

Protein_Include number-of-protein-residues
nucleic-acid-name-1 chain number
nucleic-acid-name-2 chain number
…

Residue to be included in the nucleic acid mol2 file.

On the same line following this keyword, specify the number of nucleic acid residues.

On subsequent lines, the residue name, chain and numbers are specified one per line as it appears in pdb (ex: PTR A 201).

Can be used for nucleic acid residues that are not recognized automatically by the program as natural nucleotides.

Model value

If the PDB file is a set of NMR structure, this parameter will instruct SPLASH’EM to use a specific one.

Parameters for pdb to mol2 conversion

Mode [ fitted | normal ]

Mode of execution. In the fitted mode, only a maximum of 20 water molecules within 5 Å of the ligand are conserved in the
protein mol2 file. In the normal mode, no water molecule deletion is performed.

Optimize [ Y | N ]

Optimization of tautomers and water molecules can be performed (Y) or not (N). Default is Y.

Optimize_With_Ligand [ Y | N ]

Optimization of tautomers and water molecules can be performed considering (Y) the ligand or not (N). Default is Y.

Iterations number-of-iterations

Number of optimization iterations. Default is 10.

Protonate atom-to-protonate

This keyword is used to define atoms to be protonated by the program. If SPLASHEM does not assign the correct protonation state, the user can override SPLASHEM to force a given protonation state using this keyword.

Following this keyword, specify the residue name, chain, number and atom name as it appears in the pdb file. (ex: HIS A 58 NE2).

Deprotonate atom-to-deprotonate

This keyword is used to define atoms to be deprotonated by the program. If SPLASHEM does not assign the correct protonation state, the user can override SPLASHEM to force a given protonation state using this keyword. Default is Y.

Following this keyword, specify the residue name, chain, number and atom name as it appears in the pdb file. (ex: HIS A 58 NE2).

Hybridization atom-to-hybridize new hybridization

This keyword is used to define atoms to change the hybridization by the program. If SPLASHEM does not assign the correct hybridization state, the user can override SPLASHEM to force a given hybridization state using this keyword.

Following this keyword, specify the residue name, chain, number and atom name as it appears in the pdb file and its hybridization.
(ex: HIS A 58 NE2, sp2).

Particle_Water [ Y | N ] number-of-particles

Adds water molecules as single particle water to the pdb. On the same line, Yes should be followed by the number of particle waters
to be added (10 is suggested). Default is No.

Water_Keep_Distance number

Distance between each water molecule and water both ligand and protein to keep it.

Keep_All_Water [ Y | N ]

If yes, all the water molecules are kept.

Remove_Chains [ Y | N ]

If specific chains are specified with the Protein parameter, the ones not listed will be removed in the protein structure generated by SPLASHEM.

PROCESS

Generic parameters

Main_Mode process

This parameter instructs FORECASTER to select a program. For PROCESS, use “process”.

Run_Mode process

Following the keyword, specify the run mode which can only be “process”.

Input and Output

Protein number-of-protein-structures
protein_file_1.mol2
protein_file_1.mol2
…

Following this keyword, specify the number of protein files to be processed.

On subsequent lines, the protein filenames, mol2 files only (1 line per file).

Binding_Site_Cav binding-site-file-name

Name of the file where to output the binding site cavity. If this keyword is not present ProCESS will not create a binding site cavity file.

Interaction_Sites interaction-site-filename

Name of the file where to output the interaction sites definitions. If this keyword is not present ProCESS will not create an interaction sites definition file.

Copy_To_My_Proteins [ Y | N ]

This instructs the program to save a copy of the generated files in the my_proteins folder.

Constraint_With_Residues value
residue-#1
residue-#2
…

This instructs the program to create a file containing gemoetrical constraints for use in docking with FITTED. The residue name should be: group name atom name or “side-chain” or “backbone” (ex.: GLU122 side-chain or ARG643 HH12). In some cases, proteins may be dimers (or other polymers) and may have two residues with the exact same name (one per chain). In this case, you may want to check the generated constraint file and to edit it to ensure than only the ones in the binding site were printed out.

Cleaning protein structures

Assign_Group_Name [ Y | N ]

This keyword instructs PROCESS to clean the name of the residue to consider protonation (HISE for HIS with hydrogen on epsilon N). Defautlt is N.

Assign_Hydrogen_Name[ Y | N ]

This keyword instructs PROCESS to clean the name of the hydrogen atoms if they do not follow the PDB convention. Defautlt is N.

Cofactor value
cofactor-name-#1
cofactor-name-#2
…

This keyword instructs PROCESS that some residues may be co-factors. The name should simply be the 3 letter code residue name.

Defining the binding site residues

AutoFind_Site [ Y | N ]

This parameter intructs PROCESS to find the binsing site using the provided ligand. Default is Y. If N, you may want to use the parameter “Binding_Site” below

Binding_Site number-of-residues
residue-#1_name
residue-#2-name
…

This keyword can be used to manually defines the active site. (The active site can be automatically defined by providing a ligand, see above). On the same line following this keyword, specify the number of flexible residues. This list should be as exhaustive as possible to avoid missing any important residue defining the active site. On subsequent lines, the residue name/numbers (according to Find_Residues) are specified, one per line.

Find_Residues [ Name | Number ]

If “Binding_Site” is used, this keyword define how ProCESS will identify the residues that make up the binding site. Name (default): Search residues by group name. Number: Search residues by group number

Ligand number-of-ligands
ligand-#1.mol2
ligand-#2.mol2
…

Ligand file(s) (in MOL2 format) used to define the active site and its center. It should be in the same frame as the protein.

Ligand_Cutoff ligand-cutoff

Protein residues within this cutoff (in Å) from any of the ligands are considered part of the binding site. Default is 6 Å.

Truncate [ Y | N | auto ]

Determine if the protein will be truncated, keeping only residues within a given distance (see Cutoff_Truncate) of the binding site residues. Default is auto. If so, the protein structure will be truncated keeping residues within cutoff distance of the ligand and not within cutoff distance from the binding site residues.

Cutoff_Truncate cutoff

Any residue that does not have an atom within this distance (in Å) from an atom of a flexible residue or of the given ligand will be deleted from the protein file that ProCESS will output. Default is 9.

Defining the binding site cavity

Grid_Center grid-center

Specifically defines the center of the binding site (Cartesian coordinates). The default is to automatically find it using the center of a ligand.

Grid_Size x y z

Specifies the size of the box for the binding site. Default is 15 15 15.

Grid_Boundary [ Soft | Hard ]

Soft (default): When converting from the grid to spheres, the boundary of the box will be ignored (defined by Grid_Size) and spheres can include volume outside of the box. Hard: The active site cavity file will be constrained within the box defined by Grid_Size.

Grid_Resolution grid-resolution

Following this keyword is the resolution (Å) of the grid. Default is 1.5.

Grid_Sphere_Size grid-sphere-size

Specifies the size (Å) of a sphere used to trim the sides of the box to make it rounder. Default is 15.

Grid_Clash grid-clash-distance

If a protein atom is within this distance (Å) of a grid point, the point is removed from the grid. Default is 1.5.

Defining the interaction sites

XXX_Weight weight

This group of keywords (xxx being Hydrophobic, Metal, HBA or HBD) specifies the parameters for the assignment of pharmacophoric points. xxx_Weight is used to give weight for favourable xxx-type interactions. Defaults parameters are highly recommended.

Hydrophobic_Weight hydrophobic-site-weight

Defines the weight for hydrophobic interaction points. Default is 1.

Metal_Weight metal-site-weight

Defines the weight for metal interaction points. Default is 50.

HBA_Weight HBA-site-weight

Defines the weight for hydrogen bond acceptor interaction points. Default is 5.

HBD_Weight HBD-site-weight

Defines the weight for hydrogen bond donor interaction points. Default is 5.

Pharm_Polar_Softness polar-site-softness

If too many points are found, one can reduce this number by using this keyword which defines the maximum distance (in Å) between
two polar points to be merged. Default is 0.

Pharm_Nonpolar_Softness nonpolar-site-softness

If too many points are found, one can reduce this number by using this keyword which defines the maximum distance (in Å) between
two non-polar points to be merged. Default is 0.

Hydrophobic_Level level

van der Waals interaction between a probe on the grid point with hydrophobic carbons to be considered hydrophobic. If the interaction is found lower than hydro_level, an hydrophobic point is added at this location. Default is -0.3.

Hydrophobic_Resolution resolution

Resolution of the grid used to compute van der Waals interaction with a probe to identify hydrophobic interaction sites. Default is 0.5.

Min_Weight minimum-weight

Minimum weight for a pharmacophoric point to be included in the final pharmacophore. Default is 0.5.

Num_of_IS number-of-interaction-sites

This determines the maximum number of interaction site (IS) beads in the interaction sites file. Default is 75.

RemoveSolventExposed [ Y | N ]

This parameter instructs PROCESS to scale the interaction site weights according to their solvent exposure. Default is Y.

Print_Protein_SDF [ Y | N ]

If this keyword is set to yes, the binding site created by ProCESS will be printed out in sdf format. This routine keeps all atoms within 5A of the ligand + all atoms within 4 bonds of these atoms. Can only be used if a putative ligand is co-crystallized with the crystal structure.

Ligand_Influence_On_IS [ Y | N ]

This keyword controls the weight that the ligand has on the interaction site beads created by ProCESS. The higher the value, the more influence the ligand will have. If the value is set to 0, the interaction sites will be created solely using the protein. Default = 10.0.

SMART

Generic parameters

Main_Mode smart

This parameter instructs FORECASTER to select a program. For SMART, use “smart”.

Run_Mode [ smart | keep2D ]

Following the keyword, specify the run mode which can be “smart” or “keep2D” for sdf formatted files.

This parameter instructs SMART to write the file in selected format.

- fitted: SMART prepares molecules ready for docking with FITTED.
- metabolism: SMART prepares molecules ready for docking with IMPACTS.
- reduce: SMART prepares molecules and compute descriptors ready for filtering with REDUCE.
- ace: SMART prepares molecules (transition state structures) ready for ACE.
- compute_properties: SMART computes descriptors and writes a table of values.
- extract_fragments: Mode that compares extracted scaffolds from molecules. Similar to assign_chemotypes but outputting the fragments in the working directory with the name computed_fragments.mol2.
- gamess: SMART prepares a file ready for use with GAMESS.
- write_pains: SMART may add PAINS to the list of precomputed ones (use sdf 2D with Run_Mode keep2D).
- search_pains: SMART will identify PAINS in molecules.
- conformer_generation: Mode where one can generate conformers of an input molecule or library of molecules. Requires the input to be in 3D (mol2 format).
- admet_profiling: Mode where a molecule or library of molecules is analyzed from an ADMET point of view (Lipinski Ro5, Veber Rules, sites of metabolism, key physico-chemical descriptors)
- assign_chemotypes: SMART extracts scaffolds and identifies molecules with similar ones.

Split_File number-of-molecules

This parameter enables SMART to output molecules into a multiple files with the maximum number of molecules per file defined by this parameter. Default is a single file.

Advanced parameters

Charging_Scheme [ DGH | DGH+ | OK | NM | MMFF | input | none ]
This parameter provides the method used to add atomic charges.

- DGH (Das Gupta-Hazinaga scheme),
- OK (Ohno-Klopman) and NM (Nishimoto-Magata) are based on the electronegativity equalization method by Rappe.
- “input” would force SMART to keep the charges (if any) already in the input file.
- MMFF will used MMFF94 method.

SASA_Correction_Factor value

This parameter provides the weight of the solvent accessible surface area correcting factor applied to the activation energies assigned to potential site of metabolism (in metabolism mode). Default: 0.1

Nr_En_Act value

This parameter instructs SMART to consider only the top N most reactive sites of metabolism (in metabolism mode). Default: all

Print_CSV [ Y | N ]

This parameter instructs SMART to print the descriptors in CSV format when using the “compute_properties” mode (see above). Default: all

Print_Hydrogens [ Yes | No | Only_Polar ]
In sdf 2D format, SMART can be print the hydrogens (default) or not (“No”) or only polar hydrogens (Only_Polar). Default: Yes

QUEMIST in SMART

QM_Method [ hf | dft ]

Following the keyword, specify the method to be used. DFT works only on Linux. For QM to be carried out in SMART, the mode (SmallMolecule-Mode) must be set to metabolism.

QUEMIST parameters values

For a complete list of parameters which can be used when computing QM calculations in SMART, see the list of QUEMIST parameters.

Compute_Fukui_Coeffs [ yes | no ]
This option allows the computation of several molecular and atomic descriptors within the conceptual DFT/HF framework.

GAMESS mode

The following set of keywords should be following GAMESS-US syntax. Only the groups are listed below but not all the keywords. For example $CONTRL group includes the SCFTYP keyword (self-consistent field wave function) which can be assigned RHF or UHF for example. In this case, you should use $GROUP SCFTYP=UHF $END. This is by no means an exhaustive list (see here) but rather a routine to convert a mol2 or sdf formatted structure into a Z-matrix GAMESS-US formatted file. GAMESS-US is a quantum mechanics program developed at Iowa State U.

$BASIS value

$CONTRL value

$ZMAT value

$SYSTEM value

$STATP value

$SCF value

$PCM value

Fragmenting molecules

Max_Atoms_For_Fragmenting value

Above this value, the molecule will be fragmented and properties computed on fragments if QUEMIST is used. This keyword is also used in compare_fragments mode to identify whether the molecule is large enough to include a scaffold.

BondsFromFragment value

This keyword instructs the program to keep atoms up to this number of bonds away from the scaffolds.

FragmentsWithRings [ Y | N ]

This keyword instructs the program to keep only scaffolds with a ring or not.

FragmentsScaffold [ Y | N ]

If this parameter is set to no, atoms “BondsFromFragment” bonds away may be part of a rigid fragment (ie, ring). This fragment will be rebuilt. This is necessary to complete the valence if QUEMIST is to be used. In compare_fragments mode, use yes.

PAINS

Score_Pains value

When adding a new PAINS to the list of pre-computed PAINS, a score must be given. For more information on scores see J. Cheminform 2016, 8, 29. doi: 10.1186/s13321-016-0137-3.

Statistics

Statistics_and_Distributions number_of_descriptors_to_plot
descriptor_#1 minimum_value_#1 maximum_value_#1 numberf_of_bins_#1
descriptor_#2 minimum_value_#2 maximum_value_#2 numberf_of_bins_#2
…

If the SmallMolecule-Mode is set to compute_properties, it is possible to generate statistics and distributions of the descriptors computed by Forecaster. Bar plots of these distributions will be plotted to the files descriptor-n_Distribution.png

Available descriptors:

- Molecular_Weight
- HBD
- HBA
- FlogP
- logS
- polar_SASA
- Net_Charge
- Rotatable_Bonds
- Rings
- Fsp3
- Ionizable_Centers
- #N
- #O
- #S
- #X
- Heteroatoms
- Heavy_Atoms
- Michael_Acceptors
- tPSA
- SC
- O-
- N+
- nonpolar_SASA
- McGowan_Volume
- Molecular_VSA
- Molecular_Softness
- Ovality_Index
- Stereochemical_Complexity
- Aromatic_Proportion
- Molecular_Polarity
- Molecular_EN
- Molecular_Polarizability
- BBB_Permeator
- Molecular_Density
- Molecular_Hardness
- Wiener_3D_Index
- Geometric_Radius
- Geometric_Diameter
- Geometric_Shape_Coeff
- Geometric_Span
- Radius_of_Gyration
- Dipole_Moment
- Globularity_Index

ADMET profiling

Multi_Profiles [ yes | no ]

Instructs SMART to create and output an individual profile for each molecule in the library. If keyword is set to no, all ADMET profiles be printed out in the same file. Default: no.

Generating conformers (still under development)

Number_of_Conformers value

Instructs SMART to set the number of conformers to be created. Default: 10.

Sort_by_Energy [ yes | no ]

Whether the resulting conformers should be sorted in the output file relative to the optimized molecule from the input file. Default: yes.

Multi_Conformer_Output [ yes | no ]

Whether the resulting conformers should be printed in individual xyz files or all together. Default: no.

Stochastic_Seed [ yes | no ]

Whether the torsion randomization is done stochastically (random values from -180 to 180) or using discrete values (from -180 to 180 in increments of 15 degrees only). Default: no.

ConfSim_Threshold value

The threshold for considering any two conformers too close in energy to each other. Default: 0.05 kcal/mol.

HighEnergy_Threshold value

The threshold for considering any conformer to be too high in energy (and thus unrealistic). The value is relative to the optimized input mol. Default: 10.0 kcal/mol.

MAPS

Generic parameters

Main_Mode make_peptides

This parameter instructs FORECASTER to select a program. For MAPS, use “make_peptides”.

Run_Mode make_peptides

Following the keyword, specify the run mode which can only be |” << endl;
cout << “| # make_peptides

Sequence peptide-one-letter-code (e.g. AKTASVR )

Following this keyword, provide the peptide sequence (one letter codes)

Peptide_Size size-of-peptide

Following this keyword, provide the size of the peptide to be converted. If no size is given, the entire peptide sequence provided above will be converted.

2D_or_3D [ 2D | 3D ]

This parameter instructs MAPS to generate the peptides either in 2D or 3D.

Phosphorylation [ none | monophosphorylation | complete ]

This parameter instructs MAPS that serine, tyrosine and threonine residues should be phosphorylated. Default: none

- none: no phosphorylation will be carried out.
- monophosphorylation: all the possible non-phosphorylated and mono-phosphorylated peptides will be generated.
- complete: all the serine, tyrosine and threonine residues will be phosphorylated.

FITTED

Generic parameters

Main_Mode fitted | data_postprocessing

This parameter instructs FORECASTER to select a program. For FITTED, use “fitted”.

Run_Mode fitted | data_postprocessing

Following the keyword, specify the run mode which can only be “fitted”.

Input files

Protein number-of-files
input file #1
input file #2
…

Following this keyword is the number of protein structure files used as input (same protein different conformation). This keyword is
equivalent to Protein_Conformations. These protein files should be prepared using ProCESS prior to the actual docking. On the following lines are the protein file names, one per line, without the file extension (.mol2).

Ligand file-name

Name of the ligand file to be docked (in mol2 format). This ligand files should be prepared using SMART prior to the actual docking. The ligand file can contain a single molecule or multiple molecules (multi-mol2).

Restart_At molecule name

If a job crashes or stops, it can be restarted. The ligand file (given with the parameter above: Ligand file-nameName) must include molecule names. The name of the last molecule that was running when the program stops can then be given with this keyword to instruct FITTED to skip all the molecules in the ligand file until it finds this one. It will then dock it and all the ones appearing after in the ligand file. As a note, FITTED will append the results and output data to the files (file-results.txt and file.out) written previously by the job that stopped.

Ref number-of-files
file-#1.mol2
file-#2.mol2
…

Following this keyword is an integer stating how many reference files are used to calculate the root-mean-square deviation (RMSD) of the ligand heavy atoms. These ligand files should be in the same reference frame as the protein structure. The possible symmetric conformations of the ligand are calculated in silico. RMSD calculation can only be done when the ligand’s bioactive conformation is known (e.g. self-docking study). 2 reference files may be needed in some instances where the ligand or protein active site is Cn symmetric (n >=2 ) On the following line(s), the reference file(s) (in mol2 format) are listed, one per line. If this keyword is missing, no RMSD values will be computed.

Binding_Site_Cav XXXX_BindSite.mol2

Following this keyword is the file defining the cavity present in the active site (a set of spheres prepared by ProCESS). If this parameter is missing, no cavity volume filter will be used (it is highly recommended to use both Interaction_Sites and Binding_site_cav parameters).

Binding_Site number-of-residues
flex-residue-#1_name
flex-residue-#2_name
…

This keyword can be used to manually defines the active site. (The active site can be automatically defined by PROCESS and a ligand).
On the same line following this keyword, specify the number of flexible residues. This list should be as exhaustive as possible to avoid missing any important residue defining the active site. On subsequent lines, the residue name/numbers are specified, one per line.

Interaction_Sites XXXX_IS.mol2

Name of the file containing the interaction site description (prepared by ProCESS, mol2 format). If this parameter is missing, no interaction site filter will be used. (It is highly recommended to use both Interaction_Sites and Binding_site_cav)

Pharmacophore pharmacophore_file.mol2

Name of the file containing the pharmacophore constraints on the ligands (prepared by ProCESS). Typically this keyword is used to ensure that the binding modes produced match this constraint, but it can be softened by setting Min_Constraint (see below). If this parameter is missing, no constraint will be used.

Protein_Reference_Structure number-of-files
file-#1.mol2
file-#2.mol2
…

Following this keyword is the number of reference protein structure files used to compute the protein RMSD (deviation of the modeled protein structure from the reference structures). On the following lines are the protein file names, one per line. These files will be used in addition to the Protein files listed before to calculate a root-mean-square-deviation (RMSD) between the protein generated during a fitted docking run and these reference files. Additional files can be needed if the protein has a symmetrical structure (e.g., HIV-1 protease). If this parameter is missing, protein input files will be used as references.

Recompute_Descriptors [ yes | no ]

If set to yes, molecular properties (descriptors) will be recomputed on the docked poses.

Print_CSV [ yes | no ]

If set to yes, the different terms of the binding energy as well as descriptors will be saved in a csv file. Print_Energy_Full yes is required.

Run parameters

- Dock (default): Normal docking run.
- VS: This mode is now deprecated in this new version. When selecting this mode, it automatically switches to the Dock mode.
- Cross_Docking: This mode will consider several protein conformations separately against a set of ligand in a cross-docking experiment.
- Score: Scores the ligand input structure in the provided orientation against all input proteins.
- Scorecyp: Scores the ligand input structure in the provided orientation against a CYP protein.
- Local: Performs a local search on the ligand input structure. The provided orientation/translation/conformation is used as a starting point and only slight modifications to the ligand conformation, orientation and translation are carried out.
- SAR: Performs a local search on the ligand input structure. The provided orientation/translation/conformation is used as a starting point and only slight modification to the ligand orientation and translation are carried out while a complete search of conformations is done.

Parameters [ Manual | Auto ]

This parameter instructs FITTED and IMPACTS to derive missing force field parameters automatically or not.

Flex_Type [ Rigid | Semiflex | Flex_water | Flex ]

- Rigid (default if only one protein structure is used): The ligand is docked onto one protein structure.
- Semiflex (default if more than one protein structure is used): The ligand is docked onto multiple protein structures (requires Protein ≥ 2). Proteins can be exchanged during the evolution but not the genes corresponding to side chains or water molecules (a more complete description of this mode is given in reference 1).
- Flex_water: The ligand is docked into multiple protein structures (requires Protein ≥ 2). Similar to Semiflex, except that each water molecule evolves independently.
- Flex: The ligand is docked onto multiple protein structures (requires Protein ≥ 2). The side chains and waters are allowed to be exchanged independently from the protein backbone.

Number_of_Runs number-of-runs

More than one run per ligand can be performed (The ligand may be docked several time to ensure a complete search). If this keyword is missing, the default value is 3 for Dock mode all other modes the default is 1.

Displaceable_Waters [ On | Off ]

Allows the user to turn off the displaceable waters. The default is on which allows displaceable waters.

Particle_Waters [ Y | N ]

Instructs the program to use particle waters (Needs to be previously added by PREPARE). Default is No.

Corner_Flap [ On | Off ]

Turns the corner flap conformational search for rings on or off. By default, it is set to Off.

Minimization_Algorithm [ Steepest | Conjugate | LBFGS ]

There are three available optimizers: steepest descent, conjugate gradient and LBFGS. The default is conjugate.

Conjugate gradient/LBFGS parameters

The default values for all the keywords described in this section are recommended.

GA_* or GI_* value

There are two sets of the following keywords: one for the parameters used during the generation of the initial population (GI_*; e.g., GI_MaxInt) and another one used during the evolution (GA_*; e.g., GA_MaxInt). The default values are recommended.

XX_MaxIter value

Maximum number of iterations. Once this number is reached the minimization is finished. The default is 20.

XX_StepSize value

Initial value of the step taken in the direction of the gradient during minimization. The default is 0.02.

XX_MaxStep value

Maximum step size allowed during minimization. The default is 1.

XX_EnergyBound value

Minimum energy difference between two molecules to be considered similar. The default is 1.0 for GI_EnergyBound and 0.001 for GA_EnergyBound.

XX_MaxSameEnergy value

Number of times that the same energy (defined by EnergyBound) can be repeated. The default is 3.

XX_MaxGrad value

Gradient convergence criteria. The default is 0.001.

XXMaxCosine value

Maximum change in direction (cosine of this angle) accepted to run a conjugate gradient. If the direction is greater than this limit, steepest descent is used. The default is 1 (maximum change in direction is 180 degrees, equivalent to function turned off).

CGResetSD value

This parameters designates the number of conjugate gradient steps before the algorithm switches to steepest descent for a single step
(ie reset the direction to the gradient). The default is -1 (function turned off).

Energy parameters

Score_Initial [ none | score | minimize ]

Scoring of the initial ligand binding mode.

- none (default): No scoring of the initial input structure is performed.
- score: Only the score of the initial input ligand is output.
- minimize: The score of the initial pose and the score of the energy minimized structure will be output.

VdWScale_1-4 value

Scaling factor for the 1,4 van der Waals interactions. The default is 1.0.

VdWScale_1-5 value

Scaling factor for the 1,5 van der Waals interactions. The default is 1.0.

E_VdWScale_Pro value

Scaling factor for the ligand-protein van der Waals interactions. The default is 1.0.

E_VdWScale_Wat value

Scaling factor for the ligand-water van der Waals interactions. The default is set the value as the same as E_vdWScale_Pro.

ElecScale_1-4 value

Scaling factor for the 1,4 electrostatic interactions. The default is 1.0.

ElecScale_1-5 value

Scaling factor for the 1,5 electrostatic interactions. The default is 1.0.

E_ElecScale_Pro value

Scaling factor for the ligand-protein electrostatic interactions. The default is 1.0.

E_ElecScale_Wat value

Scaling factor for the ligand-water electrostatic interactions. The default value is set the same as E_ElecScale_Pro.

E_HbondScale_Pro value

Scaling factor for the ligand-protein hydrogen bond interactions. The default is 1.0.

E_HbondScale_Wat value

Scaling factor for the ligand-water hydrogen bond interactions. The default value is set the same as E_HbondScale_Pro.

E_ElecScale_Metal value

Scaling factor for the ligand-metal electrostatic interactions. This type of interaction does not apply to specifically designed zinc and iron interactions used if the Macromolecule is set to Metalloprotein. The default is 1.0.

E_CoordinationScale_Metal value

Scaling factor for the ligand-metal electrostatic interactions. This type of interaction does not apply to specifically designed zinc and iron interaction used if the Macromolecule is set to Metalloprotein. The default is 1.0.

E_HBondScale_Metal value

Scaling factor for the ligand-metal electrostatic interactions. This type of interaction applies to specifically designed zinc and iron interactions used if the Macromolecule is set to Metalloprotein_HB and to any specific cataytic hydrogen bonds in zinc metalloprotein (ex.: with neighboring Glu or His residues) if the Macromolecule is set to Metalloprotein. The default is 1.0.

Proton_Transfer_Energy value

Energy difference between the ligand/metalloprotein complex before and after proton transfer if any. The default is -4.0.

Charge_Transfer value

Charge transfer during a proton transfer if any. The default is 0.0.

Cutdist value

Cutoff distance (in Ǻ) for the non-bond interactions with the protein. The default value is 9.

Switchdist value

Switching distance (in Ǻ) for the non-bond interactions with the protein. The default value is 7.

Cutdist_Wat value

Cutoff distance for the non-bond interactions with the water molecules. The default value is 1.20.

Switchdist_Wat value

Switching distance for the non-bond interactions with the particle water molecules. The default is 1.75.

Cutdist_PW value

Cutoff distance for the non-bond interactions with the particle water molecules. The default value is 1.20.

Switchdist_PW value

Switching distance for the non-bond interactions with the water molecules. The default is 1.75.

Solvation [ On | Off ]

Allows the user to turn off the calculation of the solvation energy. The default is On.

GB_Epsilon value

Generalized Born / surface area (GB/SA) is used to compute solvation changes upon binding. The dielectric constant value is required. The default is 78.0.

E_Entropy value

PROCESS identifies flexible residues and label them. When FITTED reads these labels it scales down interactions with these atoms to somewhat account for entropy cost associated to the freezing of these flexible side chains. If the user wants to further increase this impact, a value greater than one should be proposed. A value of 0 turns off this effect. The default is 1.

Scoring parameters

The default values for all the keywords are highly recommended as they represent the scaling factors optimized for RankScore.
Please contact us if you need to change the keywords. RankScore is a set of scoring functions which are based on energy terms and other terms which are all scaled to better model the observed binding free energies. The weights of each term can be changed, although, as mentioned above, it is not recommended.

Several flavours of our scoring function have been developed over the years. The deep neural network program can also be used to train a decision model (active/inactive). By default, RankScore 7 is used unless particle waters are present. In the latter case, RankScore 5 is the default.

S_VdWScale_Pro value

Scaling factor for the ligand-protein van der waals interaction relative to the weight in the selected scoring function. The default is 1.0
(keeps the original scoring weight.

S_ElecScale_Pro value

Scaling factor for the ligand-protein electrostatic interaction relative to the weight in the selected scoring function. The default is 1.0
(keeps the original scoring weight.

S_HBondScale_Pro value

Scaling factor for the ligand-protein hydrogen bond interaction relative to the weight in the selected scoring function. The default is 1.0 (keeps the original scoring weight.

S_VdWScale_Wat value

Scaling factor for the ligand-water van der Waals interaction relative to the weight in the selected scoring function. The default is 1.0
(keeps the original scoring weight).

S_ElecScale_Wat value

Scaling factor for the ligand-water electrostatic interaction relative to the weight in the selected scoring function. The default is 1.0
(keeps the original scoring weight.

S_HBondScale_Wat value

Scaling factor for the ligand-water hydrogen-bond interactions relative to the weight in the selected scoring function. Default = 1.0 (keeps original scoring weight).

S_VdWScale_Wat value

Scaling factor for the ligand-water hydrogen bond interaction relative to the weight in the selected scoring function. The default is 1.0
(keeps the original scoring weight).

S_ElecScale_Metal value

Scaling factor for the ligand-water electrostatic interaction relative to the weight in the selected scoring function. The default is 0.0
(ignore this term if Macromolecule is set to Metalloprotein).

S_HBondScale_Metal value

Scaling factor for the ligand-water electrostatic interaction relative to the weight in the selected scoring function. The default is 1.0
(this term is essential if Macromolecule is set to Metalloprotein).

S_CoordinationScale_Metal value

Scaling factor for the ligand-water electrostatic interaction relative to the weight in the selected scoring function. The default is 1.0 (this term is essential if Macromolecule is set to Metalloprotein).

Water_Loss_Entropy value

Entropy estimate for the displacement of a water molecule. The default is 1.0.

Water_Loss_Enthalpy value

Entropy estimate for the displacement of a water molecule. The default is 1.0.

Weight_Rot_Bonds value

Entropy estimate for the freezing of a bond upon ligand binding. This term includes the number of rotatable bonds but also some other factors such as hydrophobicity. The default is 1.0.

Weight_NRot_Bonds value

Entropy estimate for the freezing of a bond upon ligand binding. This term includes only the number of rotatable bonds. The default is 1.0.

Recompute_Descriptors [ Y | N ]

If set to yes, the descriptors will be recomputed on the docked poses. The default is N.

Initial population parameters

Pop_Size value

Population size for the genetic algorithm conformational search. When 10000 is given as value, automatic determination based on the ligand’s number of torsions is done. The default is automatic for rigid docking, 200 for flexible docking when keyword is omitted.

Resolution value

Resulotion of the torsion rotation when randomly generating a new individual. The default is 120 degrees.

GI_Initial_E value

Any randomly generated individual will be discarded before being energy-minimized if greater than this energy. The default is 1.0 e10 kcal/mol.

GI_Minimized_E value

Any randomly generated individual will be discarded after being energy-minimized if the energy relative to the input conformation energy is greater than this energy. The default is 1000 kcal/mol.

Max_Steric_Clash_Flexible_Residue value

If two water molecules and/or flexible side chains are within this distance, they are considered clashing. The default is 1.5 Å.

Max_Num_Steric_Clashes value

If two water molecules and/or flexible side chains are within the distance defined with Max_Steric_Clash_Flexible_Residue (above), they are considered clashing. This parameter defines the number of acceptable number of clashes. The default is 0.

Min_MatchScore value

This keyword is used only if an interaction site file is provided. If the Mode is set to Dock, Min_MatchScore is automatically calculated. Minimum match of the interaction sites. The default is 20.

Min_PharmScore value

This keyword is used only if a pharmacophore file is provided. Minimum percent match of the pharmacophore. The default is 100.

Anchor_Atom atom number

Sequence number of the atom to be used as an anchor. This is used to identify the center of translation and rotation for the GA. If this keyword is not specified, the anchor is automatically set to the gravity center of the ligand.

Anchor_Coord x y z

Following this keyword must be the x, y and z coordinates of the protein active site center. If this keyword is not used, it is automatically set to the center of the protein active site defined by the active site (flexible) residues.

Max_Tx x

Max_Ty y

Max_Tz z

Maximum value for translation (in Å) in x, y, and z respectively. If the Mode is set to Local or SAR, the default is 0.2 Å for the three values, it is 5 Å otherwise.

Max_Rxy x

Max_Ryz y

Max_Rxz z

Maximum value for rotation (in degrees) around x, y, and z axes respectively during a mutation. If the Mode is set to Local or SAR, the default is 2 degrees for the three values, it is 30 degrees otherwise.

GI_Num_of_Trials value

Maximum number of successive unsuccessful trials before exiting. The default for Mode Dock is 10,000 and for Mode VS is 1,000.

Matching_Algorithm [ On | Off ]

This parameters instruct the program to turn on or off the matching algorithm. By default, it is set to On.

Num_of_Top_IS value

Number of top Interactions sites that the interaction site triangles must contain at least one of. The default is 10.

Stringent_Triangles value

Is a factor by which the triangles are selected. The higher Stringent_Triangles is set, the more the matching algorithm will favor triangles that have not been used. The default value is 5.

Stringent_MS value

This parameters provides a weight factor used in calculation of Min_MatchScore. The higher this value, the stricter Min_MatchScore becomes. The default value is 4.

Genetic algorithm parameters

Max_Gen value

Determine the maximum number of generations for the genetic algorithm. The default is 175.

Max_Gen_1 value

if after Max_Gen_1 generations none of the top poses has a score below the one specified by CutScore_1 or a MatchScore higher than Cutoff_MScore_1, the program exits. Otherwise, the program proceeds until it reaches Max_Gen_2. The default is Max_Gen.

CutScore_1 value

Upper bound score at Max_Gen_1 to further proceed with the docking run. If there is one individual within the top 3 below this CutScore_1 then the program proceeds to Max_Gen_1. The default is -5.0.

Cutoff_MScore_1 value

Lower bound MatchScore at Max_Gen_1 to further proceed with the docking run. The default is 15.0.

Max_Steric_Clash_Flexible_Residue value

If two water molecules and/or flexible side chains are within this distance, they are considered clashing. The default is 1.5 Å.

Max_Gen_2 value

As for Max_Gen_1, if after Max_Gen_2 generations none of the top poses has a score below the one specified by CutScore_2, the program exits. Otherwise, the program proceeds until it reaches Max_Gen_2. The default is Max_Gen.

CutScore_2 value

Upper bound score at Max_Gen_2 to further proceed with the docking run. If there is one individual within the top 3 below this CutScore_2 then the program proceeds to Max_Gen_2. The default is -7.5.

Cutoff_MScore_2 value

Lower bound MatchScore at Max_Gen_2 to further proceed with the docking run. The default is 20.0.

Seed value

Select the starting point within the random number generator. If the same run is done with the same seed, the exact same result will be obtained. If a different seed is used, the GA will follow a different path. Changing the seed helps the developers to evaluate the convergence of a run. The default is 100.

Parent_Selection [ Random | Tournament | Islands ]

Method to select parents who will produce children. The Default is random.

- Random: parents are randomly selected from the pool of individuals.
- Tournament: parents are selected using a tournament where a randomly selected of parents are ranked based on binding energy and the best ones selected.
- Islands: evolution takes place on separate islands and both parents must come from the same island.

Tournament_Size value

If the parameter Parent_Selection (above) is set to Tournament, a number of candidates per tournament must be given. The Default is 2.

Num_Of_Islands value

If the parameter Parent_Selection (above) is set to Island, a number of islands must be given. The Default is 5.

pLearn value

Probability of energy minimization of the parents at every generation. The Default is 0.1.

pCross value

Probability of crossover at every generation. The default is 0.85.

pMut value

Probability of mutation at every generation. The default is 0.05.

pMutRot value

Probability of mutation of the orientation of the ligand at every generation. The default is 0.30.

pMutWat value

The maximum rate of mutation of the water at Max_Gen generations. The default is 0.35.

pElite value

The percentage of the best of the population to be directly passed on to the next generations. The default is 0.01.

pElite_Every_X_Gen value

pElite will be used every pElite_Every_X_Gen. The default is 2.

pElite_SSize value

The individual to be passed directly onto the next generation will be selected random from the top pElite_SSize individuals of the population. The default is 10.

pOpt value

Probability of optimization of the ligand at every generation. The default is 0.20.

Evolution [ Steady_State | Metropolis | Elite ]

- Steady_State (default): During the evolution, out of a pair of two children and their 2 parents the two best will be saved.
- Metropolis: During the evolution, out of a pair of two children and their 2 parents two individuals will be saved following the Metropolis criterion. If the children are higher in energy they are checked to see if they have a high probability to exist at room temperature. If they do they are saved.
- Elite: During the evolution, the top pop_size individuals of the children and parents will be kept for the next generation.

GA_Num_of_Trials value

Maximum number of successive unsuccessful trials to create children. The default is 1000.

Covalent docking parameters

Covalent_Residue residue name

Following this keyword is the name of the residue, the covalent inhibitor will react with. Only CYS and SER are implemented in the current version (e.g., SER554).

Covalent_Ligand [ Only | Both ]

Controls the covalent docking. FITTED will automatically identify the aldehyde, boronate or nitrile groups (other groups will eventually be implemented) and assign the proper atom types when covalent poses will be considered.

- Only (default): Only covalent poses will be considered
- Both: Covalent and non-covalent poses will be considered concomitantly.

Proton_Moved_To residue-name residue-number atom-name

The proton of a catalytic residue (e.g., serine hydroxyl group) will be moved to atom <atom_name> of residue <residue> (e.g., a neighboring histidine residue). Ex.: HISD 227 NE2

Max_Number_of_Warheads value

Any molecule with more than this number of warheads will be skipped. As a note, electrophiles such as epoxide count for 2 warheads (2 reactive carbon atoms).

Phosphonate_Group residue-name residue-number atom-name

If nucleophilic ligands are used, they can react with a phosphonate bound to a residue (eg, reactivators of AChE). it should be given to FITTED>. Ex.: Phosphonate_Group PHO P1

Output/convergence parameters

Diff_Avg_Best value

The absolute difference between the average energy of the population and the best individual of the population. If the calculated value is below difference_avg_best then the population is considered to be converged. The default is 1.0.

Diff_N_Best value

The absolute difference in energy between the individual with the lowest energy and the individual ranked Diff_Number. If Diff_Number is defined the default value is 0.4.

Diff_Number value

The number of the indivuals to be used with Diff_N_Best. By default this criteria is not used.

MaxSameEnergy_GA value

The maximum number of generations without any improvement in the best pose before the genetic algorithm exits. The default is the value given to Max_Gen.

Time_Limit value

Maximum time spent on the generation of an individual in the initial population (in seconds). The default is 10 seconds.

Time_Limit_Evolution value

Maximum time spent on the generation of an individual during the evolution (in seconds). The default is 10 seconds.

Print_Structures [ Final | Full | None ]

Controls the output of the structures during or at the end of the docking.

- Final (default): Only the final structures will be printed.
- Full: The structures (protein and ligand) will be printed during the run along with the final structures.
- None: No structures will be printed.

Print_Initial_Population [ Y | N ]

The initial population is output together with scores in CSV format. The default is N.

Print_Num_Structures value

Select how many of the top poses are printed as MOL2 files. The default is 1.

Print_CSV [ Y | N ]

This parameter allows user to output the docking result information in CSV format.

All_Poses [ Y | N ]

Select whether a pose is printed out for every run (Y) or only the best over all the runs (N). The default is N.

Number_of_Best value

Select how many individuals to print the score, energy and RMSD during the run. The default is 10 in Mode Dock and 1 in Mode VS.

Print_Best_Every_X_Gen value

How often to print a summary of the run. The default is (Max_Gen + 1).

Print_Energy_Full [ Y | N ]

Controls the printout of the detailed energy contributions.

- Y (default): Print out a breakdown of the energy (bond energy, angle energy, etc.).
- N: only major terms (e.g., total energy, score) are printed out.

Printout_Residue_Interactions value
Residue name #1
Residue name #2
This parameters instructs FITTED to print out residue-specific interactions. The number of residues should be provided followed
by a list of residue names (one per line).

Printout_Key_Interactions [ Yes | No ]

This parameter instructs FITTED to print residue-specific interactions. Default = no.

Print_Internal_Energy_Full [ Y | N ]

In addition to Print_Energy_Full (above) which gives a full report on protein/ligand interaction, this parameter instructs FITTED to also output values of internal energy (bond, angle,…).

Printout_Pairwise_Interactions [ Yes | No ]

This parameter instructs FITTED to print atom-type pairwise interactions. Default = no.

Print_Proteins [ Y | N ]

Controls the printout of the protein conformation. While this is unnecessary in rigid protein docking, one may want to visualize the protein conformation generated by FITTED when flexible protein mode is selected.

Print_XYZ [ yes | no ]

The docked poses will be printed out in xyz format. Works only if the keyword “All_Poses” is set to no.

Data postprocessing parameters

Results_Files number-of-results-files-obtained-after-docking
name-of-results-file-1
name-of-results-file-2
…

If you ran a retrospective VS and you have results files for both actives and inactives, you must add the value “actives” or “inactives” after the name of the file. It is highly recommended to dock the actives and inactives separately for easier data processing.

Results_Mode type-of-docking

This parameter can have 4 values: covalent, non-covalent, cyp_inhibition cyp_induction. Default = non-covalent.

Macromolecule protein-name-used-for-docking

This parameter can have 4 values: protein, metalloprotein, RNA, DNA. Default = protein/RNA/DNA.

Print_Energy_Full value-used-for-docking

This parameter can have 2 values: yes, no. Default = no

Compute_AUROC [ yes | no ]

If you ran a retrospective VS and you have files containing actives and inactives you can choose to compute the AUROC. This keyword will also automatically compute enrichment factors and plot the True vs. False positives and will output the plot the the file AUROC.png

Print_AUROC [ yes | no ]

Use in conjunction with Compute_AUROC. If you desire to plot the AUROC, use this keyword.

Sort_By_Column_Heading column-heading

This parameter can have 3 values: RankScore, MatchScore, HybridScore.

Sort_By_Column_Number column-number

Sort by column number instead of column name.

Top_Compounds top-n-number-of-compounds

This parameter ensures that only the top N compounds are printed. Default = 1000

Score_Lower_Than score

This parameter ensures that only the compounds with a better score than the threshold selected will be considered. Default = 0.

IMPACTS

Generic parameters

Main_Mode impacts

This parameter instructs FORECASTER to select a program. For IMPACTS, use “impacts”.

Run_Mode [ impacts | CYP_Inhibition | CYP_Induction ]

Following the keyword, specify the run mode which can be “impacts” or “all_cyps” if the seven available CYPs are to be processed.

Input files

Protein number-of-files
input file #1
input file #2
…

Following this keyword is the number of protein structure files used as input. These protein files should be prepared using ProCESS prior to the actual docking unless already predefined. On the following lines are the protein file names, one per line, without the file extension (.mol2).

For predefined CYPs, the files will be automatically found by IMPACTS in the package. below are the list of file names which should be used:

- CYP1A2: 2HI4_pro
- CYP2C8: 2NNI_pro
- CYP2C9: 1R9O_pro
- CYP2C19: 4GQS_pro
- CYP2D6: 3QM4_pro
- CYP2E1: 3E6I_pro
- CYP3A4: 3NXUA_pro

Ligand2D file-name

Name of the ligand file (built in sketcher2D) used to first draw the molecule in 2D (in sdf format). if this file has been used to prepare the 3D structure using CONVERT, then it can be provided here. If the 3D structure comes from another source (different atom number,…), do not use this keyword.

Ligand file-name

Name of the ligand file to be docked (in MOL2 format). This ligand files should be prepared using SMART prior to the actual docking. The ligand file can contain a single molecule or multiple molecules (multi-mol2).

Binding_Site_Cav XXXX_BindSite.mol2

Following this keyword is the file defining the cavity present in the active site (a set of spheres prepared by ProCESS). If this parameter is missing, no cavity volume filter will be used (it is highly recommended to use both Interaction_Sites and Binding_site_cav parameters).

For predefined CYPs, the files will be automatically found by IMPACTS in the package. below are the list of file names which should be used (if IMPACTS is parametrized to consider more than one CYP, the Heme residues would be provided in the same order as were the protein files above).:

- CYP1A2: Binding_Site_Cav 1A2_BindSite.mol2
- CYP2C8: Binding_Site_Cav 2C8_BindSite.mol2
- CYP2C9: Binding_Site_Cav 2C9_BindSite.mol2
- CYP2C19: Binding_Site_Cav 2C19_BindSite.mol2
- CYP2D6: Binding_Site_Cav 2D6_BindSite.mol2
- CYP2E1: Binding_Site_Cav 2E1_BindSite.mol2
- CYP3A4: Binding_Site_Cav 3A4_BindSite.mol2

Interaction_Sites XXXX_IS.mol2

Name of the file containing the interaction site description (prepared by ProCESS). If this parameter is missing, no interaction site filter will be used. (It is highly recommended to use both Interaction_Sites and Binding_site_cav). For predefined CYPs, use “none”.

Heme_Residue heme-residue-name

This parameter instructs IMPACTS to consider this heme (may be more than one in the structure) for metabolism prediction. If IMPACTS is parametrized to consider more than one CYP, the Heme residues would be provided in the same order as were the protein files above but with the keyword Heme_Residue only for the first one.

- CYP1A2: Heme_Residue HEM900
- CYP2C8: Heme_Residue HEM500
- CYP2C9: Heme_Residue HEM500
- CYP2C19: Heme_Residue HEM501
- CYP2D6: Heme_Residue HEM502
- CYP2E1: Heme_Residue HEM500
- CYP3A4: Heme_Residue HEM508

Run parameters

Mode [ Metabolism ]

This parameter instructs FITTED routines used in IMPACTS to run in IMPACTS mode.

Force_SOM [ Y | N ]

This parameter instructs IMPACTS to work only with the identified site of metabolism. These are designated by a “*” next to the atom name in the mol2 file.

Systematic_Search [ Y | N ]

This parameter instructs IMPACTS to investigate each site of metabolism one by one until they are all looked at.

Parameters [ Manual | Auto ]

This parameter instructs FITTED and IMPACTS to derive missing force field parameters automatically or not.

Lambda_Scale value

IMPACTS models the transition states as linear combinations of reactant and products. By default (Lambda_Scale=0.5), TS = 0.5 reactant + 0.5 product. A lower value would make the TS more reactant-like and a value greater than 0.5 more product-like.

Flex_Type [ Rigid | Semiflex | Flex_water | Flex ]

- Rigid (default if only one protein structure is used): The ligand is docked onto one protein structure.
- Semiflex (default if more than one protein structure is used): The ligand is docked onto multiple protein structures (requires Protein ≥ 2). Proteins can be exchanged during the evolution but not the genes corresponding to side chains or water molecules (a more complete description of this mode is given in reference 1).
- Flex_water: The ligand is docked into multiple protein structures (requires Protein ≥ 2). Similar to Semiflex, except that each water molecule evolves independently.
- Flex: The ligand is docked onto multiple protein structures (requires Protein ≥ 2). The side chains and waters are allowed to be exchanged independently from the protein backbone.

Number_of_Runs number-of-runs

More than one run per ligand can be performed (The ligand may be docked several time to ensure a complete search). If this keyword is missing, the default value 1.

Number_of_Runs_Per_Site number-of-runs-per-site-of-metabolism

this defines how many times a site can be selected in the final proposed docked metabolites. For example, if this is set to 1, if a site of metabolism is identified in the first run, it is removed from the list of potential sites of metabolism for the subsequent runs. The default value is 3.

Displaceable_Waters [ On | Off ]

Allows the user to turn off the displaceable waters. The default is on which allows displaceable waters.

Particle_Waters [ Y | N ]

Instructs the program to use particle waters (Needs to be previously added by PREPARE). Default is No.

Corner_Flap [ On | Off ]

Turns the corner flap conformational search for rings on or off. By default, it is set to Off.

Conjugate gradient/LBFGS parameters

The default values for all the keywords described in this section are recommended.

Minimization_Algorithm [ Steepest | Conjugate | LBFGS ]

There are three available optimizers: steepest descent, conjugate gradient and LBFGS. The default is conjugate.

GA_* or GI_* value

There are two sets of the following keywords: one for the parameters used during the generation of the initial population (GI_*; e.g., GI_MaxInt) and another one used during the evolution (GA_*; e.g., GA_MaxInt). The default values are recommended.

XX_MaxIter value

Maximum number of iterations. Once this number is reached the minimization is finished. The default is 20.

XX_StepSize value

Initial value of the step taken in the direction of the gradient during minimization. The default is 0.02.

XX_MaxStep value

Maximum step size allowed during minimization. The default is 1.

XX_EnergyBound value

Minimum energy difference between two molecules to be considered similar. The default is 1.0 for GI_EnergyBound and 0.001 for GA_EnergyBound.

XX_MaxSameEnergy value

Number of times that the same energy (defined by EnergyBound) can be repeated. The default is 3.

XX_MaxGrad value

Gradient convergence criteria. The default is 0.001.

CGMaxCosine value

Maximum change in direction (cosine of this angle) accepted to run a conjugate gradient. If the direction is greater than this limit, steepest descent is used. The default is 1 (maximum change in direction is 180 degrees, equivalent to function turned off).

CGResetSD value

This parameters designates the number of conjugate gradient steps before the algorithm switches to steepest descent for a single step
(ie reset the direction to the gradient). The default is -1 (function turned off).

Energy parameters

Score_Initial [ none | score | minimize ]

Scoring of the initial ligand binding mode.

- none (default): No scoring of the initial input structure is performed.
- score: Only the score of the initial input ligand is output.
- minimize: The score of the initial pose and the score of the energy minimized structure will be output.

VdWScale_1-4 value

Scaling factor for the 1,4 van der Waals interactions. The default is 1.0.

VdWScale_1-5 value

Scaling factor for the 1,5 van der Waals interactions. The default is 1.0.

E_VdWScale_Pro value

Scaling factor for the ligand-protein van der Waals interactions. The default is 1.0. For IMPACTS 0.5 is recommended.

E_VdWScale_Wat value

Scaling factor for the ligand-water van der Waals interactions. The default is set the value as the same as E_vdWScale_Pro.

ElecScale_1-4 value

Scaling factor for the 1,4 electrostatic interactions. The default is 1.0.

ElecScale_1-5 value

Scaling factor for the 1,5 electrostatic interactions. The default is 1.0.

E_ElecScale_Pro value

Scaling factor for the ligand-protein electrostatic interactions. The default is 1.0. For IMPACTS 4.0 is recommended.

E_ElecScale_Metal value

Scaling factor for the ligand-metal electrostatic interactions. This type of interaction does not apply to specifically designed zinc and iron interaction used if the Macromolecule is set to Metalloprotein. The default is 1.0.

E_CoordinationScale_Metal value

Scaling factor for the ligand-metal electrostatic interactions. This type of interaction does not apply to specifically designed zinc and iron interaction used if the Macromolecule is set to Metalloprotein. The default is 1.0.

E_HBondScale_Metal value

Scaling factor for the ligand-metal electrostatic interactions. This type of interaction applies to specifically designed zinc and iron interactions used if the Macromolecule is set to Metalloprotein_HB and to any specific catalytic hydrogen bonds in zinc metalloprotein (ex.: with neighboring Glu or His residues) if the Macromolecule is set to Metalloprotein. The default is 1.0.

E_ElecScale_Wat value

Scaling factor for the ligand-water electrostatic interactions. The default value is set the same as E_ElecScale_Pro.

E_EactScale value

Scaling factor for the energy of activation. The default value is 17.

E_FukuiScale value

Scaling factor for the reactivity index(Fukui coefficient). The default value is 0.

E_HbondScale_Pro value

Scaling factor for the ligand-protein hydrogen bond interactions. The default is 1.0.

E_HbondScale _Wat value

Scaling factor for the ligand-water hydrogen bond interactions. The default value is set the same as E_HbondScale_Pro.

Cutdist value

Cutoff distance (in Ǻ) for the non-bond interactions with the protein. The default value is 9.

Switchdist value

Switching distance (in Ǻ) for the non-bond interactions with the protein. The default value is 7.

Cutdist_Wat value

Cutoff distance for the non-bond interactions with the water molecules. The default value is 1.20.

Switchdist_Wat value

Switching distance for the non-bond interactions with the water molecules. The default is 1.75.

Solvation [ On | Off ]

Allows the user to turn off the calculation of the solvation energy. The default is On.

GB_Epsilon value

Generalized Born / surface area (GB/SA) is used to compute solvation changes upon binding. The dielectric constant value is required. The default is 78.0.

E_Entropy value

PROCESS identifies flexible residues and label them. When FITTED reads these labels it scales down interactions with these atoms to somewhat account for entropy cost associated to the freezing of these flexible side chains. If the user wants to further increase this impact, a value greater than one should be proposed. A value of 0 turns off this effect. The default is 1.

Scoring parameters

The default values for all the keywords are highly recommended as they represent the scaling factors optimized for RankScore.
Please contact us if you need to change the keywords.

Several flavours of our scoring function have been developed over the years. The deep neural network program can also be used to train a decision model (active/inactive). By default, RankScore 7 is used unless particle waters are present. In the latter case, RankScore 5 is the default.

S_Eact value

Weight of the energy of activation in the final score. The default is 1.00.

S_FukuiScale value

Weight of the reactivity of the site of metabolism as defined by the Fukui coefficient. The default is 1.00.

Initial population parameters

Pop_Size value

Population size for the genetic algorithm conformational search. When 10000 is given as value, automatic determination based on the ligand’s number of torsions is done. The default is automatic for rigid docking, 200 for flexible docking when keyword is omitted.

Resolution value

Resulotion of the torsion rotation when randomly generating a new individual. The default is 120 degrees.

GI_Initial_E value

Any randomly generated individual will be discarded before being energy-minimized if greater than this energy. The default is 1.0 e10 kcal/mol.

GI_Minimized_E value

Any randomly generated individual will be discarded after being energy-minimized if the energy relative to the input conformation energy is greater than this energy. The default is 1000 kcal/mol.

Min_MatchScore value

This keyword is used only if an interaction site file is provided. If the Mode is set to Dock, Min_MatchScore is automatically calculated. Minimum match of the interaction sites. The default is 20.

Min_PharmScore value

This keyword is used only if a pharmacophore file is provided. Minimum percent match of the pharmacophore. The default is 100.

Max_Steric_Clash_Flexible_Residue value

If two water molecules and/or flexible side chains are within this distance, they are considered clashing. Default = 1.5 (A).

Max_Num_Steric_Clashes value

This parameter defines the number of acceptable number of clashes (see above for definition of a clash). Default = 0.

Anchor_Atom atom-number

Sequence number of the atom to be used as an anchor. This is used to identify the center of translation and rotation for the GA. If this keyword is not specified, the anchor is automatically set to the gravity center of the ligand.

Anchor_Coor x y z

Following this keyword must be the x, y and z coordinates of the protein active site center. If this keyword is not used, it is automatically set to the center of the protein active site defined by the active site (flexible) residues.

Max_Tx x

Max_Ty y

Max_Tz z

Maximum value for translation (in Å) in x, y, and z respectively. The default is 5 Å for the three values.

Max_Rxy x

Max_Ryz y

Max_Rxz z

Maximum value for rotation (in degrees) around x, y, and z axes respectively during a mutation. The default is 30 degrees.

GI_Num_of_Trials value

Maximum number of successive unsuccessful trials before exiting. The default for Mode Dock is 10,000 and for Mode VS is 1,000.

Resolution value

Resolution of the torsion rotation when randomly generating a new individual. The default is 120 degrees.

Matching_Algorithm [ On | Off ]

Turns on or off the matching algorithm. By default, it is set to On. For IMPACTS, it is recommended to turn it off.

Num_of_Top_IS value

Number of top Interactions sites that the interaction site triangles must contain at least one of. The default is 10.

Stringent_Triangles value

Is a factor by which the triangles are selected. The higher Stringent_Triangles is set, the more the matching algorithm will favor triangles that have not been used. The default value is 5.

Stringent_MS value

Is a weight factor used in calculation of Min_MatchScore. The higher this value, the stricter Min_MatchScore becomes. The default value is 4.

Genetic algorithm parameters

Max_Gen value

Determine the maximum number of generations for the genetic algorithm. The default is 175.

Max_Gen_1 value

If after Max_Gen_1 generations none of the top poses has a score below the one specified by CutScore_1 or a MatchScore higher than Cutoff_MScore_1, the program exits. Otherwise, the program proceeds until it reaches Max_Gen_2. Default = Max_Gen.

CutScore_1 value

Upper bound score at Max_Gen_1 to further proceed with the docking run. If there is one individual within the top 3 below this CutScore_1 then the program proceeds to Max_Gen_1. Default = -5.0.

Cutoff_MScore_1 value

Lower bound MatchScore at Max_Gen_1 to further proceed with the docking run. Default = 15.0.

Max_Gen_2 value

As for Max_Gen_1, if after Max_Gen_2 generations none of the top poses has a score below the one specified by CutScore_2, the program exits. Otherwise, the program proceeds until it reaches Max_Gen_2. Default = Max_Gen.

CutScore_2 value

Upper bound score at Max_Gen_2 to further proceed with the docking run. If there is one individual within the top 3 below this CutScore_2 then the program proceeds to Max_Gen_2. Default = -7.5.

Cutoff_MScore_2 value

Lower bound MatchScore at Max_Gen_2 to further proceed with the docking run. Default = 20.0.

Seed value

Select the starting point within the random number generator. If the same run is done with the same seed, the exact same result will be obtained. If a different seed is used, the GA will follow a different path. Changing the seed helps the developers to evaluate the convergence of a run. The default is 100.

Parent_Selection [ Random | Tournament | Islands ]

Method to select parents who will produce children. The Default is random.

- Random: parents are randomly selected from the pool of individuals.
- Tournament: parents are selected using a tournament where a randomly selected of parents are ranked based on binding energy and the best ones selected.
- Islands: evolution takes place on separate islands and both parents must come from the same island.

Tournament_Size value

If the parameter Parent_Selection (above) is set to Tournament, a number of candidates per tournament must be given. Default = 2.

Num_Of_Islands value

If the parameter Parent_Selection (above) is set to Islands, a number of islands must be given. Default = 5.

pLearn value

Probability of energy minimization of the parents at every generation. The Default is 0.1.

pCross value

Probability of crossover at every generation. The default is 0.85.

pMut value

Probability of mutation at every generation. The default is 0.05.

pMutRot value

Probability of mutation of the orientation of the ligand at every generation. The default is 0.30.

pMutWat value

The maximum rate of mutation of the water at Max_Gen generations. The default is 0.35.

pElite value

The percentage of the best of the population to be directly passed on to the next generations. The default is 0.01.

pElite_Every_X_Gen value

pElite will be used every pElite_Every_X_Gen. The default is 2.

pElite_SSize value

The individual to be passed directly onto the next generation will be selected random from the top pElite_SSize individuals of the population. The default is 10.

pOpt value

Probability of optimization of the ligand at every generation. The default is 0.20.

Evolution [ Steady_State | Metropolis | Elite ]

- Steady_State (default): During the evolution, out of a pair of two children and their 2 parents the two best will be saved.
- Metropolis: During the evolution, out of a pair of two children and their 2 parents two individuals will be saved following the Metropolis criterion. If the children are higher in energy they are checked to see if they have a high probability to exist at room temperature. If they do they are saved.
- Elite: During the evolution, the top pop_size individuals of the children and parents will be kept for the next generation.

GA_Num_of_Trials value

Maximum number of successive unsuccessful trials to create children. The default is 1000.

Output/convergence parameters

Diff_Avg_Best value

The absolute difference between the average energy of the population and the best individual of the population. If the calculated value is below difference_avg_best then the population is considered to be converged. The default is 1.0.

Diff_N_Best value

The absolute difference in energy between the individual with the lowest energy and the individual ranked Diff_Number. If Diff_Number is defined the default value is 0.4.

Diff_Number value

The number of the individuals to be used with Diff_N_Best. By default this criteria is not used.

MaxSameEnergy_GA value

The maximum number of generations without any improvement in the best pose before the genetic algorithm exits. The default is the value given to Max_Gen.

Time_Limit value

Maximum time spent on the generation of an individual in the initial population (in seconds). The default is 10 seconds.

Time_Limit_Evolution value

Maximum time spent on the generation of an individual during the evolution (in seconds). The default is 10 seconds.

Print_Structures [ Final | Full | None ]

Controls the output of the structures during or at the end of the docking.

- Final (default): Only the final structures will be printed.
- Full: The structures (protein and ligand) will be printed during the run along with the final structures.
- None: No structures will be printed.

Print_Num_Structures value

Select how many of the top poses are printed as MOL2 files. The default is 1.

Print_Initial_Population [ Y | N ]

The initial population is output together with scores in CSV format. The default is N.

All_Poses [ Y | N ]

Select whether a pose is printed out for every run (Y) or only the best over all the runs (N). The default is N.

Number_of_Best value

Select how many individuals to print the score, energy and RMSD during the run. The default is 1.

Print_Best_Every_X_Gen value

How often to print a summary of the run. The default is (Max_Gen + 1).

Print_Energy_Full [ Y | N ]

Controls the printout of the detailed energy contributions.

- Y (default): Print out a breakdown of the energy (bond energy, angle energy, etc.).
- N: only major terms (e.g., total energy, score) are printed out.

Print_Proteins [ Y | N ]

Controls the printout of the protein conformation. While this is unnecessary in rigid protein docking, one may want to visualize the protein conformation generated by FITTED when flexible protein mode is selected.

Equivalency_SOM [ Y | N ]

This parameters allows IMPACTS to print out the carbon oxidized when hydrogen abstraction is found. If Y (default) is selected, the carbons and not the hydrogen atoms are listed as oxidized. Otherwise (N), the hydrogens are printed out.

CONVERT

Generic parameters

Main_Mode convert

This parameter instructs FORECASTER to select a program. For CONVERT, use “convert”.

Following the keyword, specify the run mode which can be “convert”.

- 2d_to_3d: instructs the program to convert a 2D structure into a 3D structure.
- 3d_to_2d: instructs the program to convert a 3D structure into a 2D structure.
- full_optimization: instructs the program to optimize the obtained 3D structure using a genetic algorithm (either from 2D or 3D).
- keep2D: instructs the program to keep the 2D structure but to carry out the other actions (eg, adding hydrogens, identifying chiral centers).
- scan: instructs the program to add functional groups to a scaffold using some of the keywords listed below (eg, Scan_Group, Scan_Atoms).
- scan_minimize: instructs the program to add functional groups to a scaffold using some of the keywords listed below (eg, Scan_Group, Scan_Atoms) and to optimize the structure using the energy-minimizer.
- isomerize: instructs the program to generate all the stereoisomer of a molecule. This function works only in 2D.
- add_hydrogens instructs the program to add hydrogens to an input mol
- remove_hydrogens instructs the program to remove hydrogens from a mol

Molecule molfilename

This parameter instructs CONVERT to read in the file identified by the file name.

Tautomers [ Y | N ]

This parameter instructs CONVERT to generate tautomers (if it finds some) or not. If this keyword is missing, the program will not generate tautomers.

Molecule_Name_Field value

In some sdf files, the names of the molecules are not at the top of the structural data but rather given in a field at the end of each molecule. In this case you may provide this descriptor to CONVERT.

Max_Num_Of_Atoms value

This parameter instructs CONVERT to disregard molecules with more than this number of atoms or bonds. Default = 150 (for both atoms and bonds).

Maximum_Protonation value

This parameter instructs CONVERT to deprotonate molecules if too many protons.

Solvent [ water | organic_solvent ]

If water is selected, CONVERT may add hydrogens to protonate nitrogens and ionize carboxylic acids (polar protic solvent). If organic_solvent is selected, amine and carboxylic acids will remain neutral (apolar aprotic solvent). If hydrogens have already been added, CONVERT will not change the ionization state.

Rotate x y z

This parameter instructs CONVERT to rotate a molecule. For example, Rotate 30 30 30 will rotate the molecule by 30 degrees around x, y and z axes.

Center [ Y | N ]

When converting sdf to mol2 (adding hydrogens,…), the molecule is centered to 0,0,0 by default (Center yes). This function can be disabled using Center No.

Racemic [ Y | N ]

This parameter instructs CONVERT to generate all the stereoisomers of a molecule.

Scan_Group number-of-functional-groups
functional-group-#1
functional-group-#2
…

This parameter instructs CONVERT add a functional group to a molecule. The allowed groups are:

SMe NMePh NMe2 CONHMe NHPh COOH OH OMe OAc NHMe NHAc Ph Me OPh SPh OPhCF3 PhCF3 OPhNMe2 PhNMe2 C%CH NMeAc SO2Me NHSO2Me CH=CH2 CH=CH-CH=CH2 CH=CH-CH=CH-CH=CH2 CH=CH-CH3 CH2CH=CH2 F Br I Indole CF3 Cl NO2 NH2 CN COOMe

The atom which will be modified should be identified using “Scan_Atoms” and “Scaffold_Type” keywords described below.

Scan_Atoms number-of-atoms-to-be-changed
atom-number-#1
atom-number-#2
…

This parameter identifies the atoms which will bear the additional functional group. For example Scan_Atoms 3 would use 3 atoms which numbers are given below, remove the hydrogens connected to them and replace them alternatively with the different groups listed with the keyword “Scan_Group”

Scaffold_Type [ Aromatic | Aliphatic ]

This parameter helps CONVERT generating functionalized scaffolds with proper geometry.

Alternative [ Y | N ]

All the groups should be added to same molecule (N) or a single a group at a time producing the corresponding number of monofunctionalized molecule (Y). Default is Y.

Generic_Group value

This parameter instructs CONVERT to replace a generic group such as R or G by the groups defined using the keyword Scan_Group (above). If the group to replace is simply a hydrogen, ignore this keyword.

Input files

Protein number-of-files
input file #1
input file #2
…

Following this keyword is the number of protein structure files used as input. These protein files should be prepared using ProCESS prior to the actual docking unless already predefined. On the following lines are the protein file names, one per line, without the file extension (.mol2).

For predefined CYPs, the files will be automatically found by IMPACTS in the package. below are the list of file names which should be used:

- CYP1A2: 2HI4_pro
- CYP2C8: 2NNI_pro
- CYP2C9: 1R9O_pro
- CYP2C19: 4GQS_pro
- CYP2D6: 3QM4_pro
- CYP2E1: 3E6I_pro
- CYP3A4: 3NXUA_pro

Ligand2D file-name

Name of the ligand file (built in sketcher2D) used to first draw the molecule in 2D (in sdf format). if this file has been used to prepare the 3D structure using CONVERT, then it can be provided here. If the 3D structure comes from another source (different atom number,…), do not use this keyword.

Ligand file-name

Name of the ligand file to be docked (in MOL2 format). This ligand files should be prepared using SMART prior to the actual docking. The ligand file can contain a single molecule or multiple molecules (multi-mol2).

Binding_Site_Cav XXXX_BindSite.mol2

Following this keyword is the file defining the cavity present in the active site (a set of spheres prepared by ProCESS). If this parameter is missing, no cavity volume filter will be used (it is highly recommended to use both Interaction_Sites and Binding_site_cav parameters).

For predefined CYPs, the files will be automatically found by IMPACTS in the package. below are the list of file names which should be used (if IMPACTS is parametrized to consider more than one CYP, the Heme residues would be provided in the same order as were the protein files above).:

- CYP1A2: Binding_Site_Cav 1A2_BindSite.mol2
- CYP2C8: Binding_Site_Cav 2C8_BindSite.mol2
- CYP2C9: Binding_Site_Cav 2C9_BindSite.mol2
- CYP2C19: Binding_Site_Cav 2C19_BindSite.mol2
- CYP2D6: Binding_Site_Cav 2D6_BindSite.mol2
- CYP2E1: Binding_Site_Cav 2E1_BindSite.mol2
- CYP3A4: Binding_Site_Cav 3A4_BindSite.mol2

Interaction_Sites XXXX_IS.mol2

Name of the file containing the interaction site description (prepared by ProCESS). If this parameter is missing, no interaction site filter will be used. (It is highly recommended to use both Interaction_Sites and Binding_site_cav). For predefined CYPs, use “none”.

Heme_Residue heme-residue-name

This parameter instructs IMPACTS to consider this heme (may be more than one in the structure) for metabolism prediction. If IMPACTS is parametrized to consider more than one CYP, the Heme residues would be provided in the same order as were the protein files above but with the keyword Heme_Residue only for the first one.

- CYP1A2: Heme_Residue HEM900
- CYP2C8: Heme_Residue HEM500
- CYP2C9: Heme_Residue HEM500
- CYP2C19: Heme_Residue HEM501
- CYP2D6: Heme_Residue HEM502
- CYP2E1: Heme_Residue HEM500
- CYP3A4: Heme_Residue HEM508

Run parameters

Mode [ Metabolism ]

This parameter instructs FITTED routines used in IMPACTS to run in IMPACTS mode.

Force_SOM [ Y | N ]

This parameter instructs IMPACTS to work only with the identified site of metabolism. These are designated by a “*” next to the atom name in the mol2 file.

Systematic_Search [ Y | N ]

This parameter instructs IMPACTS to investigate each site of metabolism one by one until they are all looked at.

Parameters [ Manual | Auto ]

This parameter instructs FITTED and IMPACTS to derive missing force field parameters automatically or not.

Lambda_Scale value

IMPACTS models the transition states as linear combinations of reactant and products. By default (Lambda_Scale=0.5), TS = 0.5 reactant + 0.5 product. A lower value would make the TS more reactant-like and a value greater than 0.5 more product-like.

Flex_Type [ Rigid | Semiflex | Flex_water | Flex ]

- Rigid (default if only one protein structure is used): The ligand is docked onto one protein structure.
- Semiflex (default if more than one protein structure is used): The ligand is docked onto multiple protein structures (requires Protein ≥ 2). Proteins can be exchanged during the evolution but not the genes corresponding to side chains or water molecules (a more complete description of this mode is given in reference 1).
- Flex_water: The ligand is docked into multiple protein structures (requires Protein ≥ 2). Similar to Semiflex, except that each water molecule evolves independently.
- Flex: The ligand is docked onto multiple protein structures (requires Protein ≥ 2). The side chains and waters are allowed to be exchanged independently from the protein backbone.

Number_of_Runs number-of-runs

More than one run per ligand can be performed (The ligand may be docked several time to ensure a complete search). If this keyword is missing, the default value 1.

Number_of_Runs_Per_Site number-of-runs-per-site-of-metabolism

this defines how many times a site can be selected in the final proposed docked metabolites. For example, if this is set to 1, if a site of metabolism is identified in the first run, it is removed from the list of potential sites of metabolism for the subsequent runs. The default value is 3.

Displaceable_Waters [ On | Off ]

Allows the user to turn off the displaceable waters. The default is on which allows displaceable waters.

Particle_Waters [ Y | N ]

Instructs the program to use particle waters (Needs to be previously added by PREPARE). Default is No.

Corner_Flap [ On | Off ]

Turns the corner flap conformational search for rings on or off. By default, it is set to Off.

Conjugate gradient/LBFGS parameters

The default values for all the keywords described in this section are recommended.

Minimization_Algorithm [ Steepest | Conjugate | LBFGS ]

There are three available optimizers: steepest descent, conjugate gradient and LBFGS. The default is conjugate.

GA_* or GI_* value

There are two sets of the following keywords: one for the parameters used during the generation of the initial population (GI_*; e.g., GI_MaxInt) and another one used during the evolution (GA_*; e.g., GA_MaxInt). The default values are recommended.

XX_MaxIter value

Maximum number of iterations. Once this number is reached the minimization is finished. The default is 20.

XX_StepSize value

Initial value of the step taken in the direction of the gradient during minimization. The default is 0.02.

XX_MaxStep value

Maximum step size allowed during minimization. The default is 1.

XX_EnergyBound value

Minimum energy difference between two molecules to be considered similar. The default is 1.0 for GI_EnergyBound and 0.001 for GA_EnergyBound.

XX_MaxSameEnergy value

Number of times that the same energy (defined by EnergyBound) can be repeated. The default is 3.

XX_MaxGrad value

Gradient convergence criteria. The default is 0.001.

CGMaxCosine value

Maximum change in direction (cosine of this angle) accepted to run a conjugate gradient. If the direction is greater than this limit, steepest descent is used. The default is 1 (maximum change in direction is 180 degrees, equivalent to function turned off).

CGResetSD value

This parameters designates the number of conjugate gradient steps before the algorithm switches to steepest descent for a single step
(ie reset the direction to the gradient). The default is -1 (function turned off).

Energy parameters

Score_Initial [ none | score | minimize ]

Scoring of the initial ligand binding mode.

- none (default): No scoring of the initial input structure is performed.
- score: Only the score of the initial input ligand is output.
- minimize: The score of the initial pose and the score of the energy minimized structure will be output.

VdWScale_1-4 value

Scaling factor for the 1,4 van der Waals interactions. The default is 1.0.

VdWScale_1-5 value

Scaling factor for the 1,5 van der Waals interactions. The default is 1.0.

E_VdWScale_Pro value

Scaling factor for the ligand-protein van der Waals interactions. The default is 1.0. For IMPACTS 0.5 is recommended.

E_VdWScale_Wat value

Scaling factor for the ligand-water van der Waals interactions. The default is set the value as the same as E_vdWScale_Pro.

ElecScale_1-4 value

Scaling factor for the 1,4 electrostatic interactions. The default is 1.0.

ElecScale_1-5 value

Scaling factor for the 1,5 electrostatic interactions. The default is 1.0.

E_ElecScale_Pro value

Scaling factor for the ligand-protein electrostatic interactions. The default is 1.0. For IMPACTS 4.0 is recommended.

E_ElecScale_Metal value

Scaling factor for the ligand-metal electrostatic interactions. This type of interaction does not apply to specifically designed zinc and iron interaction used if the Macromolecule is set to Metalloprotein. The default is 1.0.

E_CoordinationScale_Metal value

Scaling factor for the ligand-metal electrostatic interactions. This type of interaction does not apply to specifically designed zinc and iron interaction used if the Macromolecule is set to Metalloprotein. The default is 1.0.

E_HBondScale_Metal value

Scaling factor for the ligand-metal electrostatic interactions. This type of interaction applies to specifically designed zinc and iron interactions used if the Macromolecule is set to Metalloprotein_HB and to any specific catalytic hydrogen bonds in zinc metalloprotein (ex.: with neighboring Glu or His residues) if the Macromolecule is set to Metalloprotein. The default is 1.0.

E_ElecScale_Wat value

Scaling factor for the ligand-water electrostatic interactions. The default value is set the same as E_ElecScale_Pro.

E_EactScale value

Scaling factor for the energy of activation. The default value is 17.

E_FukuiScale value

Scaling factor for the reactivity index(Fukui coefficient). The default value is 0.

E_HbondScale_Pro value

Scaling factor for the ligand-protein hydrogen bond interactions. The default is 1.0.

E_HbondScale _Wat value

Scaling factor for the ligand-water hydrogen bond interactions. The default value is set the same as E_HbondScale_Pro.

Cutdist value

Cutoff distance (in Ǻ) for the non-bond interactions with the protein. The default value is 9.

Switchdist value

Switching distance (in Ǻ) for the non-bond interactions with the protein. The default value is 7.

Cutdist_Wat value

Cutoff distance for the non-bond interactions with the water molecules. The default value is 1.20.

Switchdist_Wat value

Switching distance for the non-bond interactions with the water molecules. The default is 1.75.

Solvation [ On | Off ]

Allows the user to turn off the calculation of the solvation energy. The default is On.

GB_Epsilon value

Generalized Born / surface area (GB/SA) is used to compute solvation changes upon binding. The dielectric constant value is required. The default is 78.0.

E_Entropy value

PROCESS identifies flexible residues and label them. When FITTED reads these labels it scales down interactions with these atoms to somewhat account for entropy cost associated to the freezing of these flexible side chains. If the user wants to further increase this impact, a value greater than one should be proposed. A value of 0 turns off this effect. The default is 1.

Scoring parameters

The default values for all the keywords are highly recommended as they represent the scaling factors optimized for RankScore.
Please contact us if you need to change the keywords.

Several flavors of our scoring function have been developed over the years. The deep neural network program can also be used to train a decision model (active/inactive). By default, RankScore 7 is used unless particle waters are present. In the latter case, RankScore 5 is the default.

S_Eact value

Weight of the energy of activation in the final score. The default is 1.00.

S_FukuiScale value

Weight of the reactivity of the site of metabolism as defined by the Fukui coefficient. The default is 1.00.

Initial population parameters

Pop_Size value

Population size for the genetic algorithm conformational search. When 10000 is given as value, automatic determination based on the ligand’s number of torsions is done. The default is automatic for rigid docking, 200 for flexible docking when keyword is omitted.

Resolution value

Resolution of the torsion rotation when randomly generating a new individual. The default is 120 degrees.

GI_Initial_E value

Any randomly generated individual will be discarded before being energy-minimized if greater than this energy. The default is 1.0 e10 kcal/mol.

GI_Minimized_E value

Any randomly generated individual will be discarded after being energy-minimized if the energy relative to the input conformation energy is greater than this energy. The default is 1000 kcal/mol.

Min_MatchScore value

This keyword is used only if an interaction site file is provided. If the Mode is set to Dock, Min_MatchScore is automatically calculated. Minimum match of the interaction sites. The default is 20.

Min_PharmScore value

This keyword is used only if a pharmacophore file is provided. Minimum percent match of the pharmacophore. The default is 100.

Anchor_Atom atom-number

Sequence number of the atom to be used as an anchor. This is used to identify the center of translation and rotation for the GA. If this keyword is not specified, the anchor is automatically set to the gravity center of the ligand.

Anchor_Coor x y z

Following this keyword must be the x, y and z coordinates of the protein active site center. If this keyword is not used, it is automatically set to the center of the protein active site defined by the active site (flexible) residues.

Max_Tx x

Max_Ty y

Max_Tz z

Maximum value for translation (in Å) in x, y, and z respectively. The default is 5 Å for the three values.

Max_Rxy x

Max_Ryz y

Max_Rxz z

Maximum value for rotation (in degrees) around x, y, and z axes respectively during a mutation. The default is 30 degrees.

GI_Num_of_Trials value

Maximum number of successive unsuccessful trials before exiting. The default for Mode Dock is 10,000 and for Mode VS is 1,000.

Resolution value

Resolution of the torsion rotation when randomly generating a new individual. The default is 120 degrees.

Matching_Algorithm [ On | Off ]

Turns on or off the matching algorithm. By default, it is set to On. For IMPACTS, it is recommended to turn it off.

Num_of_Top_IS value

Number of top Interactions sites that the interaction site triangles must contain at least one of. The default is 10.

Stringent_Triangles value

Is a factor by which the triangles are selected. The higher Stringent_Triangles is set, the more the matching algorithm will favor triangles that have not been used. The default value is 5.

Stringent_MS value

Is a weight factor used in calculation of Min_MatchScore. The higher this value, the stricter Min_MatchScore becomes. The default value is 4.

Genetic algorithm parameters

Max_Gen value

Determine the maximum number of generations for the genetic algorithm. The default is 175.

Seed value

Select the starting point within the random number generator. If the same run is done with the same seed, the exact same result will be obtained. If a different seed is used, the GA will follow a different path. Changing the seed helps the developers to evaluate the convergence of a run. The default is 100.

Parent_Selection [ Random | Tournament | Islands ]

Method to select parents who will produce children. The Default is random.

- Random: parents are randomly selected from the pool of individuals.
- Tournament: parents are selected using a tournament where a randomly selected of parents are ranked based on binding energy and the best ones selected.
- Islands: evolution takes place on separate islands and both parents must come from the same island.

pLearn value

Probability of energy minimization of the parents at every generation. The Default is 0.1.

pCross value

Probability of crossover at every generation. The default is 0.85.

pMut value

Probability of mutation at every generation. The default is 0.05.

pMutRot value

Probability of mutation of the orientation of the ligand at every generation. The default is 0.30.

pMutWat value

The maximum rate of mutation of the water at Max_Gen generations. The default is 0.35.

pElite value

The percentage of the best of the population to be directly passed on to the next generations. The default is 0.01.

pElite_Every_X_Gen value

pElite will be used every pElite_Every_X_Gen. The default is 2.

pElite_SSize value

The individual to be passed directly onto the next generation will be selected random from the top pElite_SSize individuals of the population. The default is 10.

pOpt value

Probability of optimization of the ligand at every generation. The default is 0.20.

Evolution [ Steady_State | Metropolis | Elite ]

- Steady_State (default): During the evolution, out of a pair of two children and their 2 parents the two best will be saved.
- Metropolis: During the evolution, out of a pair of two children and their 2 parents two individuals will be saved following the Metropolis criterion. If the children are higher in energy they are checked to see if they have a high probability to exist at room temperature. If they do they are saved.
- Elite: During the evolution, the top pop_size individuals of the children and parents will be kept for the next generation.

GA_Num_of_Trials value

Maximum number of successive unsuccessful trials to create children. The default is 1000.

Output/convergence parameters

Diff_Avg_Best value

The absolute difference between the average energy of the population and the best individual of the population. If the calculated value is below difference_avg_best then the population is considered to be converged. The default is 1.0.

Diff_N_Best value

The absolute difference in energy between the individual with the lowest energy and the individual ranked Diff_Number. If Diff_Number is defined the default value is 0.4.

Diff_Number value

The number of the indivuals to be used with Diff_N_Best. By default this criteria is not used.

MaxSameEnergy_GA value

The maximum number of generations without any improvement in the best pose before the genetic algorithm exits. The default is the value given to Max_Gen.

Time_Limit value

Maximum time spent on the generation of an individual in the initial population (in seconds). The default is 10 seconds.

Time_Limit_Evolution value

Maximum time spent on the generation of an individual during the evolution (in seconds). The default is 10 seconds.

Print_Structures [ Final | Full | None ]

Controls the output of the structures during or at the end of the docking.

- Final (default): Only the final structures will be printed.
- Full: The structures (protein and ligand) will be printed during the run along with the final structures.
- None: No structures will be printed.

Print_Num_Structures value

Select how many of the top poses are printed as MOL2 files. The default is 1.

Print_Initial_Population [ Y | N ]

The initial population is output together with scores in CSV format. The default is N.

All_Poses [ Y | N ]

Select whether a pose is printed out for every run (Y) or only the best over all the runs (N). The default is N.

Number_of_Best value

Select how many individuals to print the score, energy and RMSD during the run. The default is 1.

Print_Best_Every_X_Gen value

How often to print a summary of the run. The default is (Max_Gen + 1).

Print_Energy_Full [ Y | N ]

Controls the printout of the detailed energy contributions.

- Y (default): Print out a breakdown of the energy (bond energy, angle energy, etc.).
- N: only major terms (e.g., total energy, score) are printed out.

Print_Proteins [ Y | N ]

Controls the printout of the protein conformation. While this is unnecessary in rigid protein docking, one may want to visualize the protein conformation generated by FITTED when flexible protein mode is selected.

Equivalency_SOM [ Y | N ]

This parameters allows IMPACTS to print out the carbon oxidized when hydrogen abstraction is found. If Y (default) is selected, the carbons and not the hydrogen atoms are listed as oxidized. Otherwise (N), the hydrogens are printed out.

SELECT

Generic parameters

Main_Mode select

This parameter instructs FORECASTER to select a program. For SELECT, use “select”.

Run_Mode [ clustering | analogues | disimilars ]

Following the keyword, specify the run mode which can be either “clustering” or “analogues”.

- “clustering”: cluster molecules by similarity and print out the most representative of each cluster.
- “analogues”: given a small set of small molecules (“hit molecules”), SELECT will extract those similar to the hit molecules from a large library.
- given a small set of small molecules (hit molecules), SELEC Twill extract those dissimilar to the hit molecules from a large library.

Following the keyword, specify the run mode which can be either “clustering” or “analogues”. Although it looks identical to the parameter above, it has very distinct functions in the program and both should be used.

- “clustering”: cluster molecules by similarity and print out the most representative of each cluster.
- diversity: in this mode SELECT will compare a library to itself to assess diversity.
- “analogues”: given a small set of small molecules (“hit molecules”), SELECT will extract those similar to the hit molecules from a large library.
- “dissimilars”: given a small set of small molecules (“hit molecules”), SELECT will extract those dissimilar to the hit molecules from a large library.
- “write_fingerprint”: given a library of small molecules, SELECT will write the fingerprints of a library to a file with the name: output_filename + “_” + fingerprint + “.mfi.fpt.
- “read_fingerprint”: given a library of small molecules (“hit molecules”), SELECT will compares two different libraries that have been fingerprinted with the same fingerprint and whose fingerprint have been written to files (see “write_fingerprint” above.

Verbose [ Y | N ]

In “clustering” mode, Verbose Y will provide information on all the clustering steps.

Min_Tanimoto value

In “clustering” and “analogues” modes, this parameter decides whether molecules are similar or not. If molecules have a Tanimoto coefficient below this value, clustering can further proceed (clustering mode) or the compound is not selected as an analogue (analogues mode).

Max_Tanimoto value

In clustering and analogues mode, this parameter decides whether molecules are similar or not. If molecules have a Tanimoto coefficient below this value, clustering can further proceed (clustering mode) or the compound is not selected as an analogue (analogues mode).

Fingerprint [ MACCS | ECFP4 | motifs ]

To compute similarity, each molecule is converted into a fingerprint (array of “bits”). Three methods are available: MACCS, ECFP4 and motifs (conjugated groups). As a note, similarity coefficients (Tanimoto coefficient) are usually higher with MACCS than with ECFP4.

Number_Of_Clusters value

In “clustering”, clustering ends when this number of clusters is reached (or when no cluster has dissimilar molecules as defined above).

Analogues file-name

In “analogues” mode, this file contains the “hit molecules” as described above.

Dissimilars file-name

In “dissimilars” mode, this file contains the “hit molecules” as described above.

Print_Clusters [ Y | N ]

In “clustering” mode, a file containing one representative molecule of each cluster is output into a single file. If This parameter is used with the value Y, it instructs SELECT to print all the molecule each each cluster, one cluster per file.

Print_Max_Nber_Of_Molecules value

If Print_Clusters selected (see above), SELECT will print a maximum of this number of molecules per cluster. If the cluster is larger, the other molecules are discarded.

Print_Min_Nber_Of_Molecules value

If Print_Clusters selected (see above), SELECT will print a minimum of this number of molecules per cluster. If the cluster is smaller than this number, duplicates will be added.

Include_Hits_In_Cluster [ Y | N ]

This parameters instructs SELECT to include the hit molecules in the final library of analogues or dissimilars.

Library_Fingerprint library-name containing the .mfi.fpt extension

This keyword is used together with the read_fingerprint mode described above. It reads the library file that was written with the write_fingerprint mode.

Query_Molecules_Fingerprint_File value

This keyword is used together with the “SmallMolecule-mode read_fingerprint”. It reads the query file that was written with the write_fingerprint mode. This file will be compared to the one assigned by Library_Fingerprint. The value is the name of the file with the correct extension.

Tanimoto_Threshold value

The similarity threshold at which molecule information . This file will be printed to the csv file in the read_fingerprint mode. The value is [0.0 – 1.0].

Dissimilar_Threshold value

The dissimilarity threshold at which molecule information . This file will be printed to the csv file in the read_fingerprint mode. The value is [0.0 – 1.0].

REDUCE

Generic parameters

Main_Mode reduce

This parameter instructs FORECASTER to select a program. For REDUCE, use “reduce”.

Run_Mode reduce

Following the keyword, specify the run mode which can only be “reduce”.

Physico-chemical filters

Min_XXX value

Max_XXX value

This set of parameters instructs REDUCE to filter out molecules outside the specified range. the Filters (replace XXX by the names) are:

- Charge – net charge of the molecule
- MW – molecular weight of the molecule
- Num_of_Atoms – number of atoms of the molecule
- HBD – number of hydrogen bond donors
- HBA – number of hydrogen bond acceptors
- Rings – number of rings
- NRot – number of rotatable bonds
- Ionizable – number of ionizable groups (e,g pyridine)
- O – number of oxygen atoms
- N – number of nitrogen atoms
- S – number of sulphur atoms
- Hetero – number of heteroatoms
- Heavy number of heavyatoms (all atoms other than hydrogens)
- Halogens – number of halogen atoms
- O(-) – number of negatively charged oxygen atoms (1 for a carboxylate)
- N(+) – number of positively charge nitrogen atoms
- logP – logP
- Fsp3 – fraction of sp3 carbons
- Stereogenic_Centers – number of stereogenic centers
- Stereogenic_Complexity – stereogenic complexity
- Aromatic_Density – aromatic density
- McGowan_Vol – molecular volume as defined by McGowan
- TPSA – topological surface area
- VSA – van der Waals surface area
- nonpolSASA – non-polar solvent accessible surface area
- polarSASA – polar solvent accessible surface area
- Mol_Density – molecular density (compactness)
- Mol_Polarity – molecular polarity
- Mol_Softness – molecular softness
- Mol_Hardness – molecular hardness
- Mol_Electronegativity – molecular electronegativity
- Mol_Polarizability – molecular polarizability
- 3D-Wiener_Index – 3D-Wiener index
- Geom_Radius – geometrical radius
- Geom_Diameter – geometrical diameter
- Geom_Shape_Coeff – geometrical shape coefficient
- Span – molecular span
- Rad_Gyration – radius of gyration
- Ovality – ovality of the molecule
- Globularity – globularity of the molecule
- Dipole_Moment – molecular dipole moment
- logS – logarithm of the solvation
- BBB – blood-brain-barrier penetration (0 or 1)

Functional group filters

Filter number-of-functional-groups

Optional number-of-functional-groups
functional-group-#1
functional-group-#2
…

If Filter is used, all of the functional group criteria must be fulfilled. If optional is used, only one of them must be. For example is the Filters 2 is used both filters must be passed. If Optional 2 is used, once of the two must be passed. The functional groups currently implemented are:

- acyl_chloride / alcohol / aldehyde
- alkene / alkyl_chloride / alkyl_bromide
- alkyl_iodide / amide / terminal_amide
- anhydride / amine / primary_amine
- secondary_amine / tertiary_amine / quat_ammonium
- aniline / aromatic / azide
- boronic_acid / boronate / carbamate
- carboxylic_acid / carboxylate / ester
- hydroxamic_acid / imine / isocyanate
- ketone / lactone / lactame
- michael_acceptor / nitrile / nitro
- oxime / silicon / sulphonamide
- sulfonyl_chloride / thiol / vinyl_bromide
- vinyl_chloride / vinyl_iodide / epoxide
- aziridine / beta-lactam / alpha_fluoroketone
- alpha_chloroketone

FINDERS

Generic parameters

Main_Mode finders

This parameter instructs FORECASTER to select a program. For FINDERS, use “finders”.

Run_Mode finders | create | superpose3D | diverse | search_reagent

Following the keyword, specify the run mode. 5 modes are possible.

- finders: allows to search libraries of molecules for molecules matching a substructure.
- create: allows to generate all the possibility of a molecule bearing R groups (R can be R or H). This is used in CONSTRUCTS.
- superpose3D allows to superpose 2 molecules. Used by CONSTRUCTS to superpose catalysts, substrates and reagents onto a template.
- diverse: allows to remove duplicates in a library.
- search_reagent: allows to search libraries of molecules for molecules matching a reaction scheme (e.g., before running REACT2D).

Molecule molfilename

This parameter instructs FINDERS to read in the file identified by the file name. Do not use with Run_Mode “search_reagent” (see below).

Molecule_2 library-filename

This parameter instructs FINDERS to read in the file identified by the file name. This may be a entire catalog of chemicals in sdf format with Run_Mode “search_reagent”.

Input_File_Template file-name

This parameter instructs FINDERS to read in the template file identified by the file name.

Substructure_File file-name

This parameter instructs FINDERS to read in the substructure file identified by the file name with Run_Mode finders.

Reaction_Scheme file-name

This parameter instructs FINDERS to read in the reaction scheme file identified by the file name with Run_Mode search_reagent. The format of this file must be rxn (sdf-derived format for reactions).

Filter number-of-filters
filter #1
filter #2
…

This keyword defines the number of filters that will be used. For a complete list of functional group filters, consult REDUCE parameters.

Max_MW value

In Run_Mode “search_reagents”, This keyword instructs FINDERS to keep only molecule with a molecular equal of lower than this value.

Max_Num_Molecules value

In Run_Mode “search_reagents”, This keyword instructs FINDERS to keep only this maximum number of molecules. If two many molecules are found, FINDERS will call SELECT to select the most diverse.

Check_Stereochemistry [ Y | N ]

This keyword instructs FINDERS to match only the molecules with the exact same stereochemistry as the template or not.

Group_1 number-of-groups
X1
X2
…

This keyword instructs FINDERS to replace the groups named X1 etc on the first reagent in the reaction scheme by actual groups which are defined by Scan_Group_1 parameter given below. it is recommended to use the sketcher to prepare the reaction scheme which will include all of this information rather than using these keywords.

Scan_Group_1 number-of-groups-for-X1 number-of-groups-for-X2 …
X1
X2
…

This keyword instructs FINDERS to replace the groups named X1 etc by these groups. Allowed X groups are:

- Nitrogen protecting group: Boc Fmoc Cbz Hydrogen
- Oxygen protecting group: Bn Ac PMB TMS Hydrogen
- Ester protecting group: Bn Me Et tBu
- Leaving group: I Br Cl OTs OTf OMs OH

Group_2 number-of-groups
X1
X2
…

This keyword instructs FINDERS to replace the groups named X1 etc on the second reagent in the reaction scheme by actual groups which are defined by Scan_Group_2 parameter given below. it is recommended to use the sketcher to prepare the reaction scheme which will include all of this information rather than using these keywords.

Scan_Group_2 number-of-groups-for-X1 number-of-groups-for-X2 …
X1
X2
…

This keyword instructs FINDERS to replace the groups named X1 etc by these groups. Allowed X groups are:

- Nitrogen protecting group: Boc Fmoc Cbz Hydrogen
- Oxygen protecting group: Bn Ac PMB TMS Hydrogen
- Ester protecting group: Bn Me Et tBu
- Leaving group: I Br Cl OTs OTf OMs OH

Charging_Scheme [ DGH | DGH+ | OK | NM | MMFF | input | none ]

This parameter provides the method used to add atomic charges. DGH (Das Gupta-Hazinaga scheme), OK (Ohno-Klopman) and NM(Nishimoto-Magata) are based on the electronegativity equalization method by Rappe. “input” would force FINDERS to keep the charges (if any) already in the input file. MMFF will use the MMFF94 method.

REACT2D

Generic parameters

Main_Mode react2D

This parameter instructs FORECASTER to select a program. For REACT2D, use “react2d”.

Run_Mode react2D_create

Following the keyword, specify the run mode which can only be “react2D”.

Reaction_Scheme file-name

This parameter instructs REACT2D to read in the reaction scheme file identified by the file name with Run_Mode search_reagent. The format of this file must be rxn (sdf-derived format for reactions).

Filter number-of-filters
filter-#1
filter-#2
…

This keyword defines the number of filters that will be used. For a complete list of functional group filters, consult REDUCE parameters.

Max_MW value

In Run_Mode “search_reagents”, This keyword instructs FINDERS to keep only molecules with a mol. weight equal or lower than this value.

Max_Num_Molecules value

In Run_Mode “search_reagents”, This keyword instructs FINDERS to select only this maximum number of molecules. If two many molecules are found, FINDERS will call SELECT to select the most diverse.

Check_Stereochemistry [ Y | N ]

This keyword instructs FINDERS to match only the molecules with the exact same stereochemistry as the template or not.

Group_1 number-of-groups
X1
X2
…

This keyword instructs FINDERS to replace the groups named X1 etc on the first reagent in the reaction scheme by actual groups which are defined by Scan_Group_1 parameter given below. it is recommended to use the sketcher to prepare the reaction scheme which will include all of this information rather than using these keywords.

Scan_Group_1 number-of-groups-for-X1 number-of-groups-for-X2 …
X1
X2
…

This keyword instructs FINDERS to replace the groups named X1 etc by these groups. Allowed X groups are:

- Nitrogen protecting group: Boc Fmoc Cbz Hydrogen
- Oxygen protecting group: Bn Ac PMB TMS Hydrogen
- Ester protecting group: Bn Me Et tBu
- Leaving group: I Br Cl OTs OTf OMs OH

Group_2 number-of-groups
X1
X2
…

This keyword instructs FINDERS to replace the groups named X1 etc on the second reagent in the reaction scheme by actual groups which are defined by Scan_Group_2 parameter given below. it is recommended to use the sketcher to prepare the reaction scheme which will include all of this information rather than using these keywords.

Scan_Group_2 number-of-groups-for-X1 number-of-groups-for-X2 …
X1
X2
…

This keyword instructs FINDERS to replace the groups named X1 etc by these groups. Allowed X groups are:

- Nitrogen protecting group: Boc Fmoc Cbz Hydrogen
- Oxygen protecting group: Bn Ac PMB TMS Hydrogen
- Ester protecting group: Bn Me Et tBu
- Leaving group: I Br Cl OTs OTf OMs OH

Charging_Scheme [ DGH | DGH+ | OK | NM | MMFF | input | none ]

This parameter provides the method used to add atomic charges. DGH (Das Gupta-Hazinaga scheme), OK (Ohno-Klopman) and NM(Nishimoto-Magata) are based on the electronegativity equalization method by Rappe. “input” would force FINDERS to keep the charges (if any) already in the input file. MMFF will use the MMFF94 method.

CONSTRUCTS

Generic parameters

Main_Mode constructs

This parameter instructs FORECASTER to select a program. For CONSTRUCTS, use “constructs”.

Run_Mode constructs | orca_ace | renumbering

Following the keyword, specify the run mode which can be “constructs” or “orca_ace”.

- With “orca_ace”, CONSTRUCTS will look for the ORCA files used to optimize a given system.
  It will then generate force field parameters automatically without QUEMIST (see ORCA_Output and ORCA_Hessian).
- With “renumbering”, CONSTRUCTS will renumber atoms in QUEMIST-deived FF files and corresponding mol2 files. This may be used if the atom numbering in product and reactants used to generate the template and force field parameters are different.
- For any other use, use “constructs”.

Molecule molfilename

This parameter instructs CONSTRUCTS to read in the file identified by the file name. In this field you should add your catalysts file.

Substrates molfilename

This parameter instructs CONSTRUCTS to read in the file identified by the file name. In this field you should add your substrates file.

Reagents molfilename

This parameter instructs CONSTRUCTS to read in the file identified by the file name. In this field you should add your reagents file.
If you are not using reagents molfilename should be none.

Tautomers yes | no

This parameter instructs CONSTRUCTS to generate tautomers (if it finds any). Default = no.

Catalyst_Member residue-name

This parameter instructs CONSTRUCTS to assign the labels of atoms in the template file as being part of the catalyst. Generally, the name is CAT1.

Substrate_Member residue-name

This parameter instructs CONSTRUCTS to assign the labels of atoms in the template file as being part of the substrate. Generally, the name is REA1.

Reagent_Member residue-name

This parameter instructs CONSTRUCTS to assign the labels of atoms in the template file as being part of the reagent. Generally, the name is REA2.

Forcefield_Reactant number-of-files
file-reactant-#1.mol2
file-reactant-#2.mol2
…

This parameter instructs CONSTRUCTS to read the reactant geometry optimized files for which customized force field parameters have been generated using QUEMIST.

Forcefield_Product number-of-files
file-product-#1.mol2
file-product-#2.mol2
…

This parameter instructs CONSTRUCTS to read the product geometry optimized files for which customized force field parameters have been generated using QUEMIST.

ORCA_Output ORCA_output_filename

This parameter is used in orca_ace mode. CONSTRUCTS will look for the ORCA files used to optimize a given system. It will then generate parameters automatically without QUEMIST. See ORCA_Hessian parameter below.

ORCA_Hessian ORCA_Hessian_filename

This parameter is used in orca_ace mode. CONSTRUCTS will look for the ORCA files used to optimize a given system. It will then generate parameters automatically without QUEMIST. See ORCA_Output parameter above.

Number_of_TS_Templates number-of-templates
template-description-#1
template-description-#2
…

This parameter instructs CONSTRUCTS to read the number of TS templates and the type of templates it should generate. For more information on how to set up this keyword please follow the VIRTUAL CHEMIST tutorial.

Remove_Atoms template-number #-of-atoms-to-be-removed list-of-atoms-to-be-removed

This parameter instructs CONSTRUCTS to remove atoms from the products/reactant to make the template (e.g., to replace CH3 by R). For example, if we want to remove 5 atoms (#3, #7, #9, #12 and #15) from template 1, the keyword would like this:

Remove_Atoms 1 5 3 7 9 12 15

Remove_Bonds template-number #-of-bonds-to-be-removed bondType atom1 atom2

This parameter instructs CONSTRUCTS to remove bonds from the template product/reactant (prodOrReact: [product | reactant | both ]).

Add_Bonds template-number #-of-bonds-to-be-added prodOrReact atom1 atom2 prodOrReact_2 atom1_2 atom2_2….

This parameter instructs CONSTRUCTS to add a bond to the template product/reactant (prodOrReact: [product | reactant | both ]).

Add_Dummy_Bonds template-number #-of-bonds-to-be-added prodOrReact atom1 atom2 prodOrReact_2 atom1_2 atom2_2….

This parameter instructs CONSTRUCTS to add a dummy bond to the template product/reactant (prodOrReact: [product | reactant | both ]). This can be used to ensure the catalyst when clipped down to a template remains in one piece. For example, when P-CH2-CH2-CH2-P is clipped to P-R R-P, a dummy bond between to the phosphorous atom may be added.

Catalyst_Atoms template-number #-of-atoms-in-the-catalyst list-of-atoms-in-the-catalyst

This parameter instructs CONSTRUCTS to assign atoms from the template to the catalyst. For example, if we want to assign 5 atoms (#3, #7, #9, #12 and #15) from template 1, the keyword would like this:

Catalyst_Atoms 1 5 3 7 9 12 15

Substrate_Atoms template-number #-of-atoms-in-the-substrate list-of-atoms-in-the-substrate

This parameter instructs CONSTRUCTS to assign atoms from the template to the substrate. For example, if we want to assign 5 atoms (#3, #7, #9, #12 and #15) from template 1, the keyword would like this:

Substrate_Atoms 1 5 3 7 9 12 15

Reagent_Atoms template-number #-of-atoms-in-the-reagent list-of-atoms-in-the-reagent

This parameter instructs CONSTRUCTS to assign atoms from the template to the reagent. For example, if we want to assign 5 atoms (#3, #7, #9, #12 and #15) from template 1, the keyword would like this:

Reagent_Atoms 1 5 3 7 9 12 15

Flip_Ring template-number #-of-atoms-to-flip list-of-atoms-to-flip

This parameter instructs CONSTRUCTS to assign atoms from the template to a ring and then flip it. For example, if we want to assign 5 atoms (#3, #7, #9, #12 and #15) from template 1 (e.g., #7 in the ring connected to #4, #9, #12 and #15), the keyword would like this:

Flip_Ring 1 5 3 7 9 12 15

Number_of_TSs value

This parameter instructs CONSTRUCTS to consider how many TSs it should read to prepare when assembling TS structures.

Num_Of_Cores value

This parameter instructs CONSTRUCTS to split TSs in multiple files if ACE is to be run on multiple cores.

Renumbering_Atoms number_of_atoms_to_renumber
atom1_# new_atom1_#
atom2_# new_atom2_#
…

For ACE and CONSTRUCS to work properly, the atom numbering in the reactant and product should be the same. In the case, it is not, this parameter instructs CONSTRUCTS to renumber the atoms.

QUEMIST

Generic parameters

Main_Mode qm

This parameter instructs FORECASTER to select a program. For QUEMIST, use “qm”.

Run_Mode [ rhf | dft ]

Following the keyword, specify the run mode either rhf (restricted Hartree-Fock) of dft (Density Functional Theory).

Molecule molfilename

Provide the name for the molecule file (can contain more than one molecule). Supported format: mol2.

QM_Method [ hf | dft ]

Following this keyword, provide the method to be used.

Charge value

Following this keyword, provide the net charge of the molecule. This keyword is optional. If not provide, the charge will be computed automatically.

Multiplicity value

Following this keyword, provide the multiplicity of the molecule. This keyword is optional. If not provide, the multipicity will be computed automatically.

SCF and geometry optimization parameters

Geometry_Optimization [ yes | no ]

This parameter instruct Quemist to perform a geometry optimization or not (single point energy). Be aware that geometry optimization may take a long time (done in Cartesian coordinates) and can only be done on Linux.

Geometry_Optimization_Method RFO | LBFGS

This parameter instruct QUEMIST to perform a geometry optimization using either the Rational Function Optimization (RFO) or limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm. Both algorithms are efficient but we suggest RFO. Default = RFO.

Max_Opt_Steps value

This parameter defines the maximum number of geometry optimization steps. Default: 500.

Compute_Hessian [ yes | no ]

This parameter instruct Quemist to compute the Hessian. Be aware that computing a Hessian should only be used in conjunction with geometry optimization, may take a long time and can only be done on Linux.

Max_SCF_Iter value

This parameter defines the maximum number of SCF iteractions. Default: 150.

Integral_Threshold value

Following this keyword, provide the threshold for integral values. Do not change this value unless absolutely required. Default: 1e-10.

Energy_Threshold value

Following this keyword, provide the threshold for SCF convergence. Do not change this value unless absolutely required. Default: 1e-7.

DIIS_Threshold value

Following this keyword, provide the threshold for SCF convergence. Do not change this value unless absolutely required. Default: 1e-7.

RMSDP_Threshold value

Following this keyword, provide the threshold for SCF convergence. Do not change this value unless absolutely required. Default: 1e-7.

MAXDP_Threshold value

Following this keyword, provide the threshold for SCF convergence. Do not change this value unless absolutely required. Default: 1e-7.

Damping_Factor value between 0 and 1

Following this keyword, provide the damping factor for building the density matrix in the first few iterations during SCF. Do not change this value unless absolutely required. A large damping factor increases the number of iterations necessary to reach convergence but provides stability to pathological convergence cases. Default = 0.70.

Guess [ sad | core | gwh ]

Following this keyword, provide the method used to derive the initial guess. This option is useful only on Windows. On Linux, the initial guess is obtained from a STO-3G calculation/projection of orbitals onto actual basis set. Default: sad.

MP2_Correction [ yes | no ]

This parameters instructs whether MP2 correction is applied to the final geometry. Be aware that this can only be done on Linux and is very time consuming (no frozen core implemented). Default: no.

Basis_Set value

This parameter is used to define the basis set to be used. A list of the available basis sets is available in the tutorial. Default: minix.dat.

Functional value

This parameter is used to define the functional to be used. A list of the available functionals is available in the tutorial. Default: PBE0.

No_DFT_Cores value

This parameter is used to define number of cores the user would like to run the calculation on.

Integral_Scheme huzinaga | obara-saika | hgp

This parameter is used to define the integral scheme used to compute integrals in Windows. Default = obara-saika.

ACE

Generic parameters

Main_Mode ace

This parameter instructs FORECASTER to select a program. For ACE, use “ace”.

Run_Mode ace

Following the keyword, specify the run mode which can only be “ace”.

Reactant reactant-file-name

Following the keyword, specify the file name which includes the reactant structure. It is recommended to use CONSTRUCTS to prepare it.

Product product-file-name

Following the keyword, specify the file name which includes the product structure. It is recommended to use CONSTRUCTS to prepare it.

Forcefield MM3.txt

Following the keyword, specify the FF to use. ACE uses MM3 nd MM3.txt (provided) is used by default.

Forcefield_Reactant number-of-files
FF-parameter-file-#1
FF-parameter-file-#2
…

If you used QUEMIST to derive FF parameter files for the templates, and used these templates with CONSTRUCTS, the FF files should be listed here in the same order as they were used in CONSTRUCTS (CONTRUCTS annotates each structure with the template number.

Forcefield_Product number-of-files
FF-parameter-file-#1
FF-parameter-file-#2
…

If you used QUEMIST to derive FF parameter files for the templates, and used these templates with CONSTRUCTS, the FF files should be listed here in the same order as they were used in CONSTRUCTS (CONTRUCTS annotates each structure with the template number.

Forcefield_TS number-of-files
FF-parameter-file-#1
FF-parameter-file-#2
…

If you are using TS FF (such as those derived using Q2MM, make sure it is in the proper ACE format and list them using this keyword.

Parameters [ auto | no ]

This keyword instructs (auto) ACE to derive parameters automatically if some are missing.

Number_of_Runs value

This keyword instructs ACE to carry out this number of runs on each structure.

Scan_Lambda [ Y | N ]

This parameter instructs ACE to compute structures with various lambda values. This applies to the starting structure and the final structures.

ForceDoubleBond [ Y | N ]

This parameter instructs ACE to force the double bond geometry in the TS structures.

SCF and geometry optimization parameters

Geometry_Optimization [ yes | no ]

This parameter instruct Quemist to perform a geometry optimization or not (single point energy). Be aware that geometry optimization may take a long time (done in Cartesian coordinates) and can only be done on Linux.

Max_Opt_Steps value

This parameter defines the maximum number of geometry optimization steps. Default: 500.

Compute_Hessian [ yes | no ]

This parameter instruct Quemist to compute the Hessian. Be aware that computing a Hessian should only be used in conjunction with geometry optimization, may take a long time and can only be done on Linux.

Max_SCF_Iter value

This parameter defines the maximum number of SCF iterations. Default: 150.

Integral_Threshold value

Following this keyword, provide the threshold for integral values. Do not change this value unless absolutely required. Default: 1e-10.

Energy_Threshold value

Following this keyword, provide the threshold for SCF convergence. Do not change this value unless absolutely required. Default: 1e-7.

DIIS_Threshold value

Following this keyword, provide the threshold for SCF convergence. Do not change this value unless absolutely required. Default: 1e-7.

RMSDP_Threshold value

Following this keyword, provide the threshold for SCF convergence. Do not change this value unless absolutely required. Default: 1e-7.

MAXDP_Threshold value

Following this keyword, provide the threshold for SCF convergence. Do not change this value unless absolutely required. Default: 1e-7.

Guess [ sad | core | gwh ]

Following this keyword, provide the method used to derive the initial guess. This option is useful only on Windows. On Linux, the initial guess is obtained from a STO-3G calculation/projection of orbitals onto actual basis set. Default: sad.

MP2_Correction [ yes | no ]

This parameters instructs whether MP2 correction is applied to the final geometry. Be aware that this can only be done on Linux and is very time consuming (no frozen core implemented). Default: no.

Basis_Set value

This parameter is used to define the basis set to be used. A list of the available basis sets is available in the tutorial. Default: minix.dat.

Functional value

This parameter is used to define the functional to be used. A list of the available functionals is available in the tutorial. Default: PBE0.

No_DFT_Cores value

This parameter is used to define number of cores the user would like to run the calculation on.

Defining the transition state

Lambda value atom-type

ACE computes the transition state as a linear combination of reactants and products. Lambda ranging from 0 to 1 defines how early/late is the TS. This value is followed by an atom type which defines the interactions which will be treated with this value. Thus more than one lambda may be used modeling an asynchronous reaction. The possible atom types are:

- * – Generic Lambda
- H1 – any H but NH/OH/SH
- H2 – H-O (alcohol)
- H3 – H-N (amine/imine)
- H4 – H-O-C=O (carboxylic acid)
- H5 – H-N-C=O (amide)
- H6 – H-S (thiol)
- H7 -H-N+ (ammonium)
- H8 – H-O-Csp2 (phenol or enol)
- C3* – Generic sp3 C
- C2* – Generic sp2 C
- CO* – sp2 C in carbonyl
- C2C2 – conjugated sp2 C
- C2AR – conjugated aromatic C
- C1* – generic sp C
- C2=O – Csp in C=C=O (ketene)
- C1== – C sp in C=C=C (allene)
- C2CO – C2* next to C=O and H
- C2OO – C sp2 next to CO and CO
- C2O2 – C sp2 next to CO and C2*
- C2O3 – C sp2 next to CO and C3*
- C2C2 – C sp2 next to C2 and H
- C2CC – C sp2 next to two sp2 C atoms
- C223 – C sp2 next to C2 and C3
- C2C3 – C sp2 next to C3 and H
- C233 – C sp2 next to C3 and C3
- N3* – generic sp3 N
- N2* – generic sp2 N
- N1* – generic sp N
- N2PR – N in pyrrole
- N2EN – Enamine/imine
- N2C+ – Imminium
- N34+ – N sp3 in ammonium
- N3SO – N in sulfonamide
- N2CO – N in amide
- N3-N – N in hydrazine
- N3O3 – N in hydroxylamine (R2N-OH)
- N4O2 – N in nitro
- N2PY – N sp2 in pyridine
- N2P+ – N sp2 in pyridinium
- N2N3 – center N in Azide (N3)
- N2OH – N sp2 in oxime (=N-OH)
- N2=N – N sp2 in azo group
- N2AZ – N sp2 in azoxy (=N-O)
- O3* – generic sp3 O
- O3CY – O sp3 in epoxide
- O3A2 – O sp3 in anhydride
- O3CO – O-H in carboxylic acids
- O3A – equatorial O in n-OsO4
- O3E – axial O in N-OsO4
- O2* – generic sp2 O
- O2CO – O carboxylate
- O207 – O=C in anhydride
- O202 – O=C in carboxylic acid
- O2A – equatorial O in n-OsO4
- O2E – axial O in N-OsO4
- S2=C – S in thioketone
- S2* – generic sp2 S
- Br* – Bromine
- Cl* – Chlorine
- F1* – Fluorine
- I1* – Iodine
- B3* – Boron trigonal
- B4* – Boron tetrahedral
- Osm – Osmium in N-OsO4
- Al* – Aluminium
- CU* – Copper
- ZN* – Zinc

Reaction conditions

Solvation [ final | all ]

This keyword instructs ACE to compute the solvation energy on the final structure (final) or after each optimization.

Solvation_factor_1 value

This keyword instructs ACE to scale the polar contribution of the solvation energy (GB/SA). Default: 1.

Solvation_factor_2 value

This keyword instructs ACE to scale the non-polar contribution of the solvation energy (GB/SA). Default: 1.

Epsilon value

This keyword provides the dielectric constant of the solvent.

Temperature_Celsius value

This keyword provides the temperature at which the reaction is performed (in Celsius).

Polarizable [ Full | Partial | No ]

The atomic charges can be fluctuating with ACE. If No is selected, the charges from the MM3 FF will be used. If Full is used, the charges are recomputed every time a new conformation is generated and after every geometry optimization. If Partial is used, this is only done when generating the initial population. When the initial population is complete, average charges from the entire populations are assigned on each atom.

Charging_Scheme [ DGH | OK | NM ]

This parameter provide the method used to add atomic charges. DGH (Das Gupta-Hazinaga scheme), OK (Ohno-Klopman) and NM (Nishimoto-Magata) are based on the electronegativity equalization method by Rappe.

Rigid_Rings [ yes | rigid ]

This keyword instructs ACE to flip rings or not.

Convergence criteria

Maximum_number_of_same_pop value

This keyword instructs ACE that convergence is reached if no changes in the population is found after this number of generations.

Maximum_number_of_same_pop value

This keyword instructs ACE that convergence is reached if no changes in the population is found after this number of generations.

MaxSameEnergy_GA value

This keyword instructs ACE that convergence is reached if the average energy of the population has not changed by more than this value.

Time_Limit_Per_Run value

Maximum time allowed to ACE to complete a run. Default is 21600 (6 hours).

Time_Limit_Per_Generation value

Maximum time allowed to ACE to complete a generation (genetic algorithm). Default is 600 (10 minutes).

Informations output

print_energy_full [ Y | N ]

This keyword instructs ACE to output all the energy terms rather than just the total potential energy.

Conjugate gradient/LBFGS parameters

The default values for all the keywords described in this section are recommended.

Minimization_Algorithm [ Steepest | Conjugate | LBFGS ]

There are three available optimizers: steepest descent, conjugate gradient and LBFGS. The default is conjugate.

GA_* or GI_* value

There are two sets of the following keywords: one for the parameters used during the generation of the initial population (GI_*; e.g., GI_MaxInt) and another one used during the evolution (GA_*; e.g., GA_MaxInt). The default values are recommended.

XX_MaxIter value

Maximum number of iterations. Once this number is reached the minimization is finished. The default is 20.

XX_StepSize value

Initial value of the step taken in the direction of the gradient during minimization. The default is 0.02.

XX_MaxStep value

Maximum step size allowed during minimization. The default is 1.

XX_EnergyBound value

Minimum energy difference between two molecules to be considered similar. The default is 1.0 for GI_EnergyBound and 0.001 for GA_EnergyBound.

XX_MaxSameEnergy value

Number of times that the same energy (defined by EnergyBound) can be repeated. The default is 3.

XX_MaxGrad value

Gradient convergence criteria. The default is 0.001.

MC_MaxIter value

Maximum number of iterations in the Monte Carlo search.

CGResetSD value

This parameters designates the number of conjugate gradient steps before the algorithm switches to steepest descent for a single step
(ie reset the direction to the gradient). The default is -1 (function turned off).

Genetic algorithm and Monte Carlo

GI_Initial_E value

Any randomly generated individual will be discarded before being energy-minimized if greater than this energy. The default is 1.0 e10 kcal/mol.

GI_Minimized_E value

Any randomly generated individual will be discarded after being energy-minimized if the energy relative to the input conformation energy is greater than this energy. The default is 1000 kcal/mol.

max_gen_GA value

Determine the maximum number of generations for the genetic algorithm. The default is 150.

max_gen_MC value

Determine the maximum number of iterations for the Monte Carlo Search. The default is 150.

Resolution value

Resultion of the torsion rotation when randomly generating a new individual. The default is 6 degrees.

Seed value

Select the starting point within the random number generator. If the same run is done with the same seed, the exact same result will be obtained. If a different seed is used, the GA will follow a different path. Changing the seed helps the developers to evaluate the convergence of a run. The default is 100.

pLearn value

Probability of energy minimization of the parents at every generation. The Default is 0.1.

pCross value

Probability of crossover at every generation. The default is 0.85.

pMut value

Probability of mutation at every generation. The default is 0.05.

pOpt value

Probability of optimization of the ligand at every generation. The default is 0.20.

pMix value

Probability to mix individual from two different island (5 islands). The default is 1 (free exchange between island = no islands).

Evolution[ Steady_State | PSO ]

Steady_State (default): During the evolution, out of a pair of two children and their 2 parents the two best will be saved.

PSO: particle swarm optimization.

Pop_Size value

Population size for the genetic algorithm conformational search. Default: 100.

Print_Structures [ Final | Full | None ]

Controls the output of the structures during or at the end of the docking.

- Final (default): Only the final structures will be printed.
- Full: The structures will be printed during the run along with the final structures.
- None: No structures will be printed.

Print_Best_Every_X_Gen value

How often to print a summary of the run. The default is (Max_Gen + 1).

Number_of_Best value

Select how many individuals to print the energy when the optimization is complete. The default is 1.

Print_Num_Structures value

Select how many of the top structures are printed as MOL2 files. The default is 1.

ANN

Generic parameters

Main_Mode ann

This parameter instructs FORECASTER to select a program. For ANN, use “ann”.

Run_Mode rankscore

Following the keyword, specify the run mode which can only be “rankscore”.

UTILITIES

Generic parameters

Main_Mode utilities

Instructs FORECASTER to select FILECONVERSION as the program

Run_Mode [ fileconversion| split_library | plot_distributions ]

Following the keyword, specify the run mode which can only be “fileconversion”.

Converting between file formats

Molecule molfilename.smi/sdf/mol2/xyz

This parameter instructs the program to read in the file identified by the file name.

Format [ xyz | sdf | mol2 ]

When using the fileconversion run mode, this parameter instructs the program to format the output accordingly. Currently, writing a SMILES file is unavailable.

Rotate_Torsion a1 a2 a3 a4 phi (in degrees)

This parameter instructs the program to consider that in the case the user prepares the keyword files using the UI, where xyz coordinates can be saved in the converter box, any torsion can be adjusted by a given value using this keyword.

Adjust_Bond_Length a1 a2 value (in Angstrom)

This parameter instructs the program to consider that in the case the user prepares the keyword files using the UI, where xyz coordinates can be saved in the converter box, any bond length can be adjusted by a given value using this keyword.

2D_Or_3D [ 2D | 3D ]

When using the fileconversion run mode, this parameter instructs the program to write the output structures in either 2D (sdf) or 3D (mol2) if the input is a SMILES file.

Molecule_Name_Field [ Before | After | none ]

This parameter instructs the program to consider that in the case of a SMILES file used as input, the SMILES format may include the molecule name before or after the SMILES string.

Splitting libraries of variable sizes

Input_Type [ sdf | mol2 ]

The format in which the original library is in. Can be sdf or mol2.

Number_of_Files value

This parameter instructs the program to split the library in the # of files selected by the user.

Number_of_Mol_Per_File value

This parameter instructs the program to assign the user-defined number of molecules per file. The program will automatically detect the number of files it needs to create. NOTE: The keywords Number_of_Files and Number_of_Mol_Per_File are mutually exclusive – if one is chosen, the other one is automatically ignored.

Type_of_Splitting [ sequential | randomized ]

This parameter instructs the program to split the original library in a sequential or randomized fashion. If sequential is chosen, the library will be split in the order that molecules appear in the file. If randomized is chose, the library molecules will be randomly assigned to the split library files.

Balance_Libraries [ yes | no ]

This parameter instructs the program to split the original library in a balanced fashion, by first calculating the molecular weight of each compound and then assigning compounds of varying molecular weights to the split library files.

POST-PROCESSING

Generic parameters

Main_Mode data_postprocessing

This parameter instructs FORECASTER to select a program. For PROCESSING, use “data_postprocessing”.

Run_Mode data_postprocessing

Following the keyword, specify the run mode which can be “data_postprocessing” or “auroc”.

- With “data_postprocessing”, users can extract the “best” molecules from a large screening directly from the docking (FITTED) result files.
- With “auroc”, users can evaluate the accuracy of the docking program in retrospective studies. AUROC values may be computed and curves plotted.

Specific parameters

Result_Files number_of_files
file_name_#1 [ actives | inactives ]
file_name_#2 [ actives | inactives ]
…

This parameter enables user to indicate how many docking results files they want analyzed. If used in conjunction with Compute_AUROC (see below), use the actives/inactives designation (each file should contain either actives or inactives but not a mix).

Results_Mode [ covalent | non-covalent | cyp_inhibition | cyp_induction ]

This parameter informs the data processing algorithm that docking was carried out with this mode.

Macromolecule [ metalloprotein | protein | RNA | DNA ]

This parameter informs the data processing algorithm that docking was carried out with this type of macromolecule.

Print_Energy_Full atom1 atom2 distance

This parameter informs the data processing algorithm that docking was carried out with this level of output.

Sort_By_Column_Number column_number

This parameter instructs the program that this column is to be used to sort the molecules. This parameter cannot be used with Sort_By_Column_Heading.

Sort_By_Column_Heading column_number

This parameter instructs the program that this column is to be used to sort the molecules. This parameter cannot be used with Sort_By_Column_Number.

Top_Compounds value

This parameter instructs the program to output this many “best” molecules.

Compute_AUROC [ Y | N ]

This parameter instructs the program to compute AUROC values.

Print_AUROC [ Y | N ]

Used in conjunction with Compute_AUROC, this parameter instructs the program to plot AUROC curves.

Molecule_Name_Field [ Before | After | none ]

SMILES strings may contain the molecule name which may be given before the SMILES string or after. Default is none

CREATE

Generic parameters

Main_Mode tbd

This parameter instructs FORECASTER to select CREATE as the program

Run_Mode tbd

Following the keyword, specify the run mode

DIVERSE

Generic parameters

Main_Mode diverse

This parameter instructs FORECASTER to select DIVERSE as the program

Run_Mode diverse

Following the keyword, specify the run mode

For additional parameters, consult the FINDERS parameters.

Guides

System Requirements

Frequently Asked Questions

Is the software free for academics?

Do I need a special computer to run MFI’s software solutions?

I’ve set up my virtual experiment and it won’t execute (run). What’s wrong?

The graphical user interface won’t start. What do I do?

How do I deﬁne the protein active site?

What is the difference between the score and the energy values in FITTED?

Patch Updates

2024 Q1 Patch Update

2022 Q1 Patch Update

2021 Q3 patch release

Parameters

FORECASTER and VIRTUAL CHEMIST

MATCH-UP

MUTATE

PREPARE

SPLASH'EM

PROCESS

SMART

MAPS

FITTED

IMPACTS

CONVERT

SELECT

REDUCE

FINDERS

REACT2D

CONSTRUCTS

QUEMIST

ACE

ANN

UTILITIES

POST-PROCESSING

CREATE

DIVERSE