DeepChemFeatureGenerator
- class mastml.feature_generators.DeepChemFeatureGenerator(featurize_df, deepchem_featurizer, **kwargs)[source]
Bases:
BaseGeneratorClass that is used to create molecular property features using the DeepChem and RDKit packages. A list of featurizers is given here: https://github.com/deepchem/deepchem/blob/master/deepchem/feat/molecule_featurizers
Note that some featurizers require installation of additional packages (a ModuleNotFoundError will be thrown), and, for this feature generator, only routines that take SMILES strings as input will work. These include RDKitDescriptors, Mol2VecFingerprint, AtomicCoordinates, CircularFingerprint, PubChemFingerprint, etc.
- Args:
featurize_df: (pd.DataFrame), dataframe containing vector of SMILES strings to generate molecular features from
deepchem_featurizer: (string), name of DeepChem featurizer to use. See https://github.com/deepchem/deepchem/blob/master/deepchem/feat/__init__.py for options. Use “RDKitDescriptors” to make molecular features based on feature generation scheme in RDKit package.
- **kwargs: additional keyword arguments to pass to the featurizer (see docs in Github link above). For example,
- for the RDKitDescriptors, options include:
- use_fragment: bool, optional (default True)
If True, the return value includes the fragment binary descriptors like ‘fr_XXX’.
- ipc_avg: bool, optional (default True)
If True, the IPC descriptor calculates with avg=True option. Please see this issue: https://github.com/rdkit/rdkit/issues/1527.
- is_normalized: bool, optional (default False)
If True, the return value contains normalized features.
- use_bcut2d: bool, optional (default True)
If True, the return value includes the descriptors like ‘BCUT2D_XXX’.
- labels_only: bool, optional (default False)
Returns only the presence or absence of a group.
- If both labels_only and is_normalized are True, then is_normalized takes
precendence and labels_only will not be applied.
- Methods:
- fit: pass through, copies input columns as pre-generated features
- Args:
X: (pd.DataFrame), input dataframe containing X data
y: (pd.Series), series containing y data
- transform: generate the elemental feature matrix from composition strings
- Args:
None.
- Returns:
X: (dataframe), output dataframe containing generated features
y: (series), output y data as series
Methods Summary
fit([X, y])transform([X])Methods Documentation