PDStoolkit (Process Data Science toolkit) is a Python package that has modules designed to make development of data-science solutions for process systems engineering (PSE) faster and easier. These modules can be used for building tools for process monitoring, modeling, fault diagnosis, forecasting, etc. These modules have been built using classical ML methods like PCA, PLS, ICA, GMM which are already available in Sklearn and therefore, in majority of the cases, the modules in PDStoolkit package extends the base Sklearn classes to provide the additional functionalities which enable easy development of the aforementioned tools.
Why was PDStoolkit developed?
Although the base Sklearn modules are convenient to use for creating basic applications, they are found missing several functionalities for building advanced tools for PSE. For example, the ICA class does not have any function that can provide fault detection and diagnosis metrics. While it is not difficult to custom-code these additional functionalities, not everybody has the requisite (underlying mathematical) know-how and interest in learning the required details. PDStoolkit has been developed to fill this gap for process data citizens and scientists.
What does PDStoolkit currently have?
The modules in the package currently are:
- PDS_PCA: Principal Component analysis for Process Data Science
- This class is a child of sklearn.decomposition.PCA class
- The following additional methods are provided
- computeMetrics: computes monitoring indices (Q / SPE, T2) for the supplied data
- computeThresholds: computes thresholds / control limits for the monitoring indices from training data
- draw_monitoring_charts: draws monitoring charts for the training or test data
- detect_abnormalities: detects if the observations are abnormal or normal samples
- get_contributions: returns abnormality contributions for T2 and SPE for an observation sample
- PDS_PLS: Partial Least Squares for Process Data Science
- This class is a child of sklearn.cross_decomposition.PLSRegression class
- The following additional methods are provided
- computeMetrics: computes monitoring indices (SPEx, SPEy, T2) for the supplied data
- computeThresholds: computes the thresholds / control limits for the monitoring indices from training data
- draw_monitoring_charts: draws the monitoring charts for the training or test data
- detect_abnormalities: detects if the observations are abnormal or normal samples
- PDS_DPCA: Dynamic Principal Component analysis for Process Data Science
- This class is a child of sklearn.decomposition.PCA class
- The following additional methods are provided
- computeMetrics: computes monitoring indices (Q / SPE, T2) for the supplied data
- computeThresholds: computes thresholds / control limits for the monitoring indices from training data
- draw_monitoring_charts: draws the monitoring charts for the training or test data
- detect_abnormalities: detects if the observations are abnormal or normal samples
- PDS_DPLS: Dynamic Partial Least Squares for Process Data Science
- This class is a child of sklearn.cross_decomposition.PLSRegression class
- The following additional methods are provided
- computeMetrics: computes monitoring indices (SPEx, SPEy, T2) for the supplied data
- computeThresholds: computes thresholds / control limits for the monitoring indices from training data
- draw_monitoring_charts: draws the monitoring charts for the training or test data
- detect_abnormalities: detects if the observations are abnormal or normal samples
- PDS_CVA: Canonical Variate Analysis for Process Data Science
- This class is written from scratch
- The following additional methods are provided
- computeMetrics: computes monitoring indices (Ts2, Te2, Q) for the supplied data
- computeThresholds: computes thresholds / control limits for the monitoring indices from training data
- draw_monitoring_charts: draws the monitoring charts for the training or test data
- detect_abnormalities: detects if the observations are abnormal or normal samples
More modules will be added to the package over time. Our plan is to provide modules for all the different methodologies introduced in our books ‘Machine Learning in Python for process Systems Engineering‘ and ‘Machine Learning in Python for Dynamic Process Systems‘.
Sample Usage
For illustration, we will build a PCA-based process monitoring model. The schematic below summarizes the underlying methodology and is explained in detail in the book.
The code snippet below gives a quick overview on how to develop the monitoring model and perform fault diagnosis using the PDStoolkit package. For complete code and results, check out the Jupyter notebook here. The package can be installed via pip install PDStoolkit
.
# import package
from PDStoolkit import PDS_PCA
# fit PDS_PCA model
pca = PDS_PCA()
pca.fit_4_monitoring(data_train_normal, autoFindNLatents=True, method='statistical', alpha=0.01)
# fault detection and fault diagnosis on test data
pca.detect_abnormalities(data_test_normal, title='test data')
T2_contri, SPE_contri = pca.get_contributions(data_test_normal)
We hope that readers find the package useful and look forward to your feedback/suggestions.