K-talysticFlow (KAST) Documentationο
K-atalystic Automated Screening Taskflow β Automated Deep Learning for Molecular Bioactivity Prediction
π Overview Β· π Installation Β· β‘ Quick Start Β· π GitHub
What is KAST?ο
K-talysticFlow (KAST) is an open-source pipeline that democratizes the use of deep learning for molecular bioactivity prediction in drug discovery and virtual screening workflows. KAST was developed at the Laboratory of Molecular Modeling (LMM-UEFS) to provide researchers with a reproducible, end-to-end solution β from data preparation to prediction β without requiring deep expertise in machine learning infrastructure.
The pipeline is built on DeepChem and TensorFlow, using Morgan/ECFP fingerprints as molecular descriptors and a MultitaskClassifier neural network trained from scratch on user-provided bioactivity data.
What can you use KAST for? Here are some examples:
Predict the bioactivity of small drug-like molecules against a biological target
Rank large compound libraries by predicted probability of activity
Train a custom deep learning model using your own active/inactive dataset
Evaluate model quality with ROC-AUC, enrichment factor, and cross-validation
Export ranked candidate lists for downstream experimental validation
KAST is a machine learning training and inference tool β it learns from your data and builds a target-specific model. It does not ship with pre-trained models for arbitrary targets.
Quick Startο
The fastest way to get started is to set up the Conda environment and launch the interactive menu:
conda env create -f environment.yml
conda activate ktalysticflow
python main.py
Then follow the step-by-step pipeline:
[1] Prepare Data β Clean and organize your SMILES dataset
[2] Featurize β Generate Morgan/ECFP fingerprints
[3] Train Model β Build your deep learning model from scratch
[4] Evaluate β ROC-AUC, cross-validation, enrichment factor
[5] Predict β Screen new molecules and export ranked results
Aboutο
KAST is developed and maintained at the Laboratory of Molecular Modeling (LMM-UEFS) by KΓ©ssia Souza Santos. Contributions, issues, and suggestions are welcome via the GitHub repository.
Funding: This project was developed with support from CNPq (undergraduate research scholarship, PIBIC/IC) and is currently continued under a CAPES graduate research scholarship (MSc).