Understanding Outputs

How to read and interpret KAST results.

Output Folder Structure

All results are saved inside your active workspace folder (workspaces/<your_workspace>/):

workspaces/<your_workspace>/
├── prepared_data/
│   └── invalid_smiles_report.txt        # Audit report of rejected SMILES
├── 01_train_set.csv                     # Training data
├── 01_test_set.csv                      # Test data
├── trained_model/                       # Trained neural network files
├── 4_0_evaluation_report.txt           # Main metrics
├── 4_1_cross_validation_results.txt    # Cross-val scores
├── 4_2_enrichment_factor_results.txt   # Enrichment analysis
├── 4_3_tanimoto_similarity_results.txt # Similarity analysis
├── 4_4_learning_curve_results.txt      # Learning curve data
├── <custom_filename>.csv     # Prediction results (Default: 05_new_molecule_predictions.csv)
├── 4_0_roc_curve.png
├── 4_4_learning_curve.png
├── 4_2_enrichment_curve.png
├── 4_3_tanimoto_similarity_histogram.png
└── logs/
    └── kast_20251028.log               # Detailed execution log

CSV Files

`01_train_set.csv` & `01_test_set.csv`

Columns:

SMILES — Molecular structure
Label — 1 (active) or 0 (inactive)
Name — Compound name (if provided)

Example:

SMILES,Label,Name
CC(C)Cc1ccc(cc1)C(C)C(O)=O,1,ibuprofen
CN1C=NC2=C1C(=O)N(C(=O)N2C)C,1,caffeine
CCCCCCCCCCCCCCCC,0,hexadecane

`<custom_filename>.csv` (Default: `05_new_molecule_predictions.csv`)

Columns:

SMILES — Molecular structure
K-Score — Prediction score (0.0-1.0)
Predicted_Class — “Active” or “Inactive”

Example:

SMILES,K-Score,Predicted_Class
CCc1ccccc1O,0.94,Active
Cc1ccccc1C,0.92,Active
CCCc1ccccc1O,0.87,Active
Cc1ccc(cc1)C,0.45,Inactive
CCc1ccccc1,0.22,Inactive

Interpretation:

K-Score 0.9-1.0 → Very likely active
K-Score 0.7-0.9 → Likely active
K-Score 0.5-0.7 → Uncertain
K-Score 0.0-0.5 → Likely inactive

The K-Prediction Score represents the predicted probability of the active class P(active). In virtual screening workflows, probability-based scores are primarily used for ranking and prioritization rather than as absolute estimates, as discriminative power is generally more relevant than probability calibration for hit selection (Truchon & Bayly, 2007).

Metrics Files

`4_0_evaluation_report.txt`

Main evaluation metrics on test set:

ROC-AUC Score: 0.87
Accuracy: 0.85
Sensitivity (Recall): 0.82
Specificity: 0.88
Precision: 0.86
F1-Score: 0.84

Interpretation:

Metric	What It Means	Good Value
ROC-AUC	Overall model performance (0-1)	> 0.8
Accuracy	% correct predictions	> 80%
Sensitivity	% of actives found	> 80%
Specificity	% of inactives correctly rejected	> 80%
Precision	% of predictions that are correct	> 80%
F1-Score	Balance between precision & recall	> 0.8

The ROC-AUC is the recommended primary metric for evaluating binary classifiers in bioactivity prediction, as it is threshold-independent and robust to class imbalance (Hanley & McNeil, 1982). For imbalanced chemical datasets, F1-Score and Sensitivity are particularly important as complementary metrics (Jiang et al., 2025).

`4_1_cross_validation_results.txt`

5-fold cross-validation scores:

Fold 1: AUC=0.85, Accuracy=0.84
Fold 2: AUC=0.86, Accuracy=0.85
Fold 3: AUC=0.87, Accuracy=0.86
Fold 4: AUC=0.88, Accuracy=0.87
Fold 5: AUC=0.86, Accuracy=0.85

Mean AUC: 0.864 ± 0.011
Mean Accuracy: 0.854 ± 0.011

Interpretation:

Low variation (±0.01) → Model is stable
High variation (±0.1) → Model is unstable
If CV score << test score → Overfitting

Cross-validation provides a less biased estimate of generalization performance than a single train/test split. In QSAR modeling, k-fold cross-validation is considered essential to assess model robustness and detect overfitting (Tropsha, 2010).

`4_2_enrichment_factor_results.txt`

How much better than random screening:

Enrichment Factor at 10%: 3.2x
Enrichment Factor at 20%: 2.1x
Enrichment Factor at 50%: 1.5x

Interpretation:

EF = 3.2x → By screening top 10%, you find 3.2x more actives than random
Higher EF = better virtual screening tool

The Enrichment Factor (EF) at a given percentage quantifies the ability of a model to concentrate actives in the top-ranked fraction of a screened library relative to random selection. It is one of the most widely used metrics to evaluate practical virtual screening performance (Truchon & Bayly, 2007).

`4_3_tanimoto_similarity_results.txt`

Molecular diversity metrics:

=== DESCRIPTIVE STATISTICS (TEST SET) ===
Mean similarity: 0.45
Standard deviation: 0.12

=== DESCRIPTIVE STATISTICS (TRAIN SET INTERNAL) ===
Mean similarity: 0.65
Standard deviation: 0.08

Interpretation:

Test Mean ≈ Train Mean → Test set is well within the chemical space of the training set.
Test Mean << Train Mean → Test set is exploring new chemical space.

Molecular similarity is computed using the Tanimoto coefficient over binary molecular fingerprints. Comparing the internal training similarity to the test-to-training similarity is the standard method to assess if both populations occupy the same chemical space.

`4_4_learning_curve_results.txt`

Model improvement with more data, using 5-Fold Validation:

Training Size | Train AUC         | Val AUC
         | 0.7011 ± 0.0520   | 0.6800 ± 0.0610
        | 0.7832 ± 0.0310   | 0.7610 ± 0.0401
        | 0.8214 ± 0.0210   | 0.8123 ± 0.0305
        | 0.8500 ± 0.0150   | 0.8410 ± 0.0200
        | 0.8805 ± 0.0050   | 0.8715 ± 0.0080

Interpretation:

Val AUC increasing → Model improves with more data
Val AUC plateauing → More data won’t help much
High Variance (±) → Model is unstable at that dataset size

Learning curves are a standard diagnostic tool in machine learning to evaluate whether a model would benefit from additional training data or requires architectural changes (Ramsundar et al., 2019).

Plots

`roc_curve.png`

ROC (Receiver Operating Characteristic) curve showing model discrimination ability.

Interpretation:

Curve closer to top-left → Better model
Diagonal line → Random classifier
Area under curve (AUC) > 0.8 → Good

`4_4_learning_curve.png`

How accuracy improves as training set grows, including shaded standard deviation variance bands across 5 cross-validation folds.

Interpretation:

Curves converging → Model has learned most patterns
Curves still diverging → More data would help
Narrow shaded bands → High statistical confidence/stability

`enrichment_curve.png`

Virtual screening performance across different screening percentages.

Interpretation:

Steep initial slope → Model finds actives early
Steep = good for virtual screening

`4_3_tanimoto_similarity_histogram.png`

Overlapping histogram of Tanimoto Similarities.

Interpretation:

Blue Distribution (Train): Internal similarity of the training set.
Red Distribution (Test): Similarity of the test set to the training set.
High Overlap: Test set molecules are highly similar to training molecules.
Shift to the Left: Test set molecules are structurally novel compared to the training set.

Log File

`logs/kast_YYYYMMDD.log`

Detailed execution log with timestamps and debug info.

Check log if:

Something fails
You need execution details
Debugging issues

Quality Assessment

Good Results

AUC > 0.85
Accuracy > 85%
CV stability ± < 0.05
Learning curve converges
Clear enrichment factor (> 2x at 10%)

Acceptable Results

AUC 0.75-0.85
Accuracy 75-85%
CV stability ± 0.05-0.10
Model still learning with more data

Poor Results

AUC < 0.70
Accuracy < 70%
High CV variation (± > 0.15)
Enrichment factor < 1.5x
Check: data quality, balance, duplicate molecules

Exporting for Publication

CSV Export

# All results already in CSV format
# Open in Excel or Python:
import pandas as pd
results = pd.read_csv('workspaces/<your_workspace>/<custom_filename>.csv')
top_100 = results.head(100)
top_100.to_csv('top_100_predicted_actives.csv')

Plot Export

Plots automatically saved as PNG (high resolution for publications).

Report Generation

# Combine all results
cat workspaces/<your_workspace>/4_0_evaluation_report.txt \
    workspaces/<your_workspace>/4_1_cross_validation_results.txt \
    > publication_report.txt

Understanding Outputs

Output Folder Structure

CSV Files

01_train_set.csv & 01_test_set.csv

<custom_filename>.csv (Default: 05_new_molecule_predictions.csv)

Metrics Files

4_0_evaluation_report.txt

4_1_cross_validation_results.txt

4_2_enrichment_factor_results.txt

4_3_tanimoto_similarity_results.txt

4_4_learning_curve_results.txt

Plots

roc_curve.png

4_4_learning_curve.png

enrichment_curve.png

4_3_tanimoto_similarity_histogram.png

Log File

logs/kast_YYYYMMDD.log

Quality Assessment

Good Results

Acceptable Results

Poor Results

Exporting for Publication

CSV Export

Plot Export

Report Generation

Further Reading & Foundations

`01_train_set.csv` & `01_test_set.csv`

`<custom_filename>.csv` (Default: `05_new_molecule_predictions.csv`)

`4_0_evaluation_report.txt`

`4_1_cross_validation_results.txt`

`4_2_enrichment_factor_results.txt`

`4_3_tanimoto_similarity_results.txt`

`4_4_learning_curve_results.txt`

`roc_curve.png`

`4_4_learning_curve.png`

`enrichment_curve.png`

`4_3_tanimoto_similarity_histogram.png`

`logs/kast_YYYYMMDD.log`