Troubleshooting

Common issues and solutions.

Installation Issues

Setup fails with “Conda not found”

Cause: Anaconda/Miniconda not installed or not in PATH

Solution:

Install Anaconda: anaconda.com/download
Make sure to check “Add Anaconda to PATH” during installation
Restart computer
Try setup again

setup.exe won’t run (Windows)

Cause: Blocked by Windows Defender or permissions

Solutions:

Try as Administrator:
- Right-click setup.exe → “Run as administrator”
Unblock file:
- Right-click setup.exe → Properties
- Check “Unblock” at bottom → Apply → OK
Verify location:
- Ensure setup.exe is in same folder as environment.yml
- Move to a simple path (avoid spaces): C:\KAST\

Use manual setup:

cd path\to\KAST
conda env create -f environment.yml -y

setup.sh won’t run (Linux)

Cause: Script not executable

Solution:

chmod +x setup.sh
./setup.sh

“Cannot find conda.exe” during setup

Cause: Conda not in standard location

Solution:

Find Conda:

where conda    # Windows
which conda    # Linux/Mac

If not found, manually specify path in setup or use manual setup

Dependency Issues

“ImportError: No module named ‘tensorflow’”

Cause: Dependencies not installed or environment not activated

Solution (Windows):

Click desktop shortcut “K-talysticFlow 1.0.0” (handles activation)

Or activate manually:

conda activate ktalysticflow
python bin/check_env.py

Solution (Linux):

conda activate ktalysticflow
python bin/check_env.py

If still fails:

python bin/check_env.py  # See which packages are missing
conda env remove -n ktalysticflow -y
conda env create -f environment.yml -y

“ImportError: No module named ‘rdkit’”

Cause: RDKit not installed (common on some systems)

Solution:

conda activate ktalysticflow
conda install -c conda-forge rdkit

“Cannot allocate memory” during featurization

Cause: Dataset too large or batch size too large

Solutions:

Reduce batch size in settings.py:

PARALLEL_BATCH_SIZE = 25000  # Reduce from 100000

Use fewer workers in settings.py:
```
N_WORKERS = 2  # Reduce from auto
```
Disable parallel processing:
```
ENABLE_PARALLEL_PROCESSING = False
```
Use subset of data:
- Test with first 10K molecules
- Check if it’s a data quality issue

Data Issues

“Invalid SMILES in file”

Cause: Malformed SMILES structures

Solution:

Validate SMILES using RDKit:

python -c "
from rdkit import Chem
with open('data/actives.smi') as f:
    for i, line in enumerate(f):
        smiles = line.split()[0]
        if Chem.MolFromSmiles(smiles) is None:
            print(f'Line {i+1}: Invalid SMILES: {smiles}')
"

Clean your SMILES file and try again
Use online tool: SMILES validation

“No molecules loaded from file”

Cause: File format wrong or empty file

Solution:

Check file exists: data/actives.smi and data/inactives.smi
Check format: One SMILES per line, not Excel format
Ensure file encoding is UTF-8 (not Unicode)
Try file with known-good SMILES and verify it works

“Data imbalance too large”

Not an error, but might affect model. KAST handles it automatically.

If model performs poorly:

Try balancing active/inactive ratio closer to 1:1 or 1:5
Check data quality (duplicates, mislabeling)
Consider using different actives/inactives source

Runtime Issues

Pipeline crashes with “Out of Memory”

Solutions:

Reduce parallel batch size (see above)
Use fewer cores: N_WORKERS = 2
Disable parallel: ENABLE_PARALLEL_PROCESSING = False
Use smaller dataset (test with 10K molecules first)
Close other applications to free RAM

Featurization is very slow

Solutions:

Enable parallel processing (see Parallel Processing):

ENABLE_PARALLEL_PROCESSING = True
N_WORKERS = None  # Auto-detect

Check CPU usage:
- Windows: Task Manager → Performance tab
- Linux: top or htop command
- If not using all cores, verify parallel is enabled
Reduce dataset size for testing

“Process finished with exit code 1”

Generic error — check full output for details.

Solutions:

Scroll up in terminal to see actual error message
Check log file: workspaces/<your_workspace>/logs/kast_YYYYMMDD.log
Run python bin/check_env.py to verify dependencies
Try step individually: python bin/1_preparation.py

Results Issues

Model performance is terrible (AUC < 0.60)

Possible causes & solutions:

Problem	Check	Solution
Bad data quality	Look for duplicates, mislabeled molecules	Clean data, remove duplicates
Too much class imbalance	Active:Inactive ratio	Try 1:1 or 1:5 ratio
Insufficient data	< 100 molecules per class	Get more compounds
Wrong SMILES	Invalid structures	Validate SMILES with RDKit
Random seed issue	Different results each run	Check seed settings

Training seems stuck (no output for 10 minutes)

This can be normal for large datasets!

Check if process is alive:

Watch CPU usage (should be active)
Check memory usage (shouldn’t max out)

If truly stuck:

Ctrl+C to cancel
Reduce dataset size and try again
Check logs: workspaces/<your_workspace>/logs/kast_*.log

Cross-validation scores very different from test AUC

Possible signs of: Overfitting or data issues

Solutions:

Check for duplicate molecules across folds
Verify data quality
Try with more training data
Check Learning Curve (Step 4.5)

Platform-Specific Issues

Windows: Shortcuts don’t work

Solution:

Delete broken shortcut
Re-run setup.exe
Or manually create: run_kast.bat should be in KAST folder
Double-click run_kast.bat

Getting Help

Can’t find answer here?

Check logs: workspaces/<your_workspace>/logs/kast_YYYYMMDD.log
Check FAQ for common questions
Verify environment: python bin/check_env.py
Test parallel setup: python bin/test_parallel_compatibility.py

Report issue on GitHub:

github.com/kelsouzs/KAST/issues
Include: OS, error message, steps to reproduce

Email support:

lmm@uefs.br
Include: Full error output, log file, dataset info if possible

Still Having Issues?

Provide this information:

OS and version (Windows 11, Ubuntu 20.04, etc)
Anaconda or Miniconda version
Full error message (copy-paste from terminal)
Log file content: cat workspaces/<your_workspace>/logs/kast_*.log
Dataset size and approximate molecule count

Troubleshooting

Installation Issues

Setup fails with “Conda not found”

setup.exe won’t run (Windows)

setup.sh won’t run (Linux)

“Cannot find conda.exe” during setup

Dependency Issues

“ImportError: No module named ‘tensorflow’”

“ImportError: No module named ‘rdkit’”

“Cannot allocate memory” during featurization

Data Issues

“Invalid SMILES in file”

“No molecules loaded from file”

“Data imbalance too large”

Runtime Issues

Pipeline crashes with “Out of Memory”

Featurization is very slow

“Process finished with exit code 1”

Results Issues

Model performance is terrible (AUC < 0.60)

Training seems stuck (no output for 10 minutes)

Cross-validation scores very different from test AUC

Platform-Specific Issues

Windows: Shortcuts don’t work

Linux: App menu shortcut missing

Getting Help

Can’t find answer here?

Report issue on GitHub:

Email support:

Still Having Issues?