Troubleshooting
Common issues and solutions.
Installation Issues
Setup fails with “Conda not found”
Cause: Anaconda/Miniconda not installed or not in PATH
Solution:
Install Anaconda: anaconda.com/download
Make sure to check “Add Anaconda to PATH” during installation
Restart computer
Try setup again
setup.exe won’t run (Windows)
Cause: Blocked by Windows Defender or permissions
Solutions:
Try as Administrator:
Right-click
setup.exe→ “Run as administrator”
Unblock file:
Right-click
setup.exe→ PropertiesCheck “Unblock” at bottom → Apply → OK
Verify location:
Ensure
setup.exeis in same folder asenvironment.ymlMove to a simple path (avoid spaces):
C:\KAST\
Use manual setup:
cd path\to\KAST conda env create -f environment.yml -y
setup.sh won’t run (Linux)
Cause: Script not executable
Solution:
chmod +x setup.sh
./setup.sh
“Cannot find conda.exe” during setup
Cause: Conda not in standard location
Solution:
Find Conda:
where conda # Windows which conda # Linux/Mac
If not found, manually specify path in setup or use manual setup
Dependency Issues
“ImportError: No module named ‘tensorflow’”
Cause: Dependencies not installed or environment not activated
Solution (Windows):
Click desktop shortcut “K-talysticFlow 1.0.0” (handles activation)
Or activate manually:
conda activate ktalysticflow python bin/check_env.py
Solution (Linux):
conda activate ktalysticflow
python bin/check_env.py
If still fails:
python bin/check_env.py # See which packages are missing
conda env remove -n ktalysticflow -y
conda env create -f environment.yml -y
“ImportError: No module named ‘rdkit’”
Cause: RDKit not installed (common on some systems)
Solution:
conda activate ktalysticflow
conda install -c conda-forge rdkit
“Cannot allocate memory” during featurization
Cause: Dataset too large or batch size too large
Solutions:
Reduce batch size in
settings.py:PARALLEL_BATCH_SIZE = 25000 # Reduce from 100000
Use fewer workers in
settings.py:N_WORKERS = 2 # Reduce from auto
Disable parallel processing:
ENABLE_PARALLEL_PROCESSING = False
Use subset of data:
Test with first 10K molecules
Check if it’s a data quality issue
Data Issues
“Invalid SMILES in file”
Cause: Malformed SMILES structures
Solution:
Validate SMILES using RDKit:
python -c " from rdkit import Chem with open('data/actives.smi') as f: for i, line in enumerate(f): smiles = line.split()[0] if Chem.MolFromSmiles(smiles) is None: print(f'Line {i+1}: Invalid SMILES: {smiles}') "
Clean your SMILES file and try again
Use online tool: SMILES validation
“No molecules loaded from file”
Cause: File format wrong or empty file
Solution:
Check file exists:
data/actives.smianddata/inactives.smiCheck format: One SMILES per line, not Excel format
Ensure file encoding is UTF-8 (not Unicode)
Try file with known-good SMILES and verify it works
“Data imbalance too large”
Not an error, but might affect model. KAST handles it automatically.
If model performs poorly:
Try balancing active/inactive ratio closer to 1:1 or 1:5
Check data quality (duplicates, mislabeling)
Consider using different actives/inactives source
Runtime Issues
Pipeline crashes with “Out of Memory”
Solutions:
Reduce parallel batch size (see above)
Use fewer cores:
N_WORKERS = 2Disable parallel:
ENABLE_PARALLEL_PROCESSING = FalseUse smaller dataset (test with 10K molecules first)
Close other applications to free RAM
Featurization is very slow
Solutions:
Enable parallel processing (see Parallel Processing):
ENABLE_PARALLEL_PROCESSING = True N_WORKERS = None # Auto-detect
Check CPU usage:
Windows: Task Manager → Performance tab
Linux:
toporhtopcommandIf not using all cores, verify parallel is enabled
Reduce dataset size for testing
“Process finished with exit code 1”
Generic error — check full output for details.
Solutions:
Scroll up in terminal to see actual error message
Check log file:
workspaces/<your_workspace>/logs/kast_YYYYMMDD.logRun
python bin/check_env.pyto verify dependenciesTry step individually:
python bin/1_preparation.py
Results Issues
Model performance is terrible (AUC < 0.60)
Possible causes & solutions:
Problem |
Check |
Solution |
|---|---|---|
Bad data quality |
Look for duplicates, mislabeled molecules |
Clean data, remove duplicates |
Too much class imbalance |
Active:Inactive ratio |
Try 1:1 or 1:5 ratio |
Insufficient data |
< 100 molecules per class |
Get more compounds |
Wrong SMILES |
Invalid structures |
Validate SMILES with RDKit |
Random seed issue |
Different results each run |
Check seed settings |
Training seems stuck (no output for 10 minutes)
This can be normal for large datasets!
Check if process is alive:
Watch CPU usage (should be active)
Check memory usage (shouldn’t max out)
If truly stuck:
Ctrl+C to cancel
Reduce dataset size and try again
Check logs:
workspaces/<your_workspace>/logs/kast_*.log
Cross-validation scores very different from test AUC
Possible signs of: Overfitting or data issues
Solutions:
Check for duplicate molecules across folds
Verify data quality
Try with more training data
Check Learning Curve (Step 4.5)
Platform-Specific Issues
Windows: Shortcuts don’t work
Solution:
Delete broken shortcut
Re-run
setup.exeOr manually create:
run_kast.batshould be in KAST folderDouble-click
run_kast.bat
Getting Help
Can’t find answer here?
Check logs:
workspaces/<your_workspace>/logs/kast_YYYYMMDD.logCheck FAQ for common questions
Verify environment:
python bin/check_env.pyTest parallel setup:
python bin/test_parallel_compatibility.py
Report issue on GitHub:
Include: OS, error message, steps to reproduce
Email support:
lmm@uefs.br
Include: Full error output, log file, dataset info if possible
Still Having Issues?
Provide this information:
OS and version (Windows 11, Ubuntu 20.04, etc)
Anaconda or Miniconda version
Full error message (copy-paste from terminal)
Log file content:
cat workspaces/<your_workspace>/logs/kast_*.logDataset size and approximate molecule count