Formation-Energy Prediction

Performance summary of the best models

PyCaret Auto features+model selection Selection Metrics: R2 & Mean Absolute Error (MAE) Practical targets Best Models (CatBoost, ExtraTrees)

What: Built a strong AutoML baseline for formation energy using PyCaret to auto-select algorithms and features.

Why: Stress-tested AutoML on a thermodynamic target with weak direct descriptors (formation energy) to gauge when AutoML is truly viable.

How: Featurized chemical formulas and Pymatgen structures via Matminer; tuned PyCaret setup; compared model/feature sets by CV MAE and R², prioritizing MAE for selection. Raw/data (4.5k) source -> Materials Project

Results: MAE-first selection beat R²-first on out-of-sample performance, yielding a compact model with lower error and clearer trade-offs.

GitHub