Formation-Energy Prediction
Performance summary of the best models
PyCaret
Auto features+model selection
Selection Metrics: R2 & Mean Absolute Error (MAE)
Practical targets
Best Models (CatBoost, ExtraTrees)
Built a strong AutoML baseline for formation energy using PyCaret to auto-select algorithms and features. Stress-tested AutoML on a thermodynamic target with weak direct descriptors (formation energy) to gauge when AutoML is truly viable. Featurized chemical formulas and Pymatgen structures via Matminer; tuned PyCaret setup; compared model/feature sets by CV MAE and R², prioritizing MAE for selection. Raw/data (4.5k) source -> Materials Project. MAE-first selection beat R²-first on out-of-sample performance, yielding a compact model with lower error and clearer trade-offs.