Formation-Energy Prediction
Performance summary of the best models
PyCaret
Auto features+model selection
Selection Metrics: R2 & Mean Absolute Error (MAE)
Practical targets
Best Models (CatBoost, ExtraTrees)
What: Built a strong AutoML baseline for formation energy using PyCaret to auto-select algorithms and features.
Why: Stress-tested AutoML on a thermodynamic target with weak direct descriptors (formation energy) to gauge when AutoML is truly viable.
How: Featurized chemical formulas and Pymatgen structures via Matminer; tuned PyCaret setup; compared model/feature sets by CV MAE and R², prioritizing MAE for selection. Raw/data (4.5k) source -> Materials Project
Results: MAE-first selection beat R²-first on out-of-sample performance, yielding a compact model with lower error and clearer trade-offs.