Principal-level guidance for Python AI/ML backends, training pipelines, and inference services. Emphasizes data integrity, reproducibility, and production reliability.
| 1 | Data Quality & Leakage | Trust the data | Clean splits, lineage, leakage checks | | 2 | Correctness & Reproducibility | Same inputs, same outputs | Versioned data, pinned deps, deterministic runs | | 3 | Reliability & Resilience | Stable training and serving | Timeouts, retries, graceful degradation |
| 4 | Model Evaluation & Safety | Real-world performance | Offline + online eval, bias checks | | 5 | Performance & Cost | Efficient training/inference | GPU utilization, batching, cost budgets | | 6 | Observability & Monitoring | Fast detection | Drift, latency, error budgets |