model-criticism | Joshua C. Macdonald

model-criticism is a Python framework for structured model evaluation in scientific simulation studies. It uses model worlds with known ground truth to tune inference pipelines—scoring rules, observable weights, and model settings—so that when applied to real data where latent structure is unobserved, the pipeline transfers reliably. Evaluation is organized into hierarchical phases (discovery → refinement → benchmark).

Core capabilities:

Protocol-driven architecture (ModelWorld, Scorer, Observable)
Proper scoring rules (CRPS, energy, WIS, Brier) via scoringrules
Experimental design (full factorial, Latin hypercube, Morris screening)
Pareto front extraction and hypervolume indicators
Bayesian stacking of predictive distributions
Adaptive multi-objective optimization via optuna

In development; pre-release. Source at jcm-sci/model-criticism.