Deep learning emulates atmospheric reanalyses with high fidelity, enabling increasingly well-calibrated ensemble weather forecasts at progressively longer lead times. To extend these gains to climate-relevant horizons, AI prediction systems must produce credible forced responses to drivers of interest (e.g., greenhouse gases, land-use change). We propose a minimal, testable framework for AI climate modeling: (i) represent external forcings explicitly and restrict them to physically appropriate state tendencies; and (ii) stress-test robustness in out-of-distribution regimes, including extremes and counterfactual trajectories. Using leading climate emulators and hybrid physics-AI models, we identify coupling and development challenges and compare scaling with resolution and effective complexity. AI models do not appear intrinsically more efficient than GPU-ported dynamical models once complexity is accounted for, yet they can directly predict target variables at the desired grid without integrating the full high-frequency, multivariate state. Diverse ML downscaling strategies can partially substitute for explicit fine-scale resolution when observations are available, paving the way towards inexpensive, local risk assessment across prediction horizons. Author list: Tom Beucler (UNIL), David Neelin (UCLA), Hui Su (HKUST), Ignacio Lopez-Gomez (Google Research), Chris Bretherton, Oliver Watt-Meyer (AI2), Tapio Schneider, Costa Christopoulos (Caltech), Will Chapman (UC Boulder), Laure Zanna (NYU), Aditya Grover (UCLA), Adam Subel (NYU)