The success of transformer-based foundation models on natural language and images has motivated their use in single-cell transcriptomics. In this talk, we will assess how pre-training dataset size and diversity affect the performance of single-cell foundation models. Using a corpus of 22.2 million cells, we pre-trained 400 models and evaluated over 6,400 experiments. Our results show that current methods tend to plateau in performance with pre-training datasets that are only a fraction of the size, challenging the assumption that ever-larger datasets are required for optimal generalization. This will lead us to the second half of the talk where we evaluate training data composition on model performance. Focusing on human hematopoiesis, we train and analyze deep generative models with a variety of training datasets, including cells from adult and developing tissues, disease states, and perturbation atlases. Here, we observe that (1) deep generative models generalize poorly to unseen cell types and (2) addition of malignant or perturbed cells to healthy corpora does not consistently improve modeling of novel states. These findings highlight the nuanced roles of dataset size and heterogeneity, suggesting that strategic curation, rather than indiscriminate scaling, is key for optimizing single-cell foundation models.