Building energy simulation (BES) plays a significant role in buildings with applications such as architectural design, retrofit analysis, and optimizing building operation and controls. There is a recognized need for model calibration to improve the simulations’ credibility, especially with building data becoming increasingly available and the promises that a digital twin brings. However, BES calibration remains challenging due to the lack of clear guidelines and best practices. This study aims to provide the foundation for future research through a detailed systematic review of the vital aspects of BES calibration. Specifically, we conducted a meta-analysis and categorization of the simulation inputs and outputs, data type and resolution, key calibration methods, and calibration performance evaluation. This study also identified reproducible simulations as a critical issue and proposes an incremental approach to encourage future research’s reproducibility.