The ever-changing data science landscape is fueling innovation in the built environment context by providing new and more effective means of converting large raw data sets into value for professionals in the design, construction and operations of buildings. The literature developed due to this convergence has rapidly increased in recent years, making it difficult for traditional review approaches to cover all related papers. Therefore, this paper applies a natural language processing (NLP) method to provide an exhaustive and quantitative review.Approximately 30,000 scientific publications were retrieved from the Elsevier API to extract the relationship between data sources, data science techniques, and building energy efficiency applications across the life cycle of buildings. The text-mining and NLP analysis reveals that data sciences techniques are applied more for operation phase applications such as fault detection and diagnosis (FDD), while being under-explored in design and commissioning phases. In addition, it is pointed out that more data science techniques that are to be investigated for various applications. For example, generative adversarial networks (GANs) has potential in facilitating parametric design; transfer learning is a promising path to promoting the application of optimal building operation.