Current building energy benchmarking systems categorize buildings into peer groups by static characteristics such as climate zones and building types, which cannot account for the huge variation in building operations. Grouping buildings with diverse operations for benchmarking could result in misleading results. The smart meters provide an opportunity to feature the dynamic characteristics of building operations, but proper data mining techniques are needed to use the data for benchmarking. Accordingly, this paper proposes a framework that makes use of the time-series energy consumption data to categorize buildings by their operations and conduct energy benchmarking within each category. The proposed framework is based on 3-step K-means clustering and consists of two main parts: (1) Operation quantification, and (2) Building categorization and benchmarking. The framework was tested on a dataset of 81 buildings in Singapore. Two baseline methods were also implemented for comparison. The results show that the proposed framework successfully categorized the buildings by their operational similarities and made a significant impact on the energy benchmarking results. Further, the superiority of operation-based energy benchmarking is manifested by investigating two typical buildings where the proposed framework disagreed with the baselines. It is necessary to integrate building operations in energy benchmarking so that the energy performance is evaluated more precisely and higher energy saving potential can be uncovered.