Analysis of hourly electricity consumption to characterise a large amount of customers using
clustering techniques
X.Cipriano, G.Mor
• 8.536 residential users with smart meters (half hourly electricity consumption)
• The utility is called “El Gas” and is located at Mallorca Island in a city called Sòller.
• 12 months of half hourly data are available.
CASE STUDY
Mallorca
• To identify the relevant indexes or parameters related to hourly consumption that can define the users’ electricity behaviour
• To characterize the main groups of dwellings according to their patterns of hourly energy load profiles (clustering).
• To support in decision- making regarding schemes and awareness campaigns, as well as in modelling the energy behaviour of huge amount of dwellings
OBJECTIVES
– Monthly consumption– Hourly consumption– Results of other services– Weather data (Using the closest meteorological station available)– Contracted Tariff– Contracted power– Location– Year of construction– Type of construction (flat, attached, detached,…)– Nº rooms ,area, nº occupants, occupancy patterns,
type of domestic appliances, type of HVAC systems, thermostats temperatures, glazed area of the dwelling, first/second residence,…
Possible
Impossible
DATA WE REALLY HAVE
AVAILABLE DATA
1. Data collection and pre-processing of half hourly data
2. Selection of relevant indicators/parameters
3. Calculation of degree of weather dependency (for all users)
4. Segmentation of groups of dwellings (clustering with SOM+K-means)
5. Load shape curves visualization and characterisation for each cluster
6. Automated saving tips for each user according to the clustering results
STEPS OF WORK
To highlight load shapes, which occur
more frequently every month
• The energy consumption indicators: related to hourly consumption in different periods of the day
• The user behaviour indicators: related to the time and intensity when consumption is done
• The weather dependency indicators
• Complementary indicators: related to other existing data
PRE-DEFINITION OF INDICATORS
DATA PRE-PROCESSING
For each month of data, we defined a typical day: Emonth: Edaily: Daily average of the month (kWh/day)En: (hourly average by night) [01:00 – 05:59] (kWh/h)
No significant differences between weekdays and weekends were obtained
We defined 7 periods of
consumption per day
DATA PRE-PROCESSING
ENERGY CONSUMPTION INDICATORS (E.I.)
•Ct: Contracted tariff•P: Contracted power•Ndb10: Num. days below the 10th quartile consumption•Htowd: Daily occupied hours per weekday (Num. hours consump. > residual consump.)•Htowe :Daily occupied hours per weekend•Pmawe , Pmawd: Period of the day with maximum consumption weekends/weekdays•Pmiwe, Pmiwd: Period of the day with minimum consumption weekends/weekdays
•Dma: Day with maximum daily consumption
•Dmi: Day with minimum daily consumption •Hma: Time with maximum hourly consumption •Nhma: Number of hours with the maximum hourly consumption•Hmi: Time with minimum hourly consumption •Nhmi: Number of hours with the minimum hourly consumption•Hp1: First time when P > 1kW•Hp2: First time when P > 2kW•Hp3: First time when P reaches maximum
Monthly calculation of:
USER’S BEHAVIOUR INDICATORS
DATA PRE-PROCESSING
Indicators have been calculated for each dwelling for monthly, seasonally, and yearly period:
• MONTHLY: Mean value of hourly average (or daily aggregated) of daily periods during the month.
• SEASONALLY: Monthly average for each season. SUMMER: June-Sept., WINTER: Dec.-March, AUTUM: Apr.-May, Oct.-Nov.
• YEARLY: Monthly average for the whole year
TIME PERIOD OF CALCULATION
PRE-PROCESSING OF DATA
Procedure for selecting those relevant indicators :1. We performed PCA of UBIs in order to decrease the number of parameters. 5
PC were obtained.2. k-means clustering technique (R software) is applied over these 5 PC
calculated for all of users and bad results of quality of clustering were obtained (silhouette index = 0.3-0.32, and DBI=0.6-0.65).
3. Improvement with previous SOM treatment followed by k-means clustering of SOM prototype's . Increase of SI (0.45-0.5), and DBI around 0.7.
• We identify difficulties in understanding the physical meaning of results of PCA when trying to characterize the obtained groups.
• We detected that maximum and minimum period of consumption (Pmawe , Pmawd, Pmiwe, Pmiwd) were the most influencing indicators in PCA .
SELECTION OF USER’S BEHAVIOUR INDICATORS (UBI)
SELECTION OF RELEVANT INDICATORS
New group of indicators (Qualitative Indicators):• We defined p1, p2, p2, p3, p4, p5, p6, p7, as the ranking of average hourly consumption of
the day over the month. P1 defines the highest period of consumption, and p7 the lowest.
• We can reject the Energy Indicators as relevant for clustering. If p1 = 3, Enn is the biggest value (kWh) regarding the rest of E.I. (Ed, Ea, El, Em, En, Ee )
• New indicators are added: SD and the percentiles of consumption
SELECTION OF USER’S BEHAVIOUR INDICATORS (UBI)
SELECTION OF RELEVANT INDICATORS
We finally select 6 indicators: p1, p2, p6, p7, SD, and Perc25 for clustering
Correlation matrix calculation
We start with 13 indicators: 7 qualitative indicators, 5 percentile indicators (5perc, 25perc, median, 75perc, 90perc), and hourly SD.
Eliminate missing values
min–max normalization is performed to scale the values within a predetermined range (0 to 1).
high correlation if (correl. >0.8)
SELECTION OF USER’S BEHAVIOUR INDICATORS (UBI)
SELECTION OF RELEVANT INDICATORS
Calculation of complementary
indicators:
U.B.I. indicators: p1, p2, p6, p7, SD, Perc25.
K-means clustering
prototypes
Are cohesion and
dissimilitude assured?
SOM
NO
YES
YES
Baseload curves and comfort parameters
of clusters
NOAre cohesion
and dissimilitude
assured?
CLUSTERING PROCEDURE
SEGMENTATION OF DWELLINGS
CLUSTERING PROCEDURE
SEGMENTATION OF DWELLINGS
Silhouette Index: is graphical representation of how well each object lies within its cluster (Peter J. Rousseeuw in 1986).
Are cohesion and
dissimilitude assured?
s(i) = quantity between -1 and 1. a value near 1 indicates that the sample is affected to the right cluster.
Davies-Bouldin Index: (David L. Davies and Donald W. Bouldin in 1979) is an internal evaluation scheme, where the validation of how well the clustering has been done is made using quantities and features inherent to the dataset.
D(i) = a lower value will mean that the clustering is better
9
1
6
7
54
2
3
8
10
CLUSTERING RESULTS: WINTER
SEGMENTATION OF DWELLINGS
1
7
8
56
43
2
CLUSTERING RESULTS: SUMMER
SEGMENTATION OF DWELLINGS
9
1
7
8
56
4
3 2
CLUSTERING RESULTS: MIDSEASON
SEGMENTATION OF DWELLINGS
CLUSTERING RESULTS: SUMMER
SEGMENTATION OF DWELLINGS
Energy IndicatorsC4 reflects the biggest electricity consumption in all periods, as well as in the daily and monthly consumption (700 kWh/month). It is followed by C1 (450 kWh/m.). C8 represents the 3rd (250kWh/month). C2, C5, C7 have similar consumption (around 200 kWh/month), and C6 with almost no consumption.Except C4, the rest of cluster have small variations in all indicators, that means that mean/median value is rather representative of the group.Hourly SD median value is around 0.5kWh (C1), 0.25 (C2, C3, C5, C7), 0 (C6), 0.8 (C4), 0.3 (C8), foowing the same shape than the energy indicators.
CLUSTERING RESULTS: SUMMER
SEGMENTATION OF DWELLINGS
Energy Indicators (mean values)C8 (green): Users have their main consumption between afternoon-dinner (max in Ed). Is the 18% of daily consumption. The min. is by night (En) and evening (Ee) with 7 and 11% of daily consumption
C4 (red): shows differences around 2.5 kWh within the hourly consumption percentiles over month. It has the greatest standby consumption (Perc25) although with also big differences over the month. The main consumption is at noon and lunch time (Enn, El)
CLUSTERING RESULTS: SUMMER
SEGMENTATION OF DWELLINGS
Subgroups of dwellings within clusters (mean values)
• Clusters 1,2,3,8 have their main consumption between dinner (5) and evening (6).
• Clusters 4,5,6,7 have their main consumption between noon (2) and lunch (3)
• Lowest consumption by night in all clusters, except C7 and C6 which are focused at noon-lunch (3) and none (0) consumption respectively.
CLUSTERING RESULTS: SUMMER
SEGMENTATION OF DWELLINGS
Complementary Indicators (relations between daily mean consumption): aW1: Mean / Max: day with maximum/average. aW2: Min / Mean: day with minimum consumption/average. aW3: Mean (Weekends) / Mean (Weekdays)
Only C6 shows less consumption on weekends. It has low daily/month consumption and is focused on week days.
The days with max.-min consumption are quite similar to the average day for all clusters, except C6 which has many “peaks”.
Relations between hourly mean consumptions of the month aD7: Mean (Night period) / Mean, aD8: Mean (Lunch period) / Mean, aD9: Mean (Dinner period) / Mean:
Most of clusters have mean standby by night around 50-55% of total, except C7 (80%) and C6 (30%, but many with no consumption)
There is no impact of lunch time, except in C4 (20% higher), and C6 (55% lower).
C1, C2, C3, C8 increase their consumption when having dinner (30-40% higher). .
Not well predicted, or not relevant
Daily total energy (electricity ) use was plotted against outside daily mean temperature,. We use a change-point models, able to capture the non-linear relation between heating and cooling energy use and outside temperature.
We selected the five-parameter change-point model (ASHRAE, 2001).We do not divide in week days and weekend, because the consumption is similar
WEATHER DEPENDENCY INDICATORS
COMPLEMENTARY INDICATORS
Best fitWe implemented an automated parameter search algorithm to detect the best-fit lines called “segmented” (Vito M. R. Muggeo, 2010). Is an R package where estimates of the slopes and of the possibly multiple breakpoints are provided. It is an iterative procedure (Muggeo, 2003) that needs starting values only for the breakpoint parameters
PROCEDURE1. Preliminary treatment: In order to avoid “noise”, a preliminary linear regression to the data is
implemented. We performed two simple linear regression when T>23ºC, and T<15ºC. 2. If the slope is >±0.5kWh/ºC for heating/cooling this user is accepted for running the change point
model, otherwise, it will be consider for monthly simplified model (DD signature). 3. DD signature: Is the simplified model applied when daily model is not so clear. If slope is >1 and R2>0.6
the user is considered as monthly weather dependent, otherwise it is not weather dependent.
WEATHER DEPENDENCY INDICATORS
COMPLEMENTARY INDICATORS
CLUSTERING RESULTS: WEATHER DEPENDENCY
SEGMENTATION OF DWELLINGS
Summer: daily model is only relevant in C4 and C1 (40%). For those dwellings only 16% are daily cooling dependent, and 25% monthly dependent, (41% total). Winter :C8, C5 increase to 27.5% daily and monthly heating dependent each (55% total). For those users R 2 is around 0.55-0.75 for summer/winter. They represent 19% and 20% of total users respectively. C6 in summer and C3 in winter have almost not weather dependent, for the others is around 20-25% of usersSummer
Winter
Are these results in accordance with reality?
CLUSTERING RESULTS: WEATHER DEPENDENCY
SEGMENTATION OF DWELLINGS
Complementary Indicators (mean values)Base load is calculated as the aggregated monthly consumption for non air conditioned days/months. Power is the hourly maximum power over the period supplied by the utility.
• The highest values of Power and Baseload for C1, and C4 are coherent with the other results. • The three different postal codes cannot be predicted by the clustering, neither the type of tariffs, because the two
are similarly distributed over the clusters.• Only remarkable that in C4 2.1DHA (P> 10kW, and discriminated from 23h to 13h)is 30%. This can partially justify
the higher consumption they have in summer.
Summer
• Relevant indices of usual behaviour have been identified.
• Good results of clustering that allow us a better understanding of the different groups of users behaviour.
• Preliminary results of weather dependency, and their “prediction” according to relevant indicators, have been obtained (need of improvement).
• Decision making schemes can be implemented.
• Further development in improving the clustering, and in better integration of energy indicators with other information.
Conclusions
BeeData Analytics
- Scalable distributed data storage. Big Data applications supported by Apache Hadoop software framework, HBase and Hive. - Configurable communication API. This permits data to be acquired and transmitted bi-directionally using standard protocol (RESTful web API) - Advanced system for data analytics. Coupled to the principal open data mining tools and libraries (R, Pandas, Python).
BeeData Analytics offers new products and business intelligence for energy distribution and commercialisation companies.
BeeData Analytics is a high performance scalable system that permits the storage and analysis of non-homogeneous data of any type (energy-related or other):