Download - Presentation of EMPOWERING project in the last Workshop of the IEA Annex 58

Analysis of hourly electricity consumption to characterise a large amount of customers using

clustering techniques

X.Cipriano, G.Mor

• 8.536 residential users with smart meters (half hourly electricity consumption)

• The utility is called “El Gas” and is located at Mallorca Island in a city called Sòller.

• 12 months of half hourly data are available.

CASE STUDY

Mallorca

• To identify the relevant indexes or parameters related to hourly consumption that can define the users’ electricity behaviour

• To characterize the main groups of dwellings according to their patterns of hourly energy load profiles (clustering).

• To support in decision- making regarding schemes and awareness campaigns, as well as in modelling the energy behaviour of huge amount of dwellings

OBJECTIVES

– Monthly consumption– Hourly consumption– Results of other services– Weather data (Using the closest meteorological station available)– Contracted Tariff– Contracted power– Location– Year of construction– Type of construction (flat, attached, detached,…)– Nº rooms ,area, nº occupants, occupancy patterns,

type of domestic appliances, type of HVAC systems, thermostats temperatures, glazed area of the dwelling, first/second residence,…

Possible

Impossible

DATA WE REALLY HAVE

AVAILABLE DATA

1. Data collection and pre-processing of half hourly data

2. Selection of relevant indicators/parameters

3. Calculation of degree of weather dependency (for all users)

4. Segmentation of groups of dwellings (clustering with SOM+K-means)

5. Load shape curves visualization and characterisation for each cluster

6. Automated saving tips for each user according to the clustering results

STEPS OF WORK

To highlight load shapes, which occur

more frequently every month

• The energy consumption indicators: related to hourly consumption in different periods of the day

• The user behaviour indicators: related to the time and intensity when consumption is done

• The weather dependency indicators

• Complementary indicators: related to other existing data

PRE-DEFINITION OF INDICATORS

DATA PRE-PROCESSING

For each month of data, we defined a typical day: Emonth: Edaily: Daily average of the month (kWh/day)En: (hourly average by night) [01:00 – 05:59] (kWh/h)

No significant differences between weekdays and weekends were obtained

We defined 7 periods of

consumption per day

DATA PRE-PROCESSING

ENERGY CONSUMPTION INDICATORS (E.I.)

•Ct: Contracted tariff•P: Contracted power•Ndb10: Num. days below the 10th quartile consumption•Htowd: Daily occupied hours per weekday (Num. hours consump. > residual consump.)•Htowe :Daily occupied hours per weekend•Pmawe , Pmawd: Period of the day with maximum consumption weekends/weekdays•Pmiwe, Pmiwd: Period of the day with minimum consumption weekends/weekdays

•Dma: Day with maximum daily consumption

•Dmi: Day with minimum daily consumption •Hma: Time with maximum hourly consumption •Nhma: Number of hours with the maximum hourly consumption•Hmi: Time with minimum hourly consumption •Nhmi: Number of hours with the minimum hourly consumption•Hp1: First time when P > 1kW•Hp2: First time when P > 2kW•Hp3: First time when P reaches maximum

Monthly calculation of:

USER’S BEHAVIOUR INDICATORS

DATA PRE-PROCESSING

Indicators have been calculated for each dwelling for monthly, seasonally, and yearly period:

• MONTHLY: Mean value of hourly average (or daily aggregated) of daily periods during the month.

• SEASONALLY: Monthly average for each season. SUMMER: June-Sept., WINTER: Dec.-March, AUTUM: Apr.-May, Oct.-Nov.

• YEARLY: Monthly average for the whole year

TIME PERIOD OF CALCULATION

PRE-PROCESSING OF DATA

Procedure for selecting those relevant indicators :1. We performed PCA of UBIs in order to decrease the number of parameters. 5

PC were obtained.2. k-means clustering technique (R software) is applied over these 5 PC

calculated for all of users and bad results of quality of clustering were obtained (silhouette index = 0.3-0.32, and DBI=0.6-0.65).

3. Improvement with previous SOM treatment followed by k-means clustering of SOM prototype's . Increase of SI (0.45-0.5), and DBI around 0.7.

• We identify difficulties in understanding the physical meaning of results of PCA when trying to characterize the obtained groups.

• We detected that maximum and minimum period of consumption (Pmawe , Pmawd, Pmiwe, Pmiwd) were the most influencing indicators in PCA .

SELECTION OF USER’S BEHAVIOUR INDICATORS (UBI)

SELECTION OF RELEVANT INDICATORS

New group of indicators (Qualitative Indicators):• We defined p1, p2, p2, p3, p4, p5, p6, p7, as the ranking of average hourly consumption of

the day over the month. P1 defines the highest period of consumption, and p7 the lowest.

• We can reject the Energy Indicators as relevant for clustering. If p1 = 3, Enn is the biggest value (kWh) regarding the rest of E.I. (Ed, Ea, El, Em, En, Ee )

• New indicators are added: SD and the percentiles of consumption



We finally select 6 indicators: p1, p2, p6, p7, SD, and Perc25 for clustering

Correlation matrix calculation

We start with 13 indicators: 7 qualitative indicators, 5 percentile indicators (5perc, 25perc, median, 75perc, 90perc), and hourly SD.

Eliminate missing values

min–max normalization is performed to scale the values within a predetermined range (0 to 1).

high correlation if (correl. >0.8)



Calculation of complementary

indicators:

U.B.I. indicators: p1, p2, p6, p7, SD, Perc25.

K-means clustering

prototypes

Are cohesion and

dissimilitude assured?

SOM

NO

YES

YES

Baseload curves and comfort parameters

of clusters

NOAre cohesion

and dissimilitude

assured?

CLUSTERING PROCEDURE

SEGMENTATION OF DWELLINGS

CLUSTERING PROCEDURE


Silhouette Index: is graphical representation of how well each object lies within its cluster (Peter J. Rousseeuw in 1986).

Are cohesion and

dissimilitude assured?

s(i) = quantity between -1 and 1. a value near 1 indicates that the sample is affected to the right cluster.

Davies-Bouldin Index: (David L. Davies and Donald W. Bouldin in 1979) is an internal evaluation scheme, where the validation of how well the clustering has been done is made using quantities and features inherent to the dataset.

D(i) = a lower value will mean that the clustering is better

9

1

6

7

54

2

3

8

10

CLUSTERING RESULTS: WINTER


1

7

8

56

43

2

CLUSTERING RESULTS: SUMMER


9

1

7

8

56

4

3 2

CLUSTERING RESULTS: MIDSEASON




Energy IndicatorsC4 reflects the biggest electricity consumption in all periods, as well as in the daily and monthly consumption (700 kWh/month). It is followed by C1 (450 kWh/m.). C8 represents the 3rd (250kWh/month). C2, C5, C7 have similar consumption (around 200 kWh/month), and C6 with almost no consumption.Except C4, the rest of cluster have small variations in all indicators, that means that mean/median value is rather representative of the group.Hourly SD median value is around 0.5kWh (C1), 0.25 (C2, C3, C5, C7), 0 (C6), 0.8 (C4), 0.3 (C8), foowing the same shape than the energy indicators.



Energy Indicators (mean values)C8 (green): Users have their main consumption between afternoon-dinner (max in Ed). Is the 18% of daily consumption. The min. is by night (En) and evening (Ee) with 7 and 11% of daily consumption

C4 (red): shows differences around 2.5 kWh within the hourly consumption percentiles over month. It has the greatest standby consumption (Perc25) although with also big differences over the month. The main consumption is at noon and lunch time (Enn, El)



Subgroups of dwellings within clusters (mean values)

• Clusters 1,2,3,8 have their main consumption between dinner (5) and evening (6).

• Clusters 4,5,6,7 have their main consumption between noon (2) and lunch (3)

• Lowest consumption by night in all clusters, except C7 and C6 which are focused at noon-lunch (3) and none (0) consumption respectively.



Complementary Indicators (relations between daily mean consumption): aW1: Mean / Max: day with maximum/average. aW2: Min / Mean: day with minimum consumption/average. aW3: Mean (Weekends) / Mean (Weekdays)

Only C6 shows less consumption on weekends. It has low daily/month consumption and is focused on week days.

The days with max.-min consumption are quite similar to the average day for all clusters, except C6 which has many “peaks”.

Relations between hourly mean consumptions of the month aD7: Mean (Night period) / Mean, aD8: Mean (Lunch period) / Mean, aD9: Mean (Dinner period) / Mean:

Most of clusters have mean standby by night around 50-55% of total, except C7 (80%) and C6 (30%, but many with no consumption)

There is no impact of lunch time, except in C4 (20% higher), and C6 (55% lower).

C1, C2, C3, C8 increase their consumption when having dinner (30-40% higher). .

Not well predicted, or not relevant

Daily total energy (electricity ) use was plotted against outside daily mean temperature,. We use a change-point models, able to capture the non-linear relation between heating and cooling energy use and outside temperature.

We selected the five-parameter change-point model (ASHRAE, 2001).We do not divide in week days and weekend, because the consumption is similar

WEATHER DEPENDENCY INDICATORS

COMPLEMENTARY INDICATORS

Best fitWe implemented an automated parameter search algorithm to detect the best-fit lines called “segmented” (Vito M. R. Muggeo, 2010). Is an R package where estimates of the slopes and of the possibly multiple breakpoints are provided. It is an iterative procedure (Muggeo, 2003) that needs starting values only for the breakpoint parameters

PROCEDURE1. Preliminary treatment: In order to avoid “noise”, a preliminary linear regression to the data is

implemented. We performed two simple linear regression when T>23ºC, and T<15ºC. 2. If the slope is >±0.5kWh/ºC for heating/cooling this user is accepted for running the change point

model, otherwise, it will be consider for monthly simplified model (DD signature). 3. DD signature: Is the simplified model applied when daily model is not so clear. If slope is >1 and R2>0.6

the user is considered as monthly weather dependent, otherwise it is not weather dependent.

WEATHER DEPENDENCY INDICATORS

COMPLEMENTARY INDICATORS

CLUSTERING RESULTS: WEATHER DEPENDENCY


Summer: daily model is only relevant in C4 and C1 (40%). For those dwellings only 16% are daily cooling dependent, and 25% monthly dependent, (41% total). Winter :C8, C5 increase to 27.5% daily and monthly heating dependent each (55% total). For those users R 2 is around 0.55-0.75 for summer/winter. They represent 19% and 20% of total users respectively. C6 in summer and C3 in winter have almost not weather dependent, for the others is around 20-25% of usersSummer

Winter

Are these results in accordance with reality?

CLUSTERING RESULTS: WEATHER DEPENDENCY


Complementary Indicators (mean values)Base load is calculated as the aggregated monthly consumption for non air conditioned days/months. Power is the hourly maximum power over the period supplied by the utility.

• The highest values of Power and Baseload for C1, and C4 are coherent with the other results. • The three different postal codes cannot be predicted by the clustering, neither the type of tariffs, because the two

are similarly distributed over the clusters.• Only remarkable that in C4 2.1DHA (P> 10kW, and discriminated from 23h to 13h)is 30%. This can partially justify

the higher consumption they have in summer.

Summer

• Relevant indices of usual behaviour have been identified.

• Good results of clustering that allow us a better understanding of the different groups of users behaviour.

• Preliminary results of weather dependency, and their “prediction” according to relevant indicators, have been obtained (need of improvement).

• Decision making schemes can be implemented.

• Further development in improving the clustering, and in better integration of energy indicators with other information.

Conclusions

BeeData Analytics

- Scalable distributed data storage. Big Data applications supported by Apache Hadoop software framework, HBase and Hive. - Configurable communication API. This permits data to be acquired and transmitted bi-directionally using standard protocol (RESTful web API) - Advanced system for data analytics. Coupled to the principal open data mining tools and libraries (R, Pandas, Python).

BeeData Analytics offers new products and business intelligence for energy distribution and commercialisation companies.

BeeData Analytics is a high performance scalable system that permits the storage and analysis of non-homogeneous data of any type (energy-related or other):

Download - Presentation of EMPOWERING project in the last Workshop of the IEA Annex 58

Top Related