demystifying gaussian distribution (2)

DEMYSTIFYING GAUSSIAN DISTRIBUTION

The aim of this particular excursion is to explore and let explain from an intuitive standpoint, one of the most ubiquitous realities ever discovered by the genius of our era. I will approach this subject in a logical yet random style by picking up aspects from various areas when and where required to help stimulate (hopefully) the interest of the reader in understanding the beauty of this concept. Though the treatment provided is intuitive (to my better ability) by breaking apart the mysterious and scary looking equation into tangible pieces, there will be some mathematical rigor involved at times (which can be skipped over without any loss of detail) that can throw light for the mathematically (or rather curiously) inclined individuals towards understanding the approach taken to arrive at the final product. Even otherwise, please don’t get scared/demotivated looking at

symbols like , ∞, √ , , , , , , … that you will

frequently encounter in the pages to come! They have been adequately explained from an intuitive perspective to help appreciate the entire journey which is purely a result of my partially exhaustive research on the same.

Gaussian distribution named after Carl Friedrich Gauss, considered as the prince of mathematicians, is also known as Normal distribution, owing to the frequency with which natural phenomenon follow this pattern(hence normal) or a Bell curve since it resembles the shape of a bell.

What is the basic utility of any probability distribution? For that matter, what is probability? What is a distribution? How does it help us understand stuff?

The word Probability is used when we are speaking of uncertainty .which is due to lack of clarity or what we refer to as noise. Noise owing to too many factors that are not in our direct control. Sun rising in the east is a certainty unlike an expectation of an immediate outcome of 6 of a thrown die which is a probability. Where there is no clarity, there is probability and where there is complete clarity of the outcome, it becomes a certainty.

The outcome of a head or a tail was a probability for as long as the tossed coin was in transit and immediately after collapsing on the ground, it became a certainty (by either projecting a head or a tail).Mathematically assigning values, we can say that the possibility of an event not happening and happening lies between 0 % and 100% which can be equated to all the values lying between 0 and 1(including 0 and 1).So if the probability of an event happening is 0.56,it is equivalent to 56% and if it is 1,then it is 100%.The sum of all the probabilities of all possible scenarios in any event should sum up to 1.In the example of a coin, we have 2 outcomes, Heads or tails, hence the individual probability weight assigned is 1/(no. of possible states) which is ½ = 0.5.In the case of a die, there are 6 possible states and hence the individual probability weight age per scenario is 1/6 = 0.16.which means, since there are 6 scenarios(1,2,3,4,5,6),summing up each of the probability values, we get 0.16+0.16+016+0.16+0.16+0.16=1.This logic holds good for any event(there are more advanced concepts and interrelations which is beyond the scope of the present discussion).

Since we got clarity of what probability is, let’s speak about a distribution. Distribution refers to assignment of values over any given area. Area? We refer to area as a space within which we can accommodate the occurrence of a particular event. Let’s drill it down using an example.

In the above diagram, we see a snake (a jumping one!) which is about to cross a rectangular closed fence which we will equate to a closed area. Let’s equate this area to 1(which refers to the sum of all

Individual probability values).which means, the probability if the snake falls into the fence is 1 and it is 0 if it falls outside the fence.

.Now, within the area of 1, we have divided it into 3 equal parts. So now, if we want to assign the probability to individual boxes, based on the above logic, since there are 3 boxes, its 1/3=0.33 which equals 33.3%.which means the probability of the snake falling into any of the 3 boxes 1, 2 or 3 is 33.3%.Observe that as we are breaking down the box further, we are getting more information about the position of the snake. What is the probability of the snake falling in the area that includes box 1 and box 2.It is nothing but the sum of individual probabilities which is 0.33+0.33=0.66= 66.6%.this is what we refer to as cumulative probability (the overall probability arrived at by adding individual probabilities).what is the most common point from which it measures the same from both the extremities (first and third box)? It is nothing but the centre of the second box.

When the snake puts its normal (natural) effort to jump, it will mostly land in the second box (Average snake)! Why? Since if it puts in an extra effort, it lands in box 3(excited snake) and with less effort, in box 1(Lazy snake!).We see that both box 1 and box 3 are exceptions in snake’s performance where as its natural potential makes it land mostly in box 2.In a given number of chances, the maximum landings will be in box 2 which makes it a normal snake!

This is what we refer to as a central tendency called Mean (or average).but in the above case, why is the probability equally divided among the three boxes? Since all the boxes created by dividing the rectangle are of equal size and hence equal area and hence equal probability weightages. Which means the snake is equally lazy, normal and excited if it falls into any of the 3 boxes! So let’s remodel the above example.

In the revised fence, we know the area covered by all the 3 boxes 1, 2 and 3 are all not the same. We see that box 2 covers maximum area and boxes 1, 3 cover minimum areas and hence we see that the shapes are not uniform and hence the areas and the probabilities are not uniform. For the matter of convenience, let’s assume that box 2 covers 50% of the area, which means that the probability of the snake falling into box 2 or rather being an average performer is 0.5.so since the other 2 boxes sum up to 50% which equals 25% each, the probability of the snake being lazy or excited is 0.25.Since most of the times, the snake falls in the centre of the fence, it becomes the average potential of the snake.

Any miss out from this center to either end (box 1 or 3) is considered a deviation in the snake’s performance. It becomes a positive deviation if the snake crosses the center and a negative deviation if the snake lags behind the center. There can be so many deviations depending on the number of grids (various potentials of a snake) that separate the boxes.

In the above triangular fence, if we draw a central line from its peak as below, we get 2 perfectly symmetrical (mirror image) right angled triangles ∆ ABC and ∆ ACD. Which means the area is divided into 2 halves and hence each triangle has an area of 0.5

If we have the ability to draw lines into each of the above 2 right angled triangles to divide them, if I can draw 9 lines in ∆ ABC with same spacing ,I can mirror image the same number of lines in ∆ ACD. Which means if I have 10 small boxed areas marked by 9 lines in one triangle, i get the same number of 10 small boxed areas in other triangle(since both are mirror images of each other).Since we have 10 boxes on the left hand side of the central line AC, we refer to them as 10 negative deviations. Since there are 10 boxes on the right side of AC, we have 10 positive deviations. To refer to each of the deviations from a common perspective by neglecting the directionality, we use the terminology as standard deviation which refers to + and – of a set of deviations. If I say 2 standard deviations, it means 2 boxes to the left of AC and 2 boxes to the right of AC.

In the above snake example, we only referred to a triangle with 3 boxes. What if we are dividing the triangle into many boxes! We see that the area occupied by boxes will decrease as they reach towards either of the ends B or D. The boxes close to AC will cover maximum area and hence maximum probability unlike boxes far away from AC which cover small areas and hence less probability.

Which means the chances of an event happening at the center is the highest (referred to as the maximum frequency) and it decreases gradually as it moves towards B and D. This is what we call as a distribution pattern.

The spread of probabilities from 0 to 1 across the entire triangle is what is referred to as a probability distribution. Since the above one is a triangle, we call it a triangular probability distribution. In the same way, based on the type of shape a distribution acquires, we have many other distributions each having different properties.

In the above triangle, we see that there is a steep and linear increase from B to A and again a steep decrease from A to D. But most of the real time events or occurrences do not follow this perfect linear trend as they are constantly and gradually evolving processes. This is where we need to discuss about one of more obscure friends “e”. Yes, the mathematical constant or rather the natural universal constant of growth ‘e’.

‘e’ refers to the exponential growth. But what is exponential? It refers to the base rate of growth that any naturally evolving entity can attain in a given time frame.

Bacteria doubling their growth every 24 hours or money becomes double itself in a year, what’s basically happening is though the time frame is different (24 hours, 1 year, 1 sec...Etc), the original is replicating itself thereby creating 1 more.

Which means 1(original) + 1(outcome of original) = 2, which is 1 100% , where n refers to the time period.

But is it so that the growth is happening so discretely in linear steps? Is it so that suddenly after 24 hours I see that the bacteria got doubled or money became doubled suddenly after a year?

No. It’s a very gradual and continuous process. Let’s dig in. If we look at money, say 100 Rs is yielding 100 Rs in 1 year which is double, if I am breaking up the time frame into 6 months each, my 100 Rs would have earned 50 Rs at the end of 6th month which is

1 100% =2.25

Now these 50 Rs will start earning an additional interest of 100% which is 50 Rs in a year or 25 Rs in 6 months. These 25 Rs will start earning 25 Rs in a year or 12.5 Rs in 6 months and so on. This means that each of the outcomes from the original is continuously compounded. Let’s increase the timeframe from 2 periods (6 months each) to 365 periods (1 day each) which becomes

1 = 2.714

If we go on slicing the time period to the maximum possible extent, we will arrive at a maximum compounded rate of return which is 2.71456 which is referred to as ‘e’ which is

1! ! !

, ∞ ∞

So any natural rate of growth follows an exponential pattern, whether it is the bacterial growth, population growth, radioactive decay and many natural processes can be modelled using “ ”.

So is defined as an exponential function that we will be dealing with to derive the normal distribution.

What is a function by the way? A function shows a relationship between 2 entities, one is independent and the other dependent. Say if for example clouds cause rain, which means rain is an outcome of clouds and clouds were the cause of it. Since rain is dependent and cloud is independent (in the present framework of 2 entities), lets equate Cloud=x, rain=y, then x created y i.e. Clouds created rain. In mathematical notation, we refer to it as y=f(x) or rain = f (clouds) we call it as rain is a function of clouds,

Likewise Misery = f (Desire).This means any change in constant (desire) will have a direct impact on its outcome (misery).

So let’s consider the exponential function y = .Before we delve into what is the value x should take in this function, let’s get back to the basic normal distributions and their nature.

If we look at the distributions above with Y showing the frequency of occurrence of a given variable and X showing the spread of the data values, we find that with respect to mean of 5, the data is closely concentrated between 0 an 10 unlike for the mean of 20, where it is spread far across between 10 and 30.which means distribution 1 had less deviation and distribution 2 had more deviation from the mean thereby stretching the entire area under the curve.

Though Distribution 1 looks bigger than distribution 2, both actually cover the same area. It’s just that the distribution 2 is stretched across horizontally on both ends thereby increasing the spread of data points between 10 and 30.We can observe that the shape of the distribution depends on the increase and decrease of the mean(frequency of mean) on X axis and deviation on Y axis. Imagine blowing a balloon with a black spot on it and see how the spot’s area gets increased based on the extent of inflation though the total area is a constant!

In the same way as for dist 1, 5, 5 and for Dist 2,20, 10, there can be infinite number of distributions with infinite values for .To find out the area under each of these curves would be extremely laborious and non value added, hence we look at an approach where we can standardize by measuring each of these distributions on a common platform.

To do this, we consider the values 0 and =1.In the above case, distribution 1 was plotted with various values of x on X axis from 0 to 10.since we know the value for , we can standardize it by subtracting the value of from x and then dividing the result by .It becomes

Z = /

What we are doing is figuring out the distance of each point on the graph from its mean and then dividing it by its corresponding which generates a standard number for each value on X‐axis called the Z number. Once the values of X are been solved based on the above notation, they get transformed to values mostly ranging between ‐3 and 3(there will be values of z 3 and z ‐3 but they hardly contribute to 0.3% of the data points which will fall outside the normal distribution and will be considered as outliers and hence for the sake of clarity, we will define the range of z only between

3 with a mean of 0.This is what we call as a Z‐Transformation. A very important point to note here is that we are converting all the normal patterns with varying values of and into a standard normal pattern and for all non normal patterns, a different approach has to be taken(beyond the scope of this topic).As we discussed in the past, since we considered as 1 here,‐3 means 3 negative deviations from 0(left hand side).+3 means 3 positive deviations to the right hand side of 0.By ignoring the +/‐,we say that the transformed data lies between 3 standard deviations with a transformed mean of 0.

So which means that the entire area covered by the normal curve is divided into 6 parts,3 on either end this we refer to as 3 standard deviations. This then transforms to a standard normal distribution with =0 and 1.

The reason we emphasize only on these 2 parameters, and is that we can know the location of any point in the curve just by referring to mean as a central point of the curve and figure out how distant the point is from the central point by looking at the deviation zone that it is falling under. Just try relating it with the snake/fence analogy to get better clarity.

If we look at the curve on the left hand side, we see that the values range from ∞ ∞.This means that all the values on the number line, both on the positive and negative axis are accommodated by the curve. Also notice that both ends of the curve are not touching the positive and negative x axis as since the values are tending to ∞, they will never be able to touch the axis.

Touching the axis implies that the values are finite. Once we apply a Z‐transformation, we are converting this range of ( ∞ ∞) to

‐3 to +3.

Now let’s try modelling a normal curve by using the function .Post transformation, the values of x will range from ‐3 to +3.So, we are modelling the growth function e using this range as we know that the transformed values lie between these 2 limits.

When we substitute the values between ‐3 and +3 in the function

F(x) = , we get the below graph. We can clearly see that the values go up exponentially from to

Instead, if we use the function , we get the below graph. This is nothing but the inverse of exponential function ranging from to

known as the logarithmic function (another interesting pattern!)

If we look at the above 2 graphs, we can infer that our normal curve is a combination of these 2 patterns, one which is an increasing function, reaches the climax at the mean and becomes a decreasing function.

Let’s raise the power of x to 2 making it a quadratic exponential function.

Now for F(x) = , we evaluate the range from to which becomes minimum at = 1 and maximum at = 8103 (at 3, 0 .The plot of the graph is shown below.

3

Based on the above graph, now we can easily replicate the normal curve by changing the orientation of the above graph by adding a minus sign. Let’s see how this works.

For F(x) = , for the limits ‐3 and +3, the functions takes values

from to which becomes at x= ‐3 and at x= 0 and then again at x= +3 which gives us the pattern of the increasing and decreasing function. The graph looks as below which perfectly generates a normal or a bell shaped curve.

Actually, the above function F(x) = should be rewritten as

F(x) = as all values from ‐3 to +3 are transformed values of x from ( ∞ ∞).

So, For F(x) = , by substituting z = , we get

F(x) =

F(x) = /

The width of the Bell curve, defined by is half the distance between its inflection points. An inflection point for a normal curve is a point at which the normal curve changes its shape from being outside concave to inside concave.

We can clearly see above the point (circled) from which the direction of the curve is changing. Remember that for a normal curve, the inflection point (on either side of the mean) perfectly coincides (explanation beyond the scope of this discussion) with the first standard deviation of the curve.

Here in this case it will be between the first standard deviation which is equal to 2 deviations, +and ‐.By dividing the value over the exponent by half, we adjust the width of the curve to 1 deviation point instead of 2.

The equation then becomes

F(x) = / ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ A

This is the function that generates the normal curve. But as we discussed in the past, the present equation looks at a normal curve with any given value of .To standardize it, we have to substitute the values of 0 1.then the equation gets transformed to

F(x) =

Same as the equation above with z except for the width adjustment.

The above function is just a normal curve but not a normal probability distribution function. Since we know that the probability lies between 0 and 1, with 1 as the maximum which also is the range, we have to divide our function by its calculated area to arrive at a result of 1. If you remember the snake fence analogy, you would have remembered that we equated the entire area to 1 to evaluate probability.

Say for example if the area of our fence is 50 units, to make it equal

to 1, I need to divide it by 50 units so that 1. This process of converting a normal function into a probability function is known as normalizing and the constant is known as a normalizing constant.

In the same way, we will be evaluating the area of the above function (curve) within the limits ∞, ∞ and divide it by its respective standard area to equate it to 1.

i.e. P(x) = /Area (F(x)) =1‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ B

Where P(x) is the probability density function.

We will integrate the above function between the limits ∞ to arrive at the area covered by the function

A = /

To resolve the above integral, let’s multiply the integral by itself (with variable ‘y’), then the equation becomes

= / ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐ 1

We use the Cartesian polar coordinate system(beyond the present scope) which is used to derive the expression for any point on a 2 dimensional plane by relating it with the distance of the point from the origin and the angle traversed by it,

We get two main expressions, x rcos and y = sin which by squaring and adding, we get the below expression

{Since , 1

By substituting it in Equation 1 and transforming the limits of the integral, we get

‐‐‐‐‐‐‐2

(Where 0 to ∞ indicates the radius aspect of a particle and 0 to 2 indicates the total angle range traversed by the particle on the axis, dx and dy are replaced by dr and d

Let 2 2

(After differentiating with respect to r and u)

Substituting in Equation 2

2

2 ∞0

2

2 0 1

2 0 1

√2

The above result that we obtained is known as the normalizing constant for the exponential function that will convert it into a probability distribution function. Let’s substitute it in B, then it becomes

/

√2

This is the formula for standard normal distribution.

The non transformed formula can be arrived at by dividing the constant with Equation A

/

√2

Where we multiply √2 with to arrive at the area of any non standard normal distribution as is the measure of the distribution width.This is how a plotted Normal curve looks like.

We can clearly see that beyond the z score of 3, there is an extremely small portion of area available and for all practical purposes, we mostly refer to z values from ‐3 to +3.Now that we have derived the Normal probability distribution, our next goal is to find out the area occupied by the curve which of course we know is equal to 1.But within in this curve, what if we want to find out how much area is present between the mean and first standard deviation ? Life would have been so easy if every distribution would have been so linear which is often not the case. Even then, it’s important to understand simplicity to get a hang of complexity.

Let’s take an example to understand this better.

Look at the above distribution pattern. It’s linear without any curves. Finding the area would be so easy as since we know it’s a rectangle with 6 divisions(deviations) and the total area should account to 1,so each box or rather in the above case, area between two deviation grid lines will be 1/6 = 0.16.So the probability that a given point will fall into any of the above grids is 0.16 or 16%.What if I don’t have a clarity of which grid it falls into but want to know what is the probability of the point falling in a box above 0?

From the above, we know that there are 3 boxes after 0 and since each box carries a probability weightage of 0.16, for 3 boxes it will be 3*0.16 = 0.5 or 50%.

What is the probability that a given point will fall in the first standard deviation range. Remember from what we have discussed,1 standard deviation means 1 deviation on either side of the zero(since here it’s the first standard deviation) which is 1.From the above,we know that there are 2 boxes that are present between ‐1 and +1,hence the probability will be nothing but the area of 2 boxes which is 0.16*2=0.33 or 33%.

What is the probability of a point falling in 0.5 standard deviation? Tricky? Not really. Since we know that 0.5 means 0.5 ,which means 0.5 boxes on either side of 0 (since 1 deviation equals 1 box here), the area covered will be 0.5 boxes *2 = 1 Box = 0.16 = 16%.Till now, we were only finding probabilities “in between”. What about “at”? Say I want to know the probability of the point lying at 1st standard deviation? Let’s think about how we define “area”. Area is always a product of 2 lines in simple terms. Whether we call it length or breadth or width is as per conventions involved which gives us the 2‐dimensional picture of any object. The lines in the above case are any two lines in the distribution using which we arrive at the area. But what about a point? Does a point have any area? Let’s see below for some important insights.

The first line we see is continuous and as it is breaking down further, it is becoming discontinuous or discrete. Now what are these new terminologies Continuous and Discrete? It’s inherent in the words themselves. When we say continuous, it’s without any break in the “Flow”. Break is any pause or obstruction that is not allowing a point to be continuous. Which means what? If we look at the fourth line above which is a dotted line, we see that there is a regular pause (gap or space) at every single instant because of which it demarcates and gives an identity to every single point out there. This is what we term as “Discrete”. Which means it can take one and only one value. We can see that this dotted discrete line as it is going up is increasing in size (with spaces) and finally when we eliminate the spaces or gaps between these spots, it becomes a “Continuous” line.

We cannot identify any unique point on a continuous line and the entire line is a single entity. But we know that this continuous line is made up of discrete dots which cease to be discrete after a threshold. But what is this threshold no one can actually quantify though we can define to an extent. Assume the above continuous line was made up of 100 points or dots which are discrete. Now my definition of a dot as discrete will cease to be so when I increase the magnification. The dot starts looking like a continuous line which further can be broken down into dots which when magnified will again look like a continuous line and the process will go on ad infinitum. You can get transported from a normal level to a micro, nano, pico, femto level…. So on till you end up at the last atom (or rather the more fundamental quark) that made the line complete!! Remember how a gigantic star looks like a twinkling spot from the earth which itself is a negligible spot from the star’s perspective!! It’s all relative. This is why for the sake of approximation to overcome the granularity issues, the area of any given point on a continuous scale is 0.

One very important insight here is that dots integrate to form a line or line differentiates to give a dot. Sounds like calculus isn’t?!! Drops integrate to form an ocean or an ocean differentiates to give a drop. We can use the below notations.

And

Integration (Elongated s) means “sum of” which sums up all the values in a given range.

Differentiation means “differential of” which figures out the rate at which a dependent variable changes with respect to independent variable.

Remember we arrived at the result of √2 by integrating (summing up) the area between ∞.

From the above example,

(Since we said 100 points make a line)

Summation does not mean 1+2+3+.......+100.It means sum of a smallest fixed quantum “n” number of times which is

… . 1+1+1+.......100 times and

(The basic possible elementary unit)

Which means when you are trying to break 1 continuous line made of 100 discrete dots into its least possible discrete unit (here in this case its 1), we are differentiating it. And our normal distribution is a continuous distribution (remember the nature of exponential processes) and not discrete. Hence we need to always specify a search range (upper and lower limit) when it comes to finding the address of a point in the normal curve. This is the crux of figuring out probabilities for any probability distribution.

The only challenge is that not all distributions are as straight forward as the above one. Since the above one is a rectangular distribution (with a straight line parallel to X‐axis), the frequency limit which is the height of a point on X axis to the horizontal line is always a constant unlike a distribution in which the frequency changes continuously owing to a continuously fluctuating line (curve) to which the distance of a point from X axis is continuously changing. We can no longer treat it as a regular polygon like a square or a rectangle and calculate its area but we need to resort to calculus.

Ever remember what we do when we have to figure out the probabilities of a given variable within a specified standard deviation of a normal distribution?

We are asked to look into the Z‐Table (at the appendix of any fat statistics book) which gives out the probabilities within any given range. I’ll show you how this is done. In order to find out the area within a normal curve we translate a given complex function to its equivalent polynomial form and calculate the probabilities accordingly. In this case, we use a Taylor polynomial expression which i will be deriving and will demonstrate an example as to how it is used to evaluate the area of the curve within a specified range.

A Taylor series is particularly used to translate a given smooth curve (here in this case the bells curve) of a given centre (mean) represented by the complex function

/

√2

Into its polynomial form which can be then used to calculate the cumulative (total) area occupied by the curve within any specified interval. The advantage of converting it into a Taylor polynomial is that the entire function gets converted into its equivalent numeric form which can be easily integrated over a given interval. But there is a slight error involved in converting a function to its equivalent Taylor form which is adjusted to arrive at a result.

From the below expression

/

√2

We know that √2 is used as a normalizing constant (which means it makes the entire function equate to 1).

Let’s just look at the numerator which is / which is of the form where /2 .Remember how we arrived at the value of

the exponential function ? For 1, it becomes = = 2.718

Now 1! ! !

, ∞ ∞ (from the discussion on exponential function)

By substituting 1, we get

1! ! !

= 2.718

Now for / which is of the form (where /2)

1 1! 2! 3! 4! 5! 6!

1 1 2 6 24 120 720

By substituting /2

We get

/ 12 8 48 384 3840 46080

Which is an alternate series of

Now for the example sake, lets figure out the area occupied by the above polynomial form within the range 0 to 1.Here 0 represents the mean which is the centre of the normal distribution and 1 represents the 1st standard deviation from the mean (on either side of 0) . 1 since it is symmetrical curve (the shape on the left side of the mean is the mirror image of the shape on the right side of the mean and vice versa).

As i have explained before, the purpose of integration is to sum up the area (represented by a function) within a given range.

So we need to apply integration for the above function within the limits 0 and 1 which is

12 8 48 384 3840 46080

6 40 336 3456 42240

Where ( ) (an integration operation)

116

140

1336

13456

142240

(After substituting 1)

0.855623

There is even an error term 0.000002 that is included in the above arrived value.

This means

0.855623

But we need to evaluate the above function by including the

normalization constant which is √

Hence

1√2

1√2 3.14

0.3989 0.855623

0.3413

Which is nothing but the area covered by the curve from 0 to 1 which is the 1st deviation on the right hand side. Since it is a symmetrical curve, the area covered by the curve from 0 to ‐1 on the left hand side will also be the same, hence the area covered by the 1st standard deviation will be

1 0.3413 (Right hand side of 0)+0.3413(Left hand side of 0)

= 0.6826 68.26%

Which means the first standard deviation of a normal curve covers 68.26% of the total area of the curve (with 34.13% on each side of the curve).

In the same way, if we calculate the integral of the curve within the limits 0 and 3 which is 3 deviations from 0, we get

1√2

0.5 (This is the area on the right hand side of the mean)

Since it is a symmetrical one, even the area on the left side of the curve will be equal to 0.5, hence 3 0.5 0.5 1

Hence 1

√20.997

Remember almost all the data points (99.7%) fall into 3 but not all. There will be approximately 0.30% of data points (0.13% on either side of the mean) falling into the 4th standard deviation. This means

1√2

1

We can see that the maximum height a standard normal curve can

attain (at the center, mean) = √

= 0.398 0.4

There can be much more that can be spoken about a normal distribution(probably in one of my future excursions) but my prime aim in this paper was to present the details of the genesis, evolution and utility of a normal curve in a simplified manner and I hope the same is achieved in this regard.

Feedback most welcome at [email protected]

Regards,

Kalyan Sunkara

demystifying gaussian distribution (2)

Documents