Cracking Normal Distribution Using Excel
I had this idea when I had a lecturing assignment to deliver matriculation program on Statistics at the Graduate of Management Program, University of Indonesia. This matriculation program is an introduction for new students. I knew that some of the students already had at least basic statistics on their previous studies. Therefore, I tried to find some breakthrough to convey the concept with something that they may not have learned yet, particularly on distribution concept. I found an interesting Excel spreadsheet on vertex42.com. Then, I cracked it. When I presented this spreadsheet in front of the students, it seems quite fascinating for them. However, I think some of them might find it confusing and difficult. Some of them asked me to list the steps to build it. Therefore, I try to write down the steps of this exercise using Microsoft Excel 2007 as follows.
- Write the assumptions on normal distribution
- First assumption is the parameter; mean and standard deviation. In this example the mean is 20 and standard deviation is 5.
- The second assumptions are graph limits indicating standardized normal distribution limits (zmin and zmax). For instance, zmin is -3 and zmax is 3.
- The third assumptions are minimum and maximum value of random variables as cumulative probability value (xmin and xmax). For instance, xmin is 12 and xmax is 28. d. You may also generate random value for normal distribution using RAND(), and link this random number into NORMDIST function where NORMDIST(random value, mean, standard deviation, cumulative). Therefore, random normal distribution is =NORMDIST(RAND(), mean, standard deviation). Highlight the number cells with yellow color to indicate that one may change these numbers later.
- Develop a table and graphs that describe the standardized normal distribution value, z; random variable, X; probability density function, f(x); and cumulative distribution function, F(x). Let's assume that there are 41 classes, therefore we have 40 bins.
Calculating z value. The first z value is zmin, while the rests are calculated using (zmin–zmax)/40 + previous value. The last z value should be the zmax.
Calculating random variables, x. Using equation as follows:
Calculating the pdf and CDF. The pdf can be calculated using this formula
In our spreadsheet, we can use NORMDIST function where NORMDIST(x, mean, standard deviation, cumulative). Do not forget to lock the mean and standard deviation value using F4 button. For cumulative, we write FALSE to calculate the pdf and TRUE to calculate the CDF in separate cells respectively.
Draw a graph describing the pdf and CDF using CHART button. Click INSERT>SCATTERED (WITH SMOOTH LINE). On the graph, using your mouse, right click and select SELECT DATA>ADD DATA. There are two graph: the pdf and CDF. For pdf, insert the x values as series X values and insert the pdf values as series Y values. For the CDF, use the same steps and insert the CDF values as series Y values. On the graph, right click on CDF graph then click FORMAT DATA SERIES>SERIES OPTIONS>SECONDARY AXIS.
- Develop left tail distribution graph.
Develop a table comprises standardized normal distribution value, z; random variables, x, and probability density function, f(x). You may copy it from the previous table. We can use the same previous steps. The difference is that we use only one minimum value of z (zmin) and the way to random variable calculation. For the first cell of random variable, we can copy it from the previous table. However, for the rest, we can use the formula: (xmin-previous random variable value)/40 + previous random variable value. The graph can be draw using the following steps: a. Draw the pdf graph using input values from the previous table (Step 2d) b. Input left tail graph using the same step where series name is "Left Tail", series X values and Y values are random values, x, and pdf respectively from current table. c. On the graph, right-click on left tail curve, then click LAYOUT>ERROR BARS>MORE BAR OPTIONS>MINUS>NO CAP>PERCENTAGE: 100. Change the color as you like and click LINE STYLE>3 pt.
- Develop right tail distribution graph.
We use the similar steps as Step 3. For the first cells, we use the maximum value of z (zmax). For the first cell of random variable, we can copy it from the previous table (Step 3). For the next cells of random variable values, we can calculate using the formula: previous random variable value + (xmax– previous random variable value)/40. The graph can be draw using the same technique as stated in Step 3.
- Develop a graph depicting the area between right- and left-tail areas.
We can copy the table from table in Step 4. In this table, we change the first cell of random value with xmin. The other values of random variable and the pdf can be calculated using the same step as Step 4.
- Cumulative probability calculation. For step 3-5, we can calculate cumulative probability or area below the (tail) curves for each step respectively.
- Left-tail area: =NORMDIST(xmin, mean, standard deviation, TRUE)
- Right-tail: =1- NORMDIST(xmax, mean, standard deviation, TRUE) c. Areas between left and right tail =1- left-tail area -right-tail area.
- Make a scroll box for standardized normal distribution graph.
To draw this graph, we need to make cells that describe standardized value, z; random value, x, and the CDF for a point on CDF curve. Standardized value can be calculated as follows: zmin + input value/100 * (zmax–zmin). The input value is connected using cell link in SCROLL CONTROL BOX. To use this box, click DEVELOPER>INSERT>FORM BOX>SCROLL BOX. Draw the scroll box, and then click CELL LINK, link with the input value and PAGE CHANGE into 5
- Draw a standardized normal distribution graph that connected with scroll box.
Before we draw the graph, we need to make one more table with different number of bins, which are 81 bins. The standardized value is needed to calculate the first random value where x=z * sigma + mean. For the rest of random values, the calculation is random value in Step 7 – first random value in Step 8, divided by 80 and added by previous random value respectively. The graph can be draw as follows:
- Area graph. X and Y values are random values and the pdf from table in Step 8 respectively. Perform Step 3c to carry out the area under the curve.
- Density graph. X and Y values are random values and the pdf from table on Step 2. c. Cumulative graph. X and Y values are random values and the CDF from table on Step 2. d. Cumulative label. X values are the two random variables and Y is the CDF value from Step 8.
You can change perform any standard statistical procedures of normal distribution for example using changing scroll find the pdf and CDF. You may also use GOAL SEEK to find probability of left- or right-tail area. The spreadsheet can be download here. You can find brief explanation about normal distribution here. Enjoy!