Example Output File Master

EXAMPLE OUTPUT FROM FILE MASTER



A set of 150 measurements of sepal length, sepal width, petal length and petal width is used as a example for output from File Master. This "classic" data set will serve as the example set for several of the exercises in this course.

Measure Sepal L Sepal W Petal LPetal W
Mean5.833.063.761.20
Variance.700.193.120.58
Standard Deviation.837.4361.770.76
Coefficient of Variation0.140.140.470.63
Maximum7.904.406.902.50
Minimum4.302.001.00.10
Range3.6902.405.902.40
% Variance15.24%4.14%67.98%12.65%
Skewness0.320.31-0.24-0.10
Kurtosis2.353.141.581.64

The mean is one measure of centeral tendency -- the average value. Add up all of the values and divide by the number of values. The median is the "middle" value and the mode is the most frequently occuring value.

The variance is a measure of spread or dispersion about the mean. Units of the variance are the units of the variable squared. The standard deviation is the square root of the variance and has the same units as the raw data.

The coefficient of variation is a measure of relative variability -- the ratio of the standard deviation to the mean. Values drawn from a normal distribution will have a coefficient of variation less than about 0.30. Values less than 0.30 are not necessarily drawn from a normal distribution. Note that the coefficient of variation has no units thus allowing a comparison of variables measured in different units and of different orders of magnitude. Note that the variable height has the greatest absolute variance but the smallest relative variation (relative to its mean value)

The sum of all of the variances of the variables in a matrix is sometimes termed the total information content of the matrix. % Varance show how much each variable contributes to the total variance.

Skewness is a measure of the symmetry of the distribution; symmetrical distributions (like the normal distribution) have a skewness of 0.0. If large values are found in the "tail" of an asymmetrical distribution the skewness is positive.

Kurtosis is a measure of the "peakedness" of a distribution. A normal distribution has a kurtosis of 3.0.

Correlation Coefficients

Sepal L Sepal W Petal LPetal W
Sepal L1.00-.1140.8550.801
Sepal W-.1141.000-.428-.366
Petal L0.855-.4281.0000.962
Petal W0.801-.3660.9621.000

Correlation coefficients measure the "degree of linearity" between a pair of variables. r ranges from 1.0 (a straight line with a positive slope) to -1.0 (a straight line with a negative slope. "r" values have no units and are useful for comparing variables measured in different units.

The correlation between Sepal L and Sepal W is equal to the correlation between Sepal W and Sepal L so that the matrix is symmetrical. The correlation between a variable and itself is 1.0 (that is, height plotted versus height would give a straight line with a positive slope.

The covariance is another measure of pair-wise variability but has units which are given by the product of the units of measurement. The covariance is given as the product of the two standard deviations times the correlation coefficient. Compute the covariance between Sepal L and Sepal W.

Return to the File Master Exercise