DataDesk1

Ratio Formation - Part II

Percentages are ratios. Given A, B, and C, percentages are formed by dividing each element by the sum of the three elements and multiplying by 100. A matrix in which the row sums are constant is termed a closed matrix. If the row sums are not constant in a matrix, the matrix is said to be open.

For percentages the constant is 100.

That is, crudely, %A = 100*(A/(A+B+C)) or 100*(A/Row Sum). Write the expression for the ratio %A/%B in terms of the variables A, B, and C. Note that %A/%B = A/B so that percentage formation preserves the ratios of the original variables.

Given a set of closed data (percentages) you know that the ratios of the original values are preserved by the ratios of the percentages. However, you cannot determine the original values given just the percentage values. There are an infinite number of values that would give the same ratio: 20/10, 100/50, 2/1, etc.

What about the other properties of the percentage variables? How, for example, what is the relationship between the variance of %A and the variance of A. The question for exploration is, therefore, does the percentage formation process change the statistical descriptors - both univariate and bivariate.

Using Data Desk, generate three random variables with 1000 observations each. These will be labeled Normal 1, Normal 2 and Normal 3.

    Let the mean of the first be 100 and the standard deviation 10
    Let the mean of the second be 45 and the standard deviation 15
    Let the mean of the third be 100 and the standard deviation 30.

Using Data Desk options, assess the strength of linear association between the three variables and prepare a table of the summary statistics you select. I would include the mean, the median, the variance, the standard deviation and the skewness. Do the data appear to fit a normal distribution?

Compare the statistics in the open set with those in the closed set. Generalize the results. Effective use of graphics can enhance the report. Pay particular attention to the correlation coefficients. Non zero correlations in the closed data set were induced by the process of forming percentages.

Recall that you can test an observed correlation against a null of 0.0 and that as N increases, the test correlation approaches 0.0. For studies in which there are more than a few hundred pairs of observations, the null value is less than 0.10. Is 0.0 necessarily the appropriate null value against which to test correlations between percentages? Comment about the widespread use of percentages in analyzing geological data.