DataDesk1

Ratio Formation

In tests of linearity, we tested the observed correlation against a target population correlation coefficient (rho) of 0.0. I think that this has a nice intuitive feel as a plot of two uncorrelated variables gives a "buckshot pattern" which looks like "noise". However, are there cases where rho=0 is not a satisfactory measure of a lack of linear association?

Using Data Desk generate two uniformly distributed random variables with a correlation of 0.00. For variable A let the mean be 100 and the standard deviation be 10. For variable B let the mean be 25 and the standard deviation be 5. Generate a sample of 100 observations.

Compute means and standard deviations and see if the summary statistics generally agree with the model values.

Prepare a scatter diagram of A versus B and compute the correlation coefficient. Manip > Transform > New Derived Variable and create a variable A/B. Compute the matrix of correlation coefficients for A, B, and A/B. Recall that the parent variables A and B are essentially uncorrelated. Is this property inhereted by A/B and the parents?

Prepare scatter diagrams of A/B versus A and A/B versus B. They are quite different aren't they?

This is one example of what Pearson termed Suprious Correlations. When ratios (formed from uncorrelated variables) are correlated with a part of the ratio (that is, A is a part of A/B), a non-zero correlation may be induced. The greater the difference between the two coefficients of variation, the greater the magnitude of the spurious correlation.

Experiment with the ratio B/A and its correlation with A and with B. Generalize as to how to produce the strongest spurious correlation .... when the variable with the greatest coefficient of variation is in the numerator or in the denominator?

The same effect is observed in ratios of the type A/B versus C/B - common denominators. Find an example in the literature of the use of ratios of the type A/B versus A, or A/B versus B, or A/B versus C/B. Does it appear that the users of such diagrams are taking spurious correlation into account?

A brief write up of your experiments will suffice.