DataDesk1

The Central Limit Theorem

You should have gone through the Central Limit Demonstration material before beginning this exercise. Will your machine explode if you haven't? No, but I think this will make more sense if you follow the sequence.

  1. Manip > Generate Random Numbers. Generate a uniformly distributed set of ____ values. These values will range between 0 and 1 and there should be an equal number of values in the intervals 0 - 0.1, 0.1 - 0.2, 0.2 - 0.3, ..... 0.9 - 1.0. The mean of the set should be 0.500 (the median should also be 0.500) and the standard deviation should be 0.289.

    Generate distributions with:

    50 values

    100 values

    1,000 values

    10,000

    For each distribution plot a histogram. Scale the histogram so that each bar is 0.1 units. (remember to use the triangle on the histogram to easily change the scale). For each distribution plot a normal probability plot.

    Using a table format, summaries the results for the four distributions. That is, compare the observed mean and standard deviation (remember you can use Calc > Summaries > As Variables to obtain the summary statistics - remember to also select the variable by clicking on it!) with the expected mean and standard deviation.

    Summarize, in words, your results.

  2. The central limit theorem states that the theoretical sampling distribution of the mean of independent samples, each of size n, drawn from a population with mean u and standard deviation s is approximately normal with mean u and standard deviation s divided by n1/2, the number of samples.

    Therefore, if you were to sample a uniform distribution by drawing 100 samples, compute the mean and do this 100 times, the distribution of the 100 sample means should have mean mu and standard deviation of 0.0289 (0.289 divided by the square root of 100).

    Click on the variable that contains the 10,000 uniformly distributed values. Use Manip > Sample. Select 1% (100) and repeat 100 times. Each sample in the set of 10,000 has an equal opportunity of being selected. Do not compute sample indices - click it to the off position. Select several of the resulting sets of 100 samples and view the histograms. Convince your self that the samples "look" like uniform distributed values - keeping in mind that only 100 were drawn from the population of 10,000. In this case, mu (0.500) and sigma (0.287) are the target population parameters.

    Select all 100 samples by using command A (or select all). Calc > Summaries > As Variables will give vectors of the sample means and standard deviations. Plot the histogram of the sample means and compare this distribution with the target population distribution. Note that the distribution of sample means is much more restricted and that there is a "hump" in the distribution. Click on the vectors containing the sample means Calc > Summaries > As Variables will give you the mean of the sample means and the standard deviation of the sample means. Compare these two means with the expected values given above. What is the mean of the standard deviations? How does this value compare with the target population standard deviation.

    Draw 10 samples (0.1% of the total number) and repeat 100 times. What should the means of the sample means be? the standard deviation of the sample means? Is the agreement reasonable?

    Write a short paragraph in which you summarize your results. Include insight gained from the demonstration as well as from DataDesk. Comment on problems that may arise if you draw a small number (n) of samples from a target population.