📚Study Guide: Inference for Quantitative Data: Means
Unit 7: Inference for Quantitative Data: Means
Inference for means follows the same four-step structure as proportions but uses t-distributions instead of the normal distribution because the population standard deviation is almost always unknown. You will construct one-sample t intervals and perform one-sample t tests, then extend these to two-sample procedures for comparing means from independent groups and paired t procedures for dependent data. The t-distribution has heavier tails than the normal distribution, accounting for extra uncertainty from estimating sigma with s. Degrees of freedom determine the exact shape of the t-distribution. Checking conditions--Random, Normal/Large Sample, and Independent--is essential for valid inference.
Key Concepts
- t-Distribution: Similar to standard normal but with heavier tails; determined by degrees of freedom (df). As df increases, t approaches z.
- One-Sample t Interval: x_bar +/- t* * (s/sqrt(n)). Use when estimating a population mean with unknown sigma.
- One-Sample t Test: t = (x_bar - mu0) / (s/sqrt(n)). Compares sample mean to a hypothesized value.
- Two-Sample t Test: Compares means from two independent groups using a pooled or unpooled standard error; AP uses unpooled with conservative df.
- Paired t Procedures: For dependent data, compute differences for each pair and perform a one-sample t analysis on the differences.
- Degrees of Freedom: For one sample, df = n-1. For two samples, use the smaller of n1-1 and n2-1 as a conservative approximation.
- Normal/Large Sample Condition: Population normally distributed or n >= 30; for smaller samples, check a graph for strong skewness or outliers.
Vocabulary
- Standard error: The estimated standard deviation of a sampling distribution, using sample statistics (e.g., s/sqrt(n)).
- Degrees of freedom: A parameter of the t-distribution related to sample size, reflecting the amount of information available.
- Paired data: Two measurements on the same individual or matched pairs, analyzed via differences.
- Pooled standard deviation: A weighted average of group standard deviations used when assuming equal population variances.
Formulas
- One-sample t: t = (x_bar - mu0) / (s/sqrt(n))
- One-sample CI: x_bar +/- t* * (s/sqrt(n))
- Two-sample SE: sqrt[ s1^2/n1 + s2^2/n2 ]
- Two-sample t: (x_bar1 - x_bar2) / SE
- Paired t: t = d_bar / (s_d/sqrt(n)) where d_bar is mean difference
Common Mistakes
- Using z instead of t when sigma is unknown; this understates uncertainty and produces invalid results.
- Treating paired data as independent; paired designs are analyzed with differences, not by comparing separate means.
- Forgetting to check the normality condition for small samples, leading to invalid t-procedures.
- Using pooled procedures on the AP exam unless explicitly instructed; the default is unpooled two-sample t.
AP Exam Strategies
- Explicitly state that sigma is unknown as justification for using a t-procedure.
- For paired data, clearly define the difference and show that the analysis is one-sample on those differences.
- When comparing means, use comparative language: "We are 95% confident that the true mean difference is between..."
- Check the normal condition with a graph or the Central Limit Theorem; mention both for small samples.
Real-World Applications
- Medicine: Paired t-tests evaluate blood pressure changes before and after treatment on the same patients.
- Education: Two-sample t-tests compare test score means between teaching methods.
- Manufacturing: Confidence intervals estimate mean product dimensions to ensure they meet specifications.