📚Study Guide: Inference for Categorical Data: Proportions
Unit 6: Inference for Categorical Data: Proportions
This unit launches formal statistical inference by constructing confidence intervals and performing hypothesis tests for population proportions. You will learn the four-step process for inference: STATE, PLAN, DO, CONCLUDE. For confidence intervals, you estimate p using p_hat and a critical z* value. For significance tests, you assess whether sample data provide convincing evidence against a null hypothesis. The conditions--Random, Large Counts, and 10%--must be checked and stated explicitly. Type I and Type II errors are introduced, connecting test decisions to real-world consequences. Mastery of proportions sets the stage for all subsequent inference units.
Key Concepts
- Confidence Interval for p: p_hat +/- z* * sqrt[ p_hat(1-p_hat)/n ]. Captures the true proportion with a stated confidence level.
- Hypothesis Test for p: H0: p = p0 vs Ha: p <, >, or != p0. Use p0 in the standard error for the test statistic.
- Test Statistic: z = (p_hat - p0) / sqrt[ p0(1-p0)/n ]. Measures how far the sample proportion is from the null value in standard errors.
- p-value: The probability of obtaining a test statistic at least as extreme as observed, assuming H0 is true. Small p-values indicate evidence against H0.
- Type I Error: Rejecting H0 when it is actually true. Probability = alpha.
- Type II Error: Failing to reject H0 when it is false. Probability = beta.
- Power: The probability of correctly rejecting a false null hypothesis; power = 1 - beta.
Vocabulary
- Null hypothesis (H0): The claim being tested, usually a statement of no effect or no difference.
- Alternative hypothesis (Ha): The claim we seek evidence for, suggesting a difference or effect.
- Significance level (alpha): The threshold for deciding whether a p-value is small enough to reject H0.
- Margin of error: The range of values above and below the sample statistic in a confidence interval.
- One-sided test: An alternative hypothesis that specifies a direction (less than or greater than).
- Two-sided test: An alternative hypothesis that does not specify a direction (not equal to).
Formulas
- CI: p_hat +/- z* * sqrt[ p_hat(1-p_hat)/n ]
- Test statistic: z = (p_hat - p0) / sqrt[ p0(1-p0)/n ]
- ME = z* * SE
- Sample size for desired ME: n = [z*^2 * p_hat(1-p_hat)] / ME^2; use p_hat = 0.5 if unknown
Common Mistakes
- Using p_hat instead of p0 in the standard error when calculating the test statistic for a hypothesis test.
- Confusing the meaning of the confidence level; it refers to the method's success rate, not the probability that a specific interval contains p.
- Accepting the null hypothesis rather than failing to reject it; absence of evidence is not evidence of absence.
- Forgetting to check the Large Counts condition using both p_hat and n for confidence intervals, or p0 and n for tests.
AP Exam Strategies
- Follow the four-step process explicitly in FRQs: State hypotheses/parameter, Plan (name test and check conditions), Do (calculations), Conclude (interpretation in context).
- When stating conclusions, always refer to the population proportion p, not the sample proportion p_hat.
- If asked about errors, define both Type I and Type II in the specific context of the problem.
- For two-sided tests, double the tail probability when computing p-values from one tail.
Real-World Applications
- Public Health: Testing whether the proportion of vaccinated individuals meets herd immunity thresholds.
- Marketing: Estimating the proportion of customers likely to purchase a new product from survey data.
- Elections: Pollsters use confidence intervals to report candidate support proportions with margins of error.