📚Study Guide: Collecting Data
Unit 3: Collecting Data
Proper data collection methods are essential for valid statistical inference. This unit distinguishes between observational studies and experiments, emphasizing that only well-designed experiments can establish causation. You will learn about sampling methods--simple random, stratified, cluster, and systematic--and recognize biases such as undercoverage, nonresponse, and response bias. Random selection supports generalization to populations, while random assignment supports causal claims. Blocking, control groups, placebo effects, and blinding are key experimental design features. The AP exam tests your ability to identify study types, critique methodology, and explain how randomization reduces bias and variability.
Key Concepts
- Observational Study vs Experiment: Observational studies observe without intervention; experiments impose treatments. Only experiments can support causation.
- Simple Random Sample (SRS): Every group of size n has an equal chance of being selected.
- Stratified Sampling: Population divided into homogeneous strata; SRS taken from each stratum. Reduces variability.
- Cluster Sampling: Population divided into heterogeneous clusters; entire clusters are randomly selected. Easier logistically.
- Systematic Sampling: Selecting every kth individual from a list after a random start.
- Confounding Variable: A variable related to both the explanatory and response variables that prevents establishing causation.
- Random Assignment: Allocates subjects to treatment groups randomly, balancing confounders and permitting causal inference.
- Blocking: Grouping similar subjects together before random assignment to reduce variability.
- Placebo and Blinding: Control for psychological effects by using fake treatments and keeping subjects/researchers unaware of assignments.
Vocabulary
- Population: The entire group of individuals about which we want information.
- Sample: A subset of the population from which we collect data.
- Bias: A systematic error in choosing respondents or measuring responses that favors certain outcomes.
- Undercoverage: When some members of the population are inadequately represented in the sample.
- Nonresponse: When individuals chosen for the sample cannot be contacted or refuse to participate.
- Response bias: Systematic pattern of inaccurate answers to a survey question.
- Treatment: A specific condition applied to the individuals in an experiment.
Formulas
- No computational formulas; focus on design principles and justification language.
Common Mistakes
- Confusing stratified sampling (sample some from all groups) with cluster sampling (sample all from some groups).
- Claiming causation from an observational study or survey without random assignment.
- Ignoring the distinction between random selection (generalizability) and random assignment (causation).
- Overlooking placebo effects and lack of blinding as threats to internal validity in experiments.
AP Exam Strategies
- When describing an experimental design, explicitly name random assignment, control group, and replication.
- If asked to identify a sampling method, describe the process step-by-step and justify why it matches the named method.
- To reduce confounding, propose blocking on a known related variable before random assignment.
- In FRQs, use precise language: "random assignment allows us to infer causation," while "random sampling allows generalization."
Real-World Applications
- Clinical Trials: Randomized controlled trials test drug efficacy using placebo and double-blind protocols.
- Political Polling: Stratified sampling ensures demographic representation in election surveys.
- Quality Control: Cluster sampling inspects entire batches from selected production lines.