Unit 2: Exploring Two-Variable Data

Scatterplots, correlation, linear regression, and residual analysis

Unit Resources

Select a resource below to start studying.

📚Study Guide: Exploring Two-Variable Data

Unit 2: Exploring Two-Variable Data

When analyzing relationships between two quantitative variables, we use scatterplots, correlation, and linear regression. This unit teaches you to describe the direction, form, and strength of associations, compute and interpret the correlation coefficient r, and fit least-squares regression lines. You will learn to distinguish between explanatory and response variables, interpret the slope and intercept in context, and analyze residuals to assess the appropriateness of a linear model. Transformations to achieve linearity, such as logarithmic transformations, are also introduced. The AP exam emphasizes interpreting regression output and residual plots rather than performing extensive hand calculations.

Key Concepts

  • Scatterplots: Graphical display of two quantitative variables; describe direction (positive/negative), form (linear/nonlinear), and strength (weak/moderate/strong).
  • Correlation r: Measures the strength and direction of a linear association, ranging from -1 to 1. Correlation has no units and is unaffected by switching x and y.
  • Least-Squares Regression: Minimizes the sum of squared residuals. The line is y_hat = a + bx.
  • Slope Interpretation: For every 1 unit increase in x, the predicted y changes by b units.
  • Residuals: e = y - y_hat. Residual plots should show random scatter around zero for a linear model to be appropriate.
  • Coefficient of Determination r^2: The proportion of variability in y explained by the linear model with x.
  • Influential Points: Points that, if removed, significantly change the regression line, often extreme in the x-direction.
  • Transformations: Log or power transformations can linearize nonlinear relationships for modeling.

Vocabulary

  • Explanatory variable: The variable that may explain or predict changes in the response variable (x-axis).
  • Response variable: The outcome variable measured in a study (y-axis).
  • Residual: The difference between the observed value and the value predicted by the regression line.
  • Least-squares regression line: The line that minimizes the sum of the squared residuals.
  • Influential point: A data point whose removal causes a substantial change in the regression equation.
  • Lurking variable: A variable not included in the analysis that may influence the relationship between the explanatory and response variables.

Formulas

  • Regression line: y_hat = a + bx
  • Slope: b = r * (s_y / s_x)
  • Intercept: a = y_bar - b*x_bar
  • Residual: e = y - y_hat
  • r^2 = (explained variation) / (total variation)

Common Mistakes

  • Assuming correlation implies causation; always consider lurking variables.
  • Interpreting the y-intercept outside the meaningful range of x-values in the data.
  • Using the regression line to predict far outside the observed x-range (extrapolation).
  • Concluding a linear model is good just because r is high without checking the residual plot for patterns.

AP Exam Strategies

  • When interpreting slope, always include "predicted" or "on average" because regression gives predictions, not exact values.
  • On FRQs, describe residual plots by stating whether there is a pattern; random scatter supports linearity, curves suggest transformation.
  • If asked for a prediction, substitute into y_hat = a + bx; if asked for a residual, compute observed minus predicted.
  • Report r^2 as a percentage when explaining how much variation is accounted for by the model.

Real-World Applications

  • Economics: Regression models relate advertising spending to sales revenue for budget allocation.
  • Environmental Science: Scatterplots correlate CO2 levels with global temperature anomalies.
  • Medicine: Linear models predict patient recovery time based on initial severity scores.

Practice Quiz: Exploring Two-Variable Data

Answer each question one at a time. Click an option to select your answer.

Question 1 of 150
Question
Loading...
Click to flip
Answer
Loading...
Click to flip back 🔀 Shuffle
1 / 59

🎥Free Video Lessons: Exploring Two-Variable Data

Watch these unit review videos directly on our site.

AP Statistics Unit 2 Full Summary Review Video by Michael Porinchak - AP Statistics & AP Precalculus

AP Statistics Unit 2 Full Summary Review Video Part 2 by Michael Porinchak - AP Statistics & AP Precalculus

Top 10 Tips for AP Statistics Unit 2 Exploring Two Variable Data by Michael Porinchak - AP Statistics & AP Precalculus

📄Cheat Sheet: Exploring Two-Variable Data

Quick reference for Exploring Two-Variable Data. Print this out and review before the exam!

Exploring Two-Variable Data Cheat Sheet

Essential Formulas

  • y_hat = a + bx
  • b = r * (s_y / s_x)
  • a = y_bar - b*x_bar
  • Residual: e = y - y_hat
  • r^2 = explained / total variation

Key Definitions

  • Correlation r: strength/direction of linear association (-1 to 1)
  • Residual: observed - predicted
  • Influential point: extreme x that changes regression substantially

Problem-Solving Steps

  1. Create scatterplot; describe direction, form, strength.
  2. Compute regression line or interpret given output.
  3. Examine residual plot for random scatter.
  4. Interpret slope and r^2 in context.

Calculator Tips

  • Enter explanatory in L1, response in L2, run LinReg(a+bx) for slope, intercept, r, and r^2.
  • Turn on DiagnosticsOn to display r and r^2 on TI-84.
  • Use RESID list after regression to plot residuals against L1.

🔬Ultimate Review Packet Materials

Download official review materials for this unit.

No URP materials available for this unit yet.

Check back soon for study guides, practice questions, and review videos.

← Back to AP Statistics