Scatterplots & Trend Lines on the Digital SAT
What a scatterplot tells you
A scatterplot shows individual data points on the xy-plane. Each point represents one observation: the x-coordinate is one measurement and the y-coordinate is another.
Positive correlation: as x increases, y tends to increase. Line of best fit slopes upward.
Negative correlation: as x increases, y tends to decrease. Line slopes downward.
No correlation: no clear trend; the points scatter randomly.
Strong vs weak: strong = points cluster tightly around the line. Weak = points are spread out.
The line of best fit
The line of best fit approximates all the data — roughly the same number of points sit above as below. You don't need to calculate it; the SAT shows it on the graph.
Critical distinction: the line gives predicted values, not actual data. When the SAT asks for the predicted y at a given x, read from the line, not from the nearest data point.
Residual = actual − predicted. The vertical distance from a data point to the line of best fit.
Estimating slope and intercept by eye
For most scatterplot questions, you don't need exact arithmetic — just a slope estimate that's good enough to eliminate wrong choices.
- Check the sign first. Slopes up = positive slope. Slopes down = negative slope. This eliminates half the answer choices in seconds.
- Pick two clear points on the line (NOT data points — points on the line itself).
- Compute rise / run. Difference in
ydivided by difference inx.
Reading the y-intercept: extend the line to x = 0. The y-coordinate there is the y-intercept. In context, this is usually the "starting value" when the input is zero.
Linear vs exponential models
The SAT frequently asks "which model best fits this data?"
- Linear data: as
xincreases by one,ychanges by the same amount. The graph looks like a straight line. - Exponential data: as
xincreases by one,ymultiplies by the same factor. The graph curves up (growth) or down (decay) steeply.
Quick visual test: if the data points form a roughly straight line → linear. If they curve sharply or plateau → exponential. If the values double, triple, or half over equal x-intervals → exponential growth/decay.
Stuck on a scatterplots & trend lines problem?
Prepiii's AI tutor watches your scratchwork and tells you exactly where the logic broke — not just whether the answer was right.
Solving with Desmos
Desmos turns scatterplot questions into one-step problems. Three techniques:
1. Plot data with a table
table, enter x values in column 1 and y values in column 2. Desmos plots the points automatically.2. Find the line of best fit with regression
y_1 ~ m·x_1 + b. Desmos returns the best-fit slope m and intercept b automatically. Faster than estimating by eye.3. Test exponential fit
y_1 ~ a·b^(x_1) to fit an exponential model. Compare the residuals to the linear fit — whichever has smaller residuals is the better model.For the full set of Desmos techniques across the entire test, see our Desmos & Test Tools guides.
Common mistakes
Reading the data point when the question asks for the line
Predicted value = read from the LINE. Actual value = read from the DATA POINT. The residual is their difference. Misreading which one the question wants is the most common error.
Computing slope from two data points instead of two line points
Data points scatter around the line — the slope from two random data points isn't the slope of the line. Pick two points on the LINE itself for slope calculation.
Forgetting to check the sign of the slope first
If the line slopes down, the slope is negative. If you eliminate positive-slope answers first, you've usually narrowed it to one or two choices in seconds.
Confusing strong correlation with steep slope
Strong correlation = points tightly clustered around the line. Steep slope = line rises quickly. These are independent. A weak negative correlation can still have a steep line; a strong positive correlation can have a shallow line.
Practice problems
6 problems adapted from College Board released questions and internal Prepiii sets. Click each one to reveal the solution.
1A scatterplot shows the actual y-value at x = 4 is 15. The line of best fit predicts a y-value of 13 at x = 4. What is the residual at x = 4?
- −2
- 2
- 13
- 15
Click to reveal solution →
A scatterplot shows the actual y-value at x = 4 is 15. The line of best fit predicts a y-value of 13 at x = 4. What is the residual at x = 4?
- −2
- 2
- 13
- 15
Click to reveal solution →
Answer: (B) 2
Residual = actual − predicted = 15 − 13 = 2.
2A scatterplot shows that as elevation increases, the high temperature tends to decrease across 8 mountain locations. Which statement best describes the association?
- As elevation increases, temperature tends to increase.
- As elevation increases, temperature tends to decrease.
- As elevation decreases, temperature tends to decrease.
- There is no association.
Click to reveal solution →
A scatterplot shows that as elevation increases, the high temperature tends to decrease across 8 mountain locations. Which statement best describes the association?
- As elevation increases, temperature tends to increase.
- As elevation increases, temperature tends to decrease.
- As elevation decreases, temperature tends to decrease.
- There is no association.
Click to reveal solution →
Answer: (B) As elevation increases, temperature tends to decrease.
The line of best fit slopes downward → negative association → as x (elevation) goes up, y (temperature) goes down. (C) is the same in reverse, but the conventional reading tracks the x-direction.
3A line of best fit has equation y = -4x + 100. At x = 10, the actual data point is at y = 65. What is the residual?
- −5
- 5
- 60
- 65
Click to reveal solution →
A line of best fit has equation y = -4x + 100. At x = 10, the actual data point is at y = 65. What is the residual?
- −5
- 5
- 60
- 65
Click to reveal solution →
Answer: (B) 5
Predicted at x = 10: y = -4(10) + 100 = 60.
Residual = 65 − 60 = 5.
4A scatterplot of bacteria population vs hours shows the data points: (0, 100), (1, 200), (2, 400), (3, 800). Which type of model best fits this data?
- Linear, with slope 100
- Linear, with slope 200
- Exponential, with base 2
- Exponential, with base 4
Click to reveal solution →
A scatterplot of bacteria population vs hours shows the data points: (0, 100), (1, 200), (2, 400), (3, 800). Which type of model best fits this data?
- Linear, with slope 100
- Linear, with slope 200
- Exponential, with base 2
- Exponential, with base 4
Click to reveal solution →
Answer: (C) Exponential, with base 2
Consecutive ratios: 200/100 = 2, 400/200 = 2, 800/400 = 2. Constant ratio → exponential. Base = 2 (population doubles each hour).
Linear would mean equal differences, not equal ratios. The differences here are 100, 200, 400 — not constant, so not linear.
5The line of best fit for a dataset is y = 2.5x + 8. What does the y-intercept 8 represent if x is hours studied and y is exam score?
- The score a student is predicted to get if they study 0 hours.
- The number of additional points a student gets per hour of studying.
- The maximum possible score on the exam.
- The number of hours required to achieve a perfect score.
Click to reveal solution →
The line of best fit for a dataset is y = 2.5x + 8. What does the y-intercept 8 represent if x is hours studied and y is exam score?
- The score a student is predicted to get if they study 0 hours.
- The number of additional points a student gets per hour of studying.
- The maximum possible score on the exam.
- The number of hours required to achieve a perfect score.
Click to reveal solution →
Answer: (A) The score a student is predicted to get if they study 0 hours.
Y-intercept = output when input is 0. At x = 0 (no studying), the predicted score is 8. (B) describes the slope (2.5 points per hour), not the y-intercept.
6A scatterplot shows the cost (y, in dollars) of renting a venue based on the number of guests (x). The line of best fit passes through (20, 500) and (80, 2000). What is the predicted cost for 50 guests?
Click to reveal solution →
A scatterplot shows the cost (y, in dollars) of renting a venue based on the number of guests (x). The line of best fit passes through (20, 500) and (80, 2000). What is the predicted cost for 50 guests?
Click to reveal solution →
Answer: $1,250
Slope: (2000 - 500)/(80 - 20) = 1500/60 = 25 dollars per guest.
Using point-slope through (20, 500): y = 25(x - 20) + 500 = 25x + 0.
At x = 50: y = 25(50) = 1250. Predicted cost: $1,250.
Frequently asked questions
What's the difference between a data point and a predicted value?
+
How do I find the residual on the SAT?
+
How do I tell if data is linear or exponential?
+
What does a strong correlation mean?
+
Do I need to calculate the equation of a line of best fit on the SAT?
+
Keep going
Want unlimited scatterplots & trend lines practice?
Prepiii generates new problems on demand and walks you through your scratchwork. Free to start, no credit card.