You can examine the underlying ODS data object to better understand how PROC SGPLOT constructs the fit functions. The following step creates the plot, this time creating an ODS data set:
proc sgplot data=sashelp.gas;
ods output sgplot=sg;
pbspline y=nox x=eqratio / group=fuel smooth=0 nknots=5
markerattrs=(size=3px) name='a';
keylegend 'a' / location=inside position=topright across=1;
run;
The next steps create and display a subset of the data:
data subset(drop=SORT_FUEL_RETAIN_ALL_);
set sg;
Obs = _n_;
by PBSPLINE_EQRATIO_NOX_GROUP_S__GP fuel;
if _N_ gt 169 then do; fuel = '_'; eqratio = ._; nox = ._; end;
if first.fuel or last.fuel or first.PBSPLINE_EQRATIO_NOX_GROUP_S__GP or
last.PBSPLINE_EQRATIO_NOX_GROUP_S__GP or obs = 169 then output;
if lag(first.fuel) or lag(first.PBSPLINE_EQRATIO_NOX_GROUP_S__GP) then do;
call missing(of PBSPLI: Fuel EqRatio NOx obs);
if _n_ gt 169 then do; fuel = '_'; eqratio = ._; nox = ._; end;
output; output; output;
end;
run;
proc print noobs; id obs; run;
The results are shown in Output 24.6.13.
Output 24.6.13: Grouped Fit Function Data Object
| Obs | PBSPLINE_EQRATIO_NOX_GROUP_S___X | PBSPLINE_EQRATIO_NOX_GROUP_S___Y | PBSPLINE_EQRATIO_NOX_GROUP_S__GP | Fuel | EqRatio | NOx |
|---|---|---|---|---|---|---|
| 1 | 0.62500 | 0.7160 | 82rongas | 82rongas | 0.749 | 4.084 |
| . | . | . | . | . | ||
| . | . | . | . | . | ||
| . | . | . | . | . | ||
| 9 | 0.64692 | 22.8547 | 82rongas | 82rongas | 1.173 | 0.835 |
| 10 | 0.64966 | 24.2451 | 82rongas | 94%Eth | 0.993 | 2.593 |
| . | . | . | . | . | ||
| . | . | . | . | . | ||
| . | . | . | . | . | ||
| 34 | 0.71542 | 11.3943 | 82rongas | 94%Eth | 0.674 | 0.900 |
| 35 | 0.71816 | 10.4285 | 82rongas | Ethanol | 1.152 | 0.866 |
| . | . | . | . | . | ||
| . | . | . | . | . | ||
| . | . | . | . | . | ||
| 124 | 0.96202 | 5.0358 | 82rongas | Ethanol | 0.693 | 1.369 |
| 125 | 0.96476 | 4.9835 | 82rongas | Gasohol | 0.645 | 1.207 |
| . | . | . | . | . | ||
| . | . | . | . | . | ||
| . | . | . | . | . | ||
| 137 | 0.99764 | 4.1538 | 82rongas | Gasohol | 0.712 | 2.209 |
| 138 | 1.00038 | 4.0667 | 82rongas | Indolene | 1.224 | 0.537 |
| . | . | . | . | . | ||
| . | . | . | . | . | ||
| . | . | . | . | . | ||
| 159 | 1.05792 | 2.1655 | 82rongas | Indolene | 1.089 | 1.640 |
| 160 | 1.06066 | 2.0945 | 82rongas | Methanol | 0.598 | 0.204 |
| . | . | . | . | . | ||
| . | . | . | . | . | ||
| . | . | . | . | . | ||
| 169 | 1.08532 | 1.6622 | 82rongas | Methanol | 1.150 | 0.934 |
| 201 | 1.17300 | 0.8350 | 82rongas | _ | _ | _ |
| 202 | 0.67400 | 0.9081 | 94%Eth | _ | _ | _ |
| . | . | . | _ | _ | _ | |
| . | . | . | _ | _ | _ | |
| . | . | . | _ | _ | _ | |
| 402 | 1.26700 | 0.4740 | 94%Eth | _ | _ | _ |
| 403 | 0.53500 | 0.4197 | Ethanol | _ | _ | _ |
| . | . | . | _ | _ | _ | |
| . | . | . | _ | _ | _ | |
| . | . | . | _ | _ | _ | |
| 603 | 1.23200 | 0.6102 | Ethanol | _ | _ | _ |
| 604 | 0.64500 | 1.2262 | Gasohol | _ | _ | _ |
| . | . | . | _ | _ | _ | |
| . | . | . | _ | _ | _ | |
| . | . | . | _ | _ | _ | |
| 804 | 1.12500 | 1.2454 | Gasohol | _ | _ | _ |
| 805 | 0.66500 | 1.5801 | Indolene | _ | _ | _ |
| . | . | . | _ | _ | _ | |
| . | . | . | _ | _ | _ | |
| . | . | . | _ | _ | _ | |
| 1005 | 1.22400 | 0.5371 | Indolene | _ | _ | _ |
| 1006 | 0.59800 | 0.2068 | Methanol | _ | _ | _ |
| . | . | . | _ | _ | _ | |
| . | . | . | _ | _ | _ | |
| . | . | . | _ | _ | _ | |
| 1206 | 1.21200 | 0.7228 | Methanol | _ | _ | _ |
Observations at the beginning and end of data groups are displayed. Missing values (ellipses) are displayed for other values. The first 169 observations contain the scatter plot variables Fuel, EqRatio, and NOx. After that, underscores indicate that those values are ignored. In the actual data set, which is too large to print in this example, observations 170 and beyond are excluded from the scatter plot because of missing values. All observations contain interpolated coordinates for the six fit functions. The manufactured variable
PBSPLINE_EQRATIO_NOX_GROUP_S__GP contains 201 copies of each of the six fuel values. The other manufactured variables, PBSPLINE_EQRATIO_NOX_GROUP_S___X and PBSPLINE_EQRATIO_NOX_GROUP_S___Y, provide the X and Y coordinates, respectively, for the curve for each fuel group.
The results of the following step (which are not shown) show that 82rongas has only nine values:
proc freq data=sashelp.gas(where=(n(eqratio, nox) eq 2));
tables fuel;
run;
Interpolation creates 201 interpolated values (200 line segments) from the minimum to the maximum by (maximum – minimum) / 200. You can specify the MAXPOINTS= option in the REG and PBSPLINE statements to change the number of interpolated values. Interpolation enables splines like the ones in Output 24.6.6 and Output 24.6.12—splines that have too many knots and too few values—to vary substantially from the original data. The automatic smoothing in penalized B-splines often prevents this variation from happening, but not always. In most cases, it is good that ODS Graphics automatically interpolates, but not always. In the next section, you will see examples that do not use interpolation.