Statistical Graphics Using ODS

Interpolation

You can examine the underlying ODS data object to better understand how PROC SGPLOT constructs the fit functions. The following step creates the plot, this time creating an ODS data set:

proc sgplot data=sashelp.gas;
   ods output sgplot=sg;
   pbspline y=nox x=eqratio / group=fuel smooth=0 nknots=5
                              markerattrs=(size=3px) name='a';
   keylegend 'a' / location=inside position=topright across=1;
run;

The next steps create and display a subset of the data:

data subset(drop=SORT_FUEL_RETAIN_ALL_);
   set sg;
   Obs = _n_;
   by PBSPLINE_EQRATIO_NOX_GROUP_S__GP fuel;
   if _N_ gt 169 then do; fuel = '_'; eqratio = ._; nox = ._; end;
   if first.fuel or last.fuel or first.PBSPLINE_EQRATIO_NOX_GROUP_S__GP or
      last.PBSPLINE_EQRATIO_NOX_GROUP_S__GP or obs = 169 then output;
   if lag(first.fuel) or lag(first.PBSPLINE_EQRATIO_NOX_GROUP_S__GP) then do;
      call missing(of PBSPLI: Fuel EqRatio NOx obs);
      if _n_ gt 169 then do; fuel = '_'; eqratio = ._; nox = ._; end;
      output; output; output;
   end;
run;

proc print noobs; id obs; run;

The results are shown in Output 24.6.13.

Output 24.6.13: Grouped Fit Function Data Object

Obs PBSPLINE_EQRATIO_NOX_GROUP_S___X PBSPLINE_EQRATIO_NOX_GROUP_S___Y PBSPLINE_EQRATIO_NOX_GROUP_S__GP Fuel EqRatio NOx
1 0.62500 0.7160 82rongas 82rongas 0.749 4.084
. . .     . .
. . .     . .
. . .     . .
9 0.64692 22.8547 82rongas 82rongas 1.173 0.835
10 0.64966 24.2451 82rongas 94%Eth 0.993 2.593
. . .     . .
. . .     . .
. . .     . .
34 0.71542 11.3943 82rongas 94%Eth 0.674 0.900
35 0.71816 10.4285 82rongas Ethanol 1.152 0.866
. . .     . .
. . .     . .
. . .     . .
124 0.96202 5.0358 82rongas Ethanol 0.693 1.369
125 0.96476 4.9835 82rongas Gasohol 0.645 1.207
. . .     . .
. . .     . .
. . .     . .
137 0.99764 4.1538 82rongas Gasohol 0.712 2.209
138 1.00038 4.0667 82rongas Indolene 1.224 0.537
. . .     . .
. . .     . .
. . .     . .
159 1.05792 2.1655 82rongas Indolene 1.089 1.640
160 1.06066 2.0945 82rongas Methanol 0.598 0.204
. . .     . .
. . .     . .
. . .     . .
169 1.08532 1.6622 82rongas Methanol 1.150 0.934
201 1.17300 0.8350 82rongas _ _ _
202 0.67400 0.9081 94%Eth _ _ _
. . .   _ _ _
. . .   _ _ _
. . .   _ _ _
402 1.26700 0.4740 94%Eth _ _ _
403 0.53500 0.4197 Ethanol _ _ _
. . .   _ _ _
. . .   _ _ _
. . .   _ _ _
603 1.23200 0.6102 Ethanol _ _ _
604 0.64500 1.2262 Gasohol _ _ _
. . .   _ _ _
. . .   _ _ _
. . .   _ _ _
804 1.12500 1.2454 Gasohol _ _ _
805 0.66500 1.5801 Indolene _ _ _
. . .   _ _ _
. . .   _ _ _
. . .   _ _ _
1005 1.22400 0.5371 Indolene _ _ _
1006 0.59800 0.2068 Methanol _ _ _
. . .   _ _ _
. . .   _ _ _
. . .   _ _ _
1206 1.21200 0.7228 Methanol _ _ _


Observations at the beginning and end of data groups are displayed. Missing values (ellipses) are displayed for other values. The first 169 observations contain the scatter plot variables Fuel, EqRatio, and NOx. After that, underscores indicate that those values are ignored. In the actual data set, which is too large to print in this example, observations 170 and beyond are excluded from the scatter plot because of missing values. All 6 times 201 equals 1,206 observations contain interpolated coordinates for the six fit functions. The manufactured variable PBSPLINE_EQRATIO_NOX_GROUP_S__GP contains 201 copies of each of the six fuel values. The other manufactured variables, PBSPLINE_EQRATIO_NOX_GROUP_S___X and PBSPLINE_EQRATIO_NOX_GROUP_S___Y, provide the X and Y coordinates, respectively, for the curve for each fuel group.

The results of the following step (which are not shown) show that 82rongas has only nine values:

proc freq data=sashelp.gas(where=(n(eqratio, nox) eq 2));
   tables fuel;
run;

Interpolation creates 201 interpolated values (200 line segments) from the minimum to the maximum by (maximum – minimum) / 200. You can specify the MAXPOINTS= option in the REG and PBSPLINE statements to change the number of interpolated values. Interpolation enables splines like the ones in Output 24.6.6 and Output 24.6.12—splines that have too many knots and too few values—to vary substantially from the original data. The automatic smoothing in penalized B-splines often prevents this variation from happening, but not always. In most cases, it is good that ODS Graphics automatically interpolates, but not always. In the next section, you will see examples that do not use interpolation.

Last updated: December 09, 2022