Appendix B: Sashelp Data Sets
The Sashelp.JunkMail data set comes from a study that classifies whether an email is junk email (coded as 1) or not (coded as 0). The data were collected in Hewlett-Packard labs and donated by George Forman. The data set contains 4,601 observations with 59 variables. The response variable is a binary indicator of whether an email is considered spam or not. There are 57 predictor variables that record frequencies of some common words and characters and lengths of uninterrupted sequences of capital letters in emails. The following steps display information about the Sashelp.JunkMail data set and create Figure 56:
title 'Junk Email Data';
proc contents data=sashelp.JunkMail varnum;
ods select position;
run;
title 'The First Five Observations Out of 4,601';
proc print data=sashelp.JunkMail(obs=5);
run;
Figure 56: Junk Email Data
| Test |
Num |
8 |
0 - Training, 1 - Test |
| Make |
Num |
8 |
|
| Address |
Num |
8 |
|
| All |
Num |
8 |
|
| _3D |
Num |
8 |
3D |
| Our |
Num |
8 |
|
| Over |
Num |
8 |
|
| Remove |
Num |
8 |
|
| Internet |
Num |
8 |
|
| Order |
Num |
8 |
|
| Mail |
Num |
8 |
|
| Receive |
Num |
8 |
|
| Will |
Num |
8 |
|
| People |
Num |
8 |
|
| Report |
Num |
8 |
|
| Addresses |
Num |
8 |
|
| Free |
Num |
8 |
|
| Business |
Num |
8 |
|
| Email |
Num |
8 |
|
| You |
Num |
8 |
|
| Credit |
Num |
8 |
|
| Your |
Num |
8 |
|
| Font |
Num |
8 |
|
| _000 |
Num |
8 |
000 |
| Money |
Num |
8 |
|
| HP |
Num |
8 |
|
| HPL |
Num |
8 |
|
| George |
Num |
8 |
|
| _650 |
Num |
8 |
650 |
| Lab |
Num |
8 |
|
| Labs |
Num |
8 |
|
| Telnet |
Num |
8 |
|
| _857 |
Num |
8 |
857 |
| Data |
Num |
8 |
|
| _415 |
Num |
8 |
415 |
| _85 |
Num |
8 |
85 |
| Technology |
Num |
8 |
|
| _1999 |
Num |
8 |
1999 |
| Parts |
Num |
8 |
|
| PM |
Num |
8 |
|
| Direct |
Num |
8 |
|
| CS |
Num |
8 |
|
| Meeting |
Num |
8 |
|
| Original |
Num |
8 |
|
| Project |
Num |
8 |
|
| RE |
Num |
8 |
|
| Edu |
Num |
8 |
|
| Table |
Num |
8 |
|
| Conference |
Num |
8 |
|
| Semicolon |
Num |
8 |
|
| Paren |
Num |
8 |
|
| Bracket |
Num |
8 |
|
| Exclamation |
Num |
8 |
|
| Dollar |
Num |
8 |
|
| Pound |
Num |
8 |
|
| CapAvg |
Num |
8 |
Capital Run Length Average |
| CapLong |
Num |
8 |
Capital Run Length Longest |
| CapTotal |
Num |
8 |
Capital Run Length Total |
| Class |
Num |
8 |
0 - Not Junk, 1 - Junk |
| 1 |
0.00 |
0.64 |
0.64 |
0 |
0.32 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
0.64 |
0.00 |
0.00 |
0.00 |
0.32 |
0.00 |
1.29 |
1.93 |
0.00 |
0.96 |
0 |
0.00 |
0.00 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.00 |
0 |
0 |
0.00 |
0 |
0 |
0.00 |
0 |
0.00 |
0.00 |
0 |
0 |
0.00 |
0.000 |
0 |
0.778 |
0.000 |
0.000 |
3.756 |
61 |
278 |
1 |
| 0 |
0.21 |
0.28 |
0.50 |
0 |
0.14 |
0.28 |
0.21 |
0.07 |
0.00 |
0.94 |
0.21 |
0.79 |
0.65 |
0.21 |
0.14 |
0.14 |
0.07 |
0.28 |
3.47 |
0.00 |
1.59 |
0 |
0.43 |
0.43 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.07 |
0 |
0 |
0.00 |
0 |
0 |
0.00 |
0 |
0.00 |
0.00 |
0 |
0 |
0.00 |
0.132 |
0 |
0.372 |
0.180 |
0.048 |
5.114 |
101 |
1028 |
1 |
| 1 |
0.06 |
0.00 |
0.71 |
0 |
1.23 |
0.19 |
0.19 |
0.12 |
0.64 |
0.25 |
0.38 |
0.45 |
0.12 |
0.00 |
1.75 |
0.06 |
0.06 |
1.03 |
1.36 |
0.32 |
0.51 |
0 |
1.16 |
0.06 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.00 |
0 |
0 |
0.06 |
0 |
0 |
0.12 |
0 |
0.06 |
0.06 |
0 |
0 |
0.01 |
0.143 |
0 |
0.276 |
0.184 |
0.010 |
9.821 |
485 |
2259 |
1 |
| 0 |
0.00 |
0.00 |
0.00 |
0 |
0.63 |
0.00 |
0.31 |
0.63 |
0.31 |
0.63 |
0.31 |
0.31 |
0.31 |
0.00 |
0.00 |
0.31 |
0.00 |
0.00 |
3.18 |
0.00 |
0.31 |
0 |
0.00 |
0.00 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.00 |
0 |
0 |
0.00 |
0 |
0 |
0.00 |
0 |
0.00 |
0.00 |
0 |
0 |
0.00 |
0.137 |
0 |
0.137 |
0.000 |
0.000 |
3.537 |
40 |
191 |
1 |
| 0 |
0.00 |
0.00 |
0.00 |
0 |
0.63 |
0.00 |
0.31 |
0.63 |
0.31 |
0.63 |
0.31 |
0.31 |
0.31 |
0.00 |
0.00 |
0.31 |
0.00 |
0.00 |
3.18 |
0.00 |
0.31 |
0 |
0.00 |
0.00 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.00 |
0 |
0 |
0.00 |
0 |
0 |
0.00 |
0 |
0.00 |
0.00 |
0 |
0 |
0.00 |
0.135 |
0 |
0.135 |
0.000 |
0.000 |
3.537 |
40 |
191 |
1 |
Last updated: December 09, 2022