The fractional hot-deck imputation (FHDI) method uses multiple donor units for a recipient unit. Each donor donates a fraction of the original weight of the recipient unit such that the sum of the fractional weights from all the donors is equal to the original weight of the recipient. The fraction of the recipient weight that a donor unit contributes to the recipient unit is known as the fractional weight. The donors are selected by using probability proportional to size (PPS) selection in which the two-stage FEFI weights are used as the size measure.
FHDI is useful for reducing the size of the imputed data when two-stage FEFI creates many imputed rows. FHDI follows the same imputation steps as those of two-stage FEFI, but FHDI selects a subset of second-stage donor cells from all possible second-stage donor cells for the imputation.
Similar to two-stage FEFI, variables that have many unique observed levels are grouped into imputation bins. The first imputation stage is performed for all categorical variables by using the FEFI method. The categorical variables include the character variables, the CLASS variables that you also specify in the VAR statement, and the variables that contain the imputation bins of the continuous variables.
The second imputation stage is performed for the continuous variables within each first-stage donor cell. Observations that contain missing values for any of the continuous items are considered to be the recipients, and observations that contain observed values for all items are the donors. The second-stage donor cells are defined by the unique vectors of the observed values for the continuous variables within the first-stage donor cells.
If you specify NDONORS=, then
second-stage donor cells are selected within each first-stage donor cell, provided that more than
second-stage donor cells are available for the continuous variables in that donor cell. Second-stage donor cells are selected within each first-stage donor cell by using PPS selection with replacement, where the fractional weights of the second-stage donor cells are used as the size measures. If the number of second-stage donor cells in a first-stage donor cell is less than or equal to
, then all available second-stage donor cells are used to impute the missing values for the continuous variables in that first-stage donor cell.
If no second-stage selection is performed, then the second-stage fractional weights are the same as the fractional weights from two-stage FEFI. If a second-stage selection is performed, then the second-stage fractional weights are computed by multiplying the first-stage fractional weights by the number of times a second-stage donor cell is selected divided by the second-stage sample size.
Imputation-adjusted replicate weights are computed by repeating both the first-stage and second-stage imputation in every replicate sample independently.
Selection of donor cells is not repeated in the creation of replicate samples. The donor cells that are selected in the full sample are retained in all replicate samples, and the replicate weights are adjusted to compensate for the selection of donor cells in both stages. If no selection is performed in either stage in the full sample (FEFI is used in both stages), then the replicate weights are not adjusted further.
The method is similar to Im, Kim, and Fuller (2015).
For more information about replication-weight adjustment, see the section Replicate Weight Adjustments for FHDI.
Suppose you want to impute P items jointly. Let be the response for
items in unit i, and let
be the response for
items in unit i, where
. Let
be categorical with
levels for item j, and let
be continuous. Further assume that
contains the discretized levels (imputation bins) for
, where
has
levels. Define
. Then
is categorical and has
levels for item j. Denote
as the observed part and
as the missing part of
.
Let be the population proportion that falls in category
. FEFI computes the fractional weights by using conditional probabilities of observing an imputed value in the data in which the observed levels for the nonmissing items in the recipient unit are equal to the observed levels for the same items in the donor units. For example, consider observation unit i in which items
and
are missing and items
,
,
, and
are observed. The initial fractional weight for imputing
with
is computed by using the estimated conditional probability of observing
when
in the complete data. The initial conditional probabilities are estimated by
where
is the estimated joint probability, is an indicator function,
is the sampling weight for observation unit i, and
is the set of indices for observation units without any missing items.
The first-stage FEFI replaces the missing items in by using observed values from all donor cells for observation unit i. The
th imputed value
uses the observed values from the
th donor cell (realization), where
ranges from 1 to
.
The second-stage FEFI imputes the missing values in the continuous variables where
. The second-stage imputation is conditional on the imputed levels from the first-stage imputation. For observation unit i and first-stage donor cell
, the number of second-stage donor cells is equal to the number of unique combinations of the observed levels for the missing items in
among the responding units in the first-stage donor cell
. Unlike the first-stage imputation, the second-stage imputation does not depend on the observed levels for the nonmissing items in
.
The second-stage fractional weights are computed by using the estimated probability of observing an imputed value for continuous variables conditional on the first-stage imputed level. For example, consider observation unit i in which items and
are missing. Assume that the
th first-stage donor cell containing observation units that have
,
,
, and
provides the
th imputed value for the first-stage imputation for observation unit i. The second-stage fractional weight for imputing
with
in the first-stage donor cell
is computed by using the estimated conditional probability of observing
by using the nonmissing units in the
th first-stage donor cell.
The conditional probabilities are estimated by
The second-stage FEFI replaces the missing values in when
by using all observed values for
from observation rows that contain
. Each second-stage imputed value (configuration) defines a second-stage donor cell. The
th imputed value in the
th first-stage donor cell is
, where
ranges from 1 to
, and the
th first-stage donor cell is defined by observation units that contain
, …,
.
The two-stage imputation-adjusted fractional weights are computed by multiplying the first-stage fractional weights by the second-stage fractional weights. The sampling weight for observation unit i is denoted by . Let
be the first-stage fractional weight for donor cell l, and let
be the first-stage fractional weight for donor cell l at the tth EM iteration. Let
be the second-stage fractional weight, and let
be the two-stage fractional weight for first-stage donor cell
and second-stage donor cell
. Thus, the imputation-adjusted weight for observation unit i, first-stage donor cell
, and second-stage donor cell
is
.
Replicate weights are adjusted for imputation by applying both the first-stage and the second-stage imputation in every replicate sample. Let be the unadjusted replicate weight for observation unit i in replicate sample r. The unadjusted replicate weights are created by using a replication method such as the bootstrap, BRR, or delete-1 jackknife. For more information about how the unadjusted replicate weights are created, see sections Unadjusted Bootstrap Replicate Weights, Unadjusted BRR Replicate Weights, and Unadjusted Jackknife Replicate Weights. Let
be the first-stage fractional replicate weight,
be the second-stage fractional replicate weight, and
be the two-stage fractional replicate weight for replicate sample r, observation unit i, first-stage donor cell
, and second-stage donor cell
. Thus, the imputation-adjusted replicate weight for observation unit i, first-stage donor cell
, second-stage donor cell
, and replicate sample r is
.
The following assumptions are necessary:
The conditional probability of observing an imputed value in the missing data is the same as the conditional probability of observing the value in the observed data in each imputation cell. For example, the conditional probability, , is the same for the observed data as it is for the data where
and
are missing.
The conditional probability of observing a second-stage imputed value in the missing data is the same as the conditional probability of observing the value in the observed data in every first-stage donor cell. For example, the conditional probability in the first-stage donor cell ,
, is the same for the observed data as it is for the data in which
and
are missing and
.
For every observation unit that contains missing items, at least one realization for the missing items is available in the complete data; otherwise the observation is not imputed.
For variance estimation, at least two realizations for the missing items from two different PSUs must be available in the complete data for every observation unit that contains missing items. If the condition is not satisfied, the variance due to imputing the missing items in that observation is ignored.
The FHDI method first computes the fully efficient fractional weights by using an EM-by-weighting algorithm like that of Kim and Fuller (2013) to impute the missing values in . The missing values in
are imputed in the second-stage imputation. For the second-stage imputation, FEFI weights to impute
are computed independently in every imputed level of
, where
is the number of first-stage donor cells. If the number of second-stage donor cells is greater than
(where
is the value of the NDONORS= option in the PROC SURVEYIMPUTE statement), then
second-stage donor cells are selected by using PPS sampling in which the second-stage fractional weights are used as the size measure.
The following steps describe the FHDI technique. If you do not use the CELL statement to specify imputation cells, PROC SURVEYIMPUTE uses the entire data set as one imputation cell. If you specify imputation cells, then all the probabilities are computed by using observations from the same imputation cell as the recipient unit. To simplify notation, subscripts are not used for imputation cells in the following description. Imputation cells are defined for the first-stage imputation. Steps 1 to 5 describe the two-stage FEFI. Step 6 describes donor selection for FHDI.
Initialization: For each observation that has missing items, determine the number of first-stage donor cells. The first-stage donor cells are determined by using the number of unique combinations of observed levels in for imputing the missing items in
. Only the responding units in the imputation cell are used to determine the number of first-stage donor cells. Compute the initial fractional weight from donor cell l to unit i,
:
where is the number of first-stage donor cells and
The sum of the fractional weights over all the donor cells is 1 for every observation unit; that is, for all i. The lth imputed row for unit i is created by keeping the observed items unchanged, replacing the missing items with the observed levels from the lth donor cell, and computing the fractional weight by
. Only the complete observations (observations that have no missing items) are used to compute the fractional weights in this step. If unit i has no missing items, then
. The initial FEFI data set contains all the observed units, the imputed rows for observations that have missing items, and the corresponding fractional weights.
M-step: The tth maximization step (M-step) computes the joint probabilities by using the fractional weights from the (t–1)th expectation-step,
for all i, all l, and . Note that for
,
uses all observation units, including observations that have missing items and are imputed in the initialization step.
E-step: The tth expectation step (E-step) computes the fractional weights by using the joint probabilities from the tth M-step. The tth fractional weight for unit i and donor cell l is given by
Repetition: The expectation maximization steps (EM-steps, step 2 and 3) are repeated for until the changes in fractional weights over all observation units between two successive EM-steps are negligible or the maximum number of EM repetitions is reached.
The maximum absolute difference convergence criterion, , at step t is defined as
The maximum absolute relative difference convergence criterion, , at step t is defined as
Second-stage imputation: The second-stage imputation replaces the missing values in the continuous variables by using the observed values within each selected first-stage donor cell. This step is similar to step 1 but is applied in order to impute the continuous variables.
For a particular observation unit i, let be the
th donor cell from the first-stage imputation, where
ranges from 1 to
. For each observation unit, i, the possible number of second-stage donor cells is equal to the number of unique combinations of the observed levels for the missing items in
from the responding units in the first-stage donor cell
.
Let be the population proportion that falls in category
. Assume that it is possible to estimate the population categories from the observed sample. For example, the conditional probability,
, is the same for the observed data as it is for the data in which
are missing. The conditional probabilities are estimated by
where
is the estimated joint probability, is an indicator function, and
is the observation weight for unit i.
Let be all the observed combinations of
in the sample. Let
be the lth realization of
in the sample. You must assume that at least one realization is available; otherwise, missing values in the continuous items for the observation are not imputed.
Compute the second-stage fractional weight from the second-stage donor cell conditional on the first-stage donor cell
for unit i,
, by
where is the number of second-stage donor cells and
The sum of the second-stage fractional weights over all second-stage donor cells is 1 for every observation unit; that is, for all
and i. The
th second-stage imputed row in the
th first-stage imputed row for unit i is created by keeping the observed items unchanged, replacing the missing items in
with the observed values from the
th second-stage donor cell, and computing the two-stage fractional weight by
, where
is the first-stage fractional weight for the first-stage donor cell
. The maximum number of donor cells for unit i is
. Only the complete observations are used to compute the second-stage fractional weights.
Second-stage selection: Because all observed levels are used as the imputed values for missing items in the continuous variables in the previous step, the number of second-stage donor cells, , is usually large. Suppose you want to use only
second-stage donor cells (where
is the value of the NDONORS= option in the PROC SURVEYIMPUTE statement). The second-stage selection chooses
second-stage donor cells for every first-stage donor cell.
The selection step selects a random sample of second-stage donor cells of size for every recipient unit and every first-stage donor cell in which the number of second-stage donor cells is greater than
. Consider the set of all observations in which the values of the observed items are the same and they all have the same missing items. The index set
contains the indices for all units that have the same missing pattern and the same observed values in the nonmissing items. Let
be the number of units in
. Note that the number of donor cells,
, and the fractional weights for all
recipient units indexed by
are the same. If
, then select a PPS sample with replacement of
donor cells from
donor cells by using the two-stage fractional weights of the donor cells as the size measure.
The donor selection algorithm can be described as follows:
Sort the second-stage donor cells by observed values of within each first-stage donor cell.
Compute the cumulative sums of the fractional weights for the second-stage donor cells, and normalize the cumulative sums to 1.
Select a random number between 0 and 1, and divide it by the sample size, .
Select a PPS sample with replacement that starts from the scaled random number and uses as the step length. For more information, see Särndal, Swensson, and Wretman (1992, p. 97).
Use a random permutation to sort recipient units in a random order.
Distribute the selected donor cells to the recipient units in a cyclic manner, such that the first selected donor cell is allocated to the first recipient unit, the second selected donor cell is allocated to the second recipient unit, the th selected donor cell is allocated to the first recipient unit, the
th selected donor cell is allocated to the second recipient unit, and so on. Because of the random sorting in the previous step, the first recipient unit after the sorting is not the same as the first recipient unit in the input data.
The second-stage fractional weights, , are used as the size measure in the PPS sampling. Only one PPS sampling is performed for all recipient units in one first-stage donor cell.
Alternatively, if you specify METHOD=FHDI(SELECTION=PPSPEROBS), then independent selection is performed for every recipient unit and every first-stage donor cell in which the number of second-stage donor cells is greater than . Thus the procedure selects
independent PPS samples with replacement of size
for
recipient units.
The imputation-adjusted weights for the second-stage donor cells are equal to for observation units when no second-stage selection is performed and equal to
for observation units when the second-stage selection is performed, where
is the number of times the second-stage donor cell
is selected. Combining the first-stage and the second-stage imputation, the total number of donor cells for an observation unit is greater than or equal to
and less than
.
Replicate weight adjustments: The unadjusted replicate weights are created as if all rows of data were observed data by using a replication procedure such as the delete-1 jackknife, BRR, or bootstrap. The unadjusted replicate weights for rows that contain imputed data are then adjusted for two-stage FEFI by replicating all steps for two-stage FEFI for each replicate sample.
Replication weights for FHDI rows are further adjusted to account for the selection of second-stage donor cells. Although every step in the two-stage FEFI procedure is applied in each replicate sample, the selection of second-stage donor cells is not repeated in FHDI. The fractional weights are adjusted instead.
For available adjustments in the replicate weights, see the section Replicate Weight Adjustments for FHDI.
The small data set shown in Figure 15 is used to illustrate the FHDI technique. The data set contains 18 observation units, and each unit has four items (X, CX, Y, and CY). The variable Unit contains the observation identification. Variables CX and CY contains the imputation bins for variables X and Y, respectively. In this example, X and CX are missing for units 14 and 18, and Y and CY are missing for units 5 and 18.
Figure 15: Sample Data with Missing Items
| Unit | X | CX | Y | CY |
|---|---|---|---|---|
| 1 | 0.3 | 0 | -0.54 | 0 |
| 2 | 0.2 | 0 | -0.77 | 0 |
| 3 | 1.7 | 0 | -0.59 | 0 |
| 4 | 1.7 | 0 | -0.59 | 0 |
| 5 | 1.0 | 0 | . | . |
| 6 | 1.8 | 0 | -0.03 | 1 |
| 7 | 2.0 | 0 | 0.95 | 1 |
| 8 | 1.9 | 0 | 0.78 | 1 |
| 9 | 6.7 | 1 | -0.15 | 0 |
| 10 | 6.0 | 1 | -1.01 | 0 |
| 11 | 3.3 | 1 | -1.86 | 0 |
| 12 | 7.3 | 1 | -0.21 | 0 |
| 13 | 6.7 | 1 | 0.80 | 1 |
| 14 | . | . | 1.23 | 1 |
| 15 | 2.9 | 1 | 0.65 | 1 |
| 16 | 9.6 | 1 | 0.95 | 1 |
| 17 | 10.0 | 1 | 0.13 | 1 |
| 18 | . | . | . | . |
The following statements request joint imputation of X and Y by using the FHDI method. The two CLEVVAR= options specify variables CX and CY to contain the imputation bins for variables X and Y, respectively. These statements also request imputation-adjusted replicate weights for the jackknife replication method. The OUTPUT statement stores the imputed values in the data set ImputedD3 and stores the jackknife coefficients in the data set OJKC. The FRACTIONALWEIGHTS= option in the OUTPUT statement saves the fractional weights in the ImputedD3 data set. The SEED= option specifies a seed for the random number generator, and the NDONORS=3 option requests three second-stage donor cells for each first-stage FEFI level.
proc surveyimpute data=Example method=fhdi seed=8943028 ndonors=3;
var X (clevvar=CX) Y (clevvar=CY);
output out=ImputedD3 fractionalweights=FracWt outjkcoefs=OJKC;
run;
The procedure first imputes the missing values in CX, CY, X, and Y by using the two-stage FEFI, as described in the section Example of Two-Stage FEFI. The two-stage FEFI imputed values along with the imputation-adjusted weights and the imputation-adjusted replicate weights are shown in Figure 14.
The FHDI selects three second-stage donor cells in each first-stage FEFI cell. The imputed data set after the FHDI is displayed in Figure 16.
Figure 16: Fractional Hot-Deck Imputation Using Three Donors
| Unit | ImpIndex | ImpWt | FracWt | X | CX | Y | CY |
|---|---|---|---|---|---|---|---|
| 1 | 0 | 1.0000 | 1.0000 | 0.3 | 0 | -0.54 | 0 |
| 2 | 0 | 1.0000 | 1.0000 | 0.2 | 0 | -0.77 | 0 |
| 3 | 0 | 1.0000 | 1.0000 | 1.7 | 0 | -0.59 | 0 |
| 4 | 0 | 1.0000 | 1.0000 | 1.7 | 0 | -0.59 | 0 |
| 5 | 1 | 0.1340 | 0.1340 | 1.0 | 0 | -0.77 | 0 |
| 5 | 2 | 0.1340 | 0.1340 | 1.0 | 0 | -0.54 | 0 |
| 5 | 3 | 0.2680 | 0.2680 | 1.0 | 0 | -0.59 | 0 |
| 5 | 4 | 0.1547 | 0.1547 | 1.0 | 0 | -0.03 | 1 |
| 5 | 5 | 0.1547 | 0.1547 | 1.0 | 0 | 0.78 | 1 |
| 5 | 6 | 0.1547 | 0.1547 | 1.0 | 0 | 0.95 | 1 |
| 6 | 0 | 1.0000 | 1.0000 | 1.8 | 0 | -0.03 | 1 |
| 7 | 0 | 1.0000 | 1.0000 | 2.0 | 0 | 0.95 | 1 |
| 8 | 0 | 1.0000 | 1.0000 | 1.9 | 0 | 0.78 | 1 |
| 9 | 0 | 1.0000 | 1.0000 | 6.7 | 1 | -0.15 | 0 |
| 10 | 0 | 1.0000 | 1.0000 | 6.0 | 1 | -1.01 | 0 |
| 11 | 0 | 1.0000 | 1.0000 | 3.3 | 1 | -1.86 | 0 |
| 12 | 0 | 1.0000 | 1.0000 | 7.3 | 1 | -0.21 | 0 |
| 13 | 0 | 1.0000 | 1.0000 | 6.7 | 1 | 0.80 | 1 |
| 14 | 1 | 0.1547 | 0.1547 | 1.8 | 0 | 1.23 | 1 |
| 14 | 2 | 0.1547 | 0.1547 | 1.9 | 0 | 1.23 | 1 |
| 14 | 3 | 0.1547 | 0.1547 | 2.0 | 0 | 1.23 | 1 |
| 14 | 4 | 0.1787 | 0.1787 | 2.9 | 1 | 1.23 | 1 |
| 14 | 5 | 0.1787 | 0.1787 | 6.7 | 1 | 1.23 | 1 |
| 14 | 6 | 0.1787 | 0.1787 | 10.0 | 1 | 1.23 | 1 |
| 15 | 0 | 1.0000 | 1.0000 | 2.9 | 1 | 0.65 | 1 |
| 16 | 0 | 1.0000 | 1.0000 | 9.6 | 1 | 0.95 | 1 |
| 17 | 0 | 1.0000 | 1.0000 | 10.0 | 1 | 0.13 | 1 |
| 18 | 1 | 0.0667 | 0.0667 | 0.2 | 0 | -0.77 | 0 |
| 18 | 2 | 0.0667 | 0.0667 | 0.3 | 0 | -0.54 | 0 |
| 18 | 3 | 0.1334 | 0.1334 | 1.7 | 0 | -0.59 | 0 |
| 18 | 4 | 0.0770 | 0.0770 | 1.8 | 0 | -0.03 | 1 |
| 18 | 5 | 0.0770 | 0.0770 | 1.9 | 0 | 0.78 | 1 |
| 18 | 6 | 0.0770 | 0.0770 | 2.0 | 0 | 0.95 | 1 |
| 18 | 7 | 0.0784 | 0.0784 | 3.3 | 1 | -1.86 | 0 |
| 18 | 8 | 0.0784 | 0.0784 | 6.7 | 1 | -0.15 | 0 |
| 18 | 9 | 0.0784 | 0.0784 | 7.3 | 1 | -0.21 | 0 |
| 18 | 10 | 0.0889 | 0.0889 | 6.7 | 1 | 0.80 | 1 |
| 18 | 11 | 0.0889 | 0.0889 | 9.6 | 1 | 0.95 | 1 |
| 18 | 12 | 0.0889 | 0.0889 | 10.0 | 1 | 0.13 | 1 |
The FHDI is described as follows:
Observation unit 1 has no missing value. Therefore, the ImpIndex value is 0; the FracWt value is 1; and the values of X, CX, Y, and CY are the same as the observed values for observation unit 1 in Figure 16. Because all observation units have a weight of 1, the fractional weights (FracWt) and the imputation-adjusted weights (ImpWt) are the same for all rows.
Observation unit 5 has missing values in Y and CY. The variable CY has two imputed levels (0 and 1) from the first-stage imputation. The observed level of CX for observation unit 5 is 0.
There are six rows for observation unit 5 after two-stage FEFI (Figure 14). ImpIndex values 1 to 3 contain imputed values when CY=0, and ImpIndex values 4 to 6 contain imputed values when CY=1. Because there are only three rows for each imputed level of CY and NDONORS=3 is specified in the PROC SURVEYIMPUTE statement, no selection is performed for FHDI for observation unit 5. Thus, all six imputed rows from FHDI (Figure 16) are the same as the six imputed rows from two-stage FEFI for observation unit 5 (Figure 14).
Observation unit 14 has missing values in X and CX. The variable CX has two imputed levels (0 and 1) from the first-stage imputation. The observed level of CY for observation unit 14 is 1. There are seven rows for observation unit 14 after two-stage FEFI (Figure 14).
ImpIndex values 1 to 3 for observation unit 14 contain imputed values for X when CX=0. Because there are only three rows for CX=0 and NDONORS=3 is specified in the PROC SURVEYIMPUTE statement, no selection is performed for FHDI for observation unit 14 and CX=0.
ImpIndex values 4 to 7 for observation unit 14 contain imputed values for X when CX=1. Because the number of imputed rows for observation unit 14 and CX=1 is greater than 3, the procedure selects three donor cells by using a PPS sample with replacement. Donor cells that have imputed X values of 2.9, 6.7, and 10.0 are selected. Each row is assigned a second-stage fractional weight of 1/3. The second-stage fractional weight is then multiplied by the first-stage FEFI weight (0.54) for observation unit 14 when CX=1 to obtain the FHDI weight.
Although three donor cells are selected for observation unit 14 when CX=0, the procedure selects six donor cells from four donor cells for observation units 14 and 18 together by using one PPS sample with replacement. Note that observation unit 14 for ImpIndex values 4 to 7 and observation unit 18 for ImpIndex values 11 to 14 have the same levels for the first-stage imputation variables CX (=1) and CY (=1).
Thus, there are six rows for observation unit 14 after FHDI in Figure 16.
Observation unit 18 has missing values in all four variables, X, CX, Y, and CY. The variable CX has two imputed levels (0 and 1) and the variable CY has two imputed levels (0 and 1) from the first-stage imputation. There are 14 rows for observation unit 18 after two-stage FEFI (Figure 14).
ImpIndex values 1 to 3 for observation unit 18 after two-stage FEFI contain three imputed values for X and Y when CX=0 and CY=0 (Figure 14). Because there are only three imputed rows for CX=0 and CY=0 and NDONORS=3 is specified in the PROC SURVEYIMPUTE statement, no selection is performed for FHDI for observation unit 18 when CX=0 and CY=0. Similarly, no selection is performed for FHDI for observation unit 18 when CX=0 and CY=1. Thus, all rows for observation unit 18 and ImpIndex values 1 to 6 from FHDI (Figure 16) are the same as the rows for observation unit 18 and ImpIndex values 1 to 6 from two-stage FEFI (Figure 14).
Three donor cells are selected from four donor cells for observation unit 18 and ImpIndex values 7 to 10, where CX=1 and CY=0. A PPS sample with replacement is used, where the fractional weights of the donor cells are used as the size measure. The selected donor cells are (3.3, –1.86), (6.7, –0.15), and (7.3, –0.21) for (X, Y). Each row is assigned a second-stage fractional weight of 1/3. The second-stage fractional weight is then multiplied by the first-stage FEFI weight (0.24) for the observation unit 18 in which CX=1 and CY=0 in order to obtain the FHDI weight.
Similarly, donor cells (6.7, 0.80), (9.6, 0.95), and (10.0, 0.13) are selected for (X, Y) for the observation unit 18 in which CX=1 and CX=1. Each row is assigned a second-stage fractional weight of 1/3. The second-stage fractional weight is then multiplied by the first-stage FEFI weight (0.27) for the observation unit 18 in which CX=1 and CY=1 in order to obtain the FHDI weight.
Thus, there are 12 imputed rows after FHDI for observation unit 18.
The resulting data set has 39 rows: 15 rows for fully observed units (ImpIndex=0), six rows for unit 5, six rows for unit 14, and 12 rows for unit 18. The sum of the fractional weights is 1 for all observation units.
The imputation-adjusted replicate weights are computed by applying the first-stage imputation, second-stage imputation, and neighbor adjustment (as discussed in section Replicate Weight Adjustments for FHDI) independently in each replicate sample. The imputed data set along with first four imputation-adjusted replicate weights is displayed in Figure 17.
Figure 17: Fractional Hot-Deck Imputation with Imputation-Adjusted Replicate Weights
| Unit | ImpIndex | ImpWt | FracWt | X | CX | Y | CY | ImpRepWt_1 | ImpRepWt_2 | ImpRepWt_3 | ImpRepWt_4 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 1.0000 | 1.0000 | 0.3 | 0 | -0.54 | 0 | 0 | 1.0588 | 1.0588 | 1.0588 |
| 2 | 0 | 1.0000 | 1.0000 | 0.2 | 0 | -0.77 | 0 | 1.0588 | 0 | 1.0588 | 1.0588 |
| 3 | 0 | 1.0000 | 1.0000 | 1.7 | 0 | -0.59 | 0 | 1.0588 | 1.0588 | 0 | 1.0588 |
| 4 | 0 | 1.0000 | 1.0000 | 1.7 | 0 | -0.59 | 0 | 1.0588 | 1.0588 | 1.0588 | 0 |
| 5 | 1 | 0.1340 | 0.1340 | 1.0 | 0 | -0.77 | 0 | 0.1637 | 0 | 0.1637 | 0.1637 |
| 5 | 2 | 0.1340 | 0.1340 | 1.0 | 0 | -0.54 | 0 | 0 | 0.1637 | 0.1637 | 0.1637 |
| 5 | 3 | 0.2680 | 0.2680 | 1.0 | 0 | -0.59 | 0 | 0.3274 | 0.3274 | 0.1637 | 0.1637 |
| 5 | 4 | 0.1547 | 0.1547 | 1.0 | 0 | -0.03 | 1 | 0.1893 | 0.1893 | 0.1893 | 0.1893 |
| 5 | 5 | 0.1547 | 0.1547 | 1.0 | 0 | 0.78 | 1 | 0.1893 | 0.1893 | 0.1893 | 0.1893 |
| 5 | 6 | 0.1547 | 0.1547 | 1.0 | 0 | 0.95 | 1 | 0.1893 | 0.1893 | 0.1893 | 0.1893 |
| 6 | 0 | 1.0000 | 1.0000 | 1.8 | 0 | -0.03 | 1 | 1.0588 | 1.0588 | 1.0588 | 1.0588 |
| 7 | 0 | 1.0000 | 1.0000 | 2.0 | 0 | 0.95 | 1 | 1.0588 | 1.0588 | 1.0588 | 1.0588 |
| 8 | 0 | 1.0000 | 1.0000 | 1.9 | 0 | 0.78 | 1 | 1.0588 | 1.0588 | 1.0588 | 1.0588 |
| 9 | 0 | 1.0000 | 1.0000 | 6.7 | 1 | -0.15 | 0 | 1.0588 | 1.0588 | 1.0588 | 1.0588 |
| 10 | 0 | 1.0000 | 1.0000 | 6.0 | 1 | -1.01 | 0 | 1.0588 | 1.0588 | 1.0588 | 1.0588 |
| 11 | 0 | 1.0000 | 1.0000 | 3.3 | 1 | -1.86 | 0 | 1.0588 | 1.0588 | 1.0588 | 1.0588 |
| 12 | 0 | 1.0000 | 1.0000 | 7.3 | 1 | -0.21 | 0 | 1.0588 | 1.0588 | 1.0588 | 1.0588 |
| 13 | 0 | 1.0000 | 1.0000 | 6.7 | 1 | 0.80 | 1 | 1.0588 | 1.0588 | 1.0588 | 1.0588 |
| 14 | 1 | 0.1547 | 0.1547 | 1.8 | 0 | 1.23 | 1 | 0.1656 | 0.1656 | 0.1656 | 0.1656 |
| 14 | 2 | 0.1547 | 0.1547 | 1.9 | 0 | 1.23 | 1 | 0.1656 | 0.1656 | 0.1656 | 0.1656 |
| 14 | 3 | 0.1547 | 0.1547 | 2.0 | 0 | 1.23 | 1 | 0.1656 | 0.1656 | 0.1656 | 0.1656 |
| 14 | 4 | 0.1787 | 0.1787 | 2.9 | 1 | 1.23 | 1 | 0.1873 | 0.1873 | 0.1873 | 0.1873 |
| 14 | 5 | 0.1787 | 0.1787 | 6.7 | 1 | 1.23 | 1 | 0.1873 | 0.1873 | 0.1873 | 0.1873 |
| 14 | 6 | 0.1787 | 0.1787 | 10.0 | 1 | 1.23 | 1 | 0.1873 | 0.1873 | 0.1873 | 0.1873 |
| 15 | 0 | 1.0000 | 1.0000 | 2.9 | 1 | 0.65 | 1 | 1.0588 | 1.0588 | 1.0588 | 1.0588 |
| 16 | 0 | 1.0000 | 1.0000 | 9.6 | 1 | 0.95 | 1 | 1.0588 | 1.0588 | 1.0588 | 1.0588 |
| 17 | 0 | 1.0000 | 1.0000 | 10.0 | 1 | 0.13 | 1 | 1.0588 | 1.0588 | 1.0588 | 1.0588 |
| 18 | 1 | 0.0667 | 0.0667 | 0.2 | 0 | -0.77 | 0 | 0.0764 | 0 | 0.0764 | 0.0764 |
| 18 | 2 | 0.0667 | 0.0667 | 0.3 | 0 | -0.54 | 0 | 0 | 0.0764 | 0.0764 | 0.0764 |
| 18 | 3 | 0.1334 | 0.1334 | 1.7 | 0 | -0.59 | 0 | 0.1528 | 0.1528 | 0.0764 | 0.0764 |
| 18 | 4 | 0.0770 | 0.0770 | 1.8 | 0 | -0.03 | 1 | 0.0884 | 0.0884 | 0.0884 | 0.0884 |
| 18 | 5 | 0.0770 | 0.0770 | 1.9 | 0 | 0.78 | 1 | 0.0884 | 0.0884 | 0.0884 | 0.0884 |
| 18 | 6 | 0.0770 | 0.0770 | 2.0 | 0 | 0.95 | 1 | 0.0884 | 0.0884 | 0.0884 | 0.0884 |
| 18 | 7 | 0.0784 | 0.0784 | 3.3 | 1 | -1.86 | 0 | 0.0882 | 0.0882 | 0.0882 | 0.0882 |
| 18 | 8 | 0.0784 | 0.0784 | 6.7 | 1 | -0.15 | 0 | 0.0882 | 0.0882 | 0.0882 | 0.0882 |
| 18 | 9 | 0.0784 | 0.0784 | 7.3 | 1 | -0.21 | 0 | 0.0882 | 0.0882 | 0.0882 | 0.0882 |
| 18 | 10 | 0.0889 | 0.0889 | 6.7 | 1 | 0.80 | 1 | 0.0999 | 0.0999 | 0.0999 | 0.0999 |
| 18 | 11 | 0.0889 | 0.0889 | 9.6 | 1 | 0.95 | 1 | 0.0999 | 0.0999 | 0.0999 | 0.0999 |
| 18 | 12 | 0.0889 | 0.0889 | 10.0 | 1 | 0.13 | 1 | 0.0999 | 0.0999 | 0.0999 | 0.0999 |