Simulation studies
Replace observed data
In simulation studies, a common task is fitting the same model to many different datasets. It would be a waste of resources to reconstruct the complete model for each dataset. We therefore provide the function replace_observed to change the observed part of a model, without necessarily reconstructing the other parts.
For SemLoss terms, replace_observed() constructs the new loss by passing the new observed data, the current implied state, and the current loss (as refloss) to the appropriate loss constructor. The new loss term therefore shares the implied state with the original one, as well as loss-specific settings and, potentially, the internal state.
For the A first model, you would use it as
data = example_data("political_democracy")
data_1 = data[1:30, :]
data_2 = data[31:75, :]
model = Sem(
specification = partable,
data = data_1
)
model_updated = replace_observed(model, data_2)Structural Equation Model- Loss Functions
> SemML
- observed: SemObservedData
- implied: RAM
replace_observed accepts a data matrix, a DataFrame, or a ready-made SemObserved object, and works for multigroup/collection models too (pass a Dict mapping term ids to data, or a DataFrame together with a semterm_column). See the API below for all signatures.
Recomputing observed-dependent state
Some loss terms cache quantities that are derived from the observed data. The most prominent example is weighted least squares (SemWLS), whose weight matrix defaults to the GLS weights computed from the observed covariance matrix. By default, replace_observed recomputes these quantities from the new data, which is what you want in most simulation studies:
model_wls = Sem(
specification = partable,
data = data_1,
implied = RAMSymbolic,
vech = true,
loss = SemWLS,
)
# weight matrix recomputed from `data_2` (default)
model_wls_updated = replace_observed(model_wls, data_2)Structural Equation Model- Loss Functions
> SemWLS
- observed: SemObservedData
- implied: RAMSymbolic
If instead you want to keep the original observed-dependent state — e.g. fit every replication with the same fixed weight matrix — pass recompute_observed_state = false:
model_wls_fixed = replace_observed(model_wls, data_2; recompute_observed_state = false)Structural Equation Model- Loss Functions
> SemWLS
- observed: SemObservedData
- implied: RAMSymbolic
Loss terms without observed-dependent caches (such as SemML) ignore this keyword.
If you run simulation studies with a custom loss function, see Supporting replace_observed in the Custom loss functions chapter. It explains how replace_observed rebuilds a loss term, when the default behavior is enough, and how to implement a custom replace_observed method (with recompute_observed_state) if your loss caches quantities derived from the observed data.
Multithreading
This is only relevant when you are planning to fit updated models in parallel
Models generated by replace_observed may share the same objects in memory (e.g. some parts of model and model_updated are the same objects in memory.) Therefore, fitting both of these models in parallel will lead to race conditions, possibly crashing your computer. To avoid these problems, you should copy model before updating it.
Taking into account the warning above, fitting multiple models in parallel becomes as easy as:
model1 = Sem(
specification = partable,
data = data_1
)
model2 = deepcopy(replace_observed(model, data_2))
models = [model1, model2]
fits = Vector{SemFit}(undef, 2)
Threads.@threads for i in 1:2
fits[i] = fit(models[i])
endAPI
StructuralEquationModels.replace_observed — Function
replace_observed(model::Sem, observed::SemObserved)
replace_observed(model::Sem, data::AbstractDict{Symbol})
replace_observed(model::Sem, data::AbstractDataFrame; [semterm_column])
replace_observed(loss::SemLoss, observed::SemObserved)
replace_observed(loss::SemLoss, data::Union{AbstractMatrix, DataFrame})Construct a new SEM model or SEM loss with replaced observed data.
The SEM structure (implied covariance, loss type) is preserved; only the observed data is swapped. The new loss terms preserve the configuration and share the implied state with the loss terms of the original SEM model.
Keyword arguments:
recompute_observed_state::Bool = true: loss terms should recompute observed-dependent caches. Losses without such caches ignore this argument.
Single-term models
Pass a SemObserved object, a data matrix, or a DataFrame:
replace_observed(model, new_data_matrix)
replace_observed(model, new_sem_observed)
replace_observed(model, new_df)Multi-term models
Pass a Dict{Symbol} mapping term ids to data or SemObserved objects:
replace_observed(model, Dict(:g1 => data1, :g2 => data2))Or pass a DataFrame with a semterm_column identifying the group:
replace_observed(model, new_df; semterm_column = :group)