Simulation studies

Replace observed data

In simulation studies, a common task is fitting the same model to many different datasets. It would be a waste of resources to reconstruct the complete model for each dataset. We therefore provide the function replace_observed to change the observed part of a model, without necessarily reconstructing the other parts.

For SemLoss terms, replace_observed() constructs the new loss by passing the new observed data, the current implied state, and the current loss (as refloss) to the appropriate loss constructor. The new loss term therefore shares the implied state with the original one, as well as loss-specific settings and, potentially, the internal state.

For the A first model, you would use it as

data = example_data("political_democracy")

data_1 = data[1:30, :]

data_2 = data[31:75, :]

model = Sem(
    specification = partable,
    data = data_1
)

model_updated = replace_observed(model, data_2)
Structural Equation Model- Loss Functions 
  > SemML
    - observed:    SemObservedData 
    - implied:     RAM 

Multithreading

Thread safety

This is only relevant when you are planning to fit updated models in parallel

Models generated by replace_observed may share the same objects in memory (e.g. some parts of model and model_updated are the same objects in memory.) Therefore, fitting both of these models in parallel will lead to race conditions, possibly crashing your computer. To avoid these problems, you should copy model before updating it.

Taking into account the warning above, fitting multiple models in parallel becomes as easy as:

model1 = Sem(
    specification = partable,
    data = data_1
)

model2 = deepcopy(replace_observed(model, data_2))

models = [model1, model2]
fits = Vector{SemFit}(undef, 2)

Threads.@threads for i in 1:2
    fits[i] = fit(models[i])
end

API

StructuralEquationModels.replace_observedFunction
replace_observed(model::Sem, observed::SemObserved)
replace_observed(model::Sem, data::AbstractDict{Symbol})
replace_observed(model::Sem, data::AbstractDataFrame; [semterm_column])
replace_observed(loss::SemLoss, observed::SemObserved)
replace_observed(loss::SemLoss, data::Union{AbstractMatrix, DataFrame})

Construct a new SEM model or SEM loss with replaced observed data.

The SEM structure (implied covariance, loss type) is preserved; only the observed data is swapped. The new loss terms preserve the configuration and share the implied state with the loss terms of the original SEM model.

Keyword arguments:

  • recompute_observed_state::Bool = true: loss terms should recompute observed-dependent caches. Losses without such caches ignore this argument.

Single-term models

Pass a SemObserved object, a data matrix, or a DataFrame:

replace_observed(model, new_data_matrix)
replace_observed(model, new_sem_observed)
replace_observed(model, new_df)

Multi-term models

Pass a Dict{Symbol} mapping term ids to data or SemObserved objects:

replace_observed(model, Dict(:g1 => data1, :g2 => data2))

Or pass a DataFrame with a semterm_column identifying the group:

replace_observed(model, new_df; semterm_column = :group)
source