Our Concept of a Structural Equation Model

In our package, every Structural Equation Model (Sem) consists of four parts:

SEM concept

Those parts are interchangable building blocks (like 'Legos'), i.e. there are different pieces available you can choose as the 'observed' slot of the model, and stick them together with other pieces that can serve as the 'imply' part.

The 'observed' part is for observed data, the imply part is what the model implies about your data (e.g. the model implied covariance matrix), the loss part compares the observed data and implied properties (e.g. weighted least squares difference between the observed and implied covariance matrix) and the optimizer part connects to the optimization backend (e.g. the type of optimization algorithm used).

For example, to build a model for maximum likelihood estimation with the NLopt optimization suite as a backend you would choose SemML as a loss function and SemOptimizerNLopt as the optimizer.

As you can see, a model can have as many loss functions as you want it to have. We always optimize over their (weighted) sum. So to build a model for ridge regularized full information maximum likelihood estimation, you would choose two loss functions, SemFIML and SemRidge.

In julia, everything has a type. To make more precise which objects can be used as the different building blocks, we require them to have a certain type:

SEM concept typed

So everything that can be used as the 'observed' part has to be of type SemObserved.

Here is an overview on the available building blocks:

SemObservedSemImplySemLossFunctionSemOptimizer
SemObservedDataRAMSemMLSemOptimizerOptim
SemObservedCovarianceRAMSymbolicSemWLSSemOptimizerNLopt
SemObservedMissingImplyEmptySemFIML
SemRidge
SemConstant

The rest of this page explains the building blocks for each part. First, we explain every part and give an overview on the different options that are available. After that, the API - model parts section serves as a reference for detailed explanations about the different options. (How to stick them together to a final model is explained in the section on Model Construction.)

The observed part aka SemObserved

The 'observed' part contains all necessary information about the observed data. Currently, we have three options: SemObservedData for fully observed datasets, SemObservedCovariance for observed covariances (and means) and SemObservedMissing for data that contains missing values.

The imply part aka SemImply

The imply part is what your model implies about the data, for example, the model-implied covariance matrix. There are two options at the moment: RAM, which uses the reticular action model to compute the model implied covariance matrix, and RAMSymbolic which does the same but symbolically pre-computes part of the model, which increases subsequent performance in model fitting (see Symbolic precomputation). There is also a third option, ImplyEmpty that can serve as a 'placeholder' for models that do not need an imply part.

The loss part aka SemLoss

The loss part specifies the objective that is optimized to find the parameter estimates. If it contains more then one loss function (aka SemLossFunction)), we find the parameters by minimizing the sum of loss functions (for example in maximum likelihood estimation + ridge regularization). Available loss functions are

  • SemML: maximum likelihood estimation
  • SemWLS: weighted least squares estimation
  • SemFIML: full-information maximum likelihood estimation
  • SemRidge: ridge regularization

The optimizer part aka SemOptimizer

The optimizer part of a model connects to the numerical optimization backend used to fit the model. It can be used to control options like the optimization algorithm, linesearch, stopping criteria, etc. There are currently two available backends, SemOptimizerOptim connecting to the Optim.jl backend, and SemOptimizerNLopt connecting to the NLopt.jl backend. For more information about the available options see also the tutorials about Using Optim.jl and Using NLopt.jl, as well as Constrained optimization.

What to do next

You now have an understanding about our representation of structural equation models.

To learn more about how to use the package, you may visit the remaining tutorials.

If you want to learn how to extend the package (e.g., add a new loss function), you may visit Extending the package.

API - model parts

observed

StructuralEquationModels.SemObservedType

Supertype of all objects that can serve as the observed field of a SEM. Pre-processes data and computes sufficient statistics for example. If you have a special kind of data, e.g. ordinal data, you should implement a subtype of SemObserved.

source
StructuralEquationModels.SemObservedDataType

For observed data without missings.

Constructor

SemObservedData(;
    specification,
    data,
    meanstructure = false,
    obs_colnames = nothing,
    kwargs...)

Arguments

  • specification: either a RAMMatrices or ParameterTable object (1)
  • data: observed data
  • meanstructure::Bool: does the model have a meanstructure?
  • obs_colnames::Vector{Symbol}: column names of the data (if the object passed as data does not have column names, i.e. is not a data frame)

Extended help

Interfaces

  • n_obs(::SemObservedData) -> number of observed data points

  • n_man(::SemObservedData) -> number of manifest variables

  • get_data(::SemObservedData) -> observed data

  • obs_cov(::SemObservedData) -> observed.obs_cov

  • obs_mean(::SemObservedData) -> observed.obs_mean

  • data_rowwise(::SemObservedData) -> observed data, stored as vectors per observation

Implementation

Subtype of SemObserved

Remarks

(1) the specification argument can also be nothing, but this turns of checking whether the observed data/covariance columns are in the correct order! As a result, you should only use this if you are sure your observed data is in the right format.

Additional keyword arguments:

  • spec_colnames::Vector{Symbol} = nothing: overwrites column names of the specification object
  • compute_covariance::Bool ) = true: should the covariance of data be computed and stored?
  • rowwise::Bool = false: should the data be stored also as vectors per observation
source
StructuralEquationModels.SemObservedCovarianceType

For observed covariance matrices and means.

Constructor

SemObservedCovariance(;
    specification,
    obs_cov,
    obs_colnames = nothing,
    meanstructure = false,
    obs_mean = nothing,
    n_obs = nothing,
    kwargs...)

Arguments

  • specification: either a RAMMatrices or ParameterTable object (1)
  • obs_cov: observed covariance matrix
  • obs_colnames::Vector{Symbol}: column names of the covariance matrix
  • meanstructure::Bool: does the model have a meanstructure?
  • obs_mean: observed mean vector
  • n_obs::Number: number of observed data points (necessary for fit statistics)

Extended help

Interfaces

  • n_obs(::SemObservedCovariance) -> number of observed data points

  • n_man(::SemObservedCovariance) -> number of manifest variables

  • obs_cov(::SemObservedCovariance) -> observed covariance matrix

  • obs_mean(::SemObservedCovariance) -> observed means

Implementation

Subtype of SemObserved

Remarks

(1) the specification argument can also be nothing, but this turns of checking whether the observed data/covariance columns are in the correct order! As a result, you should only use this if you are sure your covariance matrix is in the right format.

Additional keyword arguments:

  • spec_colnames::Vector{Symbol} = nothing: overwrites column names of the specification object
source
StructuralEquationModels.SemObservedMissingType

For observed data with missing values.

Constructor

SemObservedMissing(;
    specification,
    data,
    obs_colnames = nothing,
    kwargs...)

Arguments

  • specification: either a RAMMatrices or ParameterTable object (1)
  • data: observed data
  • obs_colnames::Vector{Symbol}: column names of the data (if the object passed as data does not have column names, i.e. is not a data frame)

Extended help

Interfaces

  • n_obs(::SemObservedMissing) -> number of observed data points

  • n_man(::SemObservedMissing) -> number of manifest variables

  • get_data(::SemObservedMissing) -> observed data

  • data_rowwise(::SemObservedMissing) -> observed data as vector per observation, with missing values deleted

  • patterns(::SemObservedMissing) -> indices of non-missing variables per missing patterns

  • patterns_not(::SemObservedMissing) -> indices of missing variables per missing pattern

  • rows(::SemObservedMissing) -> row indices of observed data points that belong to each pattern

  • pattern_n_obs(::SemObservedMissing) -> number of data points per pattern

  • pattern_nvar_obs(::SemObservedMissing) -> number of non-missing observed variables per pattern

  • obs_mean(::SemObservedMissing) -> observed mean per pattern

  • obs_cov(::SemObservedMissing) -> observed covariance per pattern

  • em_model(::SemObservedMissing) -> EmMVNModel that contains the covariance matrix and mean vector found via optimization maximization

Implementation

Subtype of SemObserved

Remarks

(1) the specification argument can also be nothing, but this turns of checking whether the observed data/covariance columns are in the correct order! As a result, you should only use this if you are sure your observed data is in the right format.

Additional keyword arguments:

  • spec_colnames::Vector{Symbol} = nothing: overwrites column names of the specification object
source

imply

StructuralEquationModels.SemImplyType

Supertype of all objects that can serve as the imply field of a SEM. Computed model-implied values that should be compared with the observed data to find parameter estimates, e. g. the model implied covariance or mean. If you would like to implement a different notation, e.g. LISREL, you should implement a subtype of SemImply.

source
StructuralEquationModels.RAMType

Model implied covariance and means via RAM notation.

Constructor

RAM(;
    specification,
    meanstructure = false,
    gradient = true,
    kwargs...)

Arguments

  • specification: either a RAMMatrices or ParameterTable object
  • meanstructure::Bool: does the model have a meanstructure?
  • gradient::Bool: is gradient-based optimization used

Extended help

Implementation

Subtype of SemImply.

RAM notation

The model implied covariance matrix is computed as

\[ \Sigma = F(I-A)^{-1}S(I-A)^{-T}F^T\]

and for models with a meanstructure, the model implied means are computed as

\[ \mu = F(I-A)^{-1}M\]

Interfaces

  • identifier(::RAM)-> Dict containing the parameter labels and their position

  • n_par(::RAM) -> Number of parameters

  • Σ(::RAM) -> model implied covariance matrix

  • μ(::RAM) -> model implied mean vector

RAM matrices for the current parameter values:

  • A(::RAM)
  • S(::RAM)
  • F(::RAM)
  • M(::RAM)

Jacobians of RAM matrices w.r.t to the parameter vector θ

  • ∇A(::RAM) -> $∂vec(A)/∂θᵀ$
  • ∇S(::RAM) -> $∂vec(S)/∂θᵀ$
  • ∇M(::RAM) = $∂M/∂θᵀ$

Vector of indices of each parameter in the respective RAM matrix:

  • A_indices(::RAM)
  • S_indices(::RAM)
  • M_indices(::RAM)

Additional interfaces

  • F⨉I_A⁻¹(::RAM) -> $F(I-A)^{-1}$
  • F⨉I_A⁻¹S(::RAM) -> $F(I-A)^{-1}S$
  • I_A(::RAM) -> $I-A$
  • has_meanstructure(::RAM) -> Val{Bool} does the model have a meanstructure?

Only available in gradient! calls:

  • I_A⁻¹(::RAM) -> $(I-A)^{-1}$
source
StructuralEquationModels.RAMSymbolicType

Subtype of SemImply that implements the RAM notation with symbolic precomputation.

Constructor

RAMSymbolic(;specification,
    vech = false,
    gradient = true,
    hessian = false,
    approximate_hessian = false,
    meanstructure = false,
    kwargs...)

Arguments

  • specification: either a RAMMatrices or ParameterTable object
  • meanstructure::Bool: does the model have a meanstructure?
  • gradient::Bool: is gradient-based optimization used
  • hessian::Bool: is hessian-based optimization used
  • approximate_hessian::Bool: for hessian based optimization: should the hessian be approximated
  • vech::Bool: should the half-vectorization of Σ be computed (instead of the full matrix) (automatically set to true if any of the loss functions is SemWLS)

Extended help

Implementation

Subtype of SemImply.

Interfaces

  • identifier(::RAMSymbolic)-> Dict containing the parameter labels and their position

  • n_par(::RAMSymbolic) -> Number of parameters

  • Σ(::RAMSymbolic) -> model implied covariance matrix

  • μ(::RAMSymbolic) -> model implied mean vector

Jacobians (only available in gradient! calls)

  • ∇Σ(::RAMSymbolic) -> $∂vec(Σ)/∂θᵀ$

  • ∇μ(::RAMSymbolic) -> $∂μ/∂θᵀ$

  • ∇Σ_function(::RAMSymbolic) -> function to overwrite ∇Σ in place, i.e. ∇Σ_function(∇Σ, θ). Normally, you do not want to use this but simply query ∇Σ(::RAMSymbolic).

Hessians The computation of hessians is more involved, and uses the "chain rule for hessian matrices". Therefore, we desribe it at length in the mathematical appendix of the online documentation, and the relevant interfaces are omitted here.

Additional interfaces

  • has_meanstructure(::RAMSymbolic) -> Val{Bool} does the model have a meanstructure?

RAM notation

The model implied covariance matrix is computed as

\[ \Sigma = F(I-A)^{-1}S(I-A)^{-T}F^T\]

and for models with a meanstructure, the model implied means are computed as

\[ \mu = F(I-A)^{-1}M\]

source
StructuralEquationModels.ImplyEmptyType

Empty placeholder for models that don't need an imply part. (For example, models that only regularize parameters.)

Constructor

ImplyEmpty(;specification, kwargs...)

Arguments

  • specification: either a RAMMatrices or ParameterTable object

Examples

A multigroup model with ridge regularization could be specified as a SemEnsemble with one model per group and an additional model with ImplyEmpty and SemRidge for the regularization part.

Extended help

Interfaces

  • identifier(::RAMSymbolic)-> Dict containing the parameter labels and their position
  • n_par(::RAMSymbolic) -> Number of parameters

Implementation

Subtype of SemImply.

source

loss functions

StructuralEquationModels.SemLossType
SemLoss(args...; loss_weights = nothing, ...)

Constructs the loss field of a SEM. Can contain multiple SemLossFunctions, the model is optimized over their sum. See also SemLossFunction.

Arguments

  • args...: Multiple SemLossFunctions.
  • loss_weights::Vector: Weights for each loss function. Defaults to unweighted optimization.

Examples

my_ml_loss = SemML(...)
my_ridge_loss = SemRidge(...)
my_loss = SemLoss(SemML, SemRidge; loss_weights = [1.0, 2.0])
source
StructuralEquationModels.SemMLType

Maximum likelihood estimation.

Constructor

SemML(;observed, meanstructure = false, approximate_hessian = false, kwargs...)

Arguments

  • observed::SemObserved: the observed part of the model
  • meanstructure::Bool: does the model have a meanstructure?
  • approximate_hessian::Bool: if hessian-based optimization is used, should the hessian be swapped for an approximation

Examples

my_ml = SemML(observed = my_observed)

Interfaces

Analytic gradients are available, and for models without a meanstructure, also analytic hessians.

Extended help

Implementation

Subtype of SemLossFunction.

source
StructuralEquationModels.SemFIMLType

Full information maximum likelihood estimation. Can handle observed data with missings.

Constructor

SemFIML(;observed, specification, kwargs...)

Arguments

  • observed::SemObservedMissing: the observed part of the model
  • specification: either a RAMMatrices or ParameterTable object

Examples

my_fiml = SemFIML(observed = my_observed, specification = my_parameter_table)

Interfaces

Analytic gradients are available.

Extended help

Implementation

Subtype of SemLossFunction.

source
StructuralEquationModels.SemWLSType

Weighted least squares estimation.

Constructor

SemWLS(;
    observed,
    meanstructure = false,
    wls_weight_matrix = nothing,
    wls_weight_matrix_mean = nothing,
    approximate_hessian = false,
    kwargs...)

Arguments

  • observed: the SemObserved part of the model
  • meanstructure::Bool: does the model have a meanstructure?
  • approximate_hessian::Bool: should the hessian be swapped for an approximation
  • wls_weight_matrix: the weight matrix for weighted least squares. Defaults to GLS estimation ($0.5*(D^T*kron(S,S)*D)$ where D is the duplication matrix and S is the inverse of the observed covariance matrix)
  • wls_weight_matrix_mean: the weight matrix for the mean part of weighted least squares. Defaults to GLS estimation (the inverse of the observed covariance matrix)

Examples

my_wls = SemWLS(observed = my_observed)

Interfaces

Analytic gradients are available, and for models without a meanstructure, also analytic hessians.

Extended help

Implementation

Subtype of SemLossFunction.

source
StructuralEquationModels.SemRidgeType

Ridge regularization.

Constructor

SemRidge(;α_ridge, which_ridge, n_par, parameter_type = Float64, imply = nothing, kwargs...)

Arguments

  • α_ridge: hyperparameter for penalty term
  • which_ridge::Vector: Vector of parameter labels (Symbols) or indices that indicate which parameters should be regularized.
  • n_par::Int: number of parameters of the model
  • imply::SemImply: imply part of the model
  • parameter_type: type of the parameters

Examples

my_ridge = SemRidge(;α_ridge = 0.02, which_ridge = [:λ₁, :λ₂, :ω₂₃], n_par = 30, imply = my_imply)

Interfaces

Analytic gradients and hessians are available.

Extended help

Implementation

Subtype of SemLossFunction.

source
StructuralEquationModels.SemConstantType

Constant loss term. Can be used for comparability to other packages.

Constructor

SemConstant(;constant_loss, kwargs...)

Arguments

  • constant_loss::Number: constant to add to the objective

Examples

    my_constant = SemConstant(constant_loss = 42.0)

Interfaces

Analytic gradients and hessians are available.

Extended help

Implementation

Subtype of SemLossFunction.

source

optimizer

StructuralEquationModels.SemOptimizerType

Supertype of all objects that can serve as the optimizer field of a SEM. Connects the SEM to its optimization backend and controls options like the optimization algorithm. If you want to connect the SEM package to a new optimization backend, you should implement a subtype of SemOptimizer.

source
StructuralEquationModels.SemOptimizerOptimType

Connects to Optim.jl as the optimization backend.

Constructor

SemOptimizerOptim(;
    algorithm = LBFGS(), 
    options = Optim.Options(;f_tol = 1e-10, x_tol = 1.5e-8), 
    kwargs...)

Arguments

  • algorithm: optimization algorithm.
  • options::Optim.Options: options for the optimization algorithm

Usage

All algorithms and options from the Optim.jl library are available, for more information see the Optim.jl online documentation.

Examples

my_optimizer = SemOptimizerOptim()

# hessian based optimization with backtracking linesearch and modified initial step size
using Optim, LineSearches

my_newton_optimizer = SemOptimizerOptim(
    algorithm = Newton(
        ;linesearch = BackTracking(order=3), 
        alphaguess = InitialHagerZhang()
    )
)

Extended help

Interfaces

  • algorithm(::SemOptimizerOptim)
  • options(::SemOptimizerOptim)

Implementation

Subtype of SemOptimizer.

source
StructuralEquationModels.SemOptimizerNLoptType

Connects to NLopt.jl as the optimization backend.

Constructor

SemOptimizerNLopt(;
    algorithm = :LD_LBFGS,
    options = Dict{Symbol, Any}(),
    local_algorithm = nothing, 
    local_options = Dict{Symbol, Any}(), 
    equality_constraints = Vector{NLoptConstraint}(), 
    inequality_constraints = Vector{NLoptConstraint}(), 
    kwargs...)

Arguments

  • algorithm: optimization algorithm.
  • options::Dict{Symbol, Any}: options for the optimization algorithm
  • local_algorithm: local optimization algorithm
  • local_options::Dict{Symbol, Any}: options for the local optimization algorithm
  • equality_constraints::Vector{NLoptConstraint}: vector of equality constraints
  • inequality_constraints::Vector{NLoptConstraint}: vector of inequality constraints

Example

my_optimizer = SemOptimizerNLopt()

# constrained optimization with augmented lagrangian
my_constrained_optimizer = SemOptimizerNLopt(;
    algorithm = :AUGLAG,
    local_algorithm = :LD_LBFGS,
    local_options = Dict(:ftol_rel => 1e-6),
    inequality_constraints = NLoptConstraint(;f = my_constraint, tol = 0.0),
)

Usage

All algorithms and options from the NLopt library are available, for more information see the NLopt.jl package and the NLopt online documentation. For information on how to use inequality and equality constraints, see Constrained optimization in our online documentation.

Extended help

Interfaces

  • algorithm(::SemOptimizerNLopt)
  • local_algorithm(::SemOptimizerNLopt)
  • options(::SemOptimizerNLopt)
  • local_options(::SemOptimizerNLopt)
  • equality_constraints(::SemOptimizerNLopt)
  • inequality_constraints(::SemOptimizerNLopt)

Implementation

Subtype of SemOptimizer.

source