Our Concept of a Structural Equation Model

In our package, a structural equation model (a Sem) is built from one or more loss terms. Fitting the model means finding the parameters that minimize the (weighted) sum of all of its loss terms. This simple idea is remarkably general: within the same structure it covers a single SEM fit by maximum likelihood, a regularized SEM (e.g. maximum likelihood plus a ridge penalty), and multigroup models (one SEM term per group).

SEM concept

A loss term is anything of type AbstractLoss — a function that maps the model parameters to a number that should be minimized. There are two kinds of loss terms:

SEM loss functions (SemLoss), such as SemML, SemWLS and SemFIML, measure how well the model explains the data. To do so, each SemLoss bundles its own observed part (the data) and implied part (what the model implies about the data). They are the heart of a SEM.
Other loss functions, such as the regularization terms SemRidge and SemConstant, depend only on the parameters and therefore need neither an observed nor an implied part.

Because a model is just a (weighted) sum of loss terms, you can freely combine them. For example, ridge-regularized full information maximum likelihood estimation is a model with two loss terms, a SemFIML term and a SemRidge term. A two-group model is a model with two SemML terms, one per group, weighted by the respective sample sizes.

All models are subtypes of AbstractSem. The default Sem computes the weighted sum of its loss terms together with their (analytic) gradients. SemFiniteDiff is an alternative that approximates the gradient with finite differences, which is useful for loss functions that do not provide an analytic gradient.

The parts of a SEM loss

Each SEM loss function (SemLoss) is itself composed of interchangeable building blocks (like 'Legos'): an observed part and an implied part. To make precise which objects can play each role, we require them to have a certain type:

SEM concept typed

So everything that can serve as the observed part has to be of type SemObserved, everything that can serve as the implied part has to be of type SemImplied, and the loss function that combines them is a SemLoss. To fit the model, you additionally choose a SemOptimizer; it connects to the numerical optimization backend but is not itself part of the model.

Here is an overview on the available building blocks:

`SemObserved`	`SemImplied`	`AbstractLoss`	`SemOptimizer`
`SemObservedData`	`RAM`	`SemML`	:Optim
`SemObservedCovariance`	`RAMSymbolic`	`SemWLS`	:NLopt
`SemObservedMissing`	`ImpliedEmpty`	`SemFIML`	:Proximal
		`SemRidge`
		`SemConstant`

The rest of this page explains each building block and the available options. After that, the API - model parts section serves as a reference for detailed explanations. (How to stick the building blocks together into a final model is explained in the section on Model Construction.)

The observed part aka `SemObserved`

The observed part contains all necessary information about the observed data, and pre-computes the statistics a loss function needs from it — for example the observed covariance matrix, or the different patterns of missingness used for full information maximum likelihood (FIML) estimation. Currently, we have three options: SemObservedData for fully observed datasets, SemObservedCovariance for observed covariances (and means) and SemObservedMissing for data that contains missing values.

The implied part aka `SemImplied`

The implied part defines how the model-implied statistics (for example, the model-implied covariance matrix and mean vector) are computed from the parameters. There are two options at the moment: RAM, which uses the reticular action model to compute the model implied covariance matrix, and RAMSymbolic which does the same but symbolically pre-computes part of the model, which increases subsequent performance in model fitting (see Symbolic precomputation). There is also a third option, ImpliedEmpty that can serve as a 'placeholder' for loss terms that do not need an implied part.

The loss functions aka `AbstractLoss`

The loss terms specify the objective that is minimized to find the parameter estimates; a model minimizes the (weighted) sum of all its loss terms. SEM loss functions (SemLoss) compare what the model implies to the observed data, while regularization terms depend only on the parameters. Available loss functions are

SemML: maximum likelihood estimation
SemWLS: weighted least squares estimation
SemFIML: full-information maximum likelihood estimation
SemRidge: ridge regularization
SemConstant: adds a constant to the objective

The optimizer aka `SemOptimizer`

The optimizer connects to the numerical optimization backend used to fit the model. It is not part of the model itself, but it is chosen when fitting (see Model fitting). It can be used to control options like the optimization algorithm, linesearch, stopping criteria, etc. There are currently three available engines (i.e., backends used to carry out the numerical optimization), :Optim connecting to the Optim.jl backend, :NLopt connecting to the NLopt.jl backend and :Proximal connecting to ProximalAlgorithms.jl. For more information about the available options see also the tutorials about Using Optim.jl and Using NLopt.jl, as well as Constrained optimization and Regularization .

What to do next

You now have an understanding of our representation of structural equation models.

To learn more about how to use the package, you may visit the remaining tutorials.

If you want to learn how to extend the package (e.g., add a new loss function), you may visit Extending the package.

API - model parts

observed

StructuralEquationModels.SemObserved — Type

abstract type SemObserved

Supertype of all objects that can serve as the observed field of a SEM. Pre-processes data and computes sufficient statistics for example. If you have a special kind of data, you should implement a subtype of SemObserved.