ruk·si

# R - Formula Objects

Updated at 2017-10-16 16:59

Formulae convey a relationship among a set of variables. We can define a formula without having any data loaded.

``````~ x
# formula that defines a single independent variable, "x", pretty useless

y ~ x
# one dependent variable, translates to "y" depends on "x".
``````

`~` creates a formula object. They are used differently by different libraries, but the original intent was to allow specify "which variables does the left side depend on?"

``````# left of ~ is the dependent variable, the "outcome" or "result"
# right of ~ are the independent/predictor/covariate variables

myFormula <- Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width
myFormula
# Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width
# you would read this as "Species depends on Sepal.Length, Sepal.Width..."

allFormula <- Species ~ .
allFormula
# Species ~ .
# . in formula translates to "all variables not yet used"
# you would read this as "Species depends on all the other variables."
``````
• An expression of `y ~ model` is interpreted as the response `y` is modeled by a predictor specified symbolically by `model`.
• `+` operator is used to separate terms in a model.
• `:` operator is used to separate variable and factor names in those terms.
• `*` operator denotes factor crossing: `a*b` interpreted as `a+b+a:b`.
• `^` operator indicates crossing to the specified degree: `(a+b+c)^2` is identical to `(a+b+c)*(a+b+c)`.
• `%in%` operator indicates that the terms on its left are nested within those on the right: `a + b %in% a` expands to the formula `a + a:b`
• `-` operator removes the specified terms: `(a+b+c)^2 - a:b` is identical to `a + b + c + b:c + a:c`.