ruk·si

R
Formula Objects

Updated at 2017-10-16 16:59

Formulae convey a relationship among a set of variables. We can define a formula without having any data loaded.

~ x
# formula that defines a single independent variable, "x", pretty useless

y ~ x
# one dependent variable, translates to "y" depends on "x".

~ creates a formula object. They are used differently by different libraries, but the original intent was to allow specify "which variables does the left side depend on?"

# left of ~ is the dependent variable, the "outcome" or "result"
# right of ~ are the independent/predictor/covariate variables

myFormula <- Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width
myFormula
# Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width
# you would read this as "Species depends on Sepal.Length, Sepal.Width..."

allFormula <- Species ~ .
allFormula
# Species ~ .
# . in formula translates to "all variables not yet used"
# you would read this as "Species depends on all the other variables."
  • An expression of y ~ model is interpreted as the response y is modeled by a predictor specified symbolically by model.
  • + operator is used to separate terms in a model.
  • : operator is used to separate variable and factor names in those terms.
  • * operator denotes factor crossing: a*b interpreted as a+b+a:b.
  • ^ operator indicates crossing to the specified degree: (a+b+c)^2 is identical to (a+b+c)*(a+b+c).
  • %in% operator indicates that the terms on its left are nested within those on the right: a + b %in% a expands to the formula a + a:b
  • - operator removes the specified terms: (a+b+c)^2 - a:b is identical to a + b + c + b:c + a:c.

Sources