# Q-Step Glossary

**Causation** -- The idea that a change in an independent variable changes the value of a dependent variable.

**Code** -- A set of rules or instructions to tell your computer what you want it to do.

**Computer lab** -- A room offering access to computing resources, where students will learn how to write code in R to analyse data to answer social scientific questions.

**Constant** -- An unknown value that does not vary.

**Correlation** -- A measure of linear association between two variables. Correlation is not the same as causation.

**Data** -- Facts from which conclusions can be drawn.

**Dependent variable** -- The variable whose values are supposed to be explained by changes in the independent variables.

**Error **-- The difference between between an observed and predicted value of the dependent variable.

**Estimator** -- A rule for calculating an estimate of a parameter. For instance, OLS is an estimator for the regression coefficients.

**Independent variable** -- The variable that is supposed to explain changes in the dependent variable.

**Linear regression mode**l -- A statistical model that represents a continuous dependent variable as a linear function of the parameters (regression coefficients) and the independent variables plus an error term that follows a normal distribution. The regression coefficients are usually estimated via OLS. For instance, one might say that income depends on education plus chance events. Expressed as a linear regression model this would mean that income = a + b*education + e, where a is a constant, b is the regression coefficient of Education, and e is the error term.

**Model (statistical)** -- An idealised representation of the process that generated the the values of the dependent variable.

**Normal distribution** -- A symmetric, bell-shaped distribution.

**OLS (ordinary least squares)** -- An estimator that minimises the sum of squared errors.

**Parameter** -- An unknown quantity such as the population mean.

**R** -- A free software environment for statistical computing and graphics.

**Regression coefficient** -- An estimate of how much a one-unit increase in the independent variable is associated with changes in the dependent variable.

**Statistical distribution** -- A function showing all possible values of the variable and their relative frequency.

**Unit **-- A member of a population. For instance, a voter (unit) is a member of the electorate (population).

**Variable** -- A numerical value that can differ across units. For instance, voter's age (variable) can be 18 years, 19 years, and so on.