B Terminology

B.1 Terms in Statistics

Bar chart

A graph used to display summary statistics such as the mean (in the case of a scale variable) or the frequency (in the case of a nominal variable).

Boxplot

a visual representation of data that shows central tendency (usually the median) and spread (usually the interquartile range) of a numeric variable for one or more groups; boplots are often used to compare the distribution of a continuous variable across several groups

Case / Observation

A case is the unit of analysis; one person or other entity. In psychology, this is normally the data deriving from a single participant. In some research, the cases will not be people. For example, we may be interested in the average academic attainment for pupils from different schools. Here, the cases would be the schools. In R, a single row of data in a data frame represents a case.

Categorical variable

variable measured in categories; there are two types of categorical variables: ordinal variables have categories with a logical order (e.g., Liker scales), while nominal variables have categories with no logical order (e.g., religious affiliation)

Data

A set of values. A data set is typically made up of a number of variables. In quantitative research, data are numeric.

Descriptive statistics

Procedures that allow you to describe data by summarising, displaying or illustrating them. Often used as a general term for summary descriptive statistics: measures of central tendency and measures of dispersion. Graphs are descriptive statistics used to illustrate the data.

Frequency/ies

The number of times a particular value of a variable occurs.

Histogram

a visual display of data used to examine the distribution of a numeric variable

Line graph

a visual display of data often used to examine the relationship between two continuous variables or for something measured over time

Missing values

A data set may be incomplete, for example, if some observations or measurements failed or if participants didn’t respond to some questions. It is important to distinguish these missing data points from valid data. Missing values are the values R has reserved for each variable to indicate that a data point is missing. These missing values can either be specified by the user (user missing) or automatically set by R (NA).

Nominal data

Data collected at a level of measurement that yields nominal data (nominal just means ‘named’), also referred to as ‘categorical data,’ where the value does not imply anything other than a label; for example, 1 = male and 2 = female.

Observation / Case

An observation is the unit of analysis; one person or other entity. In psychology, this is normally the data deriving from a single participant. In some research, the cases will not be people. For example, we may be interested in the average academic attainment for pupils from different schools. Here, the observation would be the schools. In R, a single row of data in a data frame represents an observation.

Participant

People who take part in an experiment or research study. Previously, the word ‘subject’ was used, and still is in many statistics books.

Population

The total set of all possible scores for a particular variable.

Quantitative data

Is used to describe numeric data measured on any of the four levels of measurement. Sometimes though, the term ‘qualitative data’ is then used to describe data measured with nominal scales.

Sample

A subset of observations from some population that is often analyzed to learn about the poplulation sampled.

Scatterplot

a graph that shows one dot for each observation in the data set

Summary statistics

used to provide an overview of the characteristics of a sample; this typically includes measures central tendency and spread for numeric variables and the frequencies and percentages of categorical variables

Statistics

A general term for procedures for summarising or displaying data (descriptive statistics) and for analysing data (inferential statistical tests).

Variable

a measured characteristic of some entity (e.g., income, years of education, sex, height, blood pressure, smoking status, etc.); A variable in R is represented by a column in data frame.

B.2 Terms in R

Argument

information input into a function that controls how the function behaves

Assigning

assigning a value to an object is done by using a left-arrow (<-), with the arrow separating the name of the object on the left from the expression itself on the right: object_name <- expression

Character

a basic data type in R that comprises things that cannot be used in mathematical operations; often, character variables are names, addresses, zip codes, or other similar values

Comment

Statements included in code but not analyzed; in R, comment is denoted by hashtag (#) and is often used to clarify the codes

Constants

Constants, as the name suggests, are entities whose value cannot be altered. Basic types of constant are double constants, integer constants, logical constants and character constants.

csv

a file extension indicating that the file contains comma separated values or semicolon separated values

Data frame

an object type in R that holds data with values in rows and columns with rows treated as observations and columns treated as variables

Data management

the procedures used to prepare the data for analysis; data management often includes recoding variables, ensuring that missing values are treated properly, checking and fixing data types, and other data-cleaning procedures

Data types

in R, these include numeric (double, integer), character, logical; the data type suggests how a variable was measured and recorded or recoded, and different analytic strategies are used to manage and analyze different variable types

Expression

An expression is an instruction to perform a particular task. An expression is any sequence of R constants, object’s names, operators, function calls, and parentheses. An expression has a type as well as a value.

Factor

A categorical variable and its value labels. Value labels may be nothing more than “1,” “2,”…, if not assigned explicitly. More formally, a type of object that represents a categorical variable. It stores its labels in its levels attribute.

Function

a set of machine-readable instructions to perform a task in R; often, the task is to conduct some sort of data management or analysis, but there are also functions that exist just for fun.

Index

The order number of a variable in a data set or the subscript of a value in a object. The number of the component in a list or data frame, or of an element in a vector.

Integer

a similar data type to numeric, but containing only whole numbers

Length

The number of observations/cases in a variable, including missing values, or the number of variables in a data set. For vectors, it is the number of its elements (including NAs). For lists or data frames, it is the number of its components.

Levels

The values that a categorical variable can have. Actually stored as a part of the factor itself in what appears to be a very short character variable (even when the values themselves are numbers).

List

A set of objects of any class. Can contain vectors, data frames, matrices and even other lists.

Matrix

A data set that must contain only one type of variable, e.g. all numeric or character. More formally, a two-dimensional array; that is, a vector with a dim attribute of length 2. Information, or data elements, stored in a rectangular format with rows and columns.

NA

the R placeholder for missing values, often translated as “not available.”

NaN

A missing value. Stands for Not a Number. Something that is undefined mathematically such as zero divided by zero.

NULL

An object you can use to drop variables or values. E.g. mydata$x <- NULL drops the variable x from the data set mydata. Assigning it to an object deletes it.

Numeric

A variable that contains only numbers. This can be double and integer.

Object

information stored in R; data analysis and data management are then performed on these stored objects. Includes data frames, vectors, factors, matrices, arrays, lists and functions.

Operators

An operator is a symbol that tells the compiler to perform specific mathematical, logical, or other manipulations. R language is rich in built-in operators and provides following types of operators: Arithmetic Operators, Relational Operators, Logical Operators, Assignment Operators, Miscellaneous Operators.

Package

a collection of functions and datasets for use in R that usually has a specific purpose, such as conducting partial correlation anaylyses (ppcor package)

Precedence of operations

the order in which mathematical operations should be performed when solving an equation: parentheses, exponents, multiplication, division, addition, and subtraction (PEMDAS)

Recycling rules

If one tries to add two structures with a different number of elements, then the shortest is recycled to length of longest. That is, if for instance you add c(1, 2, 3) to a six-element vector then you will really add c(1, 2, 3, 1, 2, 3). If the length of the longer vector is not a multiple of the shorter one, a warning is given.

RMarkdown file

RMarkdown provides an authoring framework for data science. You can use a single R Markdown file to both 1) save and execute code; 2) generate high quality reports that can be shared with an audience.

sav

the file extension for a data file saved in a format for the Statistical Package for Social Sciences (SPSS) statistical software

Script file

a text file in R similar to something written in the Notepad text editor on a Windows computer or the TextEdit text editor on a Mac computer; it is saved with a .R file extension

Vector

Vectors are one-dimensional and homogenous data structures. It can exist on its own in memory or it can be part of a data frame. More formally, a set of values that have the same base type. A vector can be a vector of characters, logical, integers or double.

Working directory

R uses a working directory, where R will look, by default, for files you ask it to load. It also where, by default, any files you write to disk will go.

Workspace

A temporary work area in which all R computation happens. Data that exists there will vanish if not saved to your hard drive before quitting R. More formally, the area of your computer’s main memory where R does all its work. Data must be loaded into it from files, and packages must be loaded into it from the library, before you can use either.

B.3 Terms in Statistics and R

Terms in statistics and R
Terms in Statistics Terms in R
  • dataset
  • sample
  • data frame
  • observation
  • rows in a data frame
  • variable
  • columns in a data frame
  • categorical variable
  • qualitative variable
    • nominal variable
    • ordinal variable
  • factor
  • numeric variable
  • quantitative variable
    • continuous variable
    • discrete variable
  • numeric vector
    • double vector
    • integer vector