B Terminology
B.1 Terms in Statistics
- Bar chart
A graph used to display summary statistics such as the mean (in the case of a scale variable) or the frequency (in the case of a nominal variable).
- Boxplot
a visual representation of data that shows central tendency (usually the median) and spread (usually the interquartile range) of a numeric variable for one or more groups; boplots are often used to compare the distribution of a continuous variable across several groups
- Case / Observation
A case is the unit of analysis; one person or other entity. In psychology, this is normally the data deriving from a single participant. In some research, the cases will not be people. For example, we may be interested in the average academic attainment for pupils from different schools. Here, the cases would be the schools. In R, a single row of data in a data frame represents a case.
- Categorical variable
variable measured in categories; there are two types of categorical variables: ordinal variables have categories with a logical order (e.g., Liker scales), while nominal variables have categories with no logical order (e.g., religious affiliation)
- Data
A set of values. A data set is typically made up of a number of variables. In quantitative research, data are numeric.
- Descriptive statistics
Procedures that allow you to describe data by summarising, displaying or illustrating them. Often used as a general term for summary descriptive statistics: measures of central tendency and measures of dispersion. Graphs are descriptive statistics used to illustrate the data.
- Frequency/ies
The number of times a particular value of a variable occurs.
- Histogram
a visual display of data used to examine the distribution of a numeric variable
- Line graph
a visual display of data often used to examine the relationship between two continuous variables or for something measured over time
- Missing values
A data set may be incomplete, for example, if some observations or measurements failed or if participants didn’t respond to some questions. It is important to distinguish these missing data points from valid data. Missing values are the values R has reserved for each variable to indicate that a data point is missing. These missing values can either be specified by the user (user missing) or automatically set by R (
NA
).- Nominal data
Data collected at a level of measurement that yields nominal data (nominal just means ‘named’), also referred to as ‘categorical data,’ where the value does not imply anything other than a label; for example, 1 = male and 2 = female.
- Observation / Case
An observation is the unit of analysis; one person or other entity. In psychology, this is normally the data deriving from a single participant. In some research, the cases will not be people. For example, we may be interested in the average academic attainment for pupils from different schools. Here, the observation would be the schools. In R, a single row of data in a data frame represents an observation.
- Participant
People who take part in an experiment or research study. Previously, the word ‘subject’ was used, and still is in many statistics books.
- Population
The total set of all possible scores for a particular variable.
- Quantitative data
Is used to describe numeric data measured on any of the four levels of measurement. Sometimes though, the term ‘qualitative data’ is then used to describe data measured with nominal scales.
- Sample
A subset of observations from some population that is often analyzed to learn about the poplulation sampled.
- Scatterplot
a graph that shows one dot for each observation in the data set
- Summary statistics
used to provide an overview of the characteristics of a sample; this typically includes measures central tendency and spread for numeric variables and the frequencies and percentages of categorical variables
- Statistics
A general term for procedures for summarising or displaying data (descriptive statistics) and for analysing data (inferential statistical tests).
- Variable
a measured characteristic of some entity (e.g., income, years of education, sex, height, blood pressure, smoking status, etc.); A variable in R is represented by a column in data frame.
B.2 Terms in R
- Argument
information input into a function that controls how the function behaves
- Assigning
assigning a value to an object is done by using a left-arrow (
<-
), with the arrow separating the name of the object on the left from the expression itself on the right:object_name <- expression
- Character
a basic data type in R that comprises things that cannot be used in mathematical operations; often, character variables are names, addresses, zip codes, or other similar values
- Comment
Statements included in code but not analyzed; in R, comment is denoted by hashtag (
#
) and is often used to clarify the codes- Constants
Constants, as the name suggests, are entities whose value cannot be altered. Basic types of constant are double constants, integer constants, logical constants and character constants.
- csv
a file extension indicating that the file contains comma separated values or semicolon separated values
- Data frame
an object type in R that holds data with values in rows and columns with rows treated as observations and columns treated as variables
- Data management
the procedures used to prepare the data for analysis; data management often includes recoding variables, ensuring that missing values are treated properly, checking and fixing data types, and other data-cleaning procedures
- Data types
in R, these include numeric (double, integer), character, logical; the data type suggests how a variable was measured and recorded or recoded, and different analytic strategies are used to manage and analyze different variable types
- Expression
An expression is an instruction to perform a particular task. An expression is any sequence of R constants, object’s names, operators, function calls, and parentheses. An expression has a type as well as a value.
- Factor
A categorical variable and its value labels. Value labels may be nothing more than “1,” “2,”…, if not assigned explicitly. More formally, a type of object that represents a categorical variable. It stores its labels in its levels attribute.
- Function
a set of machine-readable instructions to perform a task in R; often, the task is to conduct some sort of data management or analysis, but there are also functions that exist just for fun.
- Index
The order number of a variable in a data set or the subscript of a value in a object. The number of the component in a list or data frame, or of an element in a vector.
- Integer
a similar data type to numeric, but containing only whole numbers
- Length
The number of observations/cases in a variable, including missing values, or the number of variables in a data set. For vectors, it is the number of its elements (including NAs). For lists or data frames, it is the number of its components.
- Levels
The values that a categorical variable can have. Actually stored as a part of the factor itself in what appears to be a very short character variable (even when the values themselves are numbers).
- List
A set of objects of any class. Can contain vectors, data frames, matrices and even other lists.
- Matrix
A data set that must contain only one type of variable, e.g. all numeric or character. More formally, a two-dimensional array; that is, a vector with a dim attribute of length 2. Information, or data elements, stored in a rectangular format with rows and columns.
- NA
the R placeholder for missing values, often translated as “not available.”
- NaN
A missing value. Stands for Not a Number. Something that is undefined mathematically such as zero divided by zero.
- NULL
An object you can use to drop variables or values. E.g.
mydata$x <- NULL
drops the variablex
from the data setmydata
. Assigning it to an object deletes it.- Numeric
A variable that contains only numbers. This can be double and integer.
- Object
information stored in R; data analysis and data management are then performed on these stored objects. Includes data frames, vectors, factors, matrices, arrays, lists and functions.
- Operators
An operator is a symbol that tells the compiler to perform specific mathematical, logical, or other manipulations. R language is rich in built-in operators and provides following types of operators: Arithmetic Operators, Relational Operators, Logical Operators, Assignment Operators, Miscellaneous Operators.
- Package
a collection of functions and datasets for use in R that usually has a specific purpose, such as conducting partial correlation anaylyses (ppcor package)
- Precedence of operations
the order in which mathematical operations should be performed when solving an equation: parentheses, exponents, multiplication, division, addition, and subtraction (PEMDAS)
- Recycling rules
If one tries to add two structures with a different number of elements, then the shortest is recycled to length of longest. That is, if for instance you add
c(1, 2, 3)
to a six-element vector then you will really addc(1, 2, 3, 1, 2, 3)
. If the length of the longer vector is not a multiple of the shorter one, a warning is given.- RMarkdown file
RMarkdown provides an authoring framework for data science. You can use a single R Markdown file to both 1) save and execute code; 2) generate high quality reports that can be shared with an audience.
- sav
the file extension for a data file saved in a format for the Statistical Package for Social Sciences (SPSS) statistical software
- Script file
a text file in R similar to something written in the Notepad text editor on a Windows computer or the TextEdit text editor on a Mac computer; it is saved with a
.R
file extension- Vector
Vectors are one-dimensional and homogenous data structures. It can exist on its own in memory or it can be part of a data frame. More formally, a set of values that have the same base type. A vector can be a vector of characters, logical, integers or double.
- Working directory
R uses a working directory, where R will look, by default, for files you ask it to load. It also where, by default, any files you write to disk will go.
- Workspace
A temporary work area in which all R computation happens. Data that exists there will vanish if not saved to your hard drive before quitting R. More formally, the area of your computer’s main memory where R does all its work. Data must be loaded into it from files, and packages must be loaded into it from the library, before you can use either.
B.3 Terms in Statistics and R
Terms in Statistics | Terms in R |
---|---|
|
|
|
|
|
|
|
|
|
|