Enhance Your Learning with Regression and Correlation Flash Cards for quick understanding
A statistical method used to model the relationship between a dependent variable and one or more independent variables.
A statistical measure that describes the strength and direction of a relationship between two variables.
A regression model that assumes a linear relationship between the dependent variable and a single independent variable.
A regression model that assumes a linear relationship between the dependent variable and multiple independent variables.
A numerical measure that quantifies the strength and direction of the linear relationship between two variables.
A graphical representation of the relationship between two variables, where each data point is plotted on a Cartesian plane.
The examination of the differences between observed and predicted values in a regression model to assess the model's fit.
The process of analyzing the coefficients, p-values, and other statistics in a regression model to draw conclusions about the relationship between variables.
The conditions that must be met for regression analysis to produce valid and reliable results, including linearity, independence, and homoscedasticity.
The process of evaluating the assumptions and checking for potential issues in a regression model, such as multicollinearity and influential observations.
The distinction between a correlation, which indicates a relationship between variables, and causation, which implies that one variable directly affects the other.
A measure that represents the proportion of the variance in the dependent variable that can be explained by the independent variables in a regression model.
Extreme values that deviate significantly from the other data points, potentially influencing the results of a regression analysis.
A violation of the assumption of homoscedasticity, where the variability of the residuals differs across the range of the independent variable.
A situation where two or more independent variables in a regression model are highly correlated, leading to issues with interpretation and estimation of coefficients.
The combined effect of two or more independent variables on the dependent variable, which is not simply the sum of their individual effects.
A measure of the variability of the estimated regression coefficients, indicating the precision of the estimates.
The hypothesis that there is no relationship between the independent and dependent variables in a regression model.
The hypothesis that there is a relationship between the independent and dependent variables in a regression model.
A range of values within which the true population parameter is likely to fall with a certain level of confidence.
A measure of the strength of evidence against the null hypothesis, indicating the probability of obtaining the observed results by chance alone.
The error of rejecting the null hypothesis when it is actually true, indicating a false positive result.
The error of failing to reject the null hypothesis when it is actually false, indicating a false negative result.
An extraneous variable that is related to both the independent and dependent variables, leading to a spurious relationship.
A variable that is included in a regression model to control for its potential influence on the relationship between the independent and dependent variables.
The assumption that the variability of the residuals is constant across the range of the independent variable.
The assumption that the observations in a regression model are independent of each other, with no systematic relationship or influence.
The assumption that the residuals in a regression model are normally distributed, allowing for valid statistical inference.
The assumption that the relationship between the independent and dependent variables in a regression model can be adequately represented by a straight line.
A measure of the proportion of the total variation in the dependent variable that is explained by the independent variables in a regression model.
A modified version of R-squared that adjusts for the number of independent variables in a regression model, providing a more accurate measure of model fit.
A statistical test that compares the overall fit of a regression model to the null hypothesis of no relationship between the independent and dependent variables.
A test for the presence of autocorrelation in the residuals of a regression model, indicating whether there is a systematic relationship between the residuals.
A measure of multicollinearity that quantifies how much the variance of the estimated regression coefficients is inflated due to high correlation between independent variables.
The residuals of a regression model that have been transformed to have a mean of zero and a standard deviation of one, allowing for easier interpretation and comparison.
A measure of the influence of each observation on the regression coefficients, indicating how much the coefficients would change if the observation were removed.
The process of using sample data to make inferences about the population parameters, such as testing the significance of regression coefficients.
A bias that occurs when the relationship between the independent and dependent variables is distorted by the presence of a confounding variable.
A bias that occurs when the sample used in a study is not representative of the population, leading to inaccurate or misleading results.
A technique used to assess the performance of a regression model by splitting the data into training and testing sets, allowing for evaluation of the model's predictive ability.
A situation where a regression model is too complex and captures noise or random fluctuations in the data, resulting in poor generalization to new data.
A situation where a regression model is too simple and fails to capture the underlying patterns or relationships in the data, resulting in poor predictive performance.
A situation where two or more independent variables in a regression model are highly correlated, making it difficult to distinguish their individual effects on the dependent variable.
The probability that a confidence interval will contain the true population parameter, often expressed as a percentage.
A statement or assumption about the relationship between variables, which can be tested using statistical methods.
The threshold used to determine whether a result is statistically significant, typically set at 0.05 or 0.01.
The probability of correctly rejecting the null hypothesis when it is false, indicating the ability of a statistical test to detect a true relationship.
The distribution of a statistic, such as the mean or regression coefficient, calculated from multiple samples drawn from the same population.
A fundamental concept in statistics that states that the sampling distribution of the mean approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution.