Data Science Products and Services Website

Tableau Visualizations

Visualize Your Data | Create Dashboards | Visit Store Today | Contact Us

Blog Subscription

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Distribution Statistics Terms

Mean, Variance, and Deviation

There are several statistics measures that help to determine the distribution of data variables in a data set:

The Online Statistics Education (2019) states the following about the Mean, Variance, and Deviation statistics measures: The Mean measure is the Average of the sample population (Sum / Count) and is the Middle of the distribution. The Variance measure determines the distribution spread or dispersion of a sample population from the larger population or “the average squared difference of the scores from the mean” (para. 7-8). The Standard Deviation measure is the square root of the variance. The Sample Population minus the Larger population’s Mean equals the Deviation from the Mean. The Variance equals the Deviation from the Mean times the Deviation from the Mean or equals the Squared Deviation.  The Range measure of variability is derived by subtracting the smallest value (Minimum) from the largest value (Maximum) in the sample population.

Skew and Kurtosis

A variable’s distribution can also be determined with statistic measures such as Skew and Kurtosis. According to Lane (2019), the distribution is skewed if one tail extends longer than the other tail.  The distribution is highly skewed if the Mean is more than two times higher than the Median.  Positive Skew is Right Tail distribution that extends to the right.  Negative Skew is Left Tail distribution extends to the left.  The distribution is not normal if it includes Kurtosis or has a fat or thin tail which a long tail is called leptokurtic and a short tail is called platykurtic.  A normal distribution has 0 Kurtosis.

The featured image clearly shows that the Economic State Target variable of the Social Security and Medicare Trust Funds Deficit Concern Project (See Menu Portfolio Link) does not have a normal distribution because there is Kurtosis. The target variable’s distribution also has left tail Skewness.

Outliers

The outliers of a data set are values that are farthest away from other data values. According to Kall (2009):  The statistical outliers (i.e., long tailed distributions) can’t be handled by certain statistical estimators.  The Mean measure can’t handle outliers because the position of extreme high or low values affects the results.  However, the Median measure can handle extreme values as-long-as the central values are not changed.  The outliers cause and effect analysis might indicate that the outliers belong to a different population.  “For example, if out of 1000 data points, 5 points are at a distance of four times the standard deviation or more, then these outliers need to be examined.” (para. 9).

Collinearity

Collinearity exists in a data set if a line goes through various points using a linear equation, which means that the data points are collinear according to Statistics How To (2019). The below target variable chart image clearly shows that the Economic State Target variable has collinear data points indicated by the blue regression line.

Also, the above target variable chart image shows that the deficit data values were much lower between 2001 and 2007 and started to rise in 2008. The extreme values or outliers exists beyond the standard deviation (2001 and 2009 – 2012). According to American Society for Quality (2019):  The Law of Variation “is defined as the difference between an ideal and an actual situation” (para. 1) which stakeholder often feel variations from their original perfect situation (e.g., expected outcomes and production quality).

However, the abnormal data distribution is “negotiable” for the historic economics quantitative continuous statistical values based on the data gathered from the time range between 2001 and 2016, which various national economic cause and effects events have been declared by the U.S. Federal Government (i.e., National Disasters, Recessions, and Unemployment).

The Social Security and Medicare Trust Funds Deficit Concern Project (See Menu Portfolio Link) includes information, visualizations, and interpretations that provides more clarity about the effects that the various economic component variables impose on the U.S. Federal Government’s Deficit or the Economic State Target variable.

References

American Society for Quality.  (2019).  What is the Law of Variation?  Retrieved from, https://asq.org/quality-resources/variation

Kalla, S. (2008-2019). Statistical outliers. Retrieved from, https://explorable.com/statistical-outliers

Lane, D.M. (n.d.). Online Statistics Education: An interactive multimedia course of study. Retrieved from, http://onlinestatbook.com/2/index.html

Lane, D.M. (n.d.). Shapes of distributions. Retrieved from, http://onlinestatbook.com/2/summarizing_distributions/shapes.html

Statistics How To. (2019). Collinear definition: What is collinearity? Retrieved from, https://www.statisticshowto.datasciencecentral.com/collinear/

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.