Oracle Coefficient of Correlation - CORR The query table [1] is setup to show the analysis of CORR function with all data (no null values in any row) and analysis is shown in section [1.1]. The query table [2] is setup with null value and analysis is shown in section [2.1], which explains the way Oracle handles null value in evaluating the CORR function.
-- [1] Query Table with no null -- value WITH q_stat AS (SELECT 1 x, 2 y FROM DUAL UNION SELECT 2 x, 6 y FROM DUAL UNION SELECT 9 x, 3 y FROM DUAL UNION SELECT 4 x, 7 y FROM DUAL) SELECT STDDEV_POP(x) sd_x, STDDEV_POP(y) sd_y, COVAR_POP(x ,y) covar_pop, CORR(x, y) corr_xy FROM q_stat;
CORR(x, y) = COVAR_POP(x, y)/(STDDEV_POP(x) * STDDEV_POP(y)) COVAR_POP(x, y) = (SUM(x * y) - SUM(y) * SUM(x)/n)/n STDDEV_POP(x) = √(SUM((Xi - XAvg)2)/n) STDDEV_POP(y) = √(SUM((yi - yAvg)2)/n) COVAR_POP(x, y) = (SUM(1x2+2x6+9x3+4x7) - (2+6+3+7) x (1+2+9+4)/4)/4 = (69 - 288/4)/4 = -0.75 STDDEV_POP(x) = √(((1-4)2+(2-4)2+(9-4)2+(4-4)2)/4) = √((9+4+25+0)/4) = √(38/4) = 3.0822 STDDEV_POP(y) = √(((2-4.5)2+(6-4.5)2+(3-4.5)2+(7-4.5)2)/4) = √((6.25+2.25+2.25+6.25)/4) = √(17/4) = 2.0615 CORR(x, y) = -0.75/(3.0822 x 2.0615) = -0.1180
-- [2] Query Table with null -- value WITH q_stat AS (SELECT 1 x, 2 y FROM DUAL UNION SELECT 2 x, 6 y FROM DUAL UNION SELECT 9 x, NULL y FROM DUAL UNION SELECT 4 x, 7 y FROM DUAL) SELECT STDDEV_POP(x) sd_x, STDDEV_POP(y) sd_y, COVAR_POP(x ,y) covar_pop, CORR(x, y) corr_xy FROM q_stat;
-- In the computation of CORR function the row with null value is ignored -- in the computation as shown below and the count (row count) is three. -- The average value of x = (1+2+4)/3 = 2.3333 and for -- y = (6+2+7)/3 = 5 -- The value shown in SD_X of 3.0822 is based on all four values of x -- is just for - "Reference Only". -- The CORR function evaluates Standard Deviation SD(x) = 1.2472 as -- shown below ignoring the x value in the null value row -- The third row is also ignored in SD(y) computation as shown below. COVAR_POP(x,y) = (SUM(x * y) - SUM(y) * SUM(x)/n)/n = (SUM(1x2+2x6+null+4x7) - (2+6+null+7) x (1+2+null+4)/3)/3 = (42 - 15 x (7/3))/3 = (42 - 35)/3 = 2.3333 STDDEV_POP(x) = √(SUM((Xi - XAvg)2)/n) = √(((1-2.333)2+(2-2.333)2+ Null +(4-2.333)2)/3) = 1.2472 STDDEV_POP(y) = √(SUM((yi - yAvg)2)/n) = √(((2-5)2+(6-5)2+ Null +(7-5)2)/3) = √((9+1+4)/3) = √(14/3) = 2.1602 CORR(x, y) = COVAR_POP(x, y)/(STDDEV_POP(x) * STDDEV_POP(y)) = 2.333/(1.2472 x 2.1602) = 0.8660
72971