Oracle Coefficient of Correlation - CORR
The query table [1] is setup to show the analysis of CORR function with all data
(no null values in any row) and analysis is shown in section [1.1].  The query 
table [2] is setup with null value and analysis is shown in section [2.1], which 
explains the way Oracle handles null value in evaluating the CORR function.
Oracle Coefficient of Correlation - CORR
-- [1] Query Table with no null
-- value

WITH q_stat AS 
(SELECT 1 x, 2 y  FROM DUAL UNION
 SELECT 2 x, 6 y  FROM DUAL UNION
 SELECT 9 x, 3 y  FROM DUAL UNION
 SELECT 4 x, 7 y  FROM DUAL)
SELECT
 STDDEV_POP(x) sd_x, 
 STDDEV_POP(y) sd_y,
 COVAR_POP(x ,y) covar_pop,
 CORR(x, y) corr_xy
FROM q_stat;
Oracle Coefficient of Correlation - CORR
[1.1] Oracle Coefficient of Correlation - CORR Analysis
CORR(x, y) = COVAR_POP(x, y)/(STDDEV_POP(x) * STDDEV_POP(y))

COVAR_POP(x, y) = (SUM(x * y) - SUM(y) * SUM(x)/n)/n

STDDEV_POP(x) = √(SUM((Xi - XAvg)2)/n)

STDDEV_POP(y) = √(SUM((yi - yAvg)2)/n)

COVAR_POP(x, y) = (SUM(1x2+2x6+9x3+4x7) - (2+6+3+7) x (1+2+9+4)/4)/4
                = (69 - 288/4)/4 = -0.75

STDDEV_POP(x) = √(((1-4)2+(2-4)2+(9-4)2+(4-4)2)/4) = √((9+4+25+0)/4) 
              = √(38/4) = 3.0822

STDDEV_POP(y) = √(((2-4.5)2+(6-4.5)2+(3-4.5)2+(7-4.5)2)/4) 
              = √((6.25+2.25+2.25+6.25)/4) = √(17/4) = 2.0615

CORR(x, y) = -0.75/(3.0822 x 2.0615) = -0.1180
 
Oracle - CORR with Null Value Data
-- [2] Query Table with null 
-- value

WITH q_stat AS 
(SELECT 1 x,    2 y  FROM DUAL
 UNION
 SELECT 2 x,    6 y  FROM DUAL
 UNION
 SELECT 9 x, NULL y  FROM DUAL
 UNION
 SELECT 4 x,    7 y  FROM DUAL)
SELECT
 STDDEV_POP(x) sd_x, 
 STDDEV_POP(y) sd_y,
 COVAR_POP(x ,y) covar_pop,
 CORR(x, y) corr_xy
FROM q_stat;
Oracle Coefficient of Correlation - CORR With Null Value Data
[2.1] Oracle Coefficient of Correlation - CORR Analysis With Null Data
-- In the computation of CORR function the row with null value is ignored
-- in the computation as shown below and the count (row count) is three. 
-- The average value of x = (1+2+4)/3 = 2.3333 and for 
-- y = (6+2+7)/3 = 5

-- The value shown in SD_X of 3.0822 is based on all four values of x
-- is just for - "Reference Only".

-- The CORR function evaluates Standard Deviation SD(x) = 1.2472 as 
-- shown below ignoring the x value in the null value row
-- The third row is also ignored in SD(y) computation as shown below.

COVAR_POP(x,y) = (SUM(x * y) - SUM(y) * SUM(x)/n)/n
               = (SUM(1x2+2x6+null+4x7) - (2+6+null+7) x (1+2+null+4)/3)/3
               = (42 - 15 x (7/3))/3 = (42 - 35)/3 = 2.3333

STDDEV_POP(x) = √(SUM((Xi - XAvg)2)/n)
              = √(((1-2.333)2+(2-2.333)2+ Null +(4-2.333)2)/3)
              = 1.2472

STDDEV_POP(y) = √(SUM((yi - yAvg)2)/n)
              = √(((2-5)2+(6-5)2+ Null +(7-5)2)/3) 
              = √((9+1+4)/3) = √(14/3) = 2.1602

CORR(x, y) = COVAR_POP(x, y)/(STDDEV_POP(x) * STDDEV_POP(y))
           = 2.333/(1.2472 x 2.1602)
           = 0.8660


Oracle Covariance
Oracle Standard Deviation

Oracle registered trademark of Oracle Corporation.

Last Revised On: September 20, 2014

  1275