Skip to content Skip to sidebar Skip to footer

Compare Two Partitions Of Table In Hive

I need to compare data changes in two partitions of table in Hive. Specifically, I have two partitions (ptn_dt='01-31-2019' and ptn_dt='02-28-2019'). Each partition contains the fo

Solution 1:

You can use conditional aggregation. This puts the comparisons in the same row:

SELECT active_indicator,
       SUM(CASEWHEN ptn_dt ='2019-01-31'THEN1ELSE0END),
       SUM(CASEWHEN ptn_dt ='2019-02-28'THEN1ELSE0END)
FROM table_name
WHERE ptn_dt IN ('2019-01-31', '2019-02-28')
GROUPBY active_indicator;

Or, in different rows, you could use:

SELECT active_indicator, ptn_dt, COUNT(*)
FROM table_name
WHERE ptn_dt IN ('2019-01-31', '2019-02-28')
GROUPBY active_indicator, ptn_dt;

EDIT:

Based on your comment, use lag(). For all combinations:

select prev_active_indicator, active_indicator, count(*)
from (select t.*,
             lag(active_indicator) over (partition by num_key order by ptn_dt) as prev_active_indicator
      from table_name t
      where ptn_dt IN ('2019-01-31', '2019-02-28')
     ) t
where ptn_dt = '2019-02-28'groupby prev_active_indicator, active_indicator;

Post a Comment for "Compare Two Partitions Of Table In Hive"