Compare Two Partitions Of Table In Hive
I need to compare data changes in two partitions of table in Hive. Specifically, I have two partitions (ptn_dt='01-31-2019' and ptn_dt='02-28-2019'). Each partition contains the fo
Solution 1:
You can use conditional aggregation. This puts the comparisons in the same row:
SELECT active_indicator,
SUM(CASEWHEN ptn_dt ='2019-01-31'THEN1ELSE0END),
SUM(CASEWHEN ptn_dt ='2019-02-28'THEN1ELSE0END)
FROM table_name
WHERE ptn_dt IN ('2019-01-31', '2019-02-28')
GROUPBY active_indicator;
Or, in different rows, you could use:
SELECT active_indicator, ptn_dt, COUNT(*)
FROM table_name
WHERE ptn_dt IN ('2019-01-31', '2019-02-28')
GROUPBY active_indicator, ptn_dt;
EDIT:
Based on your comment, use lag()
. For all combinations:
select prev_active_indicator, active_indicator, count(*)
from (select t.*,
lag(active_indicator) over (partition by num_key order by ptn_dt) as prev_active_indicator
from table_name t
where ptn_dt IN ('2019-01-31', '2019-02-28')
) t
where ptn_dt = '2019-02-28'groupby prev_active_indicator, active_indicator;
Post a Comment for "Compare Two Partitions Of Table In Hive"