hadoop - SQL/HQL Count Total Not Working on Join -
i'm creating recommender using simple probability formulas. i'm working million song dataset large dataset (189m rows , 3 columns) hadoop cluster, using hive. here sample of initial dataset:
million_song - initial dataset user song play_count c3fb2 soxoq 18 c3fb2 sozvc 1 6041e sobhn 3 6041e sobxt 5
but first created table initial dataset, sample show here:
million_both - adds occurrence of 2 songs listened same user driver_song_id also_song_id play_count soxoq sobxt 1642 sobhn sobxt 2168 sobxt sozvc 1742
distinct users (i want value displayed in rows in fifth column of final output) select count(distinct users) million_song; # 139,738,054
i'm trying add these 2 tables above together, , issue can't correct count_n_users, should same # each row in table. here code: select a.driver_song_id, a.also_song_id, a.play_count, count(distinct b.user), sum(distinct b.user) count_n_users (select driver_song_id, also_song_id, play_count million_both) join (select user, song, play_count expp_team.million_song) b on a.driver_song_id = b.song group a.driver_song_id, a.also_song_id, a.play_count;
the issue cannot correct count_n_users, of right it's overstating value.
this i'd output like:
driver_song_id also_song_id play_count c(b.user) count_n_users soxoq sobxt 1643 463 139,738,054 sobhn sobxt 2168 483 139,738,054 sobxt sozvc 1742 725 139,738,054
if query returning of correct output except column , column should have same value on rows of result set, , can returned running query counts distinct users million_song
table can add cross join
query grabs total count. avoids issue group by
you're experiencing.
select a.driver_song_id, a.also_song_id, a.play_count, count(distinct b.user), c.count_n_users (select driver_song_id, also_song_id, play_count million_both) join (select user, song, play_count expp_team.million_song) b on a.driver_song_id = b.song cross join (select count(distinct users) count_n_users million_song) c group a.driver_song_id, a.also_song_id, a.play_count;
Comments
Post a Comment