PostgreSQL group by bug on Unicode strings? -
i have weird thing happening, noticed group (word)
wasn't grouping word if word utf-8 string. in same query, cases it's been grouped correctly, , cases hasn't. wonder if knows what's that?
select *,count(*) on (partition md5(word)) k ( select word,count(*) n :tmpwl group 1 ) order 1,2 limit 12; /* gives: word | n | k ------+---+--- いい | 1 | 1 くず | 1 | 1 ごみ | 1 | 1 さま | 1 | 1 さん | 1 | 1 へま | 1 | 1 まめ | 1 | 1 よく | 1 | 1 ろく | 1 | 1 ネガ | 1 | 2 -- heck? ネガ | 1 | 2 パス | 1 | 1 */
note following workaround works fine:
select word,n,count(*) on (partition md5(word)) k ( select md5(word),max(word) word,count(*) n :tmpwl group 1 ) order 1,2 limit 12; /* gives: word | n | k ------+---+--- いい | 1 | 1 くず | 1 | 1 ごみ | 1 | 1 さま | 1 | 1 さん | 1 | 1 へま | 1 | 1 まめ | 1 | 1 よく | 1 | 1 ろく | 1 | 1 ネガ | 2 | 1 パス | 1 | 1 プア | 1 | 1 */
the version postgresql 8.2.14 (greenplum database 4.0.4.0 build 3 single-node edition) on x86_64-unknown-linux-gnu, compiled gcc gcc.exe (gcc) 4.1.1 compiled on nov 30 2010 17:20:26.
the source table :tmpwl
:
\d :tmpwl table "pg_temp_25149.pdtmp_foo706453357357532" column | type | modifiers ----------+---------+----------- baseword | text | word | text | value | integer | lexicon | text | nalts | bigint | distributed by: (word)
Comments
Post a Comment