PostgreSQL group by bug on Unicode strings? -

i have weird thing happening, noticed group (word) wasn't grouping word if word utf-8 string. in same query, cases it's been grouped correctly, , cases hasn't. wonder if knows what's that?

select *,count(*) on (partition md5(word)) k (   select word,count(*) n   :tmpwl   group 1 ) order 1,2 limit 12; /* gives:  word | n | k  ------+---+---  いい | 1 | 1  くず | 1 | 1  ごみ | 1 | 1  さま | 1 | 1  さん | 1 | 1  へま | 1 | 1  まめ | 1 | 1  よく | 1 | 1  ろく | 1 | 1  ネガ | 1 | 2   -- heck?  ネガ | 1 | 2  パス | 1 | 1 */

note following workaround works fine:

select word,n,count(*) on (partition md5(word)) k (   select md5(word),max(word) word,count(*) n   :tmpwl   group 1 ) order 1,2 limit 12; /* gives:  word | n | k  ------+---+---  いい | 1 | 1  くず | 1 | 1  ごみ | 1 | 1  さま | 1 | 1  さん | 1 | 1  へま | 1 | 1  まめ | 1 | 1  よく | 1 | 1  ろく | 1 | 1  ネガ | 2 | 1  パス | 1 | 1  プア | 1 | 1 */

the version postgresql 8.2.14 (greenplum database 4.0.4.0 build 3 single-node edition) on x86_64-unknown-linux-gnu, compiled gcc gcc.exe (gcc) 4.1.1 compiled on nov 30 2010 17:20:26.

the source table :tmpwl:

\d :tmpwl table "pg_temp_25149.pdtmp_foo706453357357532"   column  |  type   | modifiers  ----------+---------+-----------  baseword | text    |   word     | text    |   value    | integer |   lexicon  | text    |   nalts    | bigint  |  distributed by: (word)

Search This Blog

Bready

PostgreSQL group by bug on Unicode strings? -

Comments

Post a Comment

Popular posts from this blog

monitor web browser programmatically in Android? -

Shrink a YouTube video to responsive width -

c# - Using multiple datasets in RDLC -