java.util.UUID.compareTo() considered harmful

Ed Anuff —  April 2, 2011 — Leave a comment

Java’s UUID class compares UUID’s using signed comparisons, in a way that will provide opposite results than you might expect and incompatible with other languages.  If you’re writing an application that compares and sorts UUIDs, you should use an alternate UUID library for the comparison function or roll your own.

This came up in looking at how UUIDs are compared by Cassandra’s LexicalUUIDType versus it’s TimeUUIDType to see if they could be combined.  This led to an examination of RFC 4122, the IETF’s standard for UUIDs, which formalized conventions and specifications from previous standards.  According to RFC 4122, the component values of a UUID are compared as unsigned hex values.  During testing, it became clear that Java’s UUID class seemed to differ in it’s comparison results for certain randomly generated UUIDs.  Looking at the source, it was immediately clear that the UUID class was performing a signed comparison of the UUID as two long values.  Does this really matter?  For many applications, it probably doesn’t, since most people are not relying on the sorting of UUIDs, and the sorting operation is usually solely performed inside a database, for example, where as long as the sort order in that database is consistent, it doesn’t matter if it differs from how a client accessing that database might sort the UUID values in whatever language it’s written in.

Here are some simple examples that will demonstrate how Python and Perl will compare UUID’s versus how Java does it:

Python:

>>> from uuid import UUID
>>> uuid1 = UUID(‘20000000-0000-4000-8000-000000000000’)
>>> uuid1
UUID(‘20000000-0000-4000-8000-000000000000’)
>>> uuid2 = UUID(‘E0000000-0000-4000-8000-000000000000’)
>>> uuid2
UUID(‘e0000000-0000-4000-8000-000000000000’)
>>> uuids = [uuid2,uuid1]
>>> uuids
[UUID(‘e0000000-0000-4000-8000-000000000000’), UUID(‘20000000-0000-4000-8000-000000000000’)]
>>> uuids.sort()
>>> uuids
[UUID(‘20000000-0000-4000-8000-000000000000’), UUID(‘e0000000-0000-4000-8000-000000000000’)]
>>>

Perl:

#!perl
use Data::UUID;

$ug    = new Data::UUID;
$uuid1 = $ug->from_string(“20000000-0000-4000-8000-000000000000”);
$uuid2 = $ug->from_string(“E0000000-0000-4000-8000-000000000000”);

print $ug->to_string( $uuid1 ) , “\n”;
print $ug->to_string( $uuid2 ) , “\n”;

$res   = $ug->compare($uuid1, $uuid2);
print “$res\n”;

This example outputs:

$ perl ./uuidtest.pl
20000000-0000-4000-8000-000000000000
E0000000-0000-4000-8000-000000000000
-1

Java:

package test;

import java.util.UUID;

public class UUIDTest {

    public static void main(String[] args) {
        UUID uuid1 = UUID.fromString(“20000000-0000-4000-8000-000000000000”);
        UUID uuid2 = UUID.fromString(“E0000000-0000-4000-8000-000000000000”);
        System.out.println(uuid1);
        System.out.println(uuid2);
        System.out.println(uuid1.compareTo(uuid2));
    }

}

This example outputs:

$ java test.UUIDTest
20000000-0000-4000-8000-000000000000
e0000000-0000-4000-8000-000000000000
1

In the Perl and Java examples, a comparison value of ‘1’ means uuid1 is greater than uuid2, and ‘-1’ means uuid1 is less than uuid2.

These examples use simple version 4 (random) UUIDs where the most significant byte values are chosen to be values that will compare differently signed and unsigned.

This is now Bug ID 7025832 in the Java Bug Database, but is marked Will Not Fix, because this behavior has been present for over a decade and it would break countless applications to change.  

No Comments

Be the first to start the conversation.

Leave a Reply

Text formatting is available via select HTML. <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*