[Qt-interest] QUuid with QUuid::Time

Scott Aron Bloom Scott.Bloom at onshorecs.com
Wed Jun 16 04:38:12 CEST 2010


Does that mean you are storing every GUID used in your system in your client side app?

It would seem to me that is is going to be grossly expensive for large databases...

At quick glance, I don't think its not that efficient either.. you are doing 32 (1 per char) log2( 16 ) (16 is the number of valid chars ) so on average 32 * 4 comparisons or 128 comparisons per lookup.

My analysis may be wrong, please correct me if it is.. I wont take it personally at all :)  

Why not just treat the GUID like a 128 bit (32 nibbles = 128 bits) integer, you can hit 2^128 guids in your table before you hit that many comparisons on average....

That's why GUIDs make great index keys in a sql server when the server supports them as a native integer based type (SqlServer, MySql etc).. Clearly if its storing them as a string your SOL.

My suggestion, and what I have done in the past....

When creating a new item, your insert sproc to do so, has the id created as  an output parameter. In the sqlsever world, your insert sproc calls a "generate id" sproc, which takes in the tablename, it then does a loop creating a new guid until that guid is not in the table.

If you are not concerned about the possible duplication, you simply return the guid without a loop....

Scott





-----Original Message-----
From: Jason H [mailto:scorp1us at yahoo.com] 
Sent: Tuesday, June 15, 2010 7:08 PM
To: Scott Aron Bloom; qt-interest at trolltech.com
Subject: Re: [Qt-interest] QUuid with QUuid::Time

You bring up a number of options that I've explored. And really the easiest way is to sequentially number everything in the domain, usually using a record ID or sequence from a relational database that will be ACID compliant.

The other one is to guess and check. There is a data structure that can do this efficiently. It is a 36-node tree (A-Z, 0-9) that will index by character :
tree['A']['B']['C']... 
until the last character of the GUID is there, and either the entry is NULL or not. If it is not null, it is used, if it is null, it is unused, so set the pointer to not null. Its like a super- efficient way to store a dictionary, provided there is significant overlap. Else just use a sparse index on a database column.
Then roll that up into a REST service.

The last and easiest is to ignore it and handle the dupes.





----- Original Message ----
From: Scott Aron Bloom <Scott.Bloom at onshorecs.com>
To: qt-interest at trolltech.com
Sent: Tue, June 15, 2010 8:17:20 PM
Subject: Re: [Qt-interest] QUuid with QUuid::Time

Time to failure is a statistical probability that the chances over a given range of time, you will have a failure..

Given a failure rate of 1 failure in 10 days.. Yes absolutely, you are correct that for any given day, you have a 1 in 10 chance of failure...

However, if you ask, given a 10 day span, what are the chances that it will fail at least once time, the answer is pretty damn close to 100%.

For my numbers, where the failure is 1 in 30 million failures, and you are doing 10k attempts per day, you have a 1 in 3000 chance per day of failure.. Given 3000 days, the chance that it fails over a 3000 day period is virtually 100%....  I was not implying that on day 3000 its hit, and I apologize if I gave that impression...

Note, your method of namespace + increasing integer, does not scale.  The integer creation becomes a blocking situation.  If 2 systems are running, and at any time they need to create a new record, you are safe not a big deal.. However, if you have 10,000 transactions all trying to create a new identifier, the server will have to block 9,999 transactions at any time to create that unique id... Very expensive.. Using a pseudo random number UUID, where you can do pure number lookups like you can on a UUID, and the insertions don't cause a lock for unique ids has major major advantages.

If I was designing a sysmte, were the 1 in 30M failure was not an acceptable failure rate, I would probably regionalize the servers and have a nightly merge of the records, which would handle the issue of duplication... And having my search systems be able to search the multiple servers + the centralized one...

Scott


-----Original Message-----
From: qt-interest-bounces at trolltech.com [mailto:qt-interest-bounces at trolltech.com] On Behalf Of Jeffrey Brendecke
Sent: Tuesday, June 15, 2010 4:30 PM
To: qt-interest at trolltech.com
Subject: Re: [Qt-interest] QUuid with QUuid::Time


> If you are only doing 10k transactions/day, and the transaction id safety
> is 1 in 30 million (not 1 in 100 as you imply), its gonna take 3000 days
> for your duplicate to hit...
This is not applicable. The given probability does not assure you of anything: 
it is just a likelihood estimate based on certain assumptions that may or may 
not be adhered to. Take a look at a text on statistics and normal 
distributions to know what I mean. The problem could hit today, tomorrow, in 
3000 days, maybe never. You don't know, and if the assumptions relating to 
the probability estimates are not adhered to anyhow, the problem may occur 
more often than you think.

The solution is simple: namespace + increasing integer.

Each identifier could consist of a namespace for the location where the 
integer is generated along with an incrementally increasing integer, e.g., 
generated from a sequence in a Postgresql database sequence. Then, one only 
has to manage a few namespaces for each of the ID-generating locations. Then 
no one will collide with anyone, even if the dual-key IDs are thrown 
together:

asia0
asia1
asia2
europe0
asia3
africa0
europe1
asia4
africa1
europe2

and so on ...

Jeffrey Brendecke

Managing Director
icanetix Software Systems and Consulting GmbH
Untere Hagenstrasse 26
91217 Hersbruck
Germany

Commercial Registry B 25317 Nuremberg, Germany
VAT-ID: DE250213502

Tel: +49  9151 / 90 87 57-6
Fax: +49  9151 / 90 87 57-9
www.icanetix.com


--------------------
Date: Wednesday 16 June 2010 01:08
From: "Scott Aron Bloom" <Scott.Bloom at onshorecs.com>
To: "Qt Interest" <qt-interest at trolltech.com>
Cc: 
Subject: Re: [Qt-interest] QUuid with QUuid::Time
--------------------

> -----Original Message-----
> From: qt-interest-bounces at trolltech.com
> [mailto:qt-interest-bounces at trolltech.com] On Behalf Of Constantin Makshin
> Sent: Tuesday, June 15, 2010 8:49 AM
> To: Qt Interest
> Subject: Re: [Qt-interest] QUuid with QUuid::Time
>
> On Tuesday 15 June 2010 18:58:04 Oliver.Knoll at comit.ch wrote:
> > Constantin Makshin wrote on Monday, June 14, 2010 9:50 PM:
> > > IMHO, even 7 duplicates per 15 million values (~0.00005% chance of
> > > collision) is [more than] acceptable result for most, if not all,
> > > applications. :)
> >
> > But can we agree upon the following:
> >
> > Given propability p1 of a "UUID collision", as suggested to be
> > "infinitely small" by descriptions such as:
> >
> >   "For example, consider the observable universe, which contains about
> > 5×10^22 stars; every star could then have 6.8×10^15 universally unique
> > GUIDs."
> >
> > and given probability p2 = ~0.00005% of a real world UUID generator
> > implementation, that there are, well, "light years" (as to stay in the
> > figurative speech of the
> > http://en.wikipedia.org/wiki/Globally_Unique_Identifier article) between
> > p1 and p2?
>
> Of course, 0.00005% is much more than theoretical 1/2^128 (~3e-37%), but
> still [generally] acceptable.
>
> > > I agree with one of the posters from that discussion:
> > > "chances are so low that you really should stress about something
> > > else- [...]"
> >
> > I also generally agree on that :) It is always a question of "how bad"
> > the impact of a duplicate really is and whether it is worth bothering.
> > But for example if - and I stress the IF here - you'd rely on a UUID for
> > identifying finanical transactions it could turn out disastrous... Or
> > would you trust a bank which would officially state that "our online
> > banking/transactions are (technically) 0.99995% safe"? ;)
>
> I'd *never* trust a bank with 0.99995% "transaction safety". ~1 successful
> transaction per 100 attempts would quickly turn anyone into a bankrupt. :)
>
> > Cheers, Oliver
>
> _______________________________________________
> Qt-interest mailing list
> Qt-interest at trolltech.com
> http://lists.trolltech.com/mailman/listinfo/qt-interest
>
> -------------------
>
> Of course you need to take into account the likelihood of collision WRT any
> data safety...
>
> If you are only doing 10k transactions/day, and the transaction id safety
> is 1 in 30 million (not 1 in 100 as you imply), its gonna take 3000 days
> for your duplicate to hit...
>
> I would worry at that scale.. Since if Im working on server side large app,
> hopefully it will be in process for 8 years or so...
>
> But large scale or critical applications always have other issues that come
> up...
>
> However, for a smaller client side app, where you are only creating 100-500
> unique transactions a day, the 30,000->60,000 days its gonna take to hit
> the bug I can live with, and focus on other more likely scenarios of
> failures.
>
> Scott
>
> _______________________________________________
> Qt-interest mailing list
> Qt-interest at trolltech.com
> http://lists.trolltech.com/mailman/listinfo/qt-interest

_______________________________________________
Qt-interest mailing list
Qt-interest at trolltech.com
http://lists.trolltech.com/mailman/listinfo/qt-interest

_______________________________________________
Qt-interest mailing list
Qt-interest at trolltech.com
http://lists.trolltech.com/mailman/listinfo/qt-interest



      





More information about the Qt-interest-old mailing list