Jump to content
thirty bees forum

Recommended Posts

Posted

Yeah you are right with the id_order. I forgot, that order volume can be derived from it. I am clearly in first group...

But cant you just use a higher auto_increment value than 1?

Posted

First of all, there are more database tables which primary key can be used to deduce interesting business information: guest, customer, connection, order, order_invoice,...

If you want to implement some primary key randomizer, you should probably do this for all of these tables.

One possible solution to tackle this is to create override for ObjectModel::add method. Something like this should do the trick:

    public function add($autoDate = true, $nullValues = false)
    {
        if (!isset($this->id) || !$this->id) {
            $clazz = get_class($this);
            if (in_array($clazz, ['Order', 'Customer', 'Guest'])) {
                $definition = static::getDefinition($clazz);

                // generate random number that can fit into signed int(11) and doesn't exists yet
                do {
                    $rnd = rand(1, 2147483647);
                    $sql = (new DbQuery())
                        ->select('1')
                        ->from($definition['table'])
                        ->where($definition['primary'] . '=' . $rnd);
                } while (Db::getInstance()->getValue($sql));

                // use random number as a new ID
                $this->id = $rnd;
                $this->force_id = true;
            }
        }
        return parent::add($autoDate, $nullValues);
    }

 

  • Like 1
Posted
10 minutes ago, wakabayashi said:

But cant you just use a higher auto_increment value than 1?

It is possible to increase the next value that will be used by auto_increment manually with an SQL statement like `ALTER TABLE ... AUTO_INCREMENT = N`. However each `id_order` will still increment by 1.

I could add some code that boosts the auto_increment value by some random number between 100 and 200 per day. It's not an ideal solution but it's better than just purely sequential increases over time.

I've been testing random id_order generation and the RAND() function of MariaDB doesn't seem very random. I'm seeing regular collisions after even 7000 or 8000 random number generations from a pool of 10^12 numbers. That shouldn't be happening.

Posted
12 minutes ago, datakick said:

One possible solution to tackle this is to create override for ObjectModel::add method. Something like this should do the trick:

That's f'in genius. That's the difference between someone like me (who knows just enough to be dangerous) and someone like you who knows how TB works.

Posted
3 hours ago, dynambee said:

I've been testing random id_order generation and the RAND() function of MariaDB doesn't seem very random. I'm seeing regular collisions after even 7000 or 8000 random number generations from a pool of 10^12 numbers. That shouldn't be happening.

I found out why these collisions were happening.

The MariaDB RAND() function returns a double with 16 digits of precision. This means that it's not really suitable for generating long integers with a single call, but it can be done with two calls. So instead of `FLOOR(RAND()*999999999999)+1` it has to be `(FLOOR(RAND()*1000000)+1) * (FLOOR(RAND()*999999)+1)`.

@datakick I don't know how good the PHP random function is for large integers. Would it be necessary to do something like:

$rnd = rand(1, ‭46341) * rand(1, ‭46340);

or does PHP do a good enough job on it's own with an 11 digit integer?

 

Posted
9 hours ago, dynambee said:

I don't know how good the PHP random function is for large integers. Would it be necessary to do something like:


$rnd = rand(1, ‭46341) * rand(1, ‭46340);

or does PHP do a good enough job on it's own with an 11 digit integer?

I believe php rand() function returns number with uniform distribution. But even if it didn't it's not really an issue. My snippet above checks if record with the same ID already exists. If so, then another random ID is generated. The worst case scenario - this adds a little bit overhead.

Posted
1 hour ago, datakick said:

I believe php rand() function returns number with uniform distribution. But even if it didn't it's not really an issue. My snippet above checks if record with the same ID already exists. If so, then another random ID is generated. The worst case scenario - this adds a little bit overhead.


I thought MariaDB's RAND() function would also provide good levels of randomness but for large integers it starts to fall apart.

MariaDB RAND() examples:

When using `(FLOOR(RAND()*46341)+1) * (FLOOR(RAND()*46340)+1)` to generate random numbers between 1 and 2,147,441,940 I see a collision rate of 0.3% after generating 500,000 random numbers. After 1 million random numbers the collision rate is about 0.6%. These sorts of collision rates are fine, and I ran the test up to about 9.8 million randoms where the collision rate got up to about 5.7%. I don't expect to have any single TB table with anywhere near that many rows.

However when using `FLOOR(RAND()*2147483647)+1` to generate random numbers between 1 and 2,147,483,647 the collision rate is already at 3.7% after generating 500,000 random numbers. After 1 million random numbers the collision rate is around 8.5%. This collision rate is incredibly high considering that less than 0.05% of the number range has been used.

In both cases the collision rates are roughly repeatable across multiple test runs.

The reason for these big differences is that pseudo-random number generators typically create random numbers based on a generated DOUBLE value with 16 decimals of precision. Bigger random numbers approach the limits of this precision and this causes a lot more collisions. Multiplying two smaller numbers together to give the same total range of numbers fixes the problem.

Why does this matter even when checking for duplicate IDs? There is a time gap between the duplicate check and when the row is added. With a very low collision rate this time gap isn't too much of a concern. However with a high collision rate it becomes more likely that a collision could occur with two rows being inserted at around the same time. It's an edge case but not so much of an edge case that it shouldn't be a concern on a busy site.

Posted

For collision to happen, there would

1) have to be 2 php processes trying to insert new entry into the same table at the same time (or in the timespan of ~10ms)

2) and both php processes would have to generate the same random number

Even the first condition is very unlikely to happen. But probability of both two condition to be met at the same time is effectively zero. You could run your store for thousands of years without this to ever happen. 

Posted

I agree, the chance of a collision is low.

That said, is there any downside to using `(FLOOR(RAND()*46341)+1) * (FLOOR(RAND()*46340)+1)` instead of `FLOOR(RAND()*2147483647)+1`? My testing with MariaDB shows only benefits. Increased randomness, far fewer collisions, and no performance penalty that I can detect.

Posted
1 hour ago, dynambee said:

I agree, the chance of a collision is low.

That said, is there any downside to using `(FLOOR(RAND()*46341)+1) * (FLOOR(RAND()*46340)+1)` instead of `FLOOR(RAND()*2147483647)+1`? My testing with MariaDB shows only benefits. Increased randomness, far fewer collisions, and no performance penalty that I can detect.

I believe that product of two random number with uniform distribution results in a number that does not have uniform distribution, which means it leads to more collisions.

In your case, mariadb rand() probably doesn't return number with uniform distribution, so product of two rand numbers might be useful. In my case, it's not (assuming php rand generates number with uniform distribution, as my quick test suggests)

Posted
36 minutes ago, datakick said:

I believe that product of two random number with uniform distribution results in a number that does not have uniform distribution, which means it leads to more collisions.

I don't think that multiplying two properly generated pseudo-randoms together will see any decrease in randomness. Mathematically I don't see a reason why it would, but perhaps I am missing something.
 

38 minutes ago, datakick said:

In my case, it's not (assuming php rand generates number with uniform distribution, as my quick test suggests)

It seems that with PHP 7.1 and newer the rand() function is fine as it is now just an alias of mt_rand(). PHP 7.1 also saw mt_rand() fixed to use the correct Mersenne Twister Algorithm. So with PHP 7.1 and newer using rand() or mt_rand() shouldn't cause problems, even if generating a lot of extremely large numbers.

For older versions of PHP I think generating substantial quantities of large pseudo-random numbers will suffer from the same sort of lack of randomness that I have seen with MariaDB.

Posted
18 minutes ago, dynambee said:

I don't think that multiplying two properly generated pseudo-randoms together will see any decrease in randomness. Mathematically I don't see a reason why it would, but perhaps I am missing something.
 

Imagine dice throwing. Every side (1..6) has the same probability -- uniform distribution. But if you throw 2 dice, then probability of multiplied number is different:

total 1 -- probability 1/36 -- there is only one combination (1 * 1) that results in 1

total 2 -- probability 2/36 -- two combinations (1 * 2) and (2 * 1) 

total 4 -- probability 3/36 --- (1 * 4), (4 * 1), (2 * 2)

total 7 -- probability 0 -- there is no combination that would result in 7 (or 11, 13, 17, 19, 23, 29, 31)

As you can see, probability distribution changed significantly

 

Posted

Haha learnt something about stats here 💪 Thats interesting 🙂

@dynambee what was actually your idea behind the multiplication of two numbers? In my naive way of thinking, I would just create a random number and check if it's already exits. If it does I redo the process until I got a new unique number. As far as I see, thats also @datakick way of doing it.

@datakick is the needed Db::getInstance relevant in this case? I always wonder, how much such a query affect the speed. Lesley once told me, that when trying to improve speed an a site, one should try to reduce DB queries where possible.

 

Posted
8 minutes ago, wakabayashi said:

@datakick is the needed Db::getInstance relevant in this case? I always wonder, how much such a query affect the speed. Lesley once told me, that when trying to improve speed an a site, one should try to reduce DB queries where possible.

Db::getInstance() calls doesn't add any measurable overhead. Once the connection is established this method returns existing instance from static pool. Of course, it's possible to extract it to variable like this to save couple of micro/nano seconds per loop

$conn = Db::getInstance();               
do {
...
} while ($conn->getValue($sql));

 

  • Thanks 1
Posted
1 hour ago, datakick said:

As you can see, probability distribution changed significantly

Yes, I see your point. Perhaps I'll make time to test a pre-PHP7.1 version of rand() and see how it performs vs the MariaDB rand() function. If it's similar then multiplying two numbers together, while not ideal, will still be far better than using a rand() function that generates a high number of collisions. (I have tested this multiple times with the MariaDB function.)

Posted
1 hour ago, wakabayashi said:

@dynambee what was actually your idea behind the multiplication of two numbers? In my naive way of thinking, I would just create a random number and check if it's already exits. If it does I redo the process until I got a new unique number. As far as I see, thats also @datakick way of doing it.

The problem is that not all rand() functions are equal. The rand() function in PHP 7.1 is very good. Excellent, even. However the rand() function in earlier versions of PHP (even 7.0) is not so good. Exactly how bad it is I am not sure. Earlier versions seem worse than later versions and in some cases it seems to depend on the underlying OS, too. All that is fixed with 7.1 though as it uses the Mersenne Twister Random Number Generator.

What I am sure of is that the rand() function in MariaDB 10.2 is pretty terrible at generating a lot of large random numbers, far worse than I would have ever expected. For a table with a standard INT(11) ID column, generating a new random ID for the table had an 8.5% collision rate after only 1 million inserted rows. That's shockingly bad. Less than 0.05% of the available INT(11) numbers are in use with 1mil rows but 8.5% of the newly generated "random" numbers are collisions? Something is very wrong. (After 2.4mil rows rand() had a collision rate of 22.8%! 0.11% of numbers in use and 22.8% collisions... Not good.)

Changing to using two smaller numbers multiplied together brought the MariaDB collision rate in the same scenario down to 0.6%. This is still higher than expected but is pretty acceptable given the use case being discussed in this thread.

Anyway. for anyone using PHP 7.1 or newer none of this is an issue. For anyone using PHP 7.0 or lower it's still not clear, to me, if rand() is a suitable way to generate random IDs for table inserts.

Posted

@dynambee as it brought collision down rate down, there seems to be a point in it. From a logical stand point, it doesn't make sense to me. But I have to admit, that I am not a pro in statiscal methods. 

In general I could not imagine a better way, than doing once rand() and check, if its unique. If the collision rate is exceptional high, it indicates, that this is not really a normal distribution. But as you say since php 7.1 is fine, why dont you just use it then? If you are below PHP 7.1 you can go with mt_rand() direct!?

Posted (edited)
4 hours ago, wakabayashi said:

In general I could not imagine a better way, than doing once rand() and check, if its unique.

Yes, if the rand() function operates properly you would be correct.
 

4 hours ago, wakabayashi said:

If the collision rate is exceptional high, it indicates, that this is not really a normal distribution.

Also correct. Unfortunately when generating big numbers many rand() functions do not work properly. There are various workarounds for this but the best solution is a properly working rand() function.
 

4 hours ago, wakabayashi said:

But as you say since php 7.1 is fine, why dont you just use it then?

I do run PHP 7.1. I only discovered a short time ago that the rand() function in PHP 7.1 has been greatly improved and now uses a well known random number generator.

So for my particular situation calling rand() or mt_rand() is likely to work very well. However anyone else visiting this thread in the future needs to know that rand() with earlier versions of PHP may have undesirable results for creating random table IDs.
 

Quote

If you are below PHP 7.1 you can go with mt_rand() direct!?

Unfortunately below 7.1 mt_rand() also doesn't work properly. I think it is still better than rand() but there was a flaw in the implementation of mt_rand() in PHP versions prior to 7.1.

Edited by dynambee
Posted
1 hour ago, dynambee said:

Unfortunately below 7.1 mt_rand() also doesn't work properly. I think it is still better than rand() but there was a flaw in the implementation of mt_rand() in PHP versions prior to 7.1.

Oh ok, didn't know this things. I thought inbuilt php functions would be pretty good ^^

Posted (edited)
On 10/17/2019 at 2:03 AM, datakick said:

 


    public function add($autoDate = true, $nullValues = false)
    {
        if (!isset($this->id) || !$this->id) {
            $clazz = get_class($this);
            if (in_array($clazz, ['Order', 'Customer', 'Guest'])) {
                $definition = static::getDefinition($clazz);

                // generate random number that can fit into signed int(11) and doesn't exists yet
                do {
                    $rnd = rand(1, 2147483647);
                    $sql = (new DbQuery())
                        ->select('1')
                        ->from($definition['table'])
                        ->where($definition['primary'] . '=' . $rnd);
                } while (Db::getInstance()->getValue($sql));

                // use random number as a new ID
                $this->id = $rnd;
                $this->force_id = true;
            }
        }
        return parent::add($autoDate, $nullValues);
    }

 

 

I have spent the last few hours working with this and doing a lot of testing. I discovered that the Cart ID is also exposed easily during the payment process when people are sent back from PayPal after completing payment. It's also visible when using the site if you use Chrome's console. Adding 'Cart' to the list of classes in the above code fixed that problem and the Cart IDs are now random.

Another issue I came across is that while the order reference column is defined in the TB database as VARCHAR(100) the length of reference numbers is still limited to 9 characters by the TB code. I reported this as a bug on github.

Next issue is that ideally I would like the id_order, order reference #, invoice #, and delivery slip # to all match as it makes things easier for customers if everything is the same.

I found this nice little module (called "modrefchange") that seems like it might do exactly that but there seems to be one problem.

I set the modrefchange module use the Order ID for the order reference #, invoice #, and delivery slip #. It successfully uses the order ID for both the invoice # and the delivery slip # but the order reference # is being overridden with a different random number.

So I guess my next (and hopefully last!) question about this is... Is there a way I can generate random `id_order` fields but not generate random `reference` fields? That way the module can use the `id_order` value for the reference, invoice, and delivery numbers.

Edited by dynambee
  • 2 weeks later...
Posted (edited)

No idea about all these numbers,  sorry i cant help  -  but maybe a shop system should be a shop system and you have to use an separate software at your office computer for such things?  I mean, tb should be a perfect front-end for customers, but all the other things should be done separately in a safe structure and at a software, which is made for  🙂  

Edited by DRMasterChief
Posted
8 hours ago, DRMasterChief said:

No idea about all these numbers,  sorry i cant help  -  but maybe a shop system should be a shop system and you have to use an separate software at your office computer for such things?  I mean, tb should be a perfect front-end for customers, but all the other things should be done separately in a safe structure and at a software, which is made for  🙂  

There are two reasons why the numeric system is important:

  1. It's not ideal for a new shop to start from customer #1 and order #1. It does not inspire trust, even if the business behind the shop has been in existence for many years.
     
  2. It is easy for competitors to learn a lot about a shop if customer, cart, guest, order, invoice numbers are sequential. For example, I can see how many customers you have. How many orders you have received. How many orders over a period of time. How many new customers you have collected over a period of time. How many abandoned carts you have over a period of time. From this information I can extrapolate a lot about your business, including your likely monthly and yearly sales. As I said much earlier on, some people really don't care about this but personally I do.
Posted

Yes, i know about that and i also do agree with you.  What i meant is, that you can easily alter the order number to an matching level when you start with the shop.  Then you can use the numeric order module to change the order no. as you like. 

And then, i meant, you should use an software at a computer station in your office to create delivery notes, invoices and so on.  With this, you dont have the problem with such numbers to have them in an online store and competitors do not have access to this.  We run  'JTL Wawi' for this, but i think it is not available in English.

Maybe you can also run a cronjob or change the numeric order module in a way, that e.g. every few hours a random value is added?  This works for us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...