Jump to content
thirty bees forum

[zentool] images directories and database cleaner


Recommended Posts

Posted (edited)

Hello,

I just finished this module from personnal scripts, I think it might help people using TB, specially after a migration from prestashop or others e-cms.

Sometimes it happens that old websites got a huge image directory, this module will check the presence of the images for each product on the server and delete on the database the assigned image to product that is not available on server.

Also it will delete all images or files located inside /img/p directories recursivly that have nothing to do there  (without the need of deleting all miniatures and images files from back office and recreate thumbnails).

I used to rely on presta manager for this task, but it took such a big amount of time and comitment to finish the task... not really satisfying, so that is why I make my own and now want to share with TB community.

 

Please tell me if it works for you, if you have ideas to improve it, to correct the code or anything related to this module.

It might also be included in the "TB cleaner" module as well.. what do you think ?

I didn't not put it on github, I prefer to expose it here first.

thank you in advance for your bug report or reviews.

Zen.

 

zentool_images.zip

Edited by zen
Posted

Nice work!

Did you notice class ShopMaintenance yet? It's method run() gets executed periodically (triggered by Ajax requests) and is designed to do exactly such cleaning jobs. Removing obsolete images would be a nice fit there.

One problem not solved yet: how to search huge file trees in time-limited chunks. No matter how big the tree is, the search should return after three seconds and pick up next time where it left on the previous run.

If you could form your code to make it fit into ShopMaintenance I'd happily add this to thirty bees core.

Posted (edited)

Thank you Traumflug for pointing to this class I didn't know existed, but as you said it is triggered by ajax on back office.. and this directories deep search is kind of using quite some ressources on the server.

19 hours ago, Traumflug said:

One problem not solved yet: how to search huge file trees in time-limited chunks. No matter how big the tree is, the search should return after three seconds and pick up next time where it left on the previous run. 

Mission done for a part of the script, and I also spent some time to figure out how to put results of the progression of each script lunch into the logs panel.

Here is this first attempt on one part of the script that works good on my server actually, It is the part that check ALL files recursivly on the server in 'img/p' and check if extensions exists and if the file should be there or not, and if not file is deleted on the server.

I had to insert datas into the configuration table in order to start at the same point and to test if the task has been done today already, also if more than one user start the script.. it will be finished by combined ajax request but start at the break point of the 3sec max chunk each time.

My testing shop contains 88252 images on the server and it needed 3 Requests of 3 sec each to complete the task, I don't store the 2 sql queries on the begining of the function, it takes 0.5 sec here each time, should I , if yes How ?

 

 

 

Edited by zen
I put it on github with the 2 functions in one.
Posted (edited)

I add it on github on the ShopMaintenance class

.. not yet.. missed betwen the two scripts.. have to do better

Edited by zen
Posted

Maintenance tasks obviously require server resources, that's true. That's why ShopMaintenance runs these tasks only once a day and why they should get split into many small tasks. Not running these automatically means, however, most merchants never do this. I've seen a client shop (migrated from PS) with no less than 1.2 million stale image directories; just the directory and index.php inside, but no JPEG.

For saving query results between requests one can store the result in a cache file, using PHP's var_export() for saving and require or require_once for restoring. Like done e.g. here: https://github.com/thirtybees/coreupdater/blob/master/classes/GitUpdate.php#L146-L173 . This isn't always faster, though, reading a file with 50.000 lines also takes a noticeable fraction of a second. For being less resource hungry, limiting requests to chunks of 100 or 1000 might be an option: http://www.mysqltutorial.org/mysql-limit.aspx

Posted
1 hour ago, Traumflug said:

 I've seen a client shop (migrated from PS) with no less than 1.2 million stale image directories; just the directory and index.php inside, but no JPEG. 

Yes.. that is something I didn't check on the module.. so you mean it will be a good idea to erase all directories and index.php inside that does not contain any images files or sub-directory ?

Let me think about this yes...I'll try to find a way for that.

I'll allso test the php cache file.. or the cache mysql request maybe good option too, have to test if it improve the speed.

But before I'll finish the commit on github properly because trying to do too fast I thought one function only is better.. it is not : better to do in like in the module in 2 steps :

- clean directories and unattended files

- clean database from images product that cannot be found on server...

each task with 3 sec max exec time with restart when stoped... if first clean is over, the script will lunch the second one..and that only one time per day, does it sounds logic ?

I Hope to finish it soon.

 

 

Posted
21 hours ago, zen said:

so you mean it will be a good idea to erase all directories and index.php inside that does not contain any images files or sub-directory ?

I fear it's not that simple. Each image has a database entry and each image in the database is connected to a product. A first round of cleaning would be to delete all images in the database not connected to a product. A second round of cleaning would delete all image directories on disk which have no corresponding database entry. A third round would delete image types not needed, as you do already. And I hope I didn't forget something with this description.

Cleaning the database should be done not with handcrafted SQL, but by using the PHP class as far as possible. Database storage might change a bit in the future, using the class keeps code future-proof.

Your idea of scheduling is fine. It's the simple part anyways, as long as code can execute in 3-second-chunks.

Posted

Forgot a fourth round: images properly connected in the database, but the original image on disk missing. These report an error on thumbnails regeneration and should get removed from the corresponding product, along with a message to the merchant.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...