Jump to content
thirty bees forum

Datakick module how to manage images duplication? (huge img/p dir)


Recommended Posts

Posted (edited)

Hello I'm using datakick module to hourly import products.

I noticed that the img/p directory contains more than 1,5 million files, I think they are a little too much.

This shop have 42k products  (12k in stock) and 99,99% of the products have a single image

Right now image/p is huge, it contains 1.526.743 files.

Products (with their images) are imported automatically every hour with @datakick's module v2.1.9.

The issue I think is caused because every hour one of the import jobs does also an image re-import of:

  • products that don't have an image - because the source sometime add images after some hours
  • products imported during the last 2 days - because the source add/change images after 24/36 hours.

I'm using import mode: replace existing images.

I monitored the number of files and I noticed that they change even if the import job is not importing new products.

I monitored a specific product and I confirm that import mode "replace existing images" import the image again as I want, replace the product image but the old image is still in the filesystem.

Is it the right behaviour or a bug?

If it's not a bug have you some suggestion on how to clean the img/p dir from duplicates?

thank you

Edited by Beeta
Posted

If your newly imported products have new ID (and they should have) they created new image. Thirty bees does not care if it is 1:1 with another image on the filesystem (and it should not care).

The issue comes from that when old products are deleted their images stay on the filesystem. Which should not be the case.

I can recommend Tidy module to cope with those unlinked old images (it wont clean 'duplicates' but it will clean old unneeded images): https://codecanyon.net/item/prestashop-tidy/18965736

It can also do many more things.

JUST always do backups.

Posted
47 minutes ago, the.rampage.rado said:

If your newly imported products have new ID (and they should have) they created new image. Thirty bees does not care if it is 1:1 with another image on the filesystem (and it should not care).

the issue is about importing again images over existing products, re-import images.

About the tidy module AFAIK TB devs changed something in the image management and using PS modules for image cleaning was not suggested.

Posted

I don't know about your importing setup but if you are reusing the products (importing so to update prices, etc) why you simply don't skip the image import on the consecutive tries?

Regarding Tidy - yes, I too was concerned about this after the image rewrite but it's is working as expected - it detects the unlinked images and offers them for deletion:
 

Delete unused image files (preview)
Unused images found and deleted: 9947
/domain.com/img/p/9/8/3/6/9836-cart_default2x.webp - unused
/domain.com/img/p/9/8/3/6/9836-backoffice_product_medium2x.webp - unused
/domain.com/img/p/9/8/3/6/9836-medium_default2x.webp - unused
/domain.com/img/p/9/8/3/6/9836-large_default2x.webp - unused
/domain.com/img/p/9/8/3/6/9836-home_default2x.webp - unused
/domain.com/img/p/9/8/3/6/9836-small_default2x.webp - unused
/domain.com/img/p/9/8/3/0/9830-large_default2x.webp - unused
/domain.com/img/p/9/8/3/0/9830-small_default2x.webp - unused
/domain.com/img/p/9/8/3/0/9830-home_default2x.webp - unused
/domain.com/img/p/9/8/3/0/9830-medium_default2x.webp - unused
/domain.com/img/p/9/8/3/0/9830-backoffice_product_medium2x.webp - unused
/domain.com/img/p/9/8/3/0/9830-cart_default2x.webp - unused
/domain.com/img/p/9/8/3/3/9833-large_default2x.webp - unused
/domain.com/img/p/9/8/3/3/9833-backoffice_product_medium2x.webp - unused

 

Posted
On 2/18/2025 at 1:17 PM, the.rampage.rado said:

I don't know about your importing setup but if you are reusing the products (importing so to update prices, etc) why you simply don't skip the image import on the consecutive tries?

Because is the opposite I want to do.

In @datakick's module I have a cronjob that imports new products (images included) every hour. The products imported may have or not have images.

I have another import job that update only the images of the last 48h "new products imported" because the source very often adds/changes images after 24/48 hours from their creation.

So usually a product imported "now" is going to get updated 48 times during the next 48h (only replacing its image).
If the product have already an image the image get updated 48 times and I noticed that if this happen I confirm (as I monitored the filesystem) that import mode named "replace existing images" imports the image again as I want, replaces the product image as I want but leaves the old image in the filesystem and I think this is the bug that is filling my volumes.

This happens also in a third cronjobs where I put back on stock products that was outofstock and meanwhile I update price and images too because sometime the source from I'm getting products change prices and images.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...