When extracting images from a website, there are times where the images are in a carousel - a large image that changes when you click on a thumbnail of another image. Below are two examples of image carousels; one from eBay and one from Etsy:
This tutorial will show you how to scrape all of the images from one of these carousels.
Example 1: Selecting Larger Images
For this part of the tutorial, you can follow along on the Etsy Wall Hangings section.
1) Click on the "New Project" button on the ParseHub home screen, and enter the URL of the website you will be scraping. Then, click on the "Start project on this URL" button.
2) Using the Select command (selection1) that is automatically generated for you, click on the first product name on the page.
The product's name will get highlighted in green, while similar elements will be highlighted in yellow. Keep clicking on the yellow elements until all products have been selected on the page.
You can also rename your command by double-clicking on it. For this example, I've named the command "product".
3) To click into each listing page and access the image carousel, click on the + button next to "Select product" and choose a Click command.
4) In the pop-up that appears, answer "No" to whether or not this is a next page button, and create a new template. I renamed the template "product_details" in this example.
5) Now that we are on a product page with an image carousel, select the large image that appears on the page using the automatically generated Select command. I renamed the command to "image", but you can rename it whatever you want.
6) Next, switch to Browse mode by using the "Browse" toggle next to the project's Settings tab. Click on the right arrow on the large image to change the display photo.
7) Switch back to Select mode. Using the same "Select image" command from before, click on the second image to select it. This should select all of the images within the carousel.
Example 2: Resizing Thumbnail Images
For this section of the tutorial, you can follow along on the eBay Cell Phone Accessories page.
1) Following the same steps that we followed in the Etsy example, select all of the products and click into the product details pages.
2) Using the automatically generated Select command, select all of the thumbnail images below the main image. I've renamed the command "image" in this project as well.
3) For most websites, the image URLs of the thumbnails are the same as their regularly sized counterparts, save for a part that determines what size the image will be displayed in. To scrape the full sized images, we are going to get rid of/change this part of the URL that resizes the images.
To determine what we need to change in our image URLs, create a new Select command by clicking on the + button next to "Select page".
4) Select the full sized image using this command, and then take a look at your data preview at the bottom of the page. You should see what part of the URLs we need to change in order to extract the full sized images.
5) In order to fix these URLs, click on the "Extract image" command. We will be using the replace function available within ParseHub to replace the part of the URL that we do not want with the part that we do want. In this example, we will replace the text "s-l64.jpg" with "s-l300.jpg". You can add the replace function in just after the '$e.prop("src")' command in the Extract command options.
6) Now that we have the proper images URLs being extracted, we can delete selection1. In the end, your project should look like this: