This tutorial will show you:
1. How to scrape Etsy.com categories. To follow along you can Start a Project in ParseHub on https://www.etsy.com/ca/c/jewelry/necklaces?ref=catcard-1217-216044426.
2. How to add pagination to a project where you're already extracting multiple elements and their details from another page.
3. How to get a full set of data off of an eCommerce website.
Note: You might not be able to get all the data you need, as ParseHub's Free tier only offers a maximum of 200 pages per run. If you need to scrape more, consider upgrading to one of our premium or enterprise plans!
Building a paginating web scraper
1. Click on the "Select page" command + button that is located on the right of the command. From the tool box that appears, choose the "Select" tool.
2. Click on the "Next" button on the page to select it. It will highlight in green when selected.
3. Rename the "Select & Extract selection1" command by clicking on the text and typing in "button"
4. Click on the + button on the "Select & Extract button" command. Choose the "Click" tool from the tool box.
5. It will ask you whether it's a "next page" button, on which you should click Yes, after which it will default to repeating the current template. Click on "Repeat Current Template".
6. If you would like to stop the pagination at a specific page, you can change the max depth value which is one of the click command's options to something other than 0 (unlimited pages). For example, if you would like to click on the next button twice to scrape 3 pages of results in total (including the first page of the results), you can change the max depth value to 2:
Troubleshooting: Prevent Infinite Loops
After adding the "Click" tool, you want to make sure that you did not create an infinite loop in the project.
On some websites the "next" button is still visible on the last page of the results, although it is disabled and not click-able. This causes ParseHub to continue paginating even with nothing left.
First, we need to make sure that the "next" button is not available on the last page: switch to the Browser Mode on ParseHub and go to the last page of the results.
Click on the "Select button" node and make sure that the selection node shows "(0)". If it is shows "(1)", this means that we are still selecting the "next" button on the last page.
To prevent creating this infinite loop, you should add a condition to skip the "next" button if it is disabled. This won't always require the same conditional on every page.
Right click on the next button the the last page and press Inspect Elements. You should look around in the html and find a unique attribute in the next button HTML, for when the button is disabled.
You should add a conditional command right after the Select button node and enter !$e.prop("class").toLowerCase().contains("disabled") in the condition text box.
If the HTML you found is in a different attribute (for example, name="disabled") make sure to change the expression to this attribute name, instead of class.
If the HTML you found has a different unique name for the final page's button (for example, class="button_off") make sure to change the expression from disabled to this attribute.
Download this Project
You can download the project that we just created here: Etsy.phj
To open the project in your account, open ParseHub, go to My Projects, click on Import Project and select the file. Note that this project will work on the Etsy only.