EXAMPLE: Scrape Etsy

This tutorial will show you:

1. How to scrape Etsy.com section. To follow along you can Start a Project in ParseHub on https://www.etsy.com/ca/c/jewelry/necklaces?ref=catcard-1217-216044426.

2. How to add pagination to a project where you're already extracting multiple elements.

3. How to get a full set of data off of an eCommerce website.

Note: You might not be able to get all the data you need, as ParseHub's Free tier only offers a maximum of 200 pages per run. If you need to scrape more, consider upgrading to one of our premium or enterprise plans!

Building a paginating web scraper

1. Click on the "Select page" command + button that is located on the right of the command. From the tool box that appears, choose the "Select" tool.

2. Click on the "Next" button, on the page, to select this button. It will highlight in green when selected.

3. Rename the "Select & Extract selection1" command by clicking on the text and typing in "button" 

4. Click on the + button on the "Select & Extract button" command. Choose the "Click" tool from the tool box.

5. You also have to tell ParseHub which template to use on the new page. In this case, you have to make sure the same template is selected in the dropdown on the pop up window. In this example, this will be main_template.

 

Troubleshooting: Prevent Infinite Loops

After adding the "Click" tool, you want to make sure that you did not create an infinite loop in the project.
On some websites the "next" button is still visible on the last page of the results, although it is disabled and not click-able. This causes ParseHub to continue paginating even with nothing left.

First, we need to make sure that the "next" button is not available on the last page: switch to the Browser Mode on ParseHub and go to the last page of the results.

Click on the "Select button" node and make sure that the selection node shows "(0)". If it is shows "(1)", this means that we are still selecting the "next" button on the last page.

To prevent creating this infinite loop, you should add a condition to skip the "next" button if it is disabled. This won't always require the same conditional on every page.

Right click on the next button the the last page and press Inspect Elements. You should look around in the html and find a unique attribute in the next button HTML, for when the button is disabled.

 

You should add a conditional command right after the Select button node and enter !$e.prop("class").toLowerCase().contains("disabled") in the condition text box.

If the HTML you found is in a different attribute (for example, name="disabled") make sure to change the expression to this attribute name, instead of class

If the HTML you found has a different unique name for the final page's button (for example, class="button_off") make sure to change the expression from disabled to this attribute.

 

Download this Project

You can download the project that we just created here: Etsy.phj

To open the project in your account, open ParseHub, go to My Projects, click on Import Project and select the file. Note that this project will work on the Etsy website only. 

 

 

Have more questions? Submit request!

0 Comments

Article is closed for comments.