Often, a website will have a Load More button at the bottom of its list of results which loads extra items onto the page if you click on it.
This tutorial will show you what to do if you need to scrape items that only show up after you click on the Load More button some number of times. What it does is extract a custom value based on whether or not selections have been made, and then based on that value, will either click on the "load more" button or end the run.
I'll be using the JJ Buckley website as an example, so you can follow along.
Clicking on a "load more" button
- Click on the + button next to "Select page", click on advanced, and then choose an Extract command. Name this command "listingValue", then clear the contents in the extract configuration box and type zero(0) to set the value of the "listingValue" to 0. We will use this value later on to check and see if we need to click on the "load more" button or not.
- Click on the + button next to "Select page" and choose a Select command. Move your cursor onto the first product and hold CTRL key (CMD on Mac) +1 to zoom out on the selection. Now that you can select the whole container of the product, click on the first product to select the container. The other products will be highlighted in yellow; click on the next one as well to select all the products. You may need to do this multiple times to select all of the products.
- By selecting all the products, ParseHub creates a Begin New Entry (list) node which is hidden on the Select container command and extracts the text within the container (name). If you are not interested the Name you can hover on the extract node and remove it by click on the x button.
- Expand the Begin New Entry command by clicking on the list icon beside "Select product". We can now select and extract the data from each listing by clicking on the + button next to Begin New Entry (listing) and choosing Relative Select. Click on the container that we selected earlier and use the arrow to select any element you want to extract.
- Hover over the "Select product" command and hold the Shift key. Click on the + button that appears and choose an Extract command. Rename this command to "remove", and choose "Delete element from page" from the Extract dropdown menu.
- Hover over the "Select listing" command and hold the Shift key. Click on the + button that appears and choose an Extract command. Rename this command "listingValue" (same as above). This command will execute and set the value of listingValue to 1 once all the products from the list are extracted and removed.
Make sure that your Extract commands are not nested within your Begin new entry command! Otherwise the "listingValue" variable that you are setting equal to 1 will belong to a new scope, and therefore will not be able to be referred to as "listingValue" outside of that scope. The commands should be in line with the Begin new entry command in your command structure.
- Click on the + button next to "Select page", click advanced, and choose a Conditional command. In the conditional command just write listingValue. This condition will be true if listingValue equals to 1 (ie. when we are done with the previous steps' commands (extracting and removing)).
- Click on the + button next to your newly created conditional command, and choose a Select command from the toolbox. Use it to select the "load more" button on the website
- Click on the + button next to "Select loadMore" command and add a Click command. In the configuration window that appears, choose the "Yes" when asked if this is a next page button, set the repeats to 0, and repeat the current template. If the listingValue is 1 then the conditional command will proceed and click on the "load more" button to load more products.
Now ParseHub will select, extract, and remove the data from the HTML and click on the "load more" button to load more listings.