Click on many links to scrape multiple pages

We will use and the Necklaces section as an example website for both of the navigation options.

To follow along you can visit:

1. If not already in select mode, click on the "Select page" command + button that is located on the right of the command and choose the "Select" tool in the tool menu.

2. Click on any url on the page or any text that has a link behind it.

In our project, we will click on the product title because the link  will take us to the product details page.

3. Click on the second url on the page or any second text similar to the one you already clicked on. In our example we will click on another product title. 

4. The text and the url behind the element will be automatically extracted for you.

All of the selections will also be put into a new entry due to the fact that you selected many similar elements on the page. In Excel, the new entry will become a sheet of two columns that will contain the extracted name and the url of each product.

5. Click the "Begin new entry in selection1" command + button. You can also rename this command by clicking on the text and typing in "products".

6. From the tool box choose the "Click" tool. The click command lets you go to another page or click through any link that opens a new page. 

7. You also have to tell ParseHub which template to use on the new page. In our project the new page will look completely different than the product listings page, so you have to create a new template.

8. In the "template name" text box type in a name such as "product_details" and click "Create New Template" ParseHub will automatically open the new template and also navigate to the product details page behind the link.

Note: You can assume that ParseHub will go and visit every page behind the multiple links that you selected, when it actually runs and scrapes your data. You do not need to click on each link on the page individually. 

9. Select and extract any data from the product details page as you normally would with ParseHub. 

10. Check your work by running once on the server or by performing a Test Run.

Note: You won't see all pages's data in the results pane at the bottom, because ParseHub only shows the data for the currently active template.

Have more questions? Submit request!


Article is closed for comments.