If you have set up pagination using a Next page button in your Parsehub project, but find that your project is scraping fewer pages than expected, or is giving you repeating data, there is likely due to an issue with how you have set up pagination, resulting in the Next page button not being properly selected on every page as intended.
This tutorial will demonstrate how to solve the most common occurrences of this problem.
Next and Previous buttons are both selected
If you find that you are only properly scraping the first 2 pages of your results, Parsehub is most likely selecting both the Next and Previous buttons on all pages after the first.
Even though on page 1 it may look like your pagination is set up correctly:
If you visit page 2 (or any subsequent page), you will see that Parsehub is selecting 2 pagination elements:
This will result in Parsehub clicking on the previous page button first and returning to the first page instead of going to page 3. Usually there is a simple solution for this.
1. Switch to browse mode and navigate to page two.
2. If Parsehub is selecting both pagination elements, you will need to delete your existing pagination select command.
3. Add a new select command and only select the Next page button.
4. Switch back into browse mode and navigate to pages 1 and 3 to make sure the command is only selecting the Next page button on all 3 pages. If that is the case, then you will be able to run your project and get data from all available pages.
Sometimes it is impossible to only select the Next page button when both pagination buttons are enabled i.e. they both become highlighted even when just selecting the Next page button. In this case, please refer to the solution in the next section.
Parsehub can't select the Next button on every page
In some cases, Parsehub can only select the Next page button based on its position in the pagination bar, which can change from page to page.
You can tell when it is the latter case when the Parsehub selection node looks something like this:
In this example, Parsehub is recognizing the Next page button as being the 10th button in the pagination bar, rather than it being a Next button. On many websites, the number of buttons in the bar changes from page to page, so a selection node like this is not guaranteed to work for every page.
There is another solution that can be implemented in cases like this:
1. Using a new select command, select every pagination button on the page.
2. You will have to delete the Begin new entry command that is automatically generated.
3. You will then have to add a conditional command.
4. You will use this conditional command to check for text that would only be present on the Next pagination button. In the pictured example, simply checking that the button's text includes the string "next" will suffice. To check that the button includes this text, you can use the Javascript includes() function. This example also makes use of the toLowerCase() function to make sure the conditional is not case-sensitive.
5. You will now need to add a click command by clicking the plus sign next to this conditional.
6. You can configure this click as you would for a normal pagination click. Your project will now select all pagination buttons on every page, and only click the button that says "Next", allowing you to get your data from every page.