Some websites ask you to solve a Captcha in order to access their data.
Please note that, at this time, only Captchas that show an image that needs to be translated into characters are solvable by ParseHub. ParseHub cannot current solve reCaptcha v2.
In this article, we will show you how to add a Captcha solver to your template in order to scrape Captcha enabled websites.
There are two group of websites which generate the Captcha image. The first group have the Captcha image on each page (this article's example) and you need to solve the Captcha to access the data. The second group of websites show you the Captcha in case they detect ParseHub as a bot that is sending the requests. This group will send the Captcha randomly during the run on our serves (a good example is Amazon) and you might not notice them during testing the project locally.
After running the project on our servers, you will be able to see the Captcha image on the server snapshots (available on the run page). To add the Captcha solver for these websites, you need to first open the server snapshot that shows the Captcha image (similar to image below) from the run page. Once you have the server snapshot open in one of the browser's tabs, you can navigate to any of your templates and follow the steps below to have the Captcha solver added.
1. If not already in select mode, click on the "Select page" command + button that is located on the right of the command and choose the "Select" tool in the menu.
2. Select the Captcha image. You can rename this selection to "image".
3. Click on the + button next to the "Select & Extract image", then click on Advanced options and choose the "Extract" command.
In the Extract text box, remove the $e.prop("src") and enter the Captcha solution command:
This is an internal function which will solve the Captcha during the run automatically.
You can also rename the Extract command to "captcha".
4. Please note that the Captcha solution will not work while building the project or doing the test run automatically. During the test run, ParseHub asks you to answer the Captcha manually in order to proceed the test run. However, once the project runs on ParseHub servers, the Captcha solver will work properly.
Now that you added the Captcha solver, you can choose the answer field and enter the solution via a ParseHub expression.
Choose the + button next to the "Select page" command and choose the Select tool. Select the answer field. An "Input" command will be created automatically. Change the Input format to "expression" from the drop down menu that appears on the bottom of the command, and enter "captcha" without quotations. This value is the solution from the Captcha solver which was extracted as "captcha" in the previous step.
5. Normally there is a submit button available on the page, that you can select to submit the Captcha solution.
Choose the + button next to the "Select page" command and choose the Select tool. Select the "Submit" button.
In the process of building the project you must enter the Captcha solution manually. Before adding the next command, please go to "Browse" mode, by clicking on the green "Select" button on the top of the template. Next, enter the Captcha solution manually on the website.
6. Click on the + button next to the "Select & Extract submit" and choose the "Click" command.
The Click command's configuration pop up will appear. You can either choose to repeat the same template or you can create a new template in case the website is loading the results on a different page.
If you need more help with your project, please email us at firstname.lastname@example.org. We would be happy to help you.