There are many websites in which you may need to fill out a form in order to access data. For example, you may want to get data from behind a log-in, search for bookings or enter search results into an "advanced search" type form as we will demonstrate below.
In this tutorial, we will be completing the "Advanced Job Search" form on Indeed.ca as an example:
This form has three types of fields:
- Input fields which allow you to type in keywords (Job Titles, Companies and Location)
- Checklists which allow you to select one or more options
- Drop-down lists which allow you to choose an option from a predetermined list (Job Type)
1. To begin, open ParseHub and on our Home screen we will click on "New Project". The URL we will be using will be https://ca.indeed.com/advanced_search which you can type into the field if you would like to follow along. Click on "Start project on this URL" to create your project.
2. ParseHub will already be in select mode with an "Empty selection1" command.
3. To input the job title, click on the "With all of these words" field which will automatically default to an Input command where you can type in the job title you want to search for. You can also optionally click on our command where it says "selection1" and rename that to something more descriptive such as "jobTitle".
You can repeat this step for other fields in which you need to input data, such as the "From this company" and "Location" fields. To summarize:
- Click on the "+" button next to "Select page"
- Choose a Select command
- Select the field which will automatically default to an Input command
- Input the text you would like to enter into that field
Our project should look similar to this:
If we want to search for various keywords (e.g. search for one job, scrape all results and then repeat the search for another), we can follow the instructions on this tutorial to create a JSON list of those words or this tutorial to copy a list of those words from Google Sheets.
4. To select a checkbox field, click on the "+" sign next to "Select page", choose a Select command and select the box you would like to tick, in this case, we will be selecting "Exclude staffing agencies".
You can optionally click on "selection1" to rename it to something more descriptive such as "jobSearchExclusion". Then click on the "+" button next to "Select & Extract jobSearchExclusion" and choose a Click command. When asked if this is a "next page" button, click on "No". Since we will be performing more actions on this page we can select the "Continue executing the current template" option and click on "Stay on Current Template".
You can repeat this step if you would like to check any of the other boxes.
5. For the drop-down list you can follow the instructions on the first section of the drop-down options tutorial by clicking on the "+" button next to "Select Page", choosing a Select command and clicking on the drop-down after which you will click a second time on the option you would like to choose. In this case, we will choose "Full-time" for the Job Type and you can optionally rename your "selection1" to "jobType" and the selection for the "Job Type" to "Full_time".
You could also have ParseHub iterate through all of the items on the list, clicking into the results to extract data and returning to the next item on the list by following the instructions in the second section of the drop-down options tutorial.
6. To submit the form, click on the "+" sign next to "Select page", choose a Select command and select the "Find Jobs" button. You can optionally rename the selection to "searchButton". Next to "Select & Extract searchButton", click on the "+" sign and choose a Click command.
Because our search button will redirect us to a page of results, when the Click command pop-up loads, click on "No" again when asked if this is a "next page" button and this time choose "Create New Template", which you can rename to "results". Click on "Create New Template".
7. The project will redirect you to the results page and open the new"results" template, where you can begin to choose what data you would like to extract from each result.
There are multiple tutorials which can teach you how to scrape data from results pages, including this tutorial on scraping product details, this tutorial on scraping directories and this video tutorial on scraping directories.
If you have any questions regarding your own project, you can always contact us at hello@parsehub.com.