There are many websites in which you may need to fill out a form in order to access data. For example, you may want to get data from behind a log-in, search for bookings or enter search results into an "advanced search" type form as we will demonstrate below.
In this tutorial we will be completing the "Advanced Job Search" form on Monster.com as an example:
This form has three types of fields:
- Input fields which allow you to type in keywords (Job Titles, Companies and Location)
- Checklists which allow you to select one or more options (Job Type)
- Drop-down lists which allow you to choose an option from a predetermined list (Posting Date)
1. To begin, open ParseHub and on our Home screen we will click on "New Project". The URL we will be using will be https://www.monster.com/jobs/advanced-search which you can type into the field if you would like to follow along. Click on "Start project on this URL" to create your project.
2. ParseHub will already be in select mode with an "Empty selection1" command. However, if this is not the case, click on the "+" button next to "Select page" and choose a Select command.
3. To input the job title, click on the Job Titles field which will automatically default to an Input command where you can type in the job title you want to search for. You can also optionally click on our command where it says "selection1" and rename that to something more descriptive such as "jobTitle".
You can repeat this step for other fields in which you need to input data, such as the "Companies" and "Location" fields. To summarize:
- Click on the "+" button next to "Select page"
- Choose a Select command
- Select the field which will automatically default to an Input command
- Input the text you would like to enter into that field
Our project should look similar to this:
If we wanted to search for various keywords (e.g. search for one job, scrape all results and then repeat the search for another), we can follow the instructions on this tutorial to create a JSON list of those words or this tutorial to copy a list of those words from Google Sheets.
4. For the "Job Type" field which includes several checkboxes, click on the "+" sign next to "Select page", choose a Select command and select the box you would like to tick, in this case we will be selecting "Full Time".
You can optionally click on "selection1" to rename it to something more descriptive such as "jobType". Then click on the "+" button next to "Select & Extract jobType" and choose a Click command. When asked if this is a "next page" button, click on "No". Since we will be performing more actions on this page we can select the "Continue executing the current template" option and click on "Stay on Current Template".
You can repeat this step if you would like to check any of the other boxes.
5. For the drop-down list you can follow the instructions on the first section of the drop-down options tutorial by clicking on the "+" button next to "Select Page", choosing a Select command and clicking on the drop-down after which you will click a second time on the option you would like to choose. In this case we will choose "Last 7 Days" and you can optionally rename your "selection1" to "postingDate" and the selection for the "Last 7 Days" to "last7Days".
You could also have ParseHub iterate through all of the items on the list, clicking into the results to extract data and returning to the next item on the list by following the instructions on the second section of the drop-down options tutorial.
6. To submit the form, click on the "+" sign next to "Select page", choose a Select command and select the "Search for Jobs" button. You can optionally rename "selection1" to "searchButton". Next to "Select & Extract searchButton", click on the "+" sign and choose a Click command.
Because our search button will redirect us to a page of results, when the Click command pop-up loads, click on "No" again when asked if this is a "next page" button and this time choose "Create New Template", which you can rename to "results". Click on "Create New Template".
7. The project will redirect you to the results page and open the new"results" template, where you can begin to choose what data you would like to extract from each result.
There are multiple tutorials which can teach you how to scrape data from results pages, including this tutorial on scraping product details, this tutorial on scraping directories and this video tutorial on scraping directories.
If you have any questions regarding your own project, you can always contact us at hello@parsehub.com.