Welcome! In this video tutorial I'm going to get data from behind a log-in using ParseHub.
Creating your project
To begin, click on the "New Project" button when you first open ParseHub and enter the URL of the website that you would like to scrape data from.
For this example, we'll be using the Zoocasa website at www.zoocasa.com. Click on "Start project on this URL".
This website asks us if we would like to share our location with the site, as this isn't needed for our project, we can simply X out of this message.
The ParseHub tool
There are three main areas on the ParseHub tool:
- On the left-hand side is where you have your commands and your settings
- In the centre is where you have the interactive view of the websites
- And at the very bottom is where you can preview data in either CSV or JSON
When ParseHub loads, it's automatically in Select mode. Select mode allows us to interact with elements on the website. You can also toggle this to Browse mode, which allows you to click on the website as you would on any other browser. For our project, we'll be using Select mode so that we can select elements on the page that we want to interact with.
By default, your project already contains an empty Select command. However, if this is not the case, you can always click on the "+" sign next to "Select page" and choose a Select command.
Logging into a website
The first thing that we're going to do is select the "Sign In" button that appears on the top right-hand corner. The default name for the selection is "selection1". However, we can always change this to something more descriptive such as "signInButton". The action that we want to take on the Sign In button that we selected is to click on it. Therefore, we'll click on the "+" sign next to "Select & Extract signInButton" and we'll choose a Click command.
As you can see, the Click command automatically clicked on the "Sign In" button and brought up the pop-up that allows us to log into the website. There's also a pop-up here that asks us what we want to do once the element has been clicked on. In this case, we would like to "Continue executing the current template" once the element has been clicked. The reason for this is because there's more actions that we want to take on this same page. Click on "Stay on Current Template".
The actions that we want to take now are to input our email address, password and click on the sign-in button. The first thing we'll do is click on the "+" sign that appears next to select page and choose another "Select" command. We can use this Select command to click on the email field. ParseHub has automatically detected that this is an input field and added an Input command for us. We can now type in our email address into this Input command. You'll notice that as I typed in my email address, it automatically appeared on the website as well. The selection has automatically been named "selection1" again. However, we can always double-click on it and type in something more descriptive such as "email".
We're going to repeat the same actions for the password field. First, click on the "+" sign next to "Select page", choose a "Select" command and click on the password field. Then, type your password into the Input command. The password will automatically appear on our password field on the website and, once, again, we can rename our selection to something more descriptive such as "password".
We're going to choose another Select command from the "+" sign next to "Select page" and use that to select the "Sign In" button which we can rename to "signInButton". Because we want to click on this button, we'll click on the "+" sign next to "Select & Extract signInButton" and choose the Click command.
You'll notice once again that ParseHub has clicked on the "Sign In" option and we're now signed in to our account on the website, you can see this from the "My Account" section on the top right-hand corner of the website.
We also have this pop-up that asks us what we want to do once we've clicked on the button. In this case we're going to create a new template. We can name our template something such as "post_login". The reason we're creating a new template is because it's more organised to have your login actions on one template and then any actions you take after you log in on another template. Click on "Create New Template".
ParseHub has now created our new post_login template with an empty Select command. From here, there's a lot of different actions that we could do. We could, for example, click on any of the menu items, go to our account or search for a location. The actions on your post_login template will depend on what goal you have in mind for your ParseHub project.
Need more help?
You'll find plenty of other tutorials on our Help Centre and can always contact us as firstname.lastname@example.org with any questions that you have about your own project.
Please also see this written tutorial on getting data from behind a log-in with the Quora website as an example.