With ParseHub's conditional command (aka an if command) you can set a condition that will:
- Filter your results while the website is being scraped, instead of having to do extra processing after the data is collected
- Scrape only every other data point
- Scrape data only on certain pages (ex. categories, dropdown menu options, etc.)
- Scrape reviews with a minimum rating of some number
- Scrape a list only if it isn't too big or too small
- Scrape specific values from a data table and exclude others
How to use the Conditional Command
1. Add a Select command to your project by clicking on the + button on a command that is already in your instructions.
2. Select an element on the page that you want to select or apply the if statement on. You can also select multiple elements on the page, then add the conditional command after the selection command, after a new entry command or after the automatically created extraction commands. It all depends on how you want your data to look when it is finished.
3. Click on the + button of the newly created selection command.
4. Choose the "Conditional" tool from the tool box.
6. Your expression will apply to whatever selection you made prior to using the conditional command. In other words, the current element.
7. Test your expression by doing a Test Run on the template with the debugger.
Objects Available in ParseHub
ParseHub exposes 4 objects available in all expressions. These are essentially variables that you can check against each other, or against numbers or strings.
ParseHub can also use previously extracted data as variables.
$element or $e represents the current element, and has the following properties:
- $e.text - gets the inner text or value (depending on what type of element is being selected)
- $e.outerHTML - get the outer html of the element
- $e.prop - a function that behaves like jQuery prop function. This lets you compare attributes of an HTML tag that aren't immediately visible on a page. For instance, the star rating of some products
- $e.css - a function that behaves like jQuery css function
- $e.parentProp - a function that behaves like jQuery prop function. If the result is undefined, then you apply the jQuery prop to the parent, and continue doing this until you get a result or hit the html element.
- $e.parent - get the parent element with all the same properties as $e/
$location has the following properties:
- $location.href - gets the current url of the page
$selection has the following properties:
- $selection.length - gets the length of the parent select command to this Conditional
- $selection.index - which is the index of the closest parent selection command
$date.getDate() which gets information about the current date and time
- You can see information about how to extract all different kinds of date and time data here.
- x == y - proceed if the values are equal
- x != y - proceed if the values are not equal
- x > y - proceed if the first number is greater than the second
- x < y - proceed if the first number is smaller than the second
- x && y - proceed if both x and y are true
- x || y - proceed if either x or y is true, or both are true
There are some other useful functions you might want to learn too:
- x.includes("y") - proceed if the string "x" includes "y" anywhere in it
- toLowerCase(x) - you can't use this in a Conditional alone, but if you need to compare two strings when one is uppercase (like a page title) and one is lowercase (like a url), you should use this command to make all the letters lowercase
- parseInt(x) - you can't use this in a Conditional alone, but if you need to do mathematical functions on some extracted text, you'll need to put the text as x in this function
Here are five quick examples of how you might use the Conditional command. The Quick Examples shows what you need to input in the conditional itself, whereas below, there are some longer, step-by-step explanations.
1. Filter the "Next" page out of a selection of pagination elements:
toLowerCase($e.text) == "next"
2. Filter out any results that contain the word "conditional":
3. Filter the selection of ratings to those higher than 3:
$e.text > 3
or, if the rating is in an HTML element...
4. Check if the list "categories" in your scope has length less than 10. In other words, if you've extracted less than 10 categories so far:
categories.length < 10
5. Filter the selection text numbers for those that are between 200 and 500:
parseInt($e.text) > 200 && parseInt($e.text) < 500
6. Get only even numbered elements in a list:
$selection.index % 2 == 0
Long Example: Going to the next page only if the button has "Next Page" text on it.
Only navigate to the next page if the button has the text "Next Page" on it. This example is useful if you have a button that dynamically changes to "Back" after the last few pages are visited.
1. Click on the "Select page" command + button that is located on the right of the command.
2. From the tool box choose the "Select" tool from the tool box.
3. Click on the button that says "Next Page".
4. Rename the "selection1" to say "button" - do this by clicking on the text on the selection command.
5. Click on the "button" command + button.
6. Choose the "Conditional" tool from the tool box.
7. In the "Expression" text box type in $e.text.toLowerCase().contains("next").
This means "if the text of the selected button element is Next Page, perform the command that will come after the conditional command." If you're using a different page, make sure that the text of the conditional expression and the text on the button line up.
8. Click on the + button on the "if $e.text == 'Next Page'" command.
9. Choose the "Click" tool from the tool box. The Click command lets you go to another page or click through any link that opens a new page.
9. Continue the normal steps for pagination described in this tutorial.