Extract

The extract command extracts data from the current element.

ParseHub automatically extracts the text, url and img src of an element for you when you use the select command, when this is possible. 

Command Options

Modify your extraction

In addition to any attributes present on the selected elements, the following data can be easily extracted:

  • Text - The human-readable text in an element.
  • HTML - The outer html or inner html of the selected element.
  • PageUrl - The url of the page that is currently being processed. This does not depend on the current element.
  • id Attribute
  • class Attribute
  • the current Date

If you'd like to extract something more complicated, you can use the text box at the right to evaluate an arbitrary expression, and extract the result into the current scope.

Use regular expressions

This will apply a regular expression (regex) to the data before extraction. The value that will be extracted is the first matching group of the expression.

For example, if the original extracted text is Price: $99,

  • the regex .*(\d+) will extract 99 into the scope.
  • the regex (.*): will extract Price into the scope.

Regular expressions use the JavaScript syntax. You can see a handy cheatsheet here.

Have more questions? Submit request!

0 Comments

Please sign in to leave a comment.