The extract command extracts data from the current element.
ParseHub automatically extracts, when possible, the text, url and img src of an element for you when you use the select command.
Command Options
Modify your extraction
In addition to any attributes present on the selected elements, the following data can be easily extracted:
- Text - The human-readable text in an element
- Full HTML - The outer html of the selected element
- Inner HTML - The inner html of the selected element
- PageUrl - The url of the page that is currently being processed. This does not depend on the current element
- id Attribute
- class Attribute
- Today's Date - the current Date
- Solve Captcha - use this to get the text from a Captcha (full instructions here)
- Download to S3 - Downloads a file to Amazon S3
- Screenshot to S3 - Takes a screenshot of the selection to extract to Amazon S3
- Download to Dropbox - Downloads a file to Dropbox
- Screenshot to Dropbox - Takes a screenshot of the selection to extract to Dropbox
- Delete elements from page - Deletes elements from the HTML of the page on the browser
- Other HTML attributes relevant to the selection
If you'd like to extract something more complicated, you can use the text box at the right to evaluate an arbitrary expression, and extract the result into the current scope.
Use regular expressions
This will apply a regular expression (regex) to the data before extraction. The value that will be extracted is the first matching group of the expression.
For example, if the original extracted text is Price: $99
,
- the regex
.*(\d+)
will extract99
into the scope. - the regex
(.*):
will extractPrice
into the scope.
Regular expressions use the JavaScript syntax. You can see a handy cheatsheet here.