Select

The select command is used to select elements on the page. Pressing the + button and clicking Select will create a new select command beneath the current command.

Hovering over an element will highlight it blue. Clicking on a highlighted element will select it (green). Clicking on further elements on the page will add them to the selection. While the select command is active, a number in brackets to the right indicates how many elements on the page are being selected.

Selection commands are labeled, and you can edit these labels by clicking on them in the template details pane. The labels can be used with the Conditional and Jump tools for fine-grained control over execution flow. They also serve as reminders of which elements are being selected by that node.

Related is the relative select command, which allows you to select elements relative to the current element (see below).

Internal representation

When you select an element, ParseHub figures out a concise pattern by which to represent that element (if you're familiar with XPath, it's kind of like that, but more powerful). When you add to your selection by clicking more yellow-highlighted elements, you are giving the select command additional samples so that ParseHub can make a better decision about the right pattern to use.

The select command is not tied to the particular page that you created the selection on. So when a selection is executed on a different page, ParseHub will automatically figure out which elements are in that selection based on the pattern. You can select extra elements across multiple pages to train the selection!

You may be wondering: what happens if I haven't given enough samples? This is a well-known problem for those that have written crawlers themselves. Your code will work on the pages that you've tried, but not on some that you haven't. For now, you simply have to find enough samples to train ParseHub. We are working on a way to automatically detect ambiguities so that ParseHub can figure out when it hasn't been trained enough.

Execution flow

You can think of a command as a loop over the elements it selects. For each element in your set of selected elements, it will set that as a current element, then execute all of the select command's children in order, with that current element.

The current element selected affects what some commands, such as Extract and Click, will read or interact with.

If you nest two selections, any further children will be executed once for each element from the inner selection times each element from outer selection. That is, if each selection has 10 elements, the children of the inner selection will be executed 100 (10x10) times.

Modifiers

Alt - Remove

After you have added sample elements to your selection, Parsehub may have selected too many elements. In this case you can hold down the alt key to indicate elements that you do not want to be captured by your selection. ParseHub will figure out a pattern that both captures all the elements you wanted and does not capture any of the elements you didn't.

The alt key may be used to remove an element that was added by mistake.

Ctrl - Zoom

By default, ParseHub restricts the elements that can be selected to those that have text inside them. Sometimes that is not desired (e.g. you'd like to extract the css class of an element which has no text). You can use hold the ctrl key and scroll up or down (1 or 2) to "zoom" to the right element.

After zooming to the right level, there may be one or more "potential" highlighted elements. These are the elements that you may now hover over and click on to select them.

Alt deselecting can be used in combination with ctrl zooming, in case you zoom to the wrong element.

Command options

Selection Node

If you press the "Edit" button here, you can choose to manually input a path to elements you want to select on the page, using either CSS or XPath. By selecting "Advanced...", you can also see how ParseHub is selecting the current element with its own code.

You shouldn't use this function without understanding CSS or XPath, and only when ParseHub's own selection, with zooming, can't seem to find the element you need.

Wait for elements to appear

Normally, a select command is skipped when its pattern matches no elements on the page (or rather, it loops over 0 elements). With this option, if no elements are matched, the command will keep trying for up to 60 seconds. This is useful if e.g. an element gets created in response to an AJAX call.

Have more questions? Submit request!

0 Comments

Please sign in to leave a comment.