Navigate to a list of URLs (using an ID value in the URL)

Sometimes, you need to scrape data from a series of pages with similar URLs. This tutorial will show you how to navigate to, and scrape data from, a list of URLs when you have a list of changing ID numbers, extensions, or 

If you have a list of full URLs, you can follow this tutorial, and if you need to input a list of keywords into a text box on the page, follow this one.

In this tutorial, I'll be scraping stock data from http://www.barchart.com/stocks/sp500.php.

Setting up the list of IDs to input

1. While working on a project, click the "Settings" button at the top of the left-side commands tab.

 

2. In the Starting Values text box, input a list of the IDs you need to enter into a URL in JSON format.

 

I used this JSON object to get the top 25 stock tickers of the S&P 500 Index: {"stocks":["XOM","GE","MSFT","BP","C","PG","WMT","PFE","HBC","TM","JNJ","BAC","AIG","TOT","NVS","MO","GSK","MTU","JPM","RDS.A","CVX","SNY","VOD","INTC","IBM"]} 

Navigating to each URL

3. Click the "Commands" button at the top of the left-hand tab.

4. On the main_template, click the plus button to the right of the Select page command. Add a Loop command from the "Advanced" menu.

 

5. Input "for each stock in stocks". This will cause ParseHub to perform the subsequent commands for each of the 25 stock tickers in the stocks list of the Starting Values.

 

6. Click the plus button to the right of the new Loop command. Add a Go to Template command from the "Advanced" menu.

 

7. Input the following expression into the Go to URL box in the pop-up that appears: "http://www.barchart.com/quotes/stocks/"+stock

 

This will input the "stock" variable, which we defined on the Loop command, to the end of the URL in quotes.

You can use this for any URL. If the ID you need to input is in the middle of a URL, format it like this: "http://firstpartoftheurl.com/"+item+"/second_part"

8. Choose to Go to a new template, and input a name such as stock_template. Then, press the green button to add the command.

9. On the new page that loads, you can select any data you need to scrape. ParseHub will get this data for each page with the ID in the URL. Make sure to add a Begin New Entry command at the start of this template, however.