With ParseHub, you can navigate between links and categories on a website automatically. Sometimes, you may want to add hundreds of links directly into ParseHub, instead of selecting links on the website.
You can add a list of urls in JSON format into the "Starting value" of the project in the "Settings" tab.
Follow the instructions below to enter a list of urls into your Project.
1. Open your project using any page as the URL, such as the homepage for your website.
2. Go to the "Settings" menu in the project.
3. You will want to add a list of the URLs you would like to crawl into the "Starting Value" textbox. There are two possible options:
- Use the "Import from CSV/JSON" option to import a list of URLs from a CSV file
- If you are familiar with JSON, paste your JSON directly into the textbox
If you are importing from CSV, ensure that your CSV file have the name of the list (such as "urls") as the column header, with each URL in a separate row:
Click on the "Import from CSV/JSON" option next to "Starting Value", choose your CSV file and one structured like the above should import as follows:
If you are pasting JSON directly into the "Starting Value" textbox, you can format it as follows:
- The list name "urls" can be renamed to anything you want such as "links" or something more descriptive like "shoes" or "brands".
- You can enter as many links in the structure as you want. We have 3 in the example below, but you can keep adding more links inside quotation marks and separated by a comma.
{
"urls": [
"https://www.walmart.ca/en/ip/hp-stream-14-cb110ca-14-inch-laptop-white-intel-celeron-n4000-intel-uhd-600-4gb-ram-64gb-emmc-windows-10-s-4jc81uaabl/6000198793458",
"https://www.walmart.ca/en/ip/hp-17-by0002ca-173-laptop-natural-silver-and-ash-silver-core-i5-8250u-intel-uhd-graphics-620-8gb-ddr4-1-tb-5400-rpm-sata-windows-10-home-4bq83uaabl/6000198528157",
"https://www.walmart.ca/en/ip/acer-aspire-3-156-laptop-amd-e2-9000-amd-radeon-r2-graphics-8-gb-ddr4-1-tb-hard-drive-windows-10-home-nxgnvaa019/6000197843008"
]
}
8. Click on the + button on the right side of the "For each item in urls" command.
9. From the tool box choose the "Begin New Entry" tool. Now the results of each one of the urls will go into a separate row in CSV and a separate object in JSON. If you didn't use the "Begin New Entry" tool anywhere in your project, the result scraped for each url would override the previous one.
10. Rename the "list1" command to something else like "links". Make sure not to name the Begin new entry command the same as the list that holds your urls. The Begin new entry command should have a unique name.
11. Now you have the option to extract each URL in your list along with the related results. This step will add a new column for the links that you provided as the starting value.
Click on the + button on the right side of the "links" command or if you didn't rename it the "list1" command and choose the Extract command. Instead of $location.href enter item and rename the Extract command to "link". In your final results you will have a column which extracts the associated link (url) per each result.
12.Click on the + button on the right side of the "links" command and choose the "Go To Template" command from the tool box. The Go To Template command will let you specify which url you want to go to and which type of page you want to open.
13. On the pop up window, choose the "Go to URL" option instead of the "Stay on the Same Page" option.
14. In the text box type in "link" without quotation marks, assuming you didn't name it something else.
It is recommended to use "link" instead of "item" because Parsehub will only be able to parallelize your project if it is referring to a list created within Parsehub i.e. your extracted "links" list, as opposed to an uploaded list of "items" from your CSV/JSON.
15. In the "Create New Template" text box type in the name of a new template you want to open for each link - such as "results". Click on "Create New Template". You should now be taken to the first url in your JSON list and a new template should be created for you.
17. On this new template continue making commands that will be applied to each of the urls in your list in turn.
Once your loop template is complete, you will see a preview of your links in the preview pane. These URLs links will not appear on your data set they only represent your starting value. Since these URLs are static, will need to scroll down in the preview pane to view your data that is being scraped during a test run.
Bonus Tip:
If you need to run your project using a different set of URLs, you do not need to use a new project! Simply update your list of URLs in Excel, save it, and then re-import the file into your Starting Value.
Bonus Tip 2:
If you would like to paste your JSON list directly into the Starting Value box and your list of links is in an Excel, you can easily convert them into JSON with the handy Mr. Data Converter tool.
1. Copy your list of links from Excel or any other text file.
2. Go to Mr. Data Converter.
3. In the first box enter all of your links. Make sure to type in a heading name at the top of the column such as "urls".
4. From the dropdown select "JSON - column arrays".
5. Copy and paste your finished JSON into the "Starting Value" of the project "Settings tab".