In this tutorial we will cover some of the basics and build a project that scrapes one webpage. After this tutorial you should be able to download data in CSV or JSON format from one page of a website, in this case Society6. You will find information on scraping more pages on our Help Centre.
Step 1. Download the ParseHub Desktop App
You can download ParseHub here by following the instructions on the page. ParseHub is available on Windows, Mac and Linux.
Step 2. Open Website & Start New Project
1. Open the ParseHub Desktop application
2. Click on "New Project" [No "New Project" button? Check out this troubleshooting guide]
3. Enter https://society6.com/tapestries into the text box
4. Click "Start project on this URL" below
Step 3: Select & Extract All of the Post Titles
1. Click on the "+" button on the right side of the "Select page" command and choose the Select tool from the tool box. This tool can be used to select elements on the page and will also automatically extract common data, like text and URLs.
2. Click on the first product name on the page. It should highlight it in green for you, showing that it has been selected for extraction.
3. The other items should be highlighted in yellow, showing that ParseHub has identified the elements as being similar. Click on the second title on the page. All of the products should be highlighted in green and will be selected and extracted.
4. Click on the "Select selection1" command in the app side bar. Rename the "selection1" to "Product".
5. You will notice 3 new commands were created for you in the app side bar.
- All of the selections were put in a new entry for you (hidden under list icon ). This new entry creates an empty row in Excel or an empty scope in JSON for each selection. All of the actual data you extract from each selection and other instructions (commands) will be repeated in each row or scope for each selection.
- The text was extracted for all selections automatically. The text was extracted into an empty scope in JSON, or into the empty row in Excel (creating a new column).
- The url was also extracted from each selection. The url was extracted into the same row as the text for each selection, creating a new column beside the "name" column. The url for each selection was also added to the same scope as the text (name) of each selection.
6. Rename the "Extract name" command by clicking to it to say "title".
Your CSV/Excel sample results should look like this:
Your JSON sample results should look like this:
Step 4: Select & Extract All Product Prices
1. Click on the "+" button on the right side of the "Select Product" command and choose the "Relative Select" tool from the tool box. This tool lets you create a relationship between data that is already selected on the page, and any data that you want to attach to it.
This tool is best used for websites that list businesses, products, hotels, etc - when you need to pair the main business or product title with other repetitive data on the page such as telephone numbers or product prices.
2. Click on the first title on the page and then click on the corresponding price on the page. You should see an arrow created between one title and its corresponding price.
3. If every product doesn't have an arrow from it's title to it's price, we need to refine this selection. To do so, click on the second title on the page and click on the price that corresponds to the second title. You should see multiple arrows created between all of the titles and all of their corresponding prices on the page.
4. All of the prices should be automatically extracted.
5. Rename the Relative "selection1" command by clicking on it and typing in "price"
Your CSV/Excel sample results should look like this:
Your JSON sample results should look like this:
Step 4. Run the Project
1. Click on the "Get Data" button on the bottom of the page.
2. Click on the "Run" button.
3. Click on the "Save and Run" button.
Step 5. Download Data (CSV & JSON)
1. Wait a few seconds for ParseHub to scrape the website and fetch your data for you.
2. You will see the "Data is being collected" text while you wait. You can refresh the results faster by clicking the "Refresh Now" button.
3. When the data is ready you will see the "CSV" and "JSON" buttons highlight in bright green. Click on one of these buttons to download your CSV or JSON results.
You will also get an email when your run is finished - you can download your data from the link in this email.
Step 6. Connect to API [Advanced/Optional]
You can follow the instructions on our API reference.
To find your API Key:
1. You can get your API key from the run page itself or by clicking on the profile button on the top right corner of the app and going to "Account".
To find your Project Token:
1. You can find your Project Token from the run page itself of by going back to your project and clicking on the "Settings" tab.
Step 7. What's next?
To do more, such as click into each result and move on to other pages, you can look at the extended version of this tutorial here which is part of the ParseHub 101 tutorials for learning how to use ParseHub.