On ParseHub, you can scrape information from many vehicle dealership websites for cars, vans, trucks, motorcycles... etc.
In this tutorial, you will learn how to scrape details such as price, mileage or VIN from each of the listings on a car dealership website. To demonstrate, we will scrape data from https://www.autotrader.com/
Scrape data from the listings page
1. Open ParseHub, click on "New Project" and enter the URL you would like to scrape data from. In this case, we are using this URL which already specifies location. Click on "Start project on this URL".
Please note that you can also set your project to click into certain categories and/or search for one or multiple locations.
2. Once the website has loaded, a select command will already be loaded for you. The Select command allows you to select elements on the page.
3. Use your Select command to click on the first listing name which should be highlighted in green.
4. Similar elements will be highlighted in yellow. Click on the second listing name and you should see every listing name highlighted in green and the number of selections on the left-hand side. If this is not the case, click on any unselected listing names until all of them are selected.
5. You can click on "selection1" to rename it to something such as "Listing".
6. You have the option to select other information that appears on the listing page by using Relative Select commands. Relative Select commands relate data - for example, the listing name to the listing price, or the listing name to the listing location. To add a Relative Select command, click on the + sign next to the "Listing" command and choose "Relative Select".
7. Click on the first listing name and then click on the first listing price to relate the two. You should see an arrow going from each listing name to its associated price. You can click on "selection1" and rename it to "price".
8. You can preview your data on the bottom panel. Your project's JSON preview should look like this:
And your project's CSV/Excel preview should look like this:
If you are not interested in scraping the Listing_url or the Listing_price_url (these are extracted by default if you select links), you can:
- Click on the x that appears when hovering over "Extract url" to remove the Listing_url
- Click on the + sign next to "Relative price", go to "Advanced" and choose an Extract command to remove the Listing_price_url
9. Currently, the project will scrape all listing names and prices from the first page. To have it click through to the next page, click on the + sign next to "Select page" and choose a Select command, use this to click on the "next" button. You can rename your selection to "next".
10. Click on the + sign next to "Select & Extract next" and choose a Click command.
11. The pop-up that appears will ask you if this is a "next page" button. Since it is, click on "Yes" which should default to "Repeat the Current Template". Click on "Repeat Current Template".
Your project is now set to scrape all listing names and prices on every one of the results pages on your URL.
Scrape data from within each listing
If you would like to click on each listing to scrape data from within that listing's page, you can follow the instructions below.
1. Click on the + sign next to the "Listing" command and choose a Click command.
2. The pop-up will ask you again whether this is a "next page" button. This time click on "No" and you will be prompted to "Create New Template" which you can call something such as "listing_details". This template will specify the information you would like from each individual listing's page.
3. This should automatically open the first listing page and your new listing_details template on the left-hand side.
4. Within this template, you can use a new Select command for each piece of data that you would like to extract. For each piece of data (e.g. mileage, photo... etc.), click on the + sign next "Select page", choose a Select command and click on that information. For example, I could extract the price of the vehicle.
Click to reveal more details on a template
For some websites, you may need to click on an element to view more information. For example, the listing page for autotrader.com has a table categorized with tabs which expands to reveal more vehicle description information. To have ParseHub scrape this information from each tab:
1. Click on the + sign next to "Select page", choose a Select command and use it to select the first second tab "Interior":
2. Click on the + sign next to your selection and choose a Click command. In the pop-up, choose "No" when asked if this is a "next page" button and then select the option to "Continue executing the current template".
Scrape unordered vehicle specifications
Within vehicle tech specs, it is common for them to appear in different orders, depending on what information is available for that vehicle. So, for example, for one vehicle "Trim" may be the first specification but for another vehicle, it may be the third. To resolve this issue we can follow the instructions on this tutorial.
Note that each website will be slightly different, so some of the suggestions for individual listings above may not apply to your car dealership website. If you run into any trouble, please feel free to contact us for support.