Amazon Search Result Scraper using Python

Many of you have heard this term Web Scraping and In this post we'll see what is the web Scraping and how to perform web scraping using Python.before we proceed let me tell you that in this post we'll learn how to make a program that will scrape the contents of Amazon.com's Search results also we'll see how to store the results in JSON(Javascript Object Notation) format.

What is Web Scraping?

Web Scraping is a method or a way using which we can extract the data from the websites.here data can be anything like all the images,videos,useful contents etc of the website.
If i talk about libraries in python for web scraping then Python has many libraries like (requests,BeautifulSoup,Selenium,RPA) using which we can perform Web Scraping.

Requirements and Installations:-

For this Amazon Search Result Scraper Project we require the following things

Selenium Library:- Open your terminal or command prompt and enter this command to install the selenium "pip install selenium".
the second thing that we require is Web Driver of your browser that you are using,I am using a Google Chrome so i'll use a chrome web driver.
If you are using some other browser then you require the web driver for that browser just make a google search you'll find that.

You have to select the web driver specifically for your Browser version for example.

My Chrome browser version is 84.0.4147 therefore i will select the web driver version 84.0.4147.30

download and rename it to "chromedriver.exe".

Now make one folder of name Amazon Scraper and inside that folder create a Web_Scraper.py file and one folder of name "chromedriver" and Put your chromedriver.exe in the chromedriver folder.

Project Folder Structure

Coding:-

Now everything is set we can start with the coding,open the Web_Scraper.py file in your favorite code editor I am using the Jupyter Notebook.

Step 1:- import the require libraries and modules

import selenium
from selenium import webdriver
import time
import os
from selenium.webdriver.common import keys
import json

Step 2:-Initialize the web driver with the site Amazon.com

#initializing the chrome driver
chrome_driver = os.path.join(os.getcwd(),"chromedriver/chromedriver.exe")
driver = webdriver.Chrome(chrome_driver)
driver.get("https://www.Amazon.com")

above code will open the amazon.com website in the chrome driver

Step 3:-whenever we want to search any product then we enter the name of the product in the Search box and press the search button for searching the product and after pressing the search button results are getting displayed in some area of the web page.So we'll grab these elements in the program by using below code.

grabbing the search box

search_box = driver.find_element_by_id("twotabsearchtextbox")
#For making the search
search_box.send_keys(input("Enter the Product to search")+"\n")

when you run the above code then it will ask you to enter the product to search, let say we want to search the smartphones the moment when you press enter after giving the product name it will show the results for the smartphones

search results

Step 4:- Now again do an Inspect element and locate the area in which all the search results are displaying.results are displaying in the "s-latency-cf-section" class therefore we'll grab this element.

results = driver.find_elements_by_class_name("s-latency-cf-section")

Step 5:- First we grab the links of all the images present in the results use the below code to achieve that.

#finding the images of the products
images = driver.find_elements_by_tag_name('img')
image_list = []
for each_image in images:
    if each_image.get_attribute('alt').strip():
        image_list.append(each_image.get_attribute('src'))

Products in the results will be of 2 type either will be a normal product or its a Sponsored Product

Attribute sequence for Sponsored product:-

Sponsored
name
ratings
price

Attribute sequence for Normal Product:-

name
ratings
price

therefore we'll use the below code to store the information of each product in the Dictionary.

#result will be stored n the data
data={'products':[]}
each_detail = {}
i=0
for each_result in results[1:]:
    each_detail = {}
    
    try:
        each_result = each_result.text.split('\n')
        if each_result !=['']:
            if each_result[0] == "Sponsored":
                each_detail['image_url'] = image_list[i]
                each_detail['name'] = each_result[1]
                each_detail['price'] =   each_result[3] + each_result[4]
                each_detail['ratings'] = each_result[2]
                data['products'].append(each_detail)
            else:
                each_detail['image_url'] = image_list[i]
                each_detail['name'] = each_result[0]
                each_detail['price'] = each_result[2] + each_result[3]
                each_detail['ratings'] = each_result[1]
                data['products'].append(each_detail)
                
            
        i+=1
    except:
        pass
else:
    print(data)

Step 6:- Now we'll store the result in "search_result.json" file and use the below code to stop the web driver.

driver.quit()

Demonstration:- Watch this video to see how it works.

Project Link:-

https://github.com/SurajGuptaRavi/Amazon-Web-Scraper

Thank You For Reading

Header Ads

Amazon Search Result Scraper using Python

Post a Comment

1 comment:

Contact Form

Popular Posts

Categories

Blog Archive

Random Posts