How to Scrape Amazon Product Data with Python in 2026
Learn how to extract Amazon product data including titles, prices, reviews, and ratings using Python. Complete tutorial with code examples.
How to Scrape Amazon Product Data with Python in 2026
Amazon is the world’s largest online marketplace with over 350 million products. Whether you’re building a price comparison tool, doing competitive research, or feeding an analytics pipeline, Amazon product data is incredibly valuable. But extracting it programmatically is one of the hardest web scraping challenges out there.
In this guide, you’ll learn how to scrape Amazon product data reliably using Python and the FineData API, from a single product page to thousands of listings at scale.
Why Scraping Amazon Is So Challenging
Amazon invests heavily in anti-bot technology. If you’ve tried scraping Amazon with a simple requests.get(), you’ve likely seen one of these:
- CAPTCHA pages — Amazon serves CAPTCHAs aggressively to suspected bots
- IP bans — Datacenter IPs get blocked within a few dozen requests
- Dynamic content — Product details, reviews, and pricing are loaded via JavaScript
- Request fingerprinting — Amazon inspects TLS fingerprints, headers, and browser characteristics
- Rate limiting — Even with rotating proxies, too many requests trigger throttling
A naive approach might work for 10 requests, but it will fail at any meaningful scale. Let’s build something that actually works.
Setting Up Your Environment
First, install the dependencies:
pip install requests beautifulsoup4
You’ll also need a FineData API key. Sign up at finedata.ai and grab your key from the dashboard.
import requests
from bs4 import BeautifulSoup
import json
import time
FINEDATA_API_KEY = "fd_your_api_key"
FINEDATA_URL = "https://api.finedata.ai/api/v1/scrape"
def scrape_page(url, use_js=False):
"""Fetch a page through FineData's API."""
response = requests.post(
FINEDATA_URL,
headers={
"x-api-key": FINEDATA_API_KEY,
"Content-Type": "application/json"
},
json={
"url": url,
"use_js_render": use_js,
"tls_profile": "chrome124",
"use_residential": True,
"timeout": 30
}
)
response.raise_for_status()
return response.json()
We’re using use_residential: True because Amazon blocks most datacenter IPs. Residential proxies rotate through real consumer IP addresses, which Amazon treats as legitimate traffic.
Extracting Product Data from a Single Page
Let’s start with the core task: extracting structured data from an Amazon product page.
def parse_product_page(html):
"""Extract product details from Amazon product page HTML."""
soup = BeautifulSoup(html, "html.parser")
product = {}
# Product title
title_el = soup.select_one("#productTitle")
product["title"] = title_el.get_text(strip=True) if title_el else None
# Price — Amazon uses multiple price containers
price_el = (
soup.select_one(".a-price .a-offscreen")
or soup.select_one("#priceblock_ourprice")
or soup.select_one("#priceblock_dealprice")
or soup.select_one(".a-price-whole")
)
product["price"] = price_el.get_text(strip=True) if price_el else None
# Rating (e.g., "4.5 out of 5 stars")
rating_el = soup.select_one("#acrPopover .a-icon-alt")
if rating_el:
rating_text = rating_el.get_text(strip=True)
product["rating"] = float(rating_text.split(" ")[0])
else:
product["rating"] = None
# Number of reviews
reviews_el = soup.select_one("#acrCustomerReviewText")
if reviews_el:
reviews_text = reviews_el.get_text(strip=True)
product["review_count"] = int(
reviews_text.split(" ")[0].replace(",", "")
)
else:
product["review_count"] = None
# Availability
avail_el = soup.select_one("#availability span")
product["availability"] = (
avail_el.get_text(strip=True) if avail_el else None
)
# Product images
images = []
img_block = soup.select_one("#imgTagWrapperId img")
if img_block and img_block.get("data-a-dynamic-image"):
img_data = json.loads(img_block["data-a-dynamic-image"])
images = list(img_data.keys())
product["images"] = images
# Feature bullets
bullets = soup.select("#feature-bullets .a-list-item")
product["features"] = [
b.get_text(strip=True) for b in bullets
if b.get_text(strip=True)
]
return product
Now put it together:
def scrape_amazon_product(asin):
"""Scrape a single Amazon product by ASIN."""
url = f"https://www.amazon.com/dp/{asin}"
result = scrape_page(url, use_js=True)
html = result["body"]
product = parse_product_page(html)
product["asin"] = asin
product["url"] = url
return product
# Example usage
product = scrape_amazon_product("B0BSHF7WHW")
print(json.dumps(product, indent=2))
We enable JavaScript rendering (use_js=True) because Amazon loads pricing and availability dynamically. Without it, you’ll often get incomplete data.
Handling Product Variations
Many Amazon products have variations — different sizes, colors, or configurations. These are typically loaded via AJAX when a user clicks a variation button, and the data lives in a JavaScript object embedded in the page source.
import re
def extract_variations(html):
"""Extract product variation data from page source."""
variations = []
# Amazon embeds variation data in a JS object
pattern = r'"dimensionValuesDisplayData"\s*:\s*(\{[^}]+\})'
match = re.search(pattern, html)
if match:
try:
dim_data = json.loads(match.group(1))
for asin, values in dim_data.items():
variations.append({
"asin": asin,
"attributes": values
})
except json.JSONDecodeError:
pass
return variations
For a complete picture, you’d scrape each variation’s ASIN separately — but this gives you the list of available variations without extra requests.
Scraping Search Results for Product Discovery
Often you don’t have specific ASINs — you want to discover products by searching Amazon. Here’s how to scrape Amazon search results:
def scrape_amazon_search(query, max_pages=3):
"""Scrape Amazon search results for a given query."""
all_products = []
for page in range(1, max_pages + 1):
url = (
f"https://www.amazon.com/s"
f"?k={query.replace(' ', '+')}&page={page}"
)
result = scrape_page(url, use_js=True)
html = result["body"]
soup = BeautifulSoup(html, "html.parser")
items = soup.select('[data-component-type="s-search-result"]')
for item in items:
product = {}
# ASIN from data attribute
product["asin"] = item.get("data-asin", "")
# Title
title_el = item.select_one("h2 a span")
product["title"] = (
title_el.get_text(strip=True) if title_el else None
)
# Price
price_whole = item.select_one(".a-price-whole")
price_frac = item.select_one(".a-price-fraction")
if price_whole:
price_str = price_whole.get_text(strip=True).rstrip(".")
if price_frac:
price_str += "." + price_frac.get_text(strip=True)
product["price"] = float(price_str.replace(",", ""))
else:
product["price"] = None
# Rating
rating_el = item.select_one(".a-icon-alt")
if rating_el:
try:
product["rating"] = float(
rating_el.get_text().split(" ")[0]
)
except ValueError:
product["rating"] = None
else:
product["rating"] = None
# Review count
reviews_el = item.select_one(
'[aria-label*="stars"] + span .a-size-base'
)
if reviews_el:
text = reviews_el.get_text(strip=True).replace(",", "")
product["review_count"] = int(text) if text.isdigit() else None
else:
product["review_count"] = None
product["url"] = (
f"https://www.amazon.com/dp/{product['asin']}"
)
all_products.append(product)
# Be polite — add a delay between pages
time.sleep(2)
return all_products
# Search for wireless earbuds
results = scrape_amazon_search("wireless earbuds", max_pages=2)
print(f"Found {len(results)} products")
for p in results[:5]:
print(f" {p['title'][:60]}... — ${p['price']}")
Scaling to Thousands of Products
When you need to scrape hundreds or thousands of product pages, sequential requests are too slow. FineData supports batch scraping to parallelize the work:
def scrape_products_batch(asins, batch_size=20):
"""Scrape multiple products using FineData's batch endpoint."""
all_products = []
for i in range(0, len(asins), batch_size):
batch = asins[i:i + batch_size]
urls = [f"https://www.amazon.com/dp/{asin}" for asin in batch]
response = requests.post(
"https://api.finedata.ai/api/v1/batch",
headers={
"x-api-key": FINEDATA_API_KEY,
"Content-Type": "application/json"
},
json={
"urls": urls,
"use_js_render": True,
"use_residential": True
}
)
response.raise_for_status()
batch_result = response.json()
# Poll for results
batch_id = batch_result["batch_id"]
while True:
status_resp = requests.get(
f"https://api.finedata.ai/api/v1/batch/{batch_id}",
headers={"x-api-key": FINEDATA_API_KEY}
)
status = status_resp.json()
if status["status"] == "completed":
for job in status["results"]:
if job["status"] == "completed":
product = parse_product_page(job["body"])
product["url"] = job["url"]
all_products.append(product)
break
time.sleep(5)
print(f"Processed {min(i + batch_size, len(asins))}/{len(asins)}")
return all_products
This processes 20 URLs at a time in parallel, dramatically reducing total scraping time.
Best Practices for Amazon Scraping
1. Use Residential Proxies
Amazon’s anti-bot system is sophisticated enough to detect most datacenter IP ranges. Residential proxies are essential for consistent results. FineData’s use_residential flag handles this automatically.
2. Enable JavaScript Rendering
Amazon dynamically loads prices, availability, and review data. Always use use_js_render: True for product pages to get complete data.
3. Respect Rate Limits
Even with rotating residential proxies, hammering Amazon with hundreds of requests per second will trigger blocks. Add delays between requests (1-3 seconds for sequential, or use batch endpoints that handle pacing for you).
4. Handle Edge Cases
Amazon product pages aren’t uniform. Some products have:
- Multiple sellers with different prices (Buy Box vs. other offers)
- Subscribe & Save pricing
- Lightning deals with countdown timers
- Out-of-stock items with no price
Build your parser to gracefully handle missing elements rather than crashing.
5. Cache Results
Product data doesn’t change every second. Cache results for at least 15-30 minutes to avoid unnecessary requests and token usage.
Token Cost Estimation
Here’s what a typical Amazon scraping session costs in FineData tokens:
| Operation | Tokens per Request | Notes |
|---|---|---|
| Base request | 1 | Always charged |
| JS rendering | +5 | Needed for Amazon |
| Residential proxy | +3 | Recommended for Amazon |
| Total per page | 9 |
For 1,000 product pages, that’s 9,000 tokens. If you encounter CAPTCHAs (rare with residential proxies), add 10 tokens per CAPTCHA solve.
Storing Your Data
Once you’ve scraped product data, you’ll want to store it. Here’s a quick example using SQLite:
import sqlite3
def init_db():
conn = sqlite3.connect("amazon_products.db")
conn.execute("""
CREATE TABLE IF NOT EXISTS products (
asin TEXT PRIMARY KEY,
title TEXT,
price REAL,
rating REAL,
review_count INTEGER,
availability TEXT,
features TEXT,
scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
""")
conn.commit()
return conn
def save_product(conn, product):
conn.execute("""
INSERT OR REPLACE INTO products
(asin, title, price, rating, review_count, availability, features)
VALUES (?, ?, ?, ?, ?, ?, ?)
""", (
product.get("asin"),
product.get("title"),
product.get("price"),
product.get("rating"),
product.get("review_count"),
product.get("availability"),
json.dumps(product.get("features", []))
))
conn.commit()
For a more complete data pipeline with scheduling and alerts, check out our guide on building a price monitoring tool.
Key Takeaways
- Amazon is one of the hardest sites to scrape due to aggressive anti-bot measures, CAPTCHAs, and dynamic content loading.
- Residential proxies and JavaScript rendering are essential for reliable Amazon scraping.
- Structure your scraper to handle missing elements gracefully — Amazon product pages vary significantly.
- Use batch scraping to parallelize requests when working with large product lists.
- Cache results and add delays between requests to stay under the radar and minimize token usage.
- Store scraped data in a database for analysis and tracking over time.
Ready to start scraping Amazon data? Sign up for FineData and get free tokens to try it out. For more advanced patterns, check out our tutorial on handling CAPTCHAs and our API documentation.
Related Articles
Free No-Code Web Scraper: Extract Data Without Writing Code
How to use no-code web scrapers to extract structured data from websites. Tools, workflows, and practical limitations for non-developers.
TutorialHow to Scrape Dynamic Job Listings with Authentication in 2026
Learn how to scrape job portals with login requirements using FineData API, including session handling and secure credential management.
TutorialHow to Scrape Job Postings with Dynamic Filters Using FineData API
Step-by-step guide to extract job listings from career sites with dynamic filters using FineData's API and Playwright rendering.