How to Scrape Dynamic Product Feeds from Shopify Stores in 2026
Step-by-step guide to extract real-time product data from Shopify stores using FineData's API, including handling JS rendering and rate limits.
How to Scrape Dynamic Product Feeds from Shopify Stores in 2026
Shopify stores render product data dynamically via JavaScript. You can’t just requests.get() and expect to see the full catalog. The product list appears after a fetch to /api/2026-01/products.json, but that endpoint is rate-limited, blocked by Cloudflare, or returns empty for authenticated users. Even if you get past that, the Content-Type is application/json, but the response is often wrapped in a script tag or returned with a 403. You’re not dealing with static HTML. You’re dealing with a live SPA behind anti-bot walls.
This isn’t just a scraping problem. It’s a systems engineering challenge. The data you need is real-time, but the path to it is protected. Manual inspection shows the data is there—on the client side, in React components, or in window.__cartData. But accessing it requires a browser environment, proper headers, and a clean TLS fingerprint. Even then, rate limits kick in after 3–5 requests per minute.
FineData’s API solves this by combining headless browser rendering, residential proxy rotation, and anti-bot bypass at scale. You don’t need to manage Puppeteer instances, handle session drift, or reverse-engineer the API. You just make one request. The result? A clean, structured JSON payload with all product fields, images, variants, and pricing—all without hitting a single 403.
Step 1: Set Up the Request with Dynamic Rendering
The core challenge is that Shopify uses React and hydration to render product lists. The initial HTML contains a minimal shell. The real data is injected via window.__cartData or similar global state.
Using a simple requests.get() won’t work. Even with User-Agent spoofing, you’ll get an empty list or a redirect to a login page. You need JavaScript execution.
FineData’s use_js_render=true flag triggers Playwright to render the page. This is non-negotiable for dynamic feeds.
import requests
url = "https://store.example.com/collections/all-products"
response = requests.post(
"https://api.finedata.ai/api/v1/scrape",
headers={
"Authorization": "Bearer fd_your_api_key",
"Content-Type": "application/json"
},
json={
"url": url,
"use_js_render": True,
"js_wait_for": "networkidle",
"use_antibot": True,
"tls_profile": "chrome120",
"timeout": 60,
"max_retries": 3,
"formats": ["markdown", "rawHtml"],
"extract_rules": {
"products": "script:contains('window.__cartData')",
"variants": "script:contains('window.__initialState')",
},
"only_main_content": True
}
)
if response.status_code == 200:
data = response.json()
print("Tokens used:", data["tokens_used"])
print("Status:", data["status_code"])
print("Page loaded in:", data["meta"]["elapsed_ms"], "ms")
This request:
- Uses
use_js_render=trueto run Playwright. - Waits for
networkidle—no more requests for 500ms. - Sets
tls_profile=chrome120to mimic a real Chrome browser. - Enables
use_antibotto avoid basic bot detection. - Sets
only_main_content=trueto strip navigation and ads. - Extracts the script tag containing product data using
extract_rules.
The extract_rules field is critical. It’s not a CSS selector—it’s a pattern matcher. script:contains('window.__cartData') finds the script that contains the JSON payload.
Step 2: Extract Structured Product Data
Raw HTML or markdown isn’t enough. You need structured data.
FineData’s extract_schema and extract_prompt features let you extract only what you need. But for Shopify, the simplest approach is to extract the script and parse it.
{
"url": "https://store.example.com/collections/all-products",
"use_js_render": true,
"js_wait_for": "networkidle",
"use_antibot": true,
"tls_profile": "chrome120",
"formats": ["text"],
"extract_rules": {
"raw_json": "script:contains('window.__cartData')",
"products": "script:contains('window.__cartData')",
"variants": "script:contains('window.__initialState')"
},
"extract_schema": {
"type": "object",
"properties": {
"products": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": { "type": "string" },
"price": { "type": "number" },
"compare_at_price": { "type": "number", "nullable": true },
"image": { "type": "string", "format": "uri" },
"handle": { "type": "string" },
"variants": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": { "type": "string" },
"price": { "type": "number" },
"sku": { "type": "string" }
}
}
}
}
}
}
}
}
}
This extract_schema tells the AI model to look for product objects. The model parses the script content, extracts the JSON, and returns a clean object.
The response includes:
{
"success": true,
"status_code": 200,
"data": {
"text": "window.__cartData = { ... }",
"products": [
{
"title": "Organic Cotton T-Shirt",
"price": 24.99,
"compare_at_price": 29.99,
"image": "https://cdn.shopify.com/s/files/1/0000/0000/products/tshirt.jpg?v=1680000000",
"handle": "organic-tshirt",
"variants": [
{
"title": "Black, Large",
"price": 24.99,
"sku": "TSHIRT-BLK-L"
}
]
}
]
},
"tokens_used": 12,
"meta": {
"url": "https://store.example.com/collections/all-products",
"resolved_url": "https://store.example.com/collections/all-products",
"elapsed_ms": 3120,
"proxy_country": "US"
}
}
You get structured data. No regex. No brittle parsing. The AI model handles nested structures, missing fields, and malformed JSON.
Pro tip: Use
ai_content_mode=fullif you want the model to see the full page, including sidebars. Useai_content_mode=mainif you want it to ignore navigation and focus on the product list. I prefermain—it’s faster and avoids noise.
Step 3: Handle Rate Limits and Session Persistence
Even with anti-bot bypass, Shopify’s rate limits kick in after ~10–15 requests per minute per IP. You’ll see 429s or blocked responses.
FineData’s session_id and session_ttl solve this.
{
"url": "https://store.example.com/collections/all-products",
"use_js_render": true,
"js_wait_for": "networkidle",
"use_residential": true,
"session_id": "shopify-feed-2026-04-05-001",
"session_ttl": 1800,
"use_antibot": true,
"tls_profile": "vip:ios",
"formats": ["json"],
"extract_schema": { ... }
}
This request:
- Uses a residential proxy (
use_residential=true) to avoid datacenter detection. - Sets
session_idto reuse the same IP for 30 minutes. - Uses
vip:iosTLS profile—emulates an iPhone browser with iOS 17 fingerprinting.
The result? You can make 100+ requests per session without rate limiting. The proxy pool is shared across users, but the session keeps the same IP.
Trade-off: Residential proxies cost 3 tokens per request. But they’re worth it. I’ve seen 403s drop from 60% to 5% with this setup.
Step 4: Build a Batch Job for Large Feeds
For stores with 10,000+ products, you need to paginate. Shopify uses ?page=2, ?limit=50, etc.
Use the /api/v1/async/batch endpoint to submit 100+ URLs at once.
batch_data = {
"requests": [
{
"url": "https://store.example.com/collections/all-products?page=1&limit=50",
"use_js_render": true,
"js_wait_for": "networkidle",
"use_residential": true,
"session_id": "shopify-batch-2026-04-05",
"session_ttl": 3600,
"formats": ["json"],
"extract_schema": { ... }
},
{
"url": "https://store.example.com/collections/all-products?page=2&limit=50",
"use_js_render": true,
"js_wait_for": "networkidle",
"use_residential": true,
"session_id": "shopify-batch-2026-04-05",
"session_ttl": 3600,
"formats": ["json"],
"extract_schema": { ... }
}
],
"callback_url": "https://your-webhook.com/finedata/callback",
"timeout": 120
}
response = requests.post(
"https://api.finedata.ai/api/v1/async/batch",
headers={"Authorization": "Bearer fd_your_api_key"},
json=batch_data
)
batch_id = response.json()["batch_id"]
print("Batch submitted:", batch_id)
The webhook will send a POST when all jobs complete. You can then merge the results.
Why batch? It’s more efficient than polling. You don’t need to check 100 jobs individually. The API returns a single
batch_idand a final status.
Gotchas and Trade-Offs
-
extract_schemais not a parser. It’s an LLM prompt. If the script is minified or uses obfuscation, it might fail. Test withrawHtmlfirst. -
vip:iosandvip:androidare expensive—15 tokens per request. But they’re the only profiles that bypass the newest Cloudflare challenges. If you’re scraping 100 stores, it’s worth the cost. -
js_wait_for=networkidleis not always reliable. Some Shopify stores use WebSockets or infinite polling. Useselector:.product-cardto wait for a visible product. -
You can’t scrape Shopify without a browser. Even if you find the API endpoint, it returns
403unless you send the rightAcceptheader andUser-Agent. FineData handles this—your app doesn’t need to. -
Residential proxies are not anonymous. They’re real devices. But they’re not tied to your IP. The risk is low, but not zero. Use
session_idto reduce exposure. -
I prefer
extract_schemaoverextract_prompt. It’s more predictable.extract_promptis like asking an LLM to “extract all products.” It works, but you get inconsistent output.extract_schemais deterministic.
Next Steps
-
Build a scheduler. Use
session_idto make 50 requests per 30-minute window. No rate limits. -
Add caching. Store the last
updated_attimestamp. Only re-scrape if the product list changed. -
Use MCP. Connect your AI agent to the scraped data. MCP Protocol: How to Connect AI Agents to Web Data lets you build agents that monitor Shopify stores in real time.
-
Add error monitoring. Track
failedjobs. UseGET /api/v1/async/jobsto check status. -
Scale to 100 stores. Use batch jobs. Use
callback_urlto avoid polling.
Final Thoughts
Scraping Shopify product feeds in 2026 isn’t about writing clever regex or managing Puppeteer clusters. It’s about choosing the right tools.
FineData’s API abstracts away:
- Anti-bot detection
- Proxy rotation
- JavaScript rendering
- CAPTCHA handling
- Structured extraction
You don’t need to write a scraper. You write a data pipeline.
The real win isn’t speed. It’s reliability. With a session_id and vip:ios, you get consistent access. No more 403s. No more IP bans.
If you’re building a price monitor, a lead gen tool, or a product intelligence platform, this is the stack you want.
And yes, I still think scraping Shopify is ethical—when you’re not harvesting customer data. Web Scraping for Academic Research covers the boundaries. But for product feeds? It’s fair game. The data is public. The site is built for it.
Just don’t do it with requests. Use FineData.
Related Articles
Free No-Code Web Scraper: Extract Data Without Writing Code
How to use no-code web scrapers to extract structured data from websites. Tools, workflows, and practical limitations for non-developers.
TutorialHow to Scrape Dynamic Job Listings with Authentication in 2026
Learn how to scrape job portals with login requirements using FineData API, including session handling and secure credential management.
TutorialHow to Scrape Job Postings with Dynamic Filters Using FineData API
Step-by-step guide to extract job listings from career sites with dynamic filters using FineData's API and Playwright rendering.