How to Scrape JavaScript-Heavy Websites and SPAs
Learn why traditional scrapers fail on SPAs and how to scrape React, Vue, and Angular sites using JavaScript rendering and wait strategies.
How to Scrape JavaScript-Heavy Websites and SPAs
The modern web runs on JavaScript. React, Vue, Angular, Next.js, Nuxt — the majority of websites built in the last five years render their content dynamically in the browser. When you fetch these pages with a standard HTTP request, you get an empty shell: a <div id="root"></div> and a bunch of <script> tags.
This is the single biggest challenge in web scraping today. Here’s how to solve it.
Why Traditional Scrapers Fail on SPAs
When you make a request with Python’s requests library, here’s what happens:
import requests
from bs4 import BeautifulSoup
response = requests.get("https://some-react-app.com/products")
soup = BeautifulSoup(response.text, "html.parser")
products = soup.select(".product-card")
print(len(products)) # 0 — nothing found
The response contains something like:
<!DOCTYPE html>
<html>
<head><title>Products</title></head>
<body>
<div id="root"></div>
<script src="/static/js/bundle.js"></script>
</body>
</html>
All the actual content — product cards, prices, images — gets injected by JavaScript after the page loads. The requests library downloads the HTML but never executes the JavaScript.
This affects a huge number of modern sites:
- React / Next.js — Most e-commerce stores, dashboards, SaaS products
- Vue / Nuxt — News sites, marketplaces, booking platforms
- Angular — Enterprise applications, government portals
- Svelte / SvelteKit — Newer sites and tools
- Any site using client-side rendering (CSR)
The Traditional Solution: Headless Browsers
The classic approach is to run a headless browser — a real browser without a visible window:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("--headless")
driver = webdriver.Chrome(options=options)
driver.get("https://some-react-app.com/products")
# Wait for content to load
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, ".product-card"))
)
products = driver.find_elements(By.CSS_SELECTOR, ".product-card")
print(len(products)) # Now we get results
driver.quit()
This works, but it comes with significant drawbacks:
- Resource intensive — Each browser instance uses 200-500MB of RAM
- Slow — Pages take 3-10 seconds to fully render
- Fragile — Browser crashes, memory leaks, and zombie processes
- Detectable — Sites can detect Selenium via
navigator.webdriverand other fingerprints - Hard to scale — Running 50 concurrent browsers needs serious infrastructure
How FineData’s JS Rendering Works
FineData handles JavaScript rendering on its infrastructure, so you don’t need to manage headless browsers. You send a request with use_js_render: True, and FineData:
- Loads the page in a real browser environment
- Executes all JavaScript (React, Vue, etc.)
- Waits for the content to finish rendering
- Returns the fully-rendered HTML
import requests
from bs4 import BeautifulSoup
FINEDATA_API_KEY = "fd_your_api_key"
response = requests.post(
"https://api.finedata.ai/api/v1/scrape",
headers={
"x-api-key": FINEDATA_API_KEY,
"Content-Type": "application/json"
},
json={
"url": "https://some-react-app.com/products",
"use_js_render": True,
"timeout": 30
}
)
data = response.json()
soup = BeautifulSoup(data["body"], "html.parser")
products = soup.select(".product-card")
print(len(products)) # Works — JS was rendered server-side
No browser management, no Selenium drivers, no memory leaks. The rendered HTML comes back in the API response just like a regular HTTP request.
Wait Strategies: Getting the Timing Right
The trickiest part of JS rendering is knowing when the page is “done.” SPAs load data asynchronously — the initial HTML renders, then API calls fetch product data, which then gets rendered into the DOM. You need to wait for all of this to complete.
FineData supports several wait strategies:
Network Idle (Default)
{
"url": "https://example.com",
"use_js_render": True,
"js_wait_for": "networkidle"
}
This waits until there are no more network requests for 500ms. It’s the safest default — most SPAs load data immediately on page render, and once those API calls finish, the content is ready.
Best for: Most SPAs, e-commerce sites, dashboards
DOM Content Loaded
{
"url": "https://example.com",
"use_js_render": True,
"js_wait_for": "domcontentloaded"
}
Returns as soon as the initial HTML is parsed, without waiting for stylesheets, images, or subframes. This is faster but may miss dynamically loaded content.
Best for: Server-side rendered pages (Next.js SSR, Nuxt SSR) where the content is in the initial HTML but some JS enhancement runs after
Selector-Based Waiting
{
"url": "https://example.com/products",
"use_js_render": True,
"js_wait_for": "selector:.product-card"
}
This waits until a specific CSS selector appears in the DOM. It’s the most precise strategy — you’re telling FineData exactly what element signals that the page is ready.
Best for: Pages where you know the exact element that indicates content has loaded
Full Page Load
{
"url": "https://example.com",
"use_js_render": True,
"js_wait_for": "load"
}
Waits for the window.onload event, which fires after all resources (images, stylesheets, iframes) have finished loading.
Best for: Pages where images or iframes contain important data
Handling Common SPA Patterns
Infinite Scroll
Many modern sites use infinite scroll instead of pagination. The content loads as you scroll down. To scrape these, you need to simulate scrolling:
With FineData, you can use the JS rendering with networkidle wait strategy. The initial load typically brings the first batch of items. For subsequent pages, look for the underlying API endpoints:
def scrape_infinite_scroll_api(base_api_url, pages=5):
"""
Instead of scrolling, hit the underlying API directly.
Most infinite scroll sites fetch from a paginated API.
"""
all_items = []
for page in range(1, pages + 1):
api_url = f"{base_api_url}?page={page}&limit=20"
response = requests.post(
"https://api.finedata.ai/api/v1/scrape",
headers={
"x-api-key": FINEDATA_API_KEY,
"Content-Type": "application/json"
},
json={
"url": api_url,
"use_js_render": False, # API returns JSON, no JS needed
"timeout": 30
}
)
data = response.json()
# The body will contain the raw API JSON response
items = json.loads(data["body"])
all_items.extend(items)
return all_items
Pro tip: Open your browser’s DevTools Network tab, scroll the page, and watch for XHR/Fetch requests. You’ll often find a clean JSON API behind the infinite scroll UI. Scraping the API directly is faster, cheaper (no JS rendering needed), and more reliable.
Lazy-Loaded Content
Some sites delay loading certain sections until the user scrolls to them. This is common for images and below-the-fold content:
# Use selector-based waiting for the specific content you need
response = requests.post(
"https://api.finedata.ai/api/v1/scrape",
headers={
"x-api-key": FINEDATA_API_KEY,
"Content-Type": "application/json"
},
json={
"url": "https://example.com/product",
"use_js_render": True,
"js_wait_for": "selector:.review-section",
"timeout": 45
}
)
Client-Side Routing
SPAs often use client-side routing (React Router, Vue Router). URLs like /products/123 don’t correspond to actual server paths — they’re handled by JavaScript. The good news: FineData’s JS rendering handles this automatically. Just pass the full URL and the SPA’s router will navigate to the correct view.
Comparison: Selenium vs Playwright vs FineData
Here’s how the approaches stack up for scraping JavaScript-heavy sites:
| Factor | Selenium | Playwright | FineData API |
|---|---|---|---|
| Setup time | 30+ min | 15 min | 2 min |
| RAM per page | 200-500 MB | 150-300 MB | 0 (server-side) |
| Anti-bot bypass | Poor | Moderate | Built-in |
| Concurrent pages | 5-10 (local) | 10-20 (local) | 100+ |
| TLS fingerprinting | Detectable | Less detectable | Chrome-identical |
| Maintenance | High | Moderate | None |
| Cost | Infrastructure | Infrastructure | Per-request tokens |
When to use Selenium/Playwright:
- You need to interact with pages (fill forms, click buttons, navigate flows)
- You’re scraping a small number of pages (<100/day) and already have the infrastructure
- You need to capture screenshots or PDFs
When to use FineData:
- You need rendered HTML at scale (hundreds to thousands of pages)
- Anti-bot protection is present
- You don’t want to manage browser infrastructure
- You need residential proxies and CAPTCHA solving alongside JS rendering
Real-World Example: Scraping a React E-Commerce Store
Let’s put it all together with a practical example — scraping a React-based product catalog:
import requests
from bs4 import BeautifulSoup
import json
FINEDATA_API_KEY = "fd_your_api_key"
def scrape_react_store(category_url):
"""Scrape products from a React-based e-commerce store."""
# Step 1: Get the rendered category page
response = requests.post(
"https://api.finedata.ai/api/v1/scrape",
headers={
"x-api-key": FINEDATA_API_KEY,
"Content-Type": "application/json"
},
json={
"url": category_url,
"use_js_render": True,
"js_wait_for": "selector:[data-testid='product-grid']",
"tls_profile": "chrome124",
"timeout": 30
}
)
data = response.json()
soup = BeautifulSoup(data["body"], "html.parser")
# Step 2: Extract products
products = []
for card in soup.select("[data-testid='product-card']"):
product = {
"name": card.select_one("h3").get_text(strip=True),
"price": card.select_one("[data-testid='price']").get_text(strip=True),
"image": card.select_one("img").get("src"),
"link": card.select_one("a").get("href"),
}
products.append(product)
# Step 3: Check for next page
next_btn = soup.select_one("[data-testid='next-page']")
has_next = next_btn is not None and "disabled" not in next_btn.get("class", [])
return products, has_next
# Scrape multiple pages
all_products = []
page = 1
while True:
url = f"https://store.example.com/electronics?page={page}"
products, has_next = scrape_react_store(url)
all_products.extend(products)
print(f"Page {page}: {len(products)} products")
if not has_next or page >= 10:
break
page += 1
print(f"\nTotal: {len(all_products)} products")
Token Costs for JS Rendering
JavaScript rendering adds 5 tokens per request on top of the base cost:
| Configuration | Tokens | Use Case |
|---|---|---|
| Base only | 1 | Static HTML sites |
| Base + JS render | 6 | SPAs, React/Vue sites |
| Base + JS + residential | 9 | Protected SPAs |
| Base + JS + residential + CAPTCHA | 19 | Heavily protected sites |
For most SPA scraping, 6 tokens per request (base + JS rendering) is all you need.
Key Takeaways
- Standard HTTP requests return empty HTML for JavaScript-heavy sites — you need a browser to render the content.
- Headless browsers (Selenium, Playwright) work but are resource-intensive, hard to scale, and easy to detect.
- FineData’s JS rendering handles browser execution server-side, returning fully-rendered HTML via a simple API call.
- Choose the right wait strategy:
networkidlefor most cases,selector:...when you know exactly what to wait for. - Look for underlying APIs behind infinite scroll and dynamic content — scraping the API directly is faster and cheaper.
- JS rendering costs 5 extra tokens per request — a fraction of what running your own browser infrastructure costs.
For sites with additional anti-bot protection beyond JavaScript rendering, check out our guides on handling CAPTCHAs and bypassing Cloudflare.
Related Articles
Free No-Code Web Scraper: Extract Data Without Writing Code
How to use no-code web scrapers to extract structured data from websites. Tools, workflows, and practical limitations for non-developers.
TutorialHow to Scrape Dynamic Job Listings with Authentication in 2026
Learn how to scrape job portals with login requirements using FineData API, including session handling and secure credential management.
TutorialHow to Scrape Job Postings with Dynamic Filters Using FineData API
Step-by-step guide to extract job listings from career sites with dynamic filters using FineData's API and Playwright rendering.