FlareSolverr Guide - Bypassing Cloudflare Made Simple | ScrapeOps (2024)

FlareSolverr Guide - Bypassing Cloudflare Made Simple | ScrapeOps (1)

FlareSolverr is a Python package designed to help you bypass Cloudflare's anti-bot protection.

In this guide we're going to walk through how to setup and use Scrapy Splash, including:

  • How To Install Docker
  • Use FlareSolverr With Our Scrapers
  • Option 1: Send All Requests To FlareSolverr
  • Option 2: Use FlareSolverr To Only Retrieve Valid Cloudflare Cookies
  • Controlling FlareSolverr Sessions
  • Making POST Requests With FlareSolverr
  • Alternatives To FlareSolverr

For other method of bypassing Cloudflare then check out our How to Bypass Cloudflare guide.

Warning

Cloudflare is continously changing and upgrading their anti-bot protection systems, making it harder for web scrapers and libraries like FlareSolverr to bypass their detection methods. As a result, open source anti-bot bypassing libraries like FlareSolverr can often go out of date and stop working.

Need help scraping the web?

Then check out ScrapeOps, the complete toolkit for web scraping.

FlareSolverr Guide - Bypassing Cloudflare Made Simple | ScrapeOps (2)

Proxy Manager

FlareSolverr Guide - Bypassing Cloudflare Made Simple | ScrapeOps (3)

Scraper Monitoring

FlareSolverr Guide - Bypassing Cloudflare Made Simple | ScrapeOps (4)

Job Scheduling

What Is FlareSolverr?

FlareSolverr is a proxy server that you can use to bypass Cloudflare's anti-bot protection so you can scrape data from websites who have deployed their content on Cloudflare's CDN.

Cloudflare allows your scrapers to bypass Cloudflare's anti-bot pages like the one below:

FlareSolverr Guide - Bypassing Cloudflare Made Simple | ScrapeOps (5)

Cloudflare uses numerous browser fingerprinting challenges & checks (more detail here) to determine if a request is coming from a real user or a scraper/bot.

When run, FlareSolverr starts a server that uses Python Selenium with undetected-chromedriver to solve Cloudflares Javascript and browser fingerprinting challenges by impersonating a real web browser.

FlareSolverr opens the target URL with a Selenium browser and waits until the Cloudflare challenge is solved, before returning the HTML and cookies Cloudflare returns to the browser.

You can then use these cookies to bypass Cloudflare using other HTTP clients like Python Requests.

How To Install Docker

As the easiest way to setup FlareSolverr is using Docker (as it already contains the Chromium browser), in this guide we will first show you how to install Docker.

So if you haven't Docker installed already then use one of the following links to install Docker:

Download the Docker installation package, and follow the instructions. Your computer may need to restart after installation.

After installation, if Docker isn't running then click the Docker Desktop icon. You can check that docker is by running the command in your command line:


docker

If it is recognized then you should be good to go.

Install & Run FlareSolverr

Next we need to get FlareSolverr up and running.

1. Download FlareSolver

First we need to download the FlareSolverr Docker image, which we can do by running the following command on Windows or Max OS:


docker pull flaresolverr/flaresolverr

Or on a Linux machine:


sudo docker pull flaresolverr/flaresolverr

If everything has worked correctly, when you open you Docker Desktop on the Images tab you should see the flaresolverr/flaresolverr image (or ghcr.io/flaresolverr/flaresolverr image depending on which option you used).

FlareSolverr Guide - Bypassing Cloudflare Made Simple | ScrapeOps (6)

2. Run FlareSolverr

To run FlareSolverr, we need to run the following command in our command line again.

For Windows and Max OS:


docker run -d \
--name=flaresolverr \
-p 8191:8191 \
-e LOG_LEVEL=info \
--restart unless-stopped \
ghcr.io/flaresolverr/flaresolverr:latest

For Linux:


sudo docker run -d \
--name=flaresolverr \
-p 8191:8191 \
-e LOG_LEVEL=info \
--restart unless-stopped \
ghcr.io/flaresolverr/flaresolverr:latest

To check that FlareSolverr is running correctly, go to http://localhost:8191/ and you should see get a response like this.


{
"msg": "FlareSolverr is ready!",
"version": "3.0.2",
"userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36"
}

If you do then, the FlareSolverr server is up and running correctly.

Use FlareSolverr With Our Scrapers

When running, FlareSolverr provides a simple HTTP server that we can send the urls we want to scrape to it, and FlareSolverr will send the request via a Selenium browser and undetected-chromedriver to solve the Cloudflare challenge and return the HTML response and the cookies.

Here is a example of using FlareSolverr to scrape PetsAtHome.com a Cloudflare protected website:


curl -L -X POST 'http://localhost:8191/v1' \
-H 'Content-Type: application/json' \
--data-raw '{
"cmd": "request.get",
"url":"https://www.petsathome.com/",
"maxTimeout": 60000
}'

Here is the same example using Python Requests instead of cURL to send the request to FlareSolverr:


import requests

post_body = {
"cmd": "request.get",
"url":"https://www.petsathome.com/",
"maxTimeout": 60000
}

response = requests.post('http://localhost:8191/v1', headers={'Content-Type': 'application/json'}, json=post_body)

print(response.json())

The response should look something like this:


{
"status": "ok",
"message": "Challenge solved!",
"solution": {
"url": "https://www.petsathome.com/",
"status": 200,
"cookies": [
{
"domain": "www.petsathome.com",
"httpOnly": false,
"name": "WC_MOBILEDEVICEID",
"path": "/",
"secure": false,
"value": "0"
},
{
"domain": ".petsathome.com",
"expiry": 1673531559,
"httpOnly": false,
"name": "FPLC",
"path": "/",
"secure": true,
"value": "k03jwEFLbwxG2InqkF8yDy5%2BxWFeypsVETpfQGAFNO9M33HudoClDsp%2FY9BH89yLrGpQRLYL2WCgOkBrWRwdcK%2BycvG8%2F3m3SjDu3ZDXXHodwcxEhm4fQo7x8G%2BMrw%3D%3D"
},
...
],
"userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36",
"headers": {},
"response": "<html><head>...</head><body>...</body></html>"
},
"startTimestamp": 1673459546891,
"endTimestamp": 1673459560345,
"version": "3.0.2"
}

As you can see FlareSolverr has successfully solved the Cloudflare challenge "message": "Challenge solved!" and returned the cookies and HTML response from the website.

This functionality gives us two ways to use FlareSolverr:

  • Option 1: Send all requests via FlareSolverr and leave it deal with any Cloudflare challenges.
  • Option 2: Use FlareSolverr to retrieve valid Cloudflare cookies that we can then use with other HTTP clients like Python Requests.

Option 1 is the simplest of the two options as you can just send the URLs you want to scrape to your FlareSolverr server and leave it deal with Cloudflare. However, as browsers are memory & bandwidth intensive using this approach can be unreliable and expensive when done at scale.

Option 2 is a small bit trickier, but a more reliable and cost effective approach if you intend to scrape at scale.

We will run through how to use both options below.

Option 1: Send All Requests To FlareSolverr

The first option is to send all the URLs you want to scrape to FlareSolverr and have it manage bypassing Cloudflare and handling the session cookies for you.


import requests

url_list = [
'https://www.petsathome.com/',
'https://www.petsathome.com/',
]

for url in url_list:
post_body = {
"cmd": "request.get",
"url": url,
"maxTimeout": 60000
}

## Send Request To FlareSolverr
response = requests.post('http://localhost:8191/v1', headers={'Content-Type': 'application/json'}, json=post_body)
if response.status_code == 200:
json_response = response.json()
if json_response.get('status') == 'ok':
html = json_response['solution']['response']
## ...parse data from response
print('Success')

This will work but it make your scraper slower, more expensive and unreliable to run as every request will be going through the Selenium browser.

Option 2: Use FlareSolverr To Only Retrieve Valid Cloudflare Cookies

The other option is to use FlareSolverr to retrieve valid Cloudflare cookies after passing the Cloudflare challenge and then using these cookies with another HTTP client to scrape the subsequent pages you want to scrape.

This is the recommended way of using FlareSolverr but is a small bit trickier.


import requests

post_body = {
"cmd": "request.get",
"url":"https://www.petsathome.com/",
"maxTimeout": 60000
}

response = requests.post('http://localhost:8191/v1', headers={'Content-Type': 'application/json'}, json=post_body)

if response.status_code == 200:
json_response = response.json()
if json_response.get('status') == 'ok':

## Get Cookies & Clean
cookies = json_response['solution']['cookies']
clean_cookies_dict = {cookie['name']: cookie['value'] for cookie in cookies}

## Get User-Agent
user_agent = json_response['solution']['userAgent']

## Make normal request
headers={"User-Agent": user_agent}

response = requests.get("https://www.petsathome.com/", headers=headers, cookies=clean_cookies_dict)
if response.status_code == 200:
## ...parse data from response
print('Success')


Here we make the request with FlareSolverr to retrieve valid Cloudflare cookies after passing the Cloudflare challenge then we:

  1. Extract & clean the valid Cloudflare cookies
  2. Extract the user-agent FlareSolverr used to get the cookies
  3. Make a new requests with Python Requests using the FlareSolverr cookies & user-agent to avoid triggering the Cloudflare challenge

User-Agents & IP Addresses

The Cloudflare cookies are linked to the user-agent and IP address that FlareSolverr used when solving the Cloudflare challenge so you need to make sure to use the same user-agent and IP address when making subsequent requests with a different HTTP client.

Controlling FlareSolverr Sessions

FlareSolverr V2 allowed you to control the creation and destruction of browser sessions, meaning that you can have multiple browser sessions running on the FlareSolverr server.

This meant you could have different browser sessions running for different websites, proxy settings, etc. and you can then choose which ever one you want to use.

An example for this, would be that you are scraping a Cloudflare protected website with a pool of proxy IP addresses. In this case, you will need a unique set of Cloudflare cookies for each IP address you use as otherwise Cloudflare could block the requests as your IP address is changing for a single cookie.

However, FlareSolverr V3 still doesn't support this functionality since the refactor so it is currently unavailable.

Making POST Requests With FlareSolverr

FlareSolverr also allows you to make POST requests to Cloudflare protected websites if you need to retrieve valid Cloudflare cookies from POST endpoints.

To do so, you simply need to use request.post instead of request.get in the cmd section of the FlareSolverr post body, and add any POST data you need to send in the postData value of the POST data:


import requests

post_body = {
"cmd": "request.post",
"url":"https://www.example.com/POST",
"postData": POST_DATA,
"maxTimeout": 60000
}

response = requests.post('http://localhost:8191/v1', headers={'Content-Type': 'application/json'}, json=post_body)

print(response.json())

The postData must be a string with application/x-www-form-urlencoded. Eg: a=b&c=d.

Alternatives To FlareSolverr

FlareSolverr is a powerful tool, however, open source solutions like FlareSolverr often go out of date and stop working due to Cloudflare updates.

Cloudflare and other anti-bots providers monitor the web for open source anti-bot bypassing tools and often develop fixes for them in a couple months that detect/block them.

So if you are thinking of using FlareSolverr to relibaly bypass Cloudflare on a website then an alternative is to use smart proxies that develop and maintain their own private anti-bot bypasses.

These are typically more reliable as it is harder for anti-bot companies like Cloudflare to develop detections for, as they are developed by proxy companies who are financially motivated to stay 1 step ahead of anti-bot companies and fix their bypasses the very minute they stop working.

One of the best options is the ScrapeOps Proxy Aggregator as it integrates over 20 smart proxy providers into the same proxy API, and finds the best/cheapest proxy provider for your target domains.

You can activate ScrapeOps' Anti-Bot Bypasses by simply using the bypass flag to your API request.

For example, the the below code we will use the Cloudflare bypass by adding bypass=cloudflare_level_1 to the request:


import requests

response = requests.get(
url='https://proxy.scrapeops.io/v1/',
params={
'api_key': 'YOUR_API_KEY',
'url': 'http://example.com/', ## Cloudflare protected website
'bypass': 'cloudflare_level_1',
},
)

print('Body: ', response.content)

tip

Cloudflare is the most common anti-bot system being used by websites today, and bypassing it depends on which security settings the website has enabled.

To combat this, we offer 3 different Cloudflare bypasses designed to solve the Cloudflare challenges at each security level.

Security LevelBypassAPI CreditsDescription
Lowcloudflare_level_110Use to bypass Cloudflare protected sites with low security settings enabled.
Mediumcloudflare_level_235Use to bypass Cloudflare protected sites with medium security settings enabled. On large plans the credit multiple will be increased to maintain a flat rate of $3.50 per thousand requests.
Highcloudflare_level_350Use to bypass Cloudflare protected sites with high security settings enabled. On large plans the credit multiple will be increased to maintain a flat rate of $4 per thousand requests.

The advantage of taking this approach is that you can use your normal HTTP client and don't have to worry about:

  • Fortifying headless browsers
  • Managing numerous headless browser instances & dealing with memory issues
  • Reverse engineering the anti-bot protection systems

As this is all managed within the ScrapeOps Proxy Aggregator.

You can get a ScrapeOps API key with 1,000 free API credits by signing up here.

More Scrapy Tutorials

In this guide we've introduced you to the fundamental functionality of Scrapy Splash and how to use it in your own projects.

However, if you would like to learn more about Scrapy Splash then check out the offical documentation here.

If you would like to learn more about different Javascript rendering options for Scrapy, then be sure to check out our other guides:

  • Scrapy JS Rendering Libraries Compared
  • Scrapy Playwright Guide
  • Scrapy Selenium Guide

If you would like to learn more about Scrapy in general, then be sure to check out The Scrapy Playbook.

FlareSolverr Guide - Bypassing Cloudflare Made Simple | ScrapeOps (2024)

FAQs

How to skip Cloudflare captcha? ›

How to Bypass Cloudflare Protection? Tutorial for 2024
  1. Send Requests Directly to the Origin IP Address.
  2. Scrape Google's Cache.
  3. Use Headless Browsers With Plugins.
  4. Use Proxies and IP Address Rotation.
  5. Use a CAPTCHA Solver.
Mar 4, 2024

What is the alternative to FlareSolverr? ›

ScrapFly: FlareSolverr Alternative

ScrapFly is a web scraping API that provides an anti-scraping protection bypass to avoid any website blocking.

How do I bypass Cloudflare browser out of date? ›

So in this guide, we're going to go through each of those options so you can choose the one that works best for you.
  1. Option #1: Send Requests To Origin Server.
  2. Option #2: Scrape Google Cache Version.
  3. Option #3: Cloudflare Solvers.
  4. Option #4: Scrape With Fortified Headless Browsers.

How to bypass Cloudflare Python scraper? ›

How to Bypass Cloudflare When Web Scraping in 2024
  1. Start With Headless Browsers.
  2. Use High Quality Residential Proxies.
  3. Try undetected-chromedriver.
  4. Try Puppeteer Stealth Plugin.
  5. Try FlareSolverr.
  6. Try curl-impersonate.
  7. Try Warming Up Scrapers.
  8. Rotate Real User Fingerprints.
Jun 4, 2024

What tool is used to bypass Cloudflare? ›

These are some of the techniques we'll cover:
  • Method #1: Use Cloudflare solvers.
  • Method #2: Web scraping API to bypass Cloudflare.
  • Method #3: Bypass Cloudflare CDN by calling the origin server.
  • Method #4: Implement fortified headless browsers.
  • Method #5: Smart proxies to get past Cloudflare.
3 days ago

How to get past Cloudflare verification? ›

  1. Make sure your browsers are up to date.
  2. Disable any browser extensions that may interfere with the captcha (e.g., ad blockers, privacy extensions). ...
  3. Try accessing the website using a different device or network to see if the issue persists.
Jan 27, 2024

Why do I need FlareSolverr? ›

In essence, FlareSolverr is a proxy server to bypass Cloudflare protection. It serves as a bridge between Cloudflare's anti-bot measures and the need for efficient data retrieval.

Are Jackett and Prowlarr the same? ›

In conclusion, while Jackett is a well-established and widely used indexing agent, Prowlarr offers several advantages for RapidSeedbox users who want a more user-friendly, integrated, and actively developed option.

How to bypass Cloudflare access denied? ›

To bypass Cloudflare Error 1020, you'll usually have to combine different techniques, like:
  1. Customizing your client's User-Agent header.
  2. Masking the automation properties of headless browsers with Undetected ChromeDriver.
  3. Using a rotating proxy to hide your IP.
May 16, 2023

How to get unblocked from Cloudflare? ›

Cloudflare users are blocking you. It may be deliberate, or their rules are too restrictive and overblocking, but it's obviously not significant enough to cause them concern or they would notice and fix it or not use Cloudflare. The only way to get access is to contact them and ask to be let in.

Why can't I get past Cloudflare? ›

Clear your browser cookies and cache. Ensure cookies are enabled in your browser settings. Disable Ad-blocker or any privacy-related extensions temporarily.

How do ZenRows bypass Cloudflare? ›

The best way to bypass Cloudflare with Python is using ZenRows. It's a web scraping API capable of bypassing Cloudflare in Python with a single request. It simplifies the process of integrating scraping tasks into your workflow with its advanced anti-bot features and proxy modes.

How do I bypass Cloudflare status? ›

BYPASS is returned when enabling Origin Cache-Control. Cloudflare also sets BYPASS when your origin web server sends cookies in the response header. Cloudflare does not consider the asset eligible to cache and your Cloudflare settings do not explicitly instruct Cloudflare to cache the asset.

Does Cloudflare block web scraping? ›

Cloudflare Bot Management is designed to block content scraping attacks, along with bot mitigation for other kinds of malicious traffic.

Is there a way to skip captcha? ›

Browser automation tools, such as Selenium and Puppeteer, are helpful in avoiding CAPTCHAs because they simulate human-like interactions with a website.

How to solve cloudflare turnstile captcha? ›

Cloudflare Turnstile is solved by taking the captcha parameters from the page in the form of the "data-sitekey" parameter and the page URL, sending them to the 2Captcha service, where an employee solves them.

Why do I have to complete a captcha Cloudflare? ›

Cloudflare Captcha is a feature that is used to distinguish between human users and automated bots. It's an essential component of Cloudflare's security services, designed to defend websites against automated attacks and abuse.

Top Articles
Latest Posts
Article information

Author: Twana Towne Ret

Last Updated:

Views: 6564

Rating: 4.3 / 5 (44 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Twana Towne Ret

Birthday: 1994-03-19

Address: Apt. 990 97439 Corwin Motorway, Port Eliseoburgh, NM 99144-2618

Phone: +5958753152963

Job: National Specialist

Hobby: Kayaking, Photography, Skydiving, Embroidery, Leather crafting, Orienteering, Cooking

Introduction: My name is Twana Towne Ret, I am a famous, talented, joyous, perfect, powerful, inquisitive, lovely person who loves writing and wants to share my knowledge and understanding with you.