Cloudflare has become a very interesting tool for SEO in order to improve web performance optimization as it enables you to:

  • Create a cache version of your site that is hosted in the nearest location as possible when a request is made.
  • Through the Cloudflare workers you can also write and execute JS on the Cloudflare networks and they will be able to intercept and modify requests, cache content, combine third-party scripts and more. If you do not know how to set up the Cloudflare workers you can check this article written by JC Chouinard that will walk you through the process.
  • Improve the security of your site as it will handle DDoS attacks and avoid aggressive scraping that could bring your site down.

Due to all these advantages the use of Cloudflare is lately skyrocketing and already between 15% and 20% of the sites are using it.

On today’s post, we are going to learn how to use Graphl Cloudflare Analytics API with Python to get some data that can be very insightful to create a profile of the hits that are being made on Cloudflare. The report that we are going to generate can be done with the freemium version of Cloudflare, but there are other specific reports that require an upgraded version.

1.- Fetching the account information

In order to authenticate on Cloudflare API we will need to use two things: the email address that is attached to your Cloudflare account and a API key (I use the Global API key). Initially, we can request the account information so that we can extract from the API some data such as the website Zone ID that later on we will use to extract the Cloudflare data of the hits for a specific site.

The email address and the Global API key are sent with the headers to be able to authenticate successfully.

import requests

headers = {
    'X-Auth-Email': '<your-email-address>',
    'X-Auth-Key': '<Global API key>',
    'Content-Type': 'application/json'
}

response = requests.request(
    'GET',
    'https://api.cloudflare.com/client/v4/zones',
    headers=headers
)


data = response.json()

2.- Making the request to the Analytics API

The request to the Analytics API needs to be made to a different endpoint with a POST http request. The metrics that we are interested in are sent with a GraphQL query. The GraphQL query that we are going to use replaced the deprecated Cloudflare Analytics API and returns the very same output.

We are going to use the report called httpRequests1hGroups, which is supported by the free version of Cloudflare and will return the statistics from the hits that have been made to our site for a maximum time range of 259200 seconds (3 days). It can be interesting to fetch this data and store it somewhere else because it only lets you access to data not older than 262800 seconds (around 3 days).

When making the request, we can select if we would like to receive grouped data for that time range or divided by hours. If you would like to split it by hours, you would need to add the dimension datetime to your query and iterate over the hours when you receive the JSON response object. In our case, for simplicity’s sake, we are going to make the request with the grouped data.

In the code below, you will need to introduce again your email address and global API key plus the ZONE ID of the site that you would like to extract the data from and the initial date and final date of the time range that you would like to check.

import requests


headers = {
    'X-Auth-Email': '<your-email-address>',
    'X-Auth-Key': '<GLOBAL API KEY>',
    'Content-Type': 'application/json'
}


data = """{
  viewer {
    zones(filter: {zoneTag: <website zone ID>}) {
      httpRequests1hGroups( limit: 100, filter: {datetime_geq: "2021-10-27T22:00:00Z", datetime_lt: "2021-10-28T20:02:00Z"}) {

        sum {
          browserMap {
            pageViews
            uaBrowserFamily
          }
          bytes
          cachedBytes
          cachedRequests
          contentTypeMap {
            bytes
            requests
            edgeResponseContentTypeName
          }
          clientSSLMap {
            requests
            clientSSLProtocol
          }
          countryMap {
            bytes
            requests
            threats
            clientCountryName
          }
          encryptedBytes
          encryptedRequests
          ipClassMap {
            requests
            ipType
          }
          pageViews
          requests
          responseStatusMap {
            requests
            edgeResponseStatus
          }
          threats
          threatPathingMap {
            requests
            threatPathingName
          }
        }
        uniq {
          uniques
        }
      }
    }
  }
}"""

response = requests.request(
    'POST',
    'https://api.cloudflare.com/client/v4/graphql',
    headers=headers,
    json={'query': data}
)

Once the request is made successfully, we can take a look at it and see what data we have got and how we can parse the JSON object that we receive with the response.

3.- What metrics have we got?

In the response we can find: the number of pageviews, the number of requests, the number of encrypted bytes, the number of encrypted requests, the number of bytes that have been served, the number of cached bytes that have been served and the number of cached requests. This can give us an idea about how well our cache policy is working and the volume of requests and bytes that are being handled by Cloudflare.

In addition, we can also find a response code map, an IP class map, a country map, a content type map, a browser map and a clients SSL map. Even if most of this data can be found on the actual user interface, it can still be good to query it with Python to store it somewhere else as it is only available for the previous 30 days and do further analyses.

3.1.- Pageviews, requests, bytes, encryption and cache

With the keys that can be found below we can get the data for the number of pageviews, the number of requests, the number of encrypted bytes, the number of encrypted requests, the number of bytes that have been served, the number of cached bytes that have been served and the number of cached requests.

pageviews = response.json()["data"]["viewer"]["zones"][0]["httpRequests1hGroups"][0]["sum"]["pageViews"]
requests_cf = response.json()["data"]["viewer"]["zones"][0]["httpRequests1hGroups"][0]["sum"]["requests"]
encrypted_bytes = response.json()["data"]["viewer"]["zones"][0]["httpRequests1hGroups"][0]["sum"]["encryptedBytes"]
encryptes_requests = response.json()["data"]["viewer"]["zones"][0]["httpRequests1hGroups"][0]["sum"]["encryptedRequests"]
bytes_cf = response.json()["data"]["viewer"]["zones"][0]["httpRequests1hGroups"][0]["sum"]["bytes"]
cached_bytes = response.json()["data"]["viewer"]["zones"][0]["httpRequests1hGroups"][0]["sum"]["cachedBytes"]
cached_requests = response.json()["data"]["viewer"]["zones"][0]["httpRequests1hGroups"][0]["sum"]["cachedRequests"]

3.2.- Response Status Map

With the code below we can extract the different status codes and how many requests each one has gotten. Moreover, with Matplotlib we can plot a bar chart that will display this info in a more visual way.

import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
response_codes = [str(x["edgeResponseStatus"]) for x in response.json()["data"]["viewer"]["zones"][0]["httpRequests1hGroups"][2]["sum"]["responseStatusMap"]]
requests = [x["requests"] for x in response.json()["data"]["viewer"]["zones"][0]["httpRequests1hGroups"][2]["sum"]["responseStatusMap"]]

for x,y in zip(response_codes,requests):

    label = "{:.2f}".format(y)
    plt.annotate(label, (x,y), textcoords="offset points",  xytext=(0,10), ha='center')

ax.bar(response_codes,requests)
plt.show()

3.3.- Browser Map

We can get the browser map and plot a bar chart with this piece of code:

import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
browser = [str(x["uaBrowserFamily"]) for x in response.json()["data"]["viewer"]["zones"][0]["httpRequests1hGroups"][2]["sum"]["browserMap"]]
pageviews = [x["pageViews"] for x in response.json()["data"]["viewer"]["zones"][0]["httpRequests1hGroups"][2]["sum"]["browserMap"]]

for x,y in zip(browser,pageviews):

    label = "{:.2f}".format(y)
    plt.annotate(label, (x,y), textcoords="offset points",  xytext=(0,10), ha='center')

ax.bar(browser,pageviews)
plt.show()

3.4.- Client SSL Map

We can get the client SSL Map and plot a bar chart with the code below:

import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
ssl_protocol = [str(x["clientSSLProtocol"]) for x in response.json()["data"]["viewer"]["zones"][0]["httpRequests1hGroups"][2]["sum"]["clientSSLMap"]]
requests = [x["requests"] for x in response.json()["data"]["viewer"]["zones"][0]["httpRequests1hGroups"][2]["sum"]["clientSSLMap"]]

for x,y in zip(ssl_protocol,requests):

    label = "{:.2f}".format(y)
    plt.annotate(label, (x,y), textcoords="offset points",  xytext=(0,10), ha='center')

ax.bar(ssl_protocol,requests)
plt.show()

3.5.- IP Class Map

Same as in the previous ones, we can get the IP Class Map and plot a chart:

import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
type_ip = [str(x["ipType"]) for x in response.json()["data"]["viewer"]["zones"][0]["httpRequests1hGroups"][2]["sum"]["ipClassMap"]]
requests = [x["requests"] for x in response.json()["data"]["viewer"]["zones"][0]["httpRequests1hGroups"][2]["sum"]["ipClassMap"]]

for x,y in zip(type_ip,requests):

    label = "{:.2f}".format(y)
    plt.annotate(label, (x,y), textcoords="offset points",  xytext=(0,10), ha='center')

ax.bar(type_ip,requests)
plt.show()

3.6.- Country Map

With the piece of code below we will get the country map data and we will plot a chart with two Y axis, one for the requests and the other one for bytes.

import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
country = [str(x["clientCountryName"]) for x in response.json()["data"]["viewer"]["zones"][0]["httpRequests1hGroups"][2]["sum"]["countryMap"]]
bytes_request = [x["bytes"] for x in response.json()["data"]["viewer"]["zones"][0]["httpRequests1hGroups"][2]["sum"]["countryMap"]]
requests = [x["requests"] for x in response.json()["data"]["viewer"]["zones"][0]["httpRequests1hGroups"][2]["sum"]["countryMap"]]
ax.set_ylabel("Bytes",color="red",fontsize=14)


ax.plot(country, bytes_request, color="red", marker="o")
ax2=ax.twinx()
ax2.plot(country, requests,color="blue",marker="o")
ax2.set_ylabel("Requests",color="blue",fontsize=14)


plt.show()

3.7.- Content Type Map

Last but not least, we can get the content type data and plot another graph with two Y axis for the number of requests and the number of bytes.

import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
content_type = [str(x["edgeResponseContentTypeName"]) for x in response.json()["data"]["viewer"]["zones"][0]["httpRequests1hGroups"][2]["sum"]["contentTypeMap"]]
bytes_request = [x["bytes"] for x in response.json()["data"]["viewer"]["zones"][0]["httpRequests1hGroups"][2]["sum"]["contentTypeMap"]]
requests = [x["requests"] for x in response.json()["data"]["viewer"]["zones"][0]["httpRequests1hGroups"][2]["sum"]["contentTypeMap"]]
ax.set_ylabel("Bytes",color="red",fontsize=14)


ax.plot(content_type, bytes_request, color="red", marker="o")
ax2=ax.twinx()
ax2.plot(content_type, requests,color="blue",marker="o")
ax2.set_ylabel("Requests",color="blue",fontsize=14)


plt.show()

That is all folks, I hope you found this article interesting to get started with Cloudflare Analytics API and Python!