Analyzing where our competitors are running campaigns in order to increase their brand awareness and bring traffic to their websites and how they are monetizing it is key to strategize properly, create a plan to invest our budget in the most effective way and get a grasp on how the industry works.

Fortunately, we can get an idea about where our competitors are running campaigns by checking the web page technologies in their page codes. For instance, if a website has a Facebook pixel installed, it might be very likely that they are running paid social campaigns on Facebook and/or Instagram as this pixel is used to evaluate the performance of these campaigns.

In today’s post I am going to show you how you can spy your competitors by using Wappalyzer API and Python to detect the web page technologies that they are using in a bulk mode.

1.- What is Wappalyzer?

Most of you might know that Wappalyzer is a tool that is used to get insights about the web technologies that a website is making use of. In fact, they have a very handy and convenient Google Chrome Extension whereby you can check the web page technologies when surfing the Internet with your browser in a very similar as Builtwith does. By the way, my Python friend Greg Bernhardt wrote an article about using Python with Builtwith API some months ago that you might like too if you are interested in Wappalyzer.

So it turns out that Wappalyzer has an API that has 50 tokens for free each month and I took advantage of them to play a little bit around and write a code that can be used to differentiate the technologies that a bunch of websites are utilizing by using a heatmap.

2.- Making our first request

Making a request is very easy as we will only need to use the library called requests and the API endpoint: https://api.wappalyzer.com/lookup/v2/. Our API key will be sent through the headers but previously we will need to sign in and get our API key in the account section on Wappalyzer’s website.

Apart from the web technologies, we can also get other data from Wappalyzer such as email addresses, social network accounts, the country IP, metatitles and metadescriptions, phone numbers, etcetera. For that we will need to add as a parameter all the sets that we would like to get. In my case I requested all of them as we will only spend one token for each request regardless of the number of sets that we request.

url = 'https://api.wappalyzer.com/lookup/v2/?urls=' + <your url> + '&sets=email,phone,contact,social,meta,locale'
headers = {'x-api-key' : '<your API key>'}
r = requests.get(url, headers=headers)
response_file = r.json()

The API response will be a JSON file that will contain all the available information from that website. This means that in some cases a JSON key might be missing if that data is not available.

Some of the main keys are:

  • response_file[0][“technologies”]: this key will return a list with the web page technologies.
  • response_file[0][“twitter”], response_file[0][“instagram”], response_file[0][“facebook”], response_file[0][“youtube”], response_file[0][“github”], response_file[0][“pinterest”], response_file[0][“linkedin”] and response_file[0][“tiktok”]: these keys will return the social media profiles of that website if any is found.
  • response_file[0][“language”]: this key will return the page language if it is available.
  • response_file[0][“ipCountry”]: this key will return the country IP if it is available.
  • response_file[0][“email”] and response_file[0][“phone”]: these keys will return the email addresses and/or the phone number if any is found. They can be used for prospecting purposes.
  • response_file[0][“title”] and response_file[0][“description”]: these keys will return the metatitle and the metadescription from that page.

3.- Creating a heatmap to analyze the competitors

Although Wappalyzer can give us a lot of information from the pages, at least from my point of view the most interesting and accurate data is the one about web page technologies. We can make use of a for loop to retrieve the web page technologies from many websites in a bulk mode and afterwards we can create a heatmap to analyze the web page technologies that most of these websites are using.

First we need to iterate over a list that will contain the websites that we would like to check:

import requests

list_response = []
for iteration in list_websites:
    url = 'https://api.wappalyzer.com/lookup/v2/?urls=' + iteration + '&sets=email,phone,contact,social,meta,locale'
    headers = {'x-api-key' : '<your API key>'}
    r = requests.get(url, headers=headers)
    list_response.append(r.json())

We stored in a list the JSON files returned by the API and now we will iterate over them to check whether the web page technologies that are of our interest are present or not.

list_technologies = [["URL", "Twitter Ads", "Facebook Ads", "Outbrain", "Criteo", "Google Adsense", "Google Publisher", "AMP", "Google Optimize"]]
for x in list_response:
    twitter_ads = 0
    facebook_ads = 0
    outbrain = 0
    criteo = 0
    google_adsense = 0
    google_publisher = 0
    amp = 0
    google_optimize = 0
    for z in x[0]["technologies"]:
        
        if z["name"] == "Twitter Ads":
            twitter_ads = 1
        if z["name"] == "Facebook Pixel":
            facebook_ads = 1  
        if z["name"] == "Outbrain":
            outbrain = 1
        if z["name"] == "Criteo":
            criteo = 1
        if z["name"] == "Google AdSense":
            google_adsense = 1
        if z["name"] == "Google Publisher Tag":
            google_publisher = 1
        if z["name"] == "AMP":
            amp = 1
        if z["name"] == "Google Optimize":
            google_optimize = 1
        
    list_technologies.append([x[0]["url"], twitter_ads, facebook_ads, outbrain, criteo, google_adsense, google_publisher, amp, google_optimize])

We assigned the value 1 if the technology is present and the value 0 if it was not present in that website. After this, we transform this list into a dataframe. We will use the first element as the column header.

from pandas import DataFrame

df = DataFrame (list_technologies[1::], columns=list_technologies[0])

Finally we use Seaborn to represent it with a heatmap graph:

import matplotlib.pyplot as plt
import seaborn as sb
import numpy as np


fig, ax = plt.subplots(figsize=(20, 15))
sns.set(font_scale = 2)
sb.heatmap(df.iloc[:,[1,2,3,4,5,6,7,8]], cmap="bwr_r", yticklabels = df["URL"])
plt.show()

The heatmap might look like a F.C. Barcelona t-shirt 😁 :

At first glance we can see that many of these websites have the Facebook and Twitter pixels installed and all of them might be using Criteo to retarget the traffic that visits their websites. When it comes to monetization, all of them use Google Adsense and Google Publisher and three of them use Outbrain too. Google Optimize is used in half of the websites and all of them have an AMP version.

Did you find it useful to get a grasp on the competitors strategy very fast?