Creating and updating the disavow file is an activity which is usually quite time-consuming and boring but unfortunately, it needs to be done as the SEO performance of your website might be affected if some spammy links are not correctly disavowed. Even if I personally believe that Google’s algorithm has made a lot of progress at detecting these spammy links and not taking them into consideration, it can still be deceived and for this reason disavowing is still important.
On today’s post I will show you how you can partially automate in a very easy way the disavow file creation with Semrush and Python.
1.- Extracting the link “toxicity” score with Semrush
With Semrush we can run a backlinks audit to assign a toxicity score to each link which will indicate how likely a link is to be considered toxic. The higher this toxicity score will be, the more toxic this link will be.
To audit your backlinks profile you will need to go to the section placed in the sidebar called Backlink Audit and create a specific project for the domain that you want to audit. In my case, I will audit my own personal domain: danielherediamejias.com.
Once the audit has been ran, on the Audit tab we will have access to each link grouped by domain and with their toxicity scores.
We will export this report as an Excel file so that we can start to manipulate it with Python and create our disavow file!
2.- Selecting the harmful links
First, what we will do is importing the Excel file with Pandas and transform the dataframe into a list. This list will contain a list for each backlinks which will contain four metrics: URL, anchor text, authority score and toxicity score.
import pandas as pd df = pd.read_excel ('yourfilefromsemrush.xlsx') list_df = df.values.tolist()
After this, I will iterate over the list and split the backlinks’ URLs to isolate the domain from the rest of the URL as I will disavow the whole domain once I write the disavow file. At the end of the loop, I will append the domain variable into the list. As you can see below, to split the URL I am using the library tldextract.
from tldextract import extract for x in list_df: tsd, td, tsu = extract(x) domain = td + "." + tsu x.append(domain)
Now that we have already gotten the domains, we can start the filtering process to include into the disavow file those URLs which have a high toxicity score. The toxicity score threshold can vary depending on the industry or how much risk you are willing to take. In my case, I established the toxicity score threshold in 50.
So we iterate over the list with the variables for each link and if the toxicity is higher than 50 we will append the domain into a new list which will be used to create the disavow file.
threshold = 50 list_disavow =  for x in list_df: if x > threshold: list_disavow.append(x)
In addition, when I was going over the domains which are linking to my domain, I noticed that there were lots of domains with a “pw” cctld from Palau coming from spammy subdirectories and even if their toxicity scores are lower than 50 (concretely 44) I will add them into the disavow file.
So basically what I am going to do is iterating again over the master list and I will append into the disavow list those domains which have a “pw” cctld. Finally, I will use a dictionary feature to remove those duplicate elements from the list.
for x in list_df: if ".pw" in x: list_disavow.append(x) list_disavow = list(dict.fromkeys(list_disavow))
Finally, now that we have the domains that we would like to disavow, we wan create our disavow file:
f= open("disavow_file.txt","w+") for x in list_disavow: f.write("domain:" + x + "\n") f.close()
This will create a disavow file which looks like:
The only thing which is left is uploading the disavow file into the disavow file tool. That would be it!
You will only need Tldextract and Pandas.
You will learn how to use Semrush and Python to create a disavow file almost in an automated way.
The process is quite fast. Only around 5-10 minutes.