On today’s post I am going to show you how you can make use of Google Alerts with Python and how you can set up an automated workflow to reach out to some websites that might mention your brand or a term closely related to your business but not linking to your site.

Basically, what we are going to do on this post is:

  • Learning how to install the library google-alerts for Python.
  • Setting up some alerts and parsing the RSS feed which is generated with the matches.
  • Downloading the matches as an Excel file.
  • Scraping the URLs and searching for a contact URL or an email address to make contact with these sites and ask for a link.

Does this automated workflow sound interesting? Let’s get started then!

1.- Installing google-alerts for Python

First of all, we will need to install google-alerts for Python and seed our Google Alerts session. The command that we will need to run on our terminal to install google-alerts is:

pip install google-alerts

After this, we will need to input our email address and our password by running the command:

google-alerts setup --email <your-email-addressl> --password '<your-password>'

Finally to seed the Google Alerts session we will need to download the version number 84 of Chrome Driver and the version 84 of Google Chrome (be careful with not replacing the current version of Google Chrome when downloading and installing the version 84). Unfortunately, this needs to be done because this library has not been updated since 2020 and it is not compatible with the new versions of Google Chrome and Chrome Driver.

When both Chrome Driver v84 and Google Chrome v84 have been installed, we can already run the following command to seed our Google Alerts session.

google-alerts seed --driver /tmp/chromedriver --timeout 60

This command will open a Selenium webdriver session to log us into Google Alerts.

2.- Creating our first alert

Once the session is seeded, we can already use Jupyter notebook and Python to play around. We will first need to authenticate:

from google_alerts import GoogleAlerts

ga = GoogleAlerts('<your_email_address>', '<your password>')
ga.authenticate()

When the authentication is completed, we can create our first alert. For example for the term Barcelona in Spain:

ga.create("Barcelona", {'delivery': 'RSS', "language": "es", 'monitor_match': 'ALL', 'region' : "ES"})

If the alert is created successfully, then it will return an object specifying the term, the language, the region, the match type and the RSS link for that alert:

Very sadly I have not been able to create an alert which would monitor a term for all the countries because if I leave the language and region arguments empty it sets USA and English as default region and language.

If at some point we lose track of the alerts that are active, we can list them with:

ga.list()

And if we would like to delete an alert which is no longer useful or redundant, we can delete it by using the monitor_id and running:

ga.delete("monitor_id")

3.- Parsing the RSS feed

In order to parse the RSS feed we will use requests and beautifulsoup and we will extract the ID, the title, the publication date, the update date, the URL and the abstract for each alert. This data is structured as a XML file.

import requests
from bs4 import BeautifulSoup as Soup

r = requests.get('<your RSS feed>')
soup = Soup(r.text,'xml')

id_alert = [x.text for x in soup.find_all("id")[1:len(soup.find_all("id"))]]
title_alert = [x.text for x in soup.find_all("title")[1:len(soup.find_all("title"))]]
published_alert = [x.text for x in soup.find_all("published")]
update_alert = [x.text for x in soup.find_all("updated")[1:len(soup.find_all("updated"))]]
link_alert = [[x["href"].split("url=")[1].split("&ct=")[0]] for x in soup.find_all("link")[1:len(soup.find_all("link"))]]
content_alert = [x.text for x in soup.find_all("content")]

compiled_list = [[id_alert[x], title_alert[x], published_alert[x], update_alert[x], link_alert[x], content_alert[x]] for x in range(len(id_alert))]

With this piece of code we will get an individual list for each metric and a compiled list with all the metrics by alert.

If we would like to, we can download the alerts as an Excel file with Pandas:

import pandas as pd
 
df = pd.DataFrame(compiled_list, columns = ["ID", "Title", "Published on:", "Updated on", "Link", "Content"])
df.to_excel('new_alerts.xlsx', header=True, index=False)

This will create an Excel document that will look like:

4.- Reaching out to the sites

From my point of view, using Google Alerts with Python can be specially useful when trying to automate a process to reach out to sites when they mention a brand or a specific term that can be closely related to a brand or product. With Python, we can iterate over the list of URLs, scrape them and intend to find a contact page or an email address to contact these sites. In case of finding an email address, even the delivery of an email could also be automated with Python or any other outreach tool.

We can use this piece of code to find those strings that contain “@” (very likely email addresses) and contact pages. The filter to leave out some strings that might contain “@” but not be an email address can be polished, for now I only excluded those strings which are PNG images:

import re

for iteration in link_alert:
    
    request_link = requests.get(iteration[0])
    soup = Soup(request_link.text,'html')

    body = soup.find("body").text
    match = [x for x in re.findall(r'[\w.+-]+@[\w-]+\.[\w.-]+', body) if ".png" not in x]
    
    contact_urls = []
    links = soup.find_all("a")
    for y in links:
        if "contact" in y.text.lower():
            contact_urls.append(y["href"])
    
    iteration.append([match])
    iteration.append([contact_urls])

Lastly, we can iterate over the list of email addresses and use a piece of code that I published on this article about what to do with your outputs when running Python scripts, which uses email.encoder to send emails with a message like:

from email import encoders
from email.message import Message
from email.mime.audio import MIMEAudio
from email.mime.base import MIMEBase
from email.mime.image import MIMEImage
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.text import MIMEText
import smtplib 
 
#We enter the password, the email adress and the subject for the email
msg = MIMEMultipart()
password = '<your email address password>'
msg['From'] = "<your email address>"
msg['To'] = "<Receiver email address>"
 
#Here we set the message. If we send an HTML we can include tags
msg['Subject'] = "Daniel Heredia - Thank you so much!"
message = "<p>Dear lady or Sir<p>,<br><br><p>I would like to thank your for the mention of my brand on your article: " + URL + " and I would like to ask you if it were possible to include a link pointing to my website https://www.danielherediamejias.com to enable those users that are interested in my brand to get to know about me.</p><br><br><p>Thank you so much in advance!</p>"
 
#It attaches the message and its format, in this case, HTML
msg.attach(MIMEText(message, 'html'))
 
#It creates the server instance from where the email is sent
server = smtplib.SMTP('smtp.gmail.com: 587')
server.starttls()
 
#Login Credentials for sending the mail
server.login('<your email address>', password)
 
# send the message via the server.
server.sendmail(msg['From'], msg['To'], msg.as_string())
server.quit()

Unluckily I am not a very creative person, so I guess that the message to be sent could be much more appealing! That is all folks, I hope that you found this article interesting!