In today’s post I am going to show you an easy trick which can be very helpful if you have a very dynamic website which changes very often and rapidly and you need to make sure that the Googlebot is able to find and crawl your pages as soon as possible.
For that purpose, we are going to use the “Ping” functionality, a quite unknown functionality which is used in order to request to Google to crawl sitemaps. If you want to make use of it, you only need to meet these two requirements:
- The sitemap must already be available on Google Search Console.
- Do not overuse this feature if your sitemaps are unchanged.
For more information about this feature, you can check this page.
1.- Creating the Python request
Creating the Python request is very easy, as we only need to send an HTTP GET request to this endpoint: http://www.google.com/ping?sitemap=<complete_url_of_sitemap>. The URL of the sitemap would be sent as a parameter in the HTTP GET request as can be seen in the example.
To make the request, we can use the module urllib.request as shown in the code below:
import urllib.request url = "http://www.google.com/ping?sitemap=https://www.example.com/sitemap.xml" response = urllib.request.urlopen(url)
If we would like to get some sort of notification to ensure that our sitemap has been submitted correctly, we can use BeautifulSoup to scrape the response page which looks like:
The message which indicates that the Sitemap has been correctly submitted and received uses a H2 tag, so with Beautiful Soup we can scrape the H2 header and print it in our console:
from bs4 import BeautifulSoup soup = BeautifulSoup(response.read(), "html.parser") print(soup.find("h2").text)
This would print a message like “Sitemap notification received” if the request was made successfully.
Finally, if we put all the code together, it looks like:
import urllib.request from bs4 import BeautifulSoup try: url = "http://www.google.com/ping?sitemap=https://www.example.com/sitemap.xml" response = urllib.request.urlopen(url) soup = BeautifulSoup(response.read(), "html.parser") print(soup.find("h2").text) except Exception as e: print(e)
2.- Scheduling the Python script being run
As mentioned at the beginning of the article, using the Ping functionality is specially useful if the pages of your websites change very often and fast, otherwise, if you wanna use the Ping functionality to make your sitemap crawled exceptionally, you can just use your browser to send the HTTP GET request.
However, if your website changes so fast, you can create a Python script and schedule it to be run for instance daily. In such a case, you can use cronjobs or if you do not have any server from where you can execute Python scripts, you can use AWS lambda function with the Cloudwatch feature. On this post I explain how you can set up AWS Lambda to run your Python scripts.
So this all folks, I hope you find this easy trick helpful!