In today’s post I am going to show you how you can speed up the process of setting page to page redirects with Python and htaccess. Basically, the process that we are going to follow is:
- Importing from an Excel file a list of pages to be redirected, the destination page and the response code of the redirect.
- Iterating over the list of pages to be redirected and formatting it for the htaccess file.
- Exporting the htaccess rule as a txt file.
- We finally test the rule with a htaccess validator to make sure that it is not going to break our site.
Does it sound interesting? Let’s get it started then!
1.- Importing the redirects
First of all, we need to import the list of redirects that I have created in an Excel file, where I have inserted in the first column the page to be redirected, in the second column the page where the first page is meant to be redirect and in the third column the response code for that redirect.
To import to our notebook this list of URLs we will use the module read_excel from Pandas and we will then transform the Dataframe into a list:
import pandas as pd df = pd.read_excel ('your_filename.xlsx', header = None) list_pages = df.values.tolist()
2.- Formating for the htaccess file
After importing the list of URLs, we will now iterate over them and format them for the htaccess file. In order to format them, we need to take into consideration some specifications:
- We need to extract the relative path: we will use the module urlparse from urllib.parse to get the relative path.
- We need to write a line for each redirect with the pattern: “Redirect ” + Response code + old path + new path.
- We need to open and close the htaccess rule: by adding at the beginning of the rule “<IfModule mod_rewrite.c>” and “RewriteEngine On” and “</IfModule>” at the end of the rule.
Translated to Python this would look like:
from urllib.parse import urlparse text = "<IfModule mod_rewrite.c>\nRewriteEngine On\n" for x in list_pages: text = text + "Redirect " + str(x) + " " + urlparse(x) + " " + urlparse(x) + "\n" text = text + "\n</IfModule>"
Another interesting task that we can do with Python easily is creating page to page redirects for pages which are expired and we want to redirect to their parental directories. For example, in an e-commerce we might be interested in redirecting expired product pages to category pages.
With the piece of code that can be found below we would just take the page from the first column of the Excel file and we would generate a redirect pointing to the parental directory. It takes into account whether the page ends with a final slash or not so that it generates a redirect pointing to a page with final slash or not accordingly.
Note: as we only need the page to be redirected and the response code, in this case we use an Excel file that only has two columns, in the first one there are the pages to be redirected and in the second one there are the response codes.
from urllib.parse import urlparse text = "<IfModule mod_rewrite.c>\nRewriteEngine On\n" for x in list_pages: text2 = "/" #[1:-2] if it finishes with final slash. [1:-1] if it doesn't if urlparse(x).endswith("/"): for i in urlparse(x).split("/")[1:-2]: text2 += i + "/" else: for i in urlparse(x).split("/")[1:-1]: text2 += i + "/" text2 = text2[0:-1] text = text + "Redirect " + str(x) + " " + urlparse(x) + " " + text2 + "\n" text = text + "\n</IfModule>"
3.- Exporting as a txt file
Finally, we export this as a txt file and we would need to copy and paste that rule into our htaccess file:
with open("file.txt", "w") as output: output.write(text)
This looks like:
4.- Testing with a htaccess validator
Before copying and pasting this rule, it is recommendable to test it with a htaccess validator because the htaccess file is a sensitive file that can break your website. We will test it with this htaccess validator to make sure that our rule is free of syntax errors.
If the htaccess validator returns: “Test Results: Syntax checks out ok!” we can move forward with pasting the rule at the bottom of our htaccess file.
That is all folks, I hope that you found this article interesting!