When we use Python scripts to scrape or get data from some APIs we have several options about how we want this output to be delivered. On this post, I am going to walk you through five alternatives about how this data can be delivered so that you can find the perfect output for your needs in order to process this data as best as possible to make the most educated decisions.

These five alternatives are downloading the data in CSV or Excel files, uploading this data onto Google Sheets, store the data into a SQL table, sending a notification by using a Gmail account or downloading a graph as a JPEG, PNG or PDF format. In some cases, you might be interested in using different options at the same time like storing the data into a SQL database and sending an email notification if there is any value which is not desirable. Let´s get this started!

1.- CSV or Excel Files

There are several libraries with Python which can enable us to import our data very easily as CSV or Excel files. The most famous libraries are CSV, Pandas and Openpyxl. In the next examples I show you how to import some data as Excel or CSV files by using these three libraries.

1.1.- CSV Library

import  csv
yourlist = [1,2,3,4]

with open("<filename>.csv","w") as f:
    wr = csv.writer(f,delimiter="\n")
    wr.writerow(yourlist)

1.2.- Pandas Library

import pandas as pd
yourlist = [1,2,3,4]

df = pd.DataFrame(yourlist)
df.to_csv('<filename>.csv', header=False, index=False) #If you want to import to CSV file
df.to_excel('<filename>.xlsx', header=False, index=False) #If you want to import to Excel file

1.3.- Openpyxl

from openpyxl import Workbook
yourlist = [1,2,3,4]

wb = Workbook()
sheet = wb.active

for i, value in enumerate(yourlist):
    sheet.cell(column=1, row=i+1, value=value)
        
wb.save('<filename>.xlsx')

2.- Google Sheets

You can also import your data into Google Sheets, however, first if you want to upload the data into a Google Sheet you need to:

  • Get your credentials in JSON format. You can get these credentials easily on this page, clicking on the “Enable the Google Sheets API” button. These credentials are meant to be stored in the directory from which you are going to run your Python script.
  • Import first your data as a CSV file. If you want to upload your data into a Google Sheet, you will first need to save it as a CSV file and from the CSV file you will be able to convert it into Google Sheet. (To import into CSV file you can use Pandas or CSV library as shown above).

The code which is shown below would convert a CSV file into a Google Sheet:

#Set up your credentials
creds = ServiceAccountCredentials.from_json_keyfile_name('<json filename with your credentials>', ['https://www.googleapis.com/auth/drive'])

#Now build our api object
drive_api = build('drive', 'v3', credentials=creds)

#We have to make a request hash to tell the google API what we're giving it
body = {'name': <CSV filename>, 'mimeType': 'application/vnd.google-apps.document'}

#Now create the media file upload object and tell it what file to upload, in this case CSV
media = MediaFileUpload(<CSV filename>, mimetype = 'text/csv')

#Now we're doing the actual post, creating a new file of the uploaded type
fiahl = drive_api.files().create(body=body, media_body=media).execute()
id = str(fiahl.get('id'))

#We share the file with the email address which is provided
drive_api.permissions().create(body={"role":"writer", "type":"user", "emailAddress" : "<email address>"}, fileId=id).execute()

3.- Store into a SQL Table

With Pymysql you can store the data that you have scraped or gotten from an API into a SQL database. This is an extremely useful option as we can connect the database with visualization tools, we can make use of the database to download this data in CSV files or/and we can compare the data with previous queries that were done to keep track of the evolution.

Using Pymysql is quite easy, although previously you will need to:

  • Create a database and get the password and the username so that you can access to the host when you want to store the data.
  • Grant access to the IP from which the data is going to be stored.
  • It is recommendable to create the table and the columns where the data is going to be stored in advance so that you can set properly the different features for each variable (mainly type).

Once we have gone through these previous steps, storing the data in the database is very easy with the command “INSERT INTO” as shown below:

import pymysql
yourlist = [1,2,3,4]
#We need to provide the host API, username, the password and the database name
db = pymysql.connect(host="<Host IP>", user = "<username>", passwd = "<password>", db = "<database name>")
cur = db.cursor()

cur.execute("""INSERT INTO <tablename> (column1, column2, column3, column4) VALUES (%s,%s, %s, %s)""",(yourlist[0], yourlist[1], yourlist[2], yourlist[3]))
db.commit()

db.close()  

4.- Sending an Email

Perhaps you do not need to store the data, you only need to receive a notification or alert in case that there is a value which is not as expected. In such a case, you can use Encoders library to send an email from a Gmail account. For this purpose, it is recommendable to create a new email address and allow less secure apps to ON. Once it is done, sending the email can be done with a code similar to the one below:

from email import encoders
from email.message import Message
from email.mime.audio import MIMEAudio
from email.mime.base import MIMEBase
from email.mime.image import MIMEImage
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.text import MIMEText
import smtplib 

#We enter the password, the email adress and the subject for the email
msg = MIMEMultipart()
password = '<your email address password>'
msg['From'] = "<your email address>"
msg['To'] = "<Receiver email address>"

#Here we set the message. If we send an HTML we can include tags
msg['Subject'] = "<your subject>"
message = "<h1>"Title"</h1><p>Paragraph</p>

#It attaches the message and its format, in this case, HTML
msg.attach(MIMEText(message, 'html'))

#It creates the server instance from where the email is sent
server = smtplib.SMTP('smtp.gmail.com: 587')
server.starttls()

#Login Credentials for sending the mail
server.login('<your email address>', password)

# send the message via the server.
server.sendmail(msg['From'], msg['To'], msg.as_string())
server.quit()

Sending an email to alert you works very well together with conditional clauses in case that an undesired or unexpected value is found during the data extraction process.

5.- Downloading Graphs

Finally, it could be possible that like in the previous case where an email is sent, we are not interested in the data itself, we just want to manipulate the data and get a processed output like a graph. In such a case, we can also download a PDF, PNG or JPEG file with of our interest. In the next example, we make use of Matplotlib to generate an image and download it.

import matplotlib.pyplot as plt

labels = ["Car", "Motorbike", "Train", "Plane"]
newlist = [30,40,20,10]

plt.pie(newlist, autopct='%1.f%%', labels = labels, shadow=True)
fig1 = plt.gcf()
fig1.savefig('<filename>.jpeg', dpi=100) #We save the image in a JPEG format, although we can also use the suffixes PDF or PNG to download the image in a different format.