Instagram is currently one of the most important social networks around the world, specially in the western countries. With over 1 billion of monthly active users and 500 daily active users Instagram becomes a great opportunities for brands to connect with potential customers, improve their brand awareness and visibility and build customer loyalty. On the other hand, Instagram can also be a great chance to create interesting allies in the form of brand ambassadors, collaboration with influencers and business partners or to generate sales opportunities.

In this post I am going to show you how to take maximum advantage of Instagram to boost your business performance by automating most of the most time-consuming and burdensome tasks such as analyzing the most successful publications from competitors, tracking competitors stories or scraping Instagram profiles to extract some data like number of followers, posts or email addresses if found in the biography to find the right person to give visibility to your brand, partner with you or create sales opportunities.

For this purpose, I am going to walk you through the command-line application called instagram-scraper and some Python scripts that I have created to manipulate some data which is gotten from instagram-scraper in JSON format or just to change a bit how the output is delivered. Let’s get this started!

1.- Getting instagram-scraper installed and first commands

Installing instagram-scraper is quite straightforward, you only need to run the following code in your terminal:

pip install instagram-scraper

Easy right? Now that it is installed, you can already make the first account scraping by using on your terminal the next command:

instagram-scraper <username> #Scraping is done anonymously. 
instagram-scraper <username> -u <your username> -p <your password> #You log into your account and you make the scraping from there. Can be useful for private accounts. 

With the command above you would be able to scrape the posts from an Instagram account feed. For example, I ran the command “instagram-scraper psg” and I downloaded all the images and videos that were on PSG’s Instagram account.

If you would like to scrape the publication which has been uploaded with a specific hashtag, you could also scrape it by using this command:

instagram-scraper <hashtag> --tag #Important! Don't include # when you enter the hashtag.

2.- Using Arguments to take instagram-scraper to the next level

If you already think that instagram-scraper is cool, you have not seen anything yet! The commands above are mainly the foundations that we need to use together with the arguments to take this tool to the next level. In this post I am going to go over the most juicy arguments from my point of view, but you can check all of them on the instagram-scraper GitHub page.

  1. Bulk scraping by entering the instagram accounts with a txt file: with the argument -f you can enter the location where a txt file is with a bunch of Instagram accounts that you would like to scrape. To make this file readable, you can separate the Instagram account with new lines, commas, semicolons or whitespace. Command example: instagram-scraper -f instagram_users.txt
  2. Stories downloading: if you would like to download the stories from an account (highlights and the daily stories) then you would need to use the following command: instagram-scraper -u <your username> -p <your password> -t story. As you can see in this case, anonymous scraping is not allowed and you can only scrape stories if you are logged in.
  3. Download number of likes and number of comments: with the basic query instagram-scraper <user_name> you are only able to download the images and videos, but no information is gotten. If the argument –media-metadata is used, then apart from the media resources you will get a JSON file where you can have access to some information such as the number of likes, number of comments, the owner ID… The command would be: instagram-scraper <user_name> --media-metadata or in case that you are only interested in the JSON file you can run: instagram-scraper <user_name> --media-metadata --media-types none.
  4. Download comments from the posts: comments from the posts can also be downloaded if the argument –comments is used. For instance: instagram-scraper <user_name> --comments.
  5. Others: instagram-scraper also has other secondary arguments that can be helpful to use proxies, limit the number of posts that you would like to scrape, filter by a location or choose a destination folder for the final output.

3.- Some applications and Python scripts

3.1.- Tracking competitors

You might be interested in tracking what your competitors are posting on Instagram in a daily basis to make sure that you do not miss any chance to connect with your followers, react to a promotion or post that they have uploaded or just closely track the strategy they are following to engage their followers.

For this purpose, you could schedule a instagram-scraper command to be ran daily and store the new publications or the stories that they have uploaded in a folder. If you would like to store all the images and/or videos in the same folder, you can actually run a simple command which is:

instagram-scraper <username> -u <your username> -p <your password> -t story --latest #For stories
instagram-scraper <username> --latest #Publication feeds

The –latest parameter will search for the most updated picture on the destination folder and will only store those resources which were uploaded after such a date. However, if you would like to have a specific folder for each day, you might need to run a Python script which would check if any media resource has been uploaded on that day and store it properly in a folder dedicated to that day after running the standard instagram-scraper command instagram-scraper -u -p -t story. Below you can find an example:

import os.path, time

directory = "psg"
file_list = []
for i in os.listdir(directory): #Here we list files in a folder and get the creation date
    a = os.stat(os.path.join(directory,i))
    file_list.append([i,str(time.ctime(os.path.getmtime(directory + "/" + i)))[0:10] + " " + str(time.ctime(os.path.getmtime(directory + "/" + i)))[-4::]]) #[file,creationtime]

today = str(time.ctime())[0:10] + " " + str(time.ctime())[-4::]

for x in file_list: #We compare today´s date and the creation date and if they do not match we delete the file
    if x[1] != today:
        os.remove(<path to PSG directory>"/psg/" + str(x[0]))
        print("removed")
        
os.rename("psg", "psg" + " " + today) #We change the name of the folder including the date

3.2.- Finding partners, influencers or sales opportunities

Something that can be very useful to find partners, influencers or sales opportunities is running a command to find posts which are using a specific hashtag related to your sector, scrape the posts information and finally scrape the profiles seeking the number of posts, followers, following users and the biography and if an email address is therein.

First, we need to iterate through the JSON file which is gotten from scraping the tags which returns all the posts which are using that hashtag. Just as a reminder, the command would be: instagram-scraper <tag> --media-metadata --tag --media-types none. One of the variables that these posts return is the Owner ID, which is something that we can use to associate that post with an Instagram account. The code which is shown below would work to get the owner IDs from the posts (if you would like to get number of likes and comments then you would need to find the specific key, iterate through the different elements on the JSON file and store the variables which are of your interest)

import json
with open(<your_JSON_file>) as json_file:     
    data = json.load(json_file) #JSON file is loaded.

listowners = []
for x in range (len(data["GraphImages"])):
    listowners.append(data["GraphImages"][x]["owner"]["id"]) #We iterate through the JSON file and we store the owner IDs.
    
listowners = list( dict.fromkeys(listowners)) #We remove duplicate owner IDs just in case.

Now that we have a list with the Owner IDs from the posts which are under a specific hashtag, then we need to associate this Owner ID with the Instagram accounts. For this, we can make use of https://i.instagram.com/api/v1/users/<Owner_ID>/info/. So, translating this into Python code, it would look like:

from bs4 import BeautifulSoup
import cloudscraper

listaccounts = []
for x in listowners: 
    headers = {'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_3 like Mac OS X) AppleWebKit/603.3.8 (KHTML, like Gecko) Mobile/14G60 Instagram 12.0.0.16.90 (iPhone9,4; iOS 10_3_3; en_US; en-US; scale=2.61; gamut=wide; 1080x1920)'} #We need to use Instagram UA to be able to access to this URL.
    parser = 'html.parser' 
    scraper = cloudscraper.create_scraper() 
    html = scraper.get("https://i.instagram.com/api/v1/users/"+str(x)+"/info/", headers = headers) #We request the URL
    soup = BeautifulSoup(html.text, parser).get_text() #We parse the HTML code which is returned from the request.
    jsondata = json.loads(soup) #We transform the HTML into Json.
    listaccounts.append(jsondata["user"]["username"]) #We append the usernames.

Finally, now that we have the usernames, we can scrape some information from these Instagram profiles by using the parameter “?__a=1” on the Instagram URL. In the code below, we scrape from the Instagram account the biography, followers, following, category, if it is a business account and we use Regex to scrape an email address if it is present in the biography.

import re

html = scraper.get("https://www.instagram.com/<username>/?__a=1", headers = headers) #We request this URL which returns the Instagram profile with structured data
soup = BeautifulSoup(html.text, parser).get_text()
jsondata = json.loads(soup)

#Keys for each of the variables mentioned above
biography = jsondata["graphql"]["user"]["biography"]
externalurl = jsondata["graphql"]["user"]["external_url"]
followers = jsondata["graphql"]["user"]["edge_followed_by"]["count"]
following = jsondata["graphql"]["user"]["edge_follow"]["count"]
businessacount = jsondata["graphql"]["user"]["is_business_account"]
category = jsondata["graphql"]["user"]["overall_category_name"]
emails = re.findall(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.com", soup, re.I)

Disclaimer: in case you are going to use these emails addresses to contact users or as an audience for Google Ads or Paid Social campaigns, you need to be careful as in the European Union there is a special legislation called GDPR which takes care of how personal information is collected and processed.

3.3.- Analyzing competitors’ best performant posts

Learning from the competitors and/or success stories can be a good exercise as you will discover which are the most engaging creatives or formats and you can orient your strategy accordingly. As a reminder, first of all the command that needs to be ran with instagram-scraper which returns the JSON data from the posts that are published from the Instagram account of your interest is: instagram-scraper <user_name> --media-metadata --media-types none.

In the example below, we will iterate through the JSON file, get the URL name (where the filename is included), the number of likes and the number of comments and we will store everything in an Excel file (as shown in the screenshot below) where the analysis can be done easier.

import json
import pandas as pd

with open('psg.json') as json_file:     
    data = json.load(json_file) #We load the Json file
    
listposts = []
for x in range (len(data["GraphImages"])): #We iterate through the Json file and we get the variables through the keys
    try:
        media = data["GraphImages"][x]["display_url"]
        likes = data["GraphImages"][x]["edge_media_preview_like"]["count"]
        comments = data["GraphImages"][x]["edge_media_to_comment"]["count"]
        listposts.append([media,likes,comments])
    except:
        continue

df = pd.DataFrame(listposts, columns=["Media","Likes","Comments"])
df.to_csv('TestJason.csv', index=False) #We store the list in an Excel file by using Pandas