On today’s post I am going to show you how you can very easily scrape the posts which are published on a public Facebook page, how you can perform a sentiment analysis based on the sentiment magnitude and sentiment attitude by using Google NLP API and how we can download this data into an Excel file.

Finally, what I am going to explain you is how you can calculate the correlation between different variables so that you can measure the impact of the sentiment attitude or sentiment magnitude in terms of for instance “Likes”. From my point of view, this is something which can very useful as in this way you would be able to understand which is the tone of voice or the type of posts that work the best in such a community.

However, it is important knowing how to understand this data correctly as:

  1. Correlation needs to have a statistical significance: for this reason we will also calculate the p-value.
  2. Correlation does not mean causation: as there could be many other factors which are not considered causing such an impact.

In order to be able to scrape the Facebook posts, perform the sentiment analysis, download this data into an Excel file and calculate the correlation we will use the following Python modules:

  1. Facebook-scraper: to scrape the posts on a Facebook page.
  2. Google NLP API: to do the sentiment analysis in terms of magnitude and attitude.
  3. Pandas: to download the data into an Excel file.
  4. Scipy: to calculate the Pearson correlation between variables.

Having said this, let’s get it started!

1.- Scraping posts on Facebook

Scraping posts on Facebook pages with Facebook-scraper Python module is very easy. You only need to install this module and use the code which is written below:

from facebook_scraper import get_posts

listposts = []
for post in get_posts("anyfacebookpage", pages=2):
    print(post['text'][:50])
    listposts.append(post)

You would need to replace the variable “anyfacebookpage” for the page you are interested in scraping and insert the number of pages you would like to scrape (in my example I only use 2). This piece of code will print the title of the posts and append the posts with a dictionary with their metrics in a list.

The metrics that the dictionary comprise are:

  1. Post ID: the key for this metric is “post_id“.
  2. Text: they key for this metric is “text“.
  3. Publication Time: the key for this metric is “time“.
  4. Image: the key for this metric is “image” and it will return the link of the main image of the post.
  5. Video: the key for this metric is “video” and it will return the link of the main image of the post.
  6. Video Thumbnail: the key for this metric is “video_thumbnail“.
  7. Video ID: the key for this metric is “video_id“.
  8. Number of likes: the key for this metric is “likes”.
  9. Number of comments: the key for this metric is “comments”.
  10. Number of shares: the key for this metric is “shares”.
  11. Post URL: the key for this metric is “post_url”.
  12. Link: the key for this metric is “link”.
  13. Images: if there are several images, this variable will store a list with all the images links. The key for this metric is “images”.

2.- Doing the sentiment analysis

After scraping as many posts as wished, we will perform the sentiment analysis with Google NLP API. In order to use Google NLP API, first you will need to create a project, enable the Natural Language service and get your key. You can find some information about how to set up your project on this link.

Once you have set up correctly the NLP API project, you can start using the different modules. With the code below we will perform the sentiment analysis for each of the publication which were scraped from the Facebook page and we will append in the post list a new dictionary key with the magnitude and attitude scores for each of the posts.

import os
from google.cloud import language_v1
from google.cloud.language_v1 import enums
from google.cloud import language
from google.cloud.language import types


os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "yourNLPAPIkey"
client = language_v1.LanguageServiceClient()

for x in range (len(listposts)):
    
    try:

        document = types.Document(content=listposts[x]["text"], type=enums.Document.Type.PLAIN_TEXT)
        sentiment = client.analyze_sentiment(document=document).document_sentiment
        sscore = round(sentiment.score,4)
        smag = round(sentiment.magnitude,4)

        listposts[x]["score"] = sscore
        listposts[x]["magnitude"] = smag
        
    except Exception as e:
        print(e)
        listposts[x]["score"] = 0
        listposts[x]["magnitude"] = 0

You will need to replace the variable “yourNLPAPIkey” for the path were your NLP API key is hosted. Now that we have gotten the sentiment and magnitude scores, let’s download all the data into an Excel file with Pandas.

import pandas as pd
 
df = pd.DataFrame(listposts)
df.to_excel('<filename>.xlsx', header=True, index=False)

You will only need to substitute <filename> for the name that you want to give to your Excel file. My Excel file with 18 posts scraped from the FC Barcelona official Facebook page looks like:

For some of the posts the NLP API module has not been able to calculate the magnitude and attitude score as they were written in Catalan and unfortunately, its model does not support Catalan language yet.

When you are going to interpret and analyze the magnitude and attitude scores, it is important to know that:

  • Magnitude score calculates how EMOTIONAL the text is. Scores between 0 and 1 will convey no emotion, between 1 and 2 will convey low emotion and higher than 2 will convey high emotion.
  • Attitude score calculates if a text is about something Positive, Negative or Neutral. Results under 0 will convey a negative attitude and over 0 they will convey a positive attitude. Obviously, the closer to 1 or -1 the score is, the stronger the positive or negative attitude would be whereas the closer to 0 the score is, the more neutral the attitude would be.

3.- Calculating correlations

Finally, to make our analysis much more complete and understand the relationships between variables, we will calculate the Pearson correlations and p-values for different metrics.

This can be an interesting analysis as you would be able to understand if for instance, the community that you are analyzing responds better when the post which is published is very emotional or when it is more emotionally neutral or if they prefer negative or positive attitude posts.

Does it make sense to think that users on Facebook respond better to negative news than positive news or that users interact much more with a brand when the posts is highly emotional? This sort of hypothesis are the ones you can answer with this technique.

To run our example, we will create a list with the likes, magnitude scores and attitude scores with the code which is below and we will calculate their correlations and p-values:

from scipy import stats

listlikes = []
listscore = []
listmagnitude= []

for x in listposts:
    listlikes.append(x["likes"])
    listscore.append(x["score"])
    listmagnitude.append(x["magnitude"])


correlation, p_value = stats.pearsonr(listmagnitude, listlikes)

The correlation between magnitude scores and likes for the FC Barcelona posts is 0.006 and between attitude score and likes is 0.10. This mean that emotions does not make too much impact on how the posts perform, but if the post is positive, it will impact a little positively in the number of likes.

However, in both cases the p-value is very high, 0.67 and 0.97, so at least with the small sample of FC Barcelona posts that I have scraped, there is no statistical significance and the correlation could be caused by a random chance. The lower the p-value is, the higher the statistical significance is.