The previous blog includes the extraction of the YouTube channel data using the YouTube Data API v3 including channel title, channel ID, channel videos, video title, comment, likes, views count.
In the continuation of this blog, it's time to get insights into comments posted by viewers on particular videos/channels. Once you extract the comments from YouTube either of particular videos or videos with a particular category or it can be a channel, then they can be further analyzed by using likes/dislikes on every particular comment, sentiment analysis of comments.
All this will be helpful in getting how people are reacting to the channel/videos and can predict community acceptance by analyzing the comments. If you want to go deeper then you can also find the relation between comments and views and can also make estimations of people's engagements on your videos.
Getting Started with YouTube API
The last blog covered the activation of YouTube API from the Google console. If you have missed that then read that for API generation. The difference is that it includes only the activation of API without setting up the OAuth 2.0 consent screen.
It is used for authentication and authorization while using APIs to make some changes on the channel, i.e. to reply to some comment or deletion/insertion of video to Youtube directly using code.
Setting up OAuth 2.0
Some of the steps in the API activation key will be the same as in the previous one, while for the new steps you can follow the below steps. You can also refer to this for reference or the official documentation.
While you are on the APIs& Services dashboard enabling the YouTube API then first of all click on the OAuth consent screen as shown in the following figure.
Google Developer Dashboard
Once clicked it will take you to the following page where after filling up the application name and email address linked with the Google account, save the details.
OAuth 2.0 Consent Screen
The next step is to create credentials, so after selecting Create Credentials, select OAuth client ID.
Credentials Page - click on OAuth client ID
Select an Application type, as Desktop app from the dropdown and enter the project name and click create.
Create an OAuth client ID
The last and final step is, download the JSON file by clicking on the download icon, in the OAuth 2.0 client ID section as in the below figure, rename it as the client_secret.json, and save into the same path where you will store code.
Download - OAuth Client ID JSON file
Extract YouTube Comments
Link the Google collab with the Google drive.
Mount the Google drive on Google collab.
Install the required libraries - last time you used google api python client to access YouTube data and pandas library for the analysis part. In addition to these now have to use some additional libraries including:
Follow the below code for installation.
Code illustration: 1
Import the installed packages
Code illustration: 2
Restrict Access and set YouTube Parameters
First, specify the path to the credential file named as “client_secret.json” and then restrict the access of API to YouTube only by specifying the scopes following by the setting YouTube parameters,
Code illustration: 3
Build the service and get the access token
Follow the below code to build the service so as to use API to extract the YouTube comments. After you run the below cell, click the URL from the output of the cell and get the access token and continue.
Code illustration: 4
Perform YouTube Search on query
Set the query for YouTube search by providing the video title for which you want to extract the comments.
After this run the below code to perform a YouTube search so as to get the snippet of the related YouTube video and will be stored in a dictionary followed by the execution of the list, as in the below figure.
Code illustration: 5
In the below screen you can check how the data is extracted. Basically, it will only extract the video's basic details like video, channel, its description, etc.
Code illustration: 6
Extract Video Details- videoId, channelId, title, description
Run the below cell to extract the video details using the data extracted as a snippet. As you only require the details of one video so just collect the first entry from the list.
Code illustration: 7, Code Credit
Extract Video details - comments
Now let’s move to the next step, i.e. collection of data (or comments), to extract the comments from the videos you need to use comment thread resources by YouTube Data API v3.
But followed by the documentation, it can only extract the first 100 comments from the page so as to extract every comment you can make use of the next page token, as you have used in the last one to extract all the videos at the same time.
Apart from the comments, “you will also save the comment ID”, “how many people replied to the particular comments”, “count of likes on the comment”, and other “video-related details in their respective lists as followed”.
Code illustration: 8
Code illustration: 9
Store comments in CSV file
Now follow the below code to create a data frame
Code illustration: 10
In the below cell output, observe the duplicate comments for the single video.
Code illustration: 11
Follow the below codes to analyze the duplicates comments, you can observe that the majority of the comments are repeating, so you can just drop all of them using the drop_duplicates function and will then create a new data frame named unique_df.
Code illustration: 12
After then finally, write the data into a CSV file.
Code illustration: 13
Now as you have extracted all the comments from YouTube videos it's time to clean those comments as it has many redundant characters that add no importance to the final goal i.e sentiment analysis.
Use the demoji library to remove emojis from the text i.e. comments.
So first read the CSV file in the code using pandas and then using demoji library remove all the emojis from all the comments and create a new feature clean_comments as follows:
Code illustration: 14
After removing emojis from the comments, you need to extract only English comments, so as to not be complex for further analysis. To detect the language of the comments, you can use langdetect library. Run the below cell to detect language and create a new feature i.e. language.
Note: In the extracted comments, at some point, some comments include only numbers that can't be featured as text so use try and catch statements to deal with this error as in the below one.
Code illustration: 15
Just see the below one and to know how languages are detected.
Code illustration: 16
Now you need to extract the English comments only i.e. ‘en’. So run the below one and write into a separate csv file as follows
Code illustration: 17
Remove Special Characters - using RegEx
Till now you have removed the emojis and extracted only English language comments which now become like other common available standard text datasets that only contain texts and no other redundant things.
So, now you have to perform some common pre-processing steps that include the removal of brackets, special characters from the comments.
This step will be carried out by using regex. In this, you have to set an expression that will find all the brackets and special characters excluding numbers and alphabets, and using re, you can also replace those with space or some other character.
Follow the below code to create a new column that will save clean comments without brackets and special characters.
Code illustration: 18
Prepare the Dataset - into CSV file
Now as all the preprocessing has been done, so make a new data frame and only put the Video ID, Comment ID, and Comments i.e. regular comment (from the previous one) and write into a new CSV file.
Code illustration: 19
Note: The next part will cover the Sentiment Analysis of YouTube comments.
For the complete code of the same go through this Github repository.
YouTube Data APIv3 has been used to extract the comments from particular YouTube videos followed by the restricted access of API to YouTube only by specifying scope.
Service has been built to get the access token, so as to use API for the extraction of comments from the YouTube videos/channels i.e. resource comment threads by YouTube API.
Further extracted comments have been pre-processed by the removal of emojis using the demoji module by python, extracting only the English comments using the langdetect module by the same, removal of special characters except for numbers, and alphabets using regex.