There are a lot of APIs available by google with each of them having their application in various fields which makes the work easier while the development of mobile applications, website development and many more. One such API is the YouTube Data API v3 by google. It provides features including:
Search for videos,
Retrieve information of videos from youtube either of channel or of particular video i.e. likes/dislikes, comments, etc.
Can start the youtube video directly from the application.
As described above there are other applications of youtube API. Now you're going to use one such application of Youtube API in the field of data science using Python.
Getting Started with YouTube API
Activate API from Google console
Create a New Project:
Login to Google, if you don't have an account create one and then login.
Visit the Google developer dashboard, create a new project from the top of the page as in below image, click “Select a project” from the top :
Google developer dashboard
Select a project window
Once done you will be automatically redirected to the Google APIs dashboard.
The next step is to activation of YouTube API, so for that navigate to the API & Services from the side panel.
Then click on Enable API & Services from the top of the page.
Search for Youtube and then select YouTube Data API v3.
Select YouTube Data API v3 page
API library that Enables the API
API key created
Extraction of Data from YouTube Channels
Link the Google Collab with the Google Drive
Google collab provides all the features as provided by the Jupyter notebook on local machines. To store the data like datasets, images, videos to work on, it will be stored in the google drive the same way stored in a local machine.
Apart from that it also provides a free GPU with 12 Gb of RAM, but only supports Python 2.7 and Python 3.6.
Mount the Google drive on Google Collab
After you run the below code in the cell it will mount the google drive with google collab, just after entering the authorization code got from the URL in the output.
Code illustration: 1
After this step, you can access any of the folder/files in the google drive from the google collab. Now just create a new folder in google drive for this project and run the below cell.
Code illustration: 2
Install the required libraries
In the google collab, all the libraries are pre-built but while using in jupyter notebook or any other editor you have to install using pip command.
You will be using google api python client library to access youtube data and pandas library to apply exploratory data analysis on the extracted data.
To install these in windows run the below commands in the command prompt.
Code illustration: 3
Code illustration: 4
Import the packages
Now it's time to start with the coding part to get insights from youtube data. So the first step towards this is to import the required libraries as mentioned above i.e. pandas, google client library,
Code illustration: 5
Set the Youtube Parameters
As explained above, you have generated the YouTube API key from the google console page. Now it's time to use that API, so here set some parameters to be used in future steps including “youtube API key”, version of API.
Code illustration: 6
“YouTube Data API” provides so many functions to retrieve all kinds of data from the youtube of particular channels, videos or playlists, and many more. There are many resources available with this API to retrieve.
Apart from this, there are some other functionalities supported by this API to insert, update or delete on youtube, but for that, there is need of authorization while generating API. For now, you ‘re going to retrieve function only and for that, there is no need for authorization while generating API.
Some of the resources available to retrieve which are used in the following steps are:
Search: It will be used to find the information about the channel by providing the “channel name” as search parameters and retrieve the channel, which will be useful for retrieving statistics, uploaded videos.
Channel: It contains information about youtube channels including total subscribers of the channel, total uploaded videos, total likes/dislikes, comments on all videos, and other information.
Run the below cell to perform a youtube search by API calling and will save the data in the list.
You need to use snippets property for youtube search as it contains the basic information of the channel.
Code illustration: 7
As from the output of the above cell, you can see the details of the all channels associated with the provided one. The output of search results stored in a list that will become a dictionary after execution.
All the basic information related to all the channels will be retrieved after the execution of the following code:
Code illustration: 8
As seen from the output, it can be observed that there are four other channels which are associated with the given one, and including the other information for that channel like published date, channel ID, channel title, description, thumbnails, published time, etc. has been retrieved, all this information stored in the snippet.
Next, you will find the channel ID of the first in the list from the details above,
Code illustration: 9
Statistics of the Channel
Now find the statistics of the channel by using channel ID where you get all the details of the total subscribers, views, videos uploaded, likes/dislikes, etc. of the channel.
For this step, you have to use of the channel resource by YouTube Datav3 API through passing channel IDS as a parameter.
Code illustration: 10
ContentDetails of the Channel
In this step, you will be using this property for channel resources to retrieve the information for the content of the channel, including uploads- playlists, i.e. playlist id of uploads with which you can find all the uploaded videos on the channel since the creation of it.
Code illustration: 11
Now, this is another resource that is used to retrieve the details of the playlist uploads including all the uploaded videos. But also possess some limitations, i.e. at the one time, you can only get results of 50 videos maximum to get all the videos in a single run, and also using the next page token parameter that will be useful to retrieve the details of the next page.
So here, extracting all the videos with their details and saving it in a list as in below code,
Code illustration: 12
From the below cell, you can see the data gets retrieved from the playlists as it contains the video ID and title.
Code illustration: 13
Next, retrieve the video IDs, and in the next cell, the statistics of all the videos, including total likes/dislikes, comments, views on the video is presented.
Code illustration: 14
Now, retrieving the content details of all the videos and will store them in lists after then save it to disk as a csv file.
Code illustration: 15
Code illustration: 16
Analyzing the YouTube extracted Data
Till now, you have extracted the information of all the videos and saved them in one csv file using pandas. Now it's time to get some more insights into the dataset.
For that you can make use of a python library -pandas, which provide various functions to get insights into the dataset like,
Total number of videos,
Getting counts of unique values, i.e. using value_counts() method,
Most liked the video, disliked video, most commented video,
Most viewed the video,
Video with the maximum number of comments, likes, dislikes,
Maximum Number of likes, dislikes, comments.
So at first read that csv file from the disk,
Code illustration: 17
Counts of unique values,
Code illustration: 18
In the same way, unique values of comments and dislikes can be found.
Code illustration: 19
Code illustration: 20
From the above cell, the description of the video shows the maximum number of likes, dislikes, comments, and views on the videos.
Using “value_counts” you can also get the number of videos corresponding to the likes, dislikes, comments, and views.
Now it's time to get to know about the videos with the maximum number of likes, dislikes, comments, views. For that, first, you have to find the index of videos corresponding to the most liked, commented, disliked, and viewed the video,
Code illustration: 21
After this, you can access the video information, i.e. video title, video Id, URL likes, comments, etc. using an index.
Most liked video is;
Code illustration: 22
Most Viewed Video is;
Code illustration: 23
Most Commented video is;
Code illustration: 24
For the complete code go through this Github repository. After this, there can be some more insights that can also be retrieved i.e. sentiment analysis of the comments on videos can be done.
YouTube Data API is used to extract the information from the youtube channel using Python. Information includes the details corresponding to each video uploaded to that channel, i.e. channelId, number of videos, upload Id, the maximum number of likes, comments, views, total subscribers of the channel, published date, published time of the channel and videos as well.
All these steps will be followed by the generation of youtube API from the google console. After this analysis of the data extracted from the youtube, which is in csv format, will be done to get some interesting insights from the datasets as “which video is the most liked video”, “video with maximum comments”, “video that is most viewed”.