How to Extract & Analyze YouTube Data using YouTube API?

  • Ripul Agrawal
  • Jun 04, 2020
  • Machine Learning
  • Updated on: Jun 04, 2020
How to Extract & Analyze YouTube Data using YouTube API? title banner

Introduction

 

There are a lot of APIs available by google with each of them having their application in various fields which makes the work easier while the development of mobile applications, website development and many more. One such API is the YouTube Data API v3 by google. It provides features including: 

 

  • Search for videos,

  • Retrieve information of videos from youtube either of channel or of particular video i.e. likes/dislikes, comments, etc.

  • Can start the youtube video directly from the application.

 

As described above there are other applications of youtube API. Now you're going to use one such application of Youtube API in the field of data science using Python.

 

 

Prerequisite

 

  • Google Account

  • Python 3

  • Anaconda or  GoogleColab: If you want to use your local machine then install Anaconda on the local machine and start Jupyter Notebook there. Or else you can use Google Colab just to save your time in installation and memory usage on a local machine as it runs on the cloud and provides GPU.

 

 

Getting Started with YouTube API

 

Activate API from Google console

 

  1.  Create a New Project:

 

  • Login to Google, if you don't have an account create one and then login.

  • Visit the Google developer dashboard, create a new project from the top of the page as in below image, click “Select a project” from the top :

 


This image is showing, the first step towards the API generation i.e. first step to creating a new project on Google Console, by navigating to "Select a project"

Google developer dashboard


 

  • Now Click on “New Project” as in the below one and proceed.

 


This image is showing the page where a new project will be created and all the projects will be listed here only. So developer can move to the desired one.

Select a project window


 

Once done you will be automatically redirected to the Google APIs dashboard.

 

  • The next step is to activation of YouTube API, so for that navigate to the API & Services from the side panel.

  • Then click on Enable API & Services from the top of the page.

  • Search for Youtube and then select YouTube Data API v3.     

 


       This page is showing all the YouTube APIs available. you have to use YouTube Data API v3

Select YouTube Data API v3 page


 

  • After that Enable the API by clicking on the Enable as shown in the below figure.

 


This is displaying the You Tube Data API v3 page, from where you can enable it to further create API and it will then be usable in our code later.

API library that Enables the API


 

  • Now again click on the API & Services and select credentials. Navigate to the Create Credentials from the top of the page in that select API key.

 


 

This is the credentials page of the You Tube Data API, from where you can generate the API key. As shown in the image.

Credentials Page


 

  • Once clicked after some time a pop up will come with the message API key created from there you will get our API key as alphanumeric. Copy that and keep it safe for further use.

 


This image is showing that API key has been generated and you gonna use the API key in our code by copying it from here.

API key created



 

Extraction of Data from YouTube Channels

 

  1. Link the Google Collab with the Google Drive

 

Google collab provides all the features as provided by the Jupyter notebook on local machines. To store the data like datasets, images, videos to work on, it will be stored in the google drive the same way stored in a local machine.

Apart from that it also provides a free GPU with 12 Gb of RAM, but only supports Python 2.7 and Python 3.6.

 

  1. Mount the Google drive on Google Collab

 

After you run the below code in the cell it will mount the google drive with google collab, just after entering the authorization code got from the URL in the output.

 


This displaying, how to mount google drive to google collab so as to load/write files from google drive to collab.

Code illustration: 1


 

After this step, you can access any of the folder/files in the google drive from the google collab. Now just create a new folder in google drive for this project and run the below cell.

 


In this image you have to change the current working directory to the folder where all the files will be stored in the drive.

Code illustration: 2


 

  1. Install the required libraries

 

In the google collab, all the libraries are pre-built but while using in jupyter notebook or any other editor you have to install using pip command.

 

You will be using google api python client library to access youtube data and pandas library to apply exploratory data analysis on the extracted data.

 

For Windows:

 

To install these in windows run the below commands in the command prompt.

 


In this code we are installing libraries that are used while retrieving data from you tube i.e. "google-api-python-client" and "pandas".

Code illustration: 3


 

For Anaconda:

 


            The code is used for installation of libraries "google-api-python-client" and "pandas" but only if working in anaconda prompt.

Code illustration: 4


 

  1. Import the packages

 

Now it's time to start with the coding part to get insights from youtube data. So the first step towards this is to import the required libraries as mentioned above i.e. pandas, google client library,

 


In this code import the libraries that you installed above, because only after importing in python code you can use their functions.

Code illustration: 5


 

  1. Set the Youtube Parameters

 

As explained above, you have generated the YouTube API key from the google console page. Now it's time to use that API, so here set some parameters to be used in future steps including “youtube API key”, version of API.

 


In this code set important parameters like API key generated to retrieve data and also version of API to be used.

Code illustration: 6


 

“YouTube Data API” provides so many functions to retrieve all kinds of data from the youtube of particular channels, videos or playlists, and many more. There are many resources available with this API to retrieve. 

 

Apart from this, there are some other functionalities supported by this API to insert, update or delete on youtube, but for that, there is need of authorization while generating API. For now, you ‘re going to retrieve function only and for that, there is no need for authorization while generating API.

Some of the resources available to retrieve which are used in the following steps are:

 

  • Search: It will be used to find the information about the channel by providing the “channel name” as search parameters and retrieve the channel, which will be useful for retrieving statistics, uploaded videos.

 

  • Channel: It contains information about youtube channels including total subscribers of the channel, total uploaded videos, total likes/dislikes, comments on all videos, and other information. 

 

  1. Snippets

 

Run the below cell to perform a youtube search by API calling and will save the data in the list.

 

You need to use snippets property for youtube search as it contains the basic information of the channel.

 


Specifying how you tube search will be done for the particular you tube channel using "channel name",

Code illustration: 7


 

As from the output of the above cell, you can see the details of the all channels associated with the provided one. The output of search results stored in a list that will become a dictionary after execution.

 

All the basic information related to all the channels will be retrieved after the execution of the following code:

 


In this image, the retrieved data from the you tube search is displayed using "snippet" variable, i.e - channel title, channel ID, published date, description etc.

Code illustration: 8


 

As seen from the output, it can be observed that there are four other channels which are associated with the given one, and including the other information for that channel like published date, channel ID, channel title, description, thumbnails, published time, etc. has been retrieved, all this information stored in the snippet.

 

Next, you will find the channel ID of the first in the list from the details above,

 


Code illustration: 9


 

  1. Statistics of the Channel

 

Now find the statistics of the channel by using channel ID where you get all the details of the total subscribers, views, videos uploaded, likes/dislikes, etc. of the channel.

 

For this step, you have to use of the channel resource by YouTube Datav3 API through passing channel IDS as a parameter.

 


In this image, statistics of the channel will be retrieved using its channel ID, to find the details of subscribers, video counts, view counts and how information is retrieved.

Code illustration: 10


 

  1. ContentDetails of the Channel

 

In this step, you will be using this property for channel resources to retrieve the information for the content of the channel, including uploads- playlists, i.e. playlist id of uploads with which you can find all the uploaded videos on the channel since the creation of it.

 


Finding the details of content of uploaded videos on the channel for getting upload Id of the upload playlist.

Code illustration: 11

 

  1. PlaylistItems

 

Now, this is another resource that is used to retrieve the details of the playlist uploads including all the uploaded videos. But also possess some limitations, i.e. at the one time, you can only get results of 50 videos maximum to get all the videos in a single run, and also using the next page token parameter that will be useful to retrieve the details of the next page.

 

So here, extracting all the videos with their details and saving it in a list as in below code,

 


This code is showing , the retrieving of videos from the channel using nextPageToken

Code illustration: 12


 

From the below cell, you can see the data gets retrieved from the playlists as it contains the video ID and title.

 


This image is displaying the presentation of data from retrieved videos that includes title, description, and video ID.

Code illustration: 13


 

Next, retrieve the video IDs, and in the next cell, the statistics of all the videos, including total likes/dislikes, comments, views on the video is presented.

 


This code is specifying the retrieving of video ID of all videos and statistics of the videos including view counts, likes,comments etc.

Code illustration: 14


 

Now,  retrieving the content details of all the videos and will store them in lists after then save it to disk as a csv file.

 


This image showing the retrieving of all videos with its details into separate lists. Details include - video title, video description, likes/dislikes, comments,views, videoID.

Code illustration: 15


 


This image is showing the creation of data frame to store the you tube extracted data. In next step saving the data set into disk.

Code illustration: 16


 

 

Analyzing the YouTube extracted Data

 

Till now, you have extracted the information of all the videos and saved them in one csv file using pandas. Now it's time to get some more insights into the dataset.  

 

For that you can make use of a python library -pandas, which provide various functions to get insights into the dataset like,

 

  • Total number of videos,

  • Getting counts of unique values, i.e. using value_counts() method,

  • Most liked the video, disliked video, most commented video,

  • Most viewed the video,

  • Video with the maximum number of comments, likes, dislikes,

  • Maximum Number of likes, dislikes, comments.

 

So at first read that csv file from the disk, 

 


In this code read the csv file into python code which stored the data of YouTube In the same image there is display of how data is arranged in csv.

Code illustration: 17


 

Counts of unique values, 

 


This image is showing the count of unique values corresponding to each feature. It will also give the idea about counts of videos with maximum views/likes .                

   Code illustration: 18


 

In the same way, unique values of comments and dislikes can be found.

 


This image is just showing the information about the dataset and features extracted i.e. how many non-null values are there, datatype of values (object or int64)

Code illustration: 19


 


This image displays the statistics of data stored i.e mean value of features, maximum, minimum values , standard deviation etc.

Code illustration: 20


From the above cell, the description of the video shows the maximum number of likes, dislikes, comments, and views on the videos. 

 

Using “value_counts” you can also get the number of videos corresponding to the likes, dislikes, comments, and views.


Now it's time to get to know about the videos with the maximum number of likes, dislikes, comments, views. For that, first, you have to find the index of videos corresponding to the most liked, commented, disliked, and viewed the video,


The code is used to find the index of videos with maximum likes, comments, views.

Code illustration: 21


 

After this, you can access the video information, i.e. video title, video Id, URL likes, comments, etc. using an index.

 

Most liked video is;


This is displaying the most liked video on the channel, which you can get using the index

Code illustration: 22


 

Most Viewed Video is;


This is displaying the most viewed video of the channel.

Code illustration: 23


 

Most Commented video is; 


This is the displaying the video with maximum comments of that channel.

Code illustration: 24


 

For the complete code go through this Github repository. After this, there can be some more insights that can also be retrieved i.e. sentiment analysis of the comments on videos can be done. 

 

 

Conclusion

 

YouTube Data API is used to extract the information from the youtube channel using Python. Information includes the details corresponding to each video uploaded to that channel, i.e. channelId, number of videos, upload Id, the maximum number of likes, comments, views, total subscribers of the channel, published date, published time of the channel and videos as well. 

 

All these steps will be followed by the generation of youtube API from the google console. After this analysis of the data extracted from the youtube, which is in csv format, will be done to get some interesting insights from the datasets as which video is the most liked video”, “video with maximum comments”, “video that is most viewed”. 

0%

Comments