Category
>Machine Learning

How to Extract & Analyze YouTube Data using YouTube API?

Ripul Agrawal
Jun 04, 2020
Updated on: Jun 04, 2020

Introduction

There are a lot of APIs available by google with each of them having their application in various fields which makes the work easier while the development of mobile applications, website development and many more. One such API is the YouTube Data API v3 by google. It provides features including:

Search for videos,
Retrieve information of videos from youtube either of channel or of particular video i.e. likes/dislikes, comments, etc.
Can start the youtube video directly from the application.

As described above there are other applications of youtube API. Now you're going to use one such application of Youtube API in the field of data science using Python.

Prerequisite

Google Account
Python 3
Anaconda or GoogleColab: If you want to use your local machine then install Anaconda on the local machine and start Jupyter Notebook there. Or else you can use Google Colab just to save your time in installation and memory usage on a local machine as it runs on the cloud and provides GPU.

Getting Started with YouTube API

Activate API from Google console

Create a New Project:

Login to Google, if you don't have an account create one and then login.
Visit the Google developer dashboard, create a new project from the top of the page as in below image, click “Select a project” from the top :

Google developer dashboard

Now Click on “New Project” as in the below one and proceed.

Select a project window

Once done you will be automatically redirected to the Google APIs dashboard.

The next step is to activation of YouTube API, so for that navigate to the API & Services from the side panel.
Then click on Enable API & Services from the top of the page.
Search for Youtube and then select YouTube Data API v3.

Select YouTube Data API v3 page

After that Enable the API by clicking on the Enable as shown in the below figure.

API library that Enables the API

Now again click on the API & Services and select credentials. Navigate to the Create Credentials from the top of the page in that select API key.

Credentials Page

Once clicked after some time a pop up will come with the message API key created from there you will get our API key as alphanumeric. Copy that and keep it safe for further use.

API key created

Extraction of Data from YouTube Channels

Link the Google Collab with the Google Drive

Google collab provides all the features as provided by the Jupyter notebook on local machines. To store the data like datasets, images, videos to work on, it will be stored in the google drive the same way stored in a local machine.

Apart from that it also provides a free GPU with 12 Gb of RAM, but only supports Python 2.7 and Python 3.6.

Mount the Google drive on Google Collab

After you run the below code in the cell it will mount the google drive with google collab, just after entering the authorization code got from the URL in the output.

Code illustration: 1

After this step, you can access any of the folder/files in the google drive from the google collab. Now just create a new folder in google drive for this project and run the below cell.

Code illustration: 2

Install the required libraries

In the google collab, all the libraries are pre-built but while using in jupyter notebook or any other editor you have to install using pip command.

You will be using google api python client library to access youtube data and pandas library to apply exploratory data analysis on the extracted data.

For Windows:

To install these in windows run the below commands in the command prompt.

Code illustration: 3

For Anaconda:

Code illustration: 4

Import the packages

Now it's time to start with the coding part to get insights from youtube data. So the first step towards this is to import the required libraries as mentioned above i.e. pandas, google client library,

Code illustration: 5

Set the Youtube Parameters

As explained above, you have generated the YouTube API key from the google console page. Now it's time to use that API, so here set some parameters to be used in future steps including “youtube API key”, version of API.

Code illustration: 6

“YouTube Data API” provides so many functions to retrieve all kinds of data from the youtube of particular channels, videos or playlists, and many more. There are many resources available with this API to retrieve.

Apart from this, there are some other functionalities supported by this API to insert, update or delete on youtube, but for that, there is need of authorization while generating API. For now, you ‘re going to retrieve function only and for that, there is no need for authorization while generating API.

Some of the resources available to retrieve which are used in the following steps are:

Search: It will be used to find the information about the channel by providing the “channel name” as search parameters and retrieve the channel, which will be useful for retrieving statistics, uploaded videos.

Channel: It contains information about youtube channels including total subscribers of the channel, total uploaded videos, total likes/dislikes, comments on all videos, and other information.

Snippets

Run the below cell to perform a youtube search by API calling and will save the data in the list.

You need to use snippets property for youtube search as it contains the basic information of the channel.

Code illustration: 7

As from the output of the above cell, you can see the details of the all channels associated with the provided one. The output of search results stored in a list that will become a dictionary after execution.

All the basic information related to all the channels will be retrieved after the execution of the following code:

Code illustration: 8

As seen from the output, it can be observed that there are four other channels which are associated with the given one, and including the other information for that channel like published date, channel ID, channel title, description, thumbnails, published time, etc. has been retrieved, all this information stored in the snippet.

Next, you will find the channel ID of the first in the list from the details above,

Code illustration: 9

Statistics of the Channel

Now find the statistics of the channel by using channel ID where you get all the details of the total subscribers, views, videos uploaded, likes/dislikes, etc. of the channel.

For this step, you have to use of the channel resource by YouTube Datav3 API through passing channel IDS as a parameter.

Code illustration: 10

ContentDetails of the Channel

In this step, you will be using this property for channel resources to retrieve the information for the content of the channel, including uploads- playlists, i.e. playlist id of uploads with which you can find all the uploaded videos on the channel since the creation of it.

Code illustration: 11

PlaylistItems

Now, this is another resource that is used to retrieve the details of the playlist uploads including all the uploaded videos. But also possess some limitations, i.e. at the one time, you can only get results of 50 videos maximum to get all the videos in a single run, and also using the next page token parameter that will be useful to retrieve the details of the next page.

So here, extracting all the videos with their details and saving it in a list as in below code,

Code illustration: 12

From the below cell, you can see the data gets retrieved from the playlists as it contains the video ID and title.

Code illustration: 13

Next, retrieve the video IDs, and in the next cell, the statistics of all the videos, including total likes/dislikes, comments, views on the video is presented.

Code illustration: 14

Now, retrieving the content details of all the videos and will store them in lists after then save it to disk as a csv file.

Code illustration: 15

Code illustration: 16

Analyzing the YouTube extracted Data

Till now, you have extracted the information of all the videos and saved them in one csv file using pandas. Now it's time to get some more insights into the dataset.

For that you can make use of a python library -pandas, which provide various functions to get insights into the dataset like,

Total number of videos,
Getting counts of unique values, i.e. using value_counts() method,
Most liked the video, disliked video, most commented video,
Most viewed the video,
Video with the maximum number of comments, likes, dislikes,
Maximum Number of likes, dislikes, comments.

So at first read that csv file from the disk,

Code illustration: 17

Counts of unique values,

Code illustration: 18

In the same way, unique values of comments and dislikes can be found.

Code illustration: 19

Code illustration: 20

From the above cell, the description of the video shows the maximum number of likes, dislikes, comments, and views on the videos.

Using “value_counts” you can also get the number of videos corresponding to the likes, dislikes, comments, and views.

Now it's time to get to know about the videos with the maximum number of likes, dislikes, comments, views. For that, first, you have to find the index of videos corresponding to the most liked, commented, disliked, and viewed the video,

Code illustration: 21

After this, you can access the video information, i.e. video title, video Id, URL likes, comments, etc. using an index.

Most liked video is;

Code illustration: 22

Most Viewed Video is;

Code illustration: 23

Most Commented video is;

Code illustration: 24

For the complete code go through this Github repository. After this, there can be some more insights that can also be retrieved i.e. sentiment analysis of the comments on videos can be done.

Conclusion

YouTube Data API is used to extract the information from the youtube channel using Python. Information includes the details corresponding to each video uploaded to that channel, i.e. channelId, number of videos, upload Id, the maximum number of likes, comments, views, total subscribers of the channel, published date, published time of the channel and videos as well.

All these steps will be followed by the generation of youtube API from the google console. After this analysis of the data extracted from the youtube, which is in csv format, will be done to get some interesting insights from the datasets as “which video is the most liked video”, “video with maximum comments”, “video that is most viewed”.

Latest Comments

chiau8526g

Nov 13, 2020

Hi Ripul,

chiau8526g

Nov 13, 2020

Thanks your sharing that I can connect youtube API by myself. But I have a problem on Code illustration: 15, colab showed

chiau8526g

Nov 13, 2020

" File "<ipython-input-41-c5c8e3f0769c>", line 8 disliked.append(int((stats[i])['statistics']['dislikeCount']) ^ SyntaxError: invalid syntax"

chiau8526g

Nov 13, 2020

And I don't know why? Could you help me? Thank you very much!

3 Replies

Ripul Agrawal

Dec 08, 2020

Hey Chiau, I am sorry for the delay. But I have dropped you a mail for the same around 10 days back. So let me know if you are still struck with that error. Thanks !

ParthRangarajan

Dec 16, 2020

Hey please can you help with this too. I am getting the same error.

Ripul Agrawal

Dec 18, 2020

reach me out at ripulagrawal98@gmail.com

Ripul Agrawal

Dec 09, 2020

For any queries, please reach out to me at ripulagrawal98@gmail.com

utkarsh.singh

Dec 30, 2020

Your code doesn't work for any channels containing more than 20k videos. 20k is the max it can return.

ahnafzahin06

Oct 19, 2021

I have a problem with "Code illustration: 12". When I am trying to do this on google colab it is showing "TypeError: 'method' object is not subscriptable" on "allVideos += res['items'] " line. How can I fix it?

yakisatama

Mar 02, 2022

Hi.. All image on this crashed. Can you fix that?

How to Extract & Analyze YouTube Data using YouTube API?

Introduction

Prerequisite

Getting Started with YouTube API

Activate API from Google console

Create a New Project:

Extraction of Data from YouTube Channels

Link the Google Collab with the Google Drive

Mount the Google drive on Google Collab

Install the required libraries

Import the packages

Set the Youtube Parameters

Snippets

Statistics of the Channel

ContentDetails of the Channel

PlaylistItems

Analyzing the YouTube extracted Data

Conclusion

Share Blog :

Trending blogs

Latest Comments

chiau8526g

chiau8526g

chiau8526g

chiau8526g

Ripul Agrawal

ParthRangarajan

Ripul Agrawal

Ripul Agrawal

utkarsh.singh

ahnafzahin06

yakisatama