How to Become TikTok Famous 🎶

This project is a part of the Data Blog at DataResolutions. The article published on this project, along with other articles from Data Blog, can be found on Medium.

Project Motivation/Description 💫

The purpose of this project is to explore various aspects of TikTok—specifically the demographics of TikTok's most followed users and the analytics of trending videos on TikTok. Questions we wanted to answer include: what are the most popular "genres" on TikTok, what can we say about the most common demographics of the top TikTokers (is there an even distribution of gender, ethnicity, etc.), are there specific genres that men perform better in than women (also the same for different ethnicities).

Technologies Used 💻

R
tidyverse, ggplot2
Python
NumPy, Pandas, Matplotlib, SeasBorn, Plotly, jupyter

Getting Started 📄

Clone this repository (for help see this tutorial).
Raw Data is being kept here within this repo.
Data processing/visualization scripts are being kept here

Data Collection 📂

For the data collection, we had a different method of collection for our tiktok videos dataset and our top tiktokers dataset.

TikTok Videos

TikTok data collection article

TikTok API

For the collection of our TikTok video data, we used David Teather's Unofficial TikTok Api. Using the api's getSuggestedUsersbyID method, we used the usernames of the top 250 TikTokers and collect around 1,700 suggested users. We then collected data of 25 videos from each of 1,700 suggested users with byUsername method.

In addition to the TikTok video data, we created a script that used the bySound method of the api to collect data of videos that use some of the most famous songs on TikTok to get an idea of how the choice of music can impact the potential of a video to become a trending video.

Top TikTokers

For the collection of the top TikTokers, we use the simple webscraping module on Python BeautifulSoup. The scripts and html pages used can be seen here in this repo. For the webscraping, we found information for the first 6 columns of our dataset from InflueceGrid. The remaining information was manually entered, citing sources such as Famous Birthdays and TikTok.

This dataset top-250-tiktokers.csv looks into the demographics of the top 250 TikTokers and is used to explore possible commanalities of the top influencers on the platform.

It consists of 14 columns:

Rank
Username
Country
Followers
Views (median of 15 most recent videos collected on 10-24-2020)
Likes (median of 15 most recent videos collected on 10-24-2020)
Engagement (Views + Likes + Shares / 15 most recent videos, collected on 10-24-2020)
Brand Account (1 for true, 0 for false)
Gender (Based on what information is public about their identity)
Age
Ethnicity
Famous (1 for if they were already famous before TikTok, 0 for not)
Genre (categories that would define their main content)
LBGTQ (1 if they are a member of the LGBTQ community, 0 if not. Based on what information is public about their identity)

get-csv Webscraping

Simply using the get-csv.py script will allow you to obtain the first 6 columns of top-250-tiktokers.csv

Get the necessary libraries to obtain and parse the html of the data pip install beautifulsoup4 pip install urllib3
Using urllib.request and BeautifulSoup you can retrieve the html of any page and make it "parsable". (Note: some sites are secure and can deter you from reading their url)

#open a url (my_url should be a string of the url)
uClient = urlopen(my_url)
page_html = uClient.read()
uClient.close()
#save the page as "soup" that can be parsed
page_soup = soup(page_html, "html.parser")

Following commands from the BeautifulSoup Documentation, you can get the necessary information from the html page and write it into a file (using commas and newline to make it a CSV file).

Adding remaining information

We were not able to find a cohesive list of information on the demographics of the top TikTokers, so we manually entered in the remaining columns.

Data Cleaning 🚿

TikTok Videos

In our tiktok_data_cleaner.py script, we created two functions to help clean our video data: hashtag_cleaner and data_cleaner. hashtag_cleaner extracts the hashtags from the video's description and stores it in a list. data_cleaner takes in a TikTok dictionary from the api and compiles/organizes the relavent data into a pandas data frame. With these two functions, we cleaned the video data we collected from 1,700 suggested users.

For the datasets involving the popular songs on TikTok, we first obtained the dictionaries from the api's bySound method and then collected the useful metrics into a pandas dataframe to make manipulating the data easier. Then we created some summary statistics to find out which songs had usable data to create visualizations that would help us answer the questions we had raised.

Top TikTokers

For the top-250-tiktokers.csv dataset, we first removed the unnecessary symbols from our numerical column entries (such as the % or 'm' to indicate a million) and converted those datatypes from object to float. For the remaining columns, the entries were entered in manually so we did not have to clean up any possible duplicate/nonsensical values/entries that appeared in the data.

Data Analysis 📊

Our analysis can be found here. Each member of the team contributed to delevoping visualizations for our article, half working on Trending TikTok videos and the other half on demographics of the Top 250 TikTokers. The technologies we used is mentioned at the beginning of the README.

TikTok Videos

Based on what TikTok says it's algorithm uses to recommend videos to users, some of the most important factors to make a video go viral are how it appeals to a variety of users and how users interact with these videos. Based on TikTok's article regarding the "For You" recommendation process and general ideas that are used on other social media sites, we identified some of the important metrics for a successful video to be the length of the video, the hashtags used and the songs/sounds used. Once we identified these metrics and had our datasets ready, we proceeded to create visualizations that could explore the connections these metrics have with the performance of videos. This was done by comparing how these metrics impacted the number of likes, plays, shares and comments a video got. Another trend we looked at was how a more popular TikToker can impact the creations of smaller TikTokers, and how this impacts the chance of video going viral.

Top TikTokers

First we wanted to get a general sense of what demographics the top 250 TikTokers fall under. We used various categorical plots such as barplots from Seaborn, pie charts from Matplotlib, and a treemap from Plotly to have different and interesting visuals of the age, country, gender, and ethnicity distrbution of the users. We also wanted to see the distribution of genders and ethnicites of users specifically in the U.S.A (the country with the most top followed TikTokers) who were not previously famous to see what the most probable demographics of a "breakout" TikTok star would be. For this, we used a sanky diagram from Plotly to see the proportions of newly famous TikTokers in the US based on gender, ethnicity, and the main genre of their content. Next we wanted to compare the differences in engagement between the top male and female users. We used barplots from Seaborn to compare the likes/views per each of the top genres for males and females on the platform. We also used a catplot from Seaborn with some added vertical line plots from Matplotlib to visualize the percentage difference in likes per view that men and women received in each of the top genres.

Contributing DataRes Members 💪

Name	Position	GitHub Handle
Ivan Tran	Team Lead	@ivantran96
Kaushik Naresh	Member	@kaushiknaresh47
Isha Shah	Member	@ishashah146
Madison Kohls	Member	@madisonkohls

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Analysis		Analysis
Datasets		Datasets
.DS_Store		.DS_Store
README.md		README.md
README1.md		README1.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How to Become TikTok Famous 🎶

Project Motivation/Description 💫

Technologies Used 💻

Getting Started 📄

Data Collection 📂

TikTok Videos

Top TikTokers

get-csv Webscraping

Adding remaining information

Data Cleaning 🚿

TikTok Videos

Top TikTokers

Data Analysis 📊

TikTok Videos

Top TikTokers

Contributing DataRes Members 💪

About

Releases

Packages

Contributors 4

Languages

datares/TikTok_Famous

Folders and files

Latest commit

History

Repository files navigation

How to Become TikTok Famous 🎶

Project Motivation/Description 💫

Technologies Used 💻

Getting Started 📄

Data Collection 📂

TikTok Videos

Top TikTokers

get-csv Webscraping

Adding remaining information

Data Cleaning 🚿

TikTok Videos

Top TikTokers

Data Analysis 📊

TikTok Videos

Top TikTokers

Contributing DataRes Members 💪

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages