501 Project zl483

About Me

Hello Everyone! I am Zixuan Li! I also go by Shirley if it is easier for all of you. I had my Bachelor degree in Penn State majored in Applied Data Science in IST. Which did not give me a strong stat background but some python, R and SQL experiences. That is also the main reason I wanted to continue study in the same area because I feel like I need something more about it. I am already preparing for internships in the area of data science. My hobbies out of this field is exploring food, sending time with my puppies, playing games and spending time in art museums.

Introduction

My topic is about the pet owners' situations before and after the COVID vaccines are made. And I will focus on how their budget on their pets and life style changes based on this topic.

Pet is a big part of life for many people around the world. From my experience as a dog owner, my lifestyle has been changed a lot. Therefore, I am trying to seek for something related.

Q1: The housing related to pet owners. Is the decision making on housing somehow depended on their pet?

Q2: How has the neiborhood they live changed after the COVID vaccine has been made?

Q3: What are their thoghts on the vaccines?

Q4: Is visiting the vet been different before and after the vaccine has been made?

Q5: Pet owners opnions of brands of vaccines.

Q6: Pet owners thoughts after pet related COVID news/articles been published.

Q7: Did the population of animal shelters got affected?

Q8: How likely do pet owners intend to go out before and after the vaccine?

Q9: How does housing affect on theor thoughts? For example, living in a house that has a yard compare to without, or living in an apartment.

Q10: Overall, is the vaccine making pet owners taking care of their pets eaiser?

My Dog

Data Gathering

R API Data

This data is drawn from twitter API using R by the query "my dog"

Python API Data

This data is drawn from twitter API using Python by the query "pet friendly"

Exploring Data

Naive Bayes

I generated a word cloud based on the search query of pet, dog, cat, vet, puppy, kitty.

Then plot of the labeled positive and negitive data with the tweets' word frequency is shown as below. x-axis shows the number of words. y-axis shows the label. We can clearly see the gap of positive and negative tweets, which is around 12000 from 7000 tweets.

Modeling Process

I first gathered tweets based on the search query of pet, dog, cat, vet, puppy, kitty. Then cleaned them using pandas package. Such as remove user ids, punctuations and URLs from tweets text. Then saperated the text by spaces to better process the modeling part.

Because the tweets about pets could be biased, I want to know the distribution of positive words and negative words in them. So I found word packages on kaggle, where the source website is in the My Sources tab. Then labeled positive words as 1 and negative words as 0 for future embedding.