The Last Diary

The last entry for Statistics Diary will be about statistical diary. This series of diaries was for class assignment, and keeping it helped me practice relating things I learn in class to my day-to-day life. Sometimes I struggled to find topics I can write about, because nothing particularly "statistical" happened that day. That pushed me to ask my friends for inspiration, which often lead to interesting discussions. Coming from math background, I used to think of statistics as something objective like math. I realized that it's not always obvious how one can interpret statistics, or even agree on what is the statistics that matter or not. I'm quite surprised and proud that that I kept this diary regularly--I'm going to try to have some elements of statistics diary in my secret personal diaries from now on.

Machine Learning in Unity

One of my friends showed me a demo of machine learning in Unity, a Virtual Reality development software. It was just a simple demo where a square object learned its way closer to the target object (circle)--if the square object reaches the target, it is rewarded, and if it falls out of the boundary, it is given penalty. There seems to be a lot of interest in entertainment/art related application of AI/Machine Learning, and I am excited to see how "fun" these technology can be.

Fake News

One of the presenters today discussed statistical method to detect "fake news". The presentation had a lot of nice visualization of the sources of fake news and scoring system of fake news. Although the topic itself was something that drew a lot of attention from the audience (us), I don't think it was a good example of statistical quantification because for such a strong terminology, exact definition of "fake news" is different for everyone. 

Shiny Shiny!

For final project for Communicating Data & Statistics class, I collaborated with a PhD student at School of Social Works to design a tool for Child Protective Services workers. We have been working on this project for a good two months, and today I finally managed to finish the app written in Shiny (R package). After I finished the app, I took screenshots of it and sent it to my boyfriend, who said that it was "shiny"! At first I thought he meant it to be a pun, but turns out he didn't know about Shiny (R package). Props to whoever named Shiny, Shiny!

Waiting for Godot and Memorylessness

In Stochastic Processes class today, I learned about memoryless property of an Exponential distribution. Statistically, it is:  P(T>t+s | T>s) = P(T>t). It refers to how the waiting time does not depend on how much time has passed. After the statistical derivations, our Professor made an analogy to Waiting for Godot--no matter how long the wait time is, probability that Godot will arrive is still the same.

Predicting Alzheimer's Disease

My boyfriend explained his data science project today. He is trying to predict alzheimer's disease based on a huge set of data from USC--medical data (MRI results, etc), demographic data, questionnaire answers. Using all of the data, his accuracy during test was around 85 percent. Surprisingly, just by predicting with the questionnaire answers, his accuracy was 75%, which is not too different from when you use all of the data. He also found out that the accuracy was 60% when he just used one question (recalling words). It was interesting for me how just one simple question could predict whether someone had the alzheimer's disease or not.

Recognizing Difficulty Level

I've been thinking about how difficult it is to explain the difficulty of math or statistics that I'm studying to other people who don't have the relevant background knowledge. For example, my mom (who majored in English Literature and Music) can't tell the difference in difficulty level when I complain about my research tasks. This also means that you become more expert in an area when you can tell apart different difficulty levels of tasks.

Predicting Startup's Future

I've been thinking a lot about startups lately, first because I am applying for a startup accelerator program with my friend, and second because my older sister who has been working in a startup for five years finally started to generate revenue. For me, it is really difficult to see if a startup idea would work or not--naturally I thought about if there would be a statistical method to predict such thing. There must be many predictors for a startup to succeed, such as the amount of initial investment, size of the team, etc. But after seeing my sister for the past five years, I think patience and dedication are the most important factors of success. How would that be measurable, though? 

Approximating Art Consumption

My friend asked today how much audience are we expecting for our collaborative artistic installation. We translated the question into "how much of the American population would be interested in our project?". Our guess started from the total population, around 300 million. Consider the 100 million population between the oldest and youngest 100 million. Our project has to do with both music and sculpture. Let's say about one out of 10 adults go to artist events (galleries, shows, etc.). That's 10 million. Of that 10 million show-goers, if one out of 10 people like our project, that's 1 million people! Unfortunately I'm not sure how optimistic or pessimistic we're being here.

Decoding Mind with Statistics

I did a really cool experiment in Brain Computer Interface (BCI) class today. We learned about mu waves, which are related to motor cortex. We trained a classifier by having the subject imagine the left/right movement (thinking in mind about which fist to clench). BCI paradigm like this is used for developing tools for people who have limited mobility (people who can't move their hands or talk). For example, they can control wheelchair directions, or can spell by directing on a screen of letters. I'm really glad that I took this class because I finally see how useful LDA (linear discriminant analysis) and clustering that I learned in math/statistics class can be applied to practical tools.

My First Data Scraping!

For the first time, I scraped data from online using Python. I used a widely-known package Pandas and BeautifulSoup, and it was even magical to be able to use online data for my codes. I wanted to scrape something that is truly meaningful for me, so I scraped a table from Wikipedia article about Lady Gaga's songs that were ever recorded. I made a simple little plot of her prolificness by counting the songs recorded by year (answer: 2011, the Born This Way era!) 

Tangential Examples

At the Thanksgiving dinner I was invited to yesterday, one person brought up how she thinks some of the things professor mentions in this one class are "too" tangent and uninteresting--she explained that she doesn't like it when she has to learn something that seems irrelevant to her, such as "using this concept/ideas/formulas, we can learn that there may be a star far, far away that is yet to be discovered." Indeed, the tangential topics in math/statistics classes work and don't work depending on the students' interest. It is upon instructor's discretion to decide whether certain example will help or hinder learning.


Thanksgiving eve (last night), I met my friends from high school. During the three years of high school, we were all living in a dorm. We've all grown a bit since then, but after our little reunion I wondered what kind of indicators are there to predict where we will be--a topic we discussed while we ate our dinner. Would we be in Korea? America? Or some other place? Whether we can stay in America is pretty unpredictable, and so is the probability that my friends will be on the same continent as I am. 

Generational Struggles

Often when I have discussion with my mom, I realize what we each think is the “norm” can differ greatly. My friends, from all places, agree that they have the generational discrepancy on what “normal” is. I wonder if we can ever mitigate this difference by statistics, using it as a tool to analyze what people’s thoughts and cultures are. If those things could be quantified, it can be represented as stochastic process.

Inside Joke

Since I'm not a native English speaker, it is often difficult to understand the inside jokes, whether it's during casual/social conversations or in academic context. Sometimes you have to really know a statistical concept to "get" what others are saying. For example, I finally realized that the company Two-Sigma's name wasn't just a random one--it had to do with "I'm the top 5 percent" mindset! It takes time to really let some concept soak in, and for some reason this made me feel like I was getting more sense out of mere formulas and textbooks. 


I really enjoy reading horoscopes, even though I am aware that they are not really based on scientific evidence. 1 out of 12 people in the world is a gemini like me, and we're all given same prospect on our upcoming days. In Korea, we have something similar to horoscopes too--there are experts who consult on people's fate in many aspects of life (finance, love, health, etc.) and the first thing they ask is the date and time of birth. In middle school, my history teacher explained why ancient people believed that moment of birth was so important: In the scheme of infinite time, a life time is as small as one day. (70years/infinite ~ 1 day/infinite ~0). I guess the logic was that the day you start your life defines the rest of your life.

Sonification of Data

I'm taking a class on data next semester, which is titled 'Sonic and Visual Representation of Data'. Data visualization is something I hear a lot these days and more familiar to, but not as much emphasis has been put in representing data in auditory way. Wikipedia defines sonification as "the use of non-speech audio to convey information or perceptualize data." The instructor for this class will be a geophysicist, and he has done some sonification of volcanic activity. I'm hoping to do some projects that connects physical data (like heart rate) to music, and hopefully apply to my smartwatch-music playlist app idea. 

General Questions are Hard

I talked to one of the professors in applied mathematics to discuss my research question, and the topic of 'information' (in the statistical context) came up. I've heard about Information Theory from my boyfriend, because he is currently taking the class. He has told me that the subject is very abstract and heavy in probability theories. Anyways, professor explained how people started to question how to measure 'information' in an more objective way, and that led to theorizing on it. I thought that the more general or low-level our question is, the more difficult theorizing it is. Answering a general question, in most of the aspects of life, is difficult. 

Tips On Writing Non-Technical Articles

The guest lecturer today was Susan Matthews who works as a science editor in Slate Magazine. Of the many tips she gave to class on writing a non-technical article, one thing that I found the most useful was to first write about who I am--what my background is, and what my special interests or research areas are. Susan says she sometimes asks scientists to write about themselves before she edits the research part. This process lets the scientist and the editor to understand what kind of background knowledge is assumed, and help the editor clarify on that for the less technical audience. When I did my assignment for writing a non-technical article, I was just focusing on the audience that I didn't focus too much about who I am--this advice will help me revise my article.

Solemn Statistics

I read an article from The Guardian titled, 'Richest 1% own half the world's wealth, study finds.' The article included number of statistics that compared the growth of wealth in the top 1% after the financial crisis, to the growth of wealth in the lower-income classes. It also touched on how millennials are doing worse off than their parents. When I read articles like this, that gives a (kind of a) shocking statistical fact and relates it to how it can affect my life, I feel like I stepped on the scale to see that I have gained 20 pounds--solemness.