In the morning, I received an email with the subject, 'When English Majors Earn More Than Engineers'. The main point of this article from Vault, is that if you are one of the good English majors, you are better off than mediocre engineers. This article used statistics (like "The top quarter of earners who majored in English make more over their lifetimes than the bottom quarter of chemical engineers") in an optimistic way for the English majors, in terms of jobs prospective. However, the same exact statistics can be used to defend the opposite side--even if you are one of the top performers in English, you can only make as much as an okay engineer. I guess this article's logic is convincing as long as people assume that English majors make really, really little.
Writing about math is really hard. It's because A) I have to be ABSOLUTELY sure what I'm writing about, even the underlying theorems that lead up to what I'm writing about, and B) I have to deliver in an order that will make sense to whoever I'm writing for. It's quite like teaching, but even more challenging because I can't adjust to my audience's response on the go, or use interactive tools (writing on the board, drawing, etc.) Hopefully I will be able to finish an article about the top-down neural network research I'm working on by the end of this year.
Today I had lunch with my friend who is doing a Master's program in anthropology. Her boyfriend studies neuroscience/psychology. She said she and her boyfriend often argues because while her boyfriend studies humans based on controlled experiments and statistical inferences, she thinks that "there is nothing more untrustworthy than numbers". It was an interesting perspective, because although numbers give some idea on what's going on, with regards to human's mind and behavior it is almost never the truth. Next time I meet her I would like to discuss this matter more in depth.
I enjoy keeping track of things about me. For example, I've kept a diary since first year of high school, and have been tracking my weight and diet-related things on app since this summer. Another data I've been tracking for years is my period, since 2011. I got my period today, which was only 20 days since the last time (my average cycle is 35 days), so this got me to think what better method can I use to predict what will happen to my body, based on these data I've been tracking.
Incredible coincidence freaks me out... For example, today I was shocked that my best friend gave me Babka Cake from Bread's Bakery because just yesterday I learned that their Babka is the best and bookmarked them on Yelp. And I have NOT discussed this with my friend ever. What is the odd, that A) my friend will generously give me a Babka cake, B) exactly right after I thought of getting Babka, and C) get the cake from EXACTLY where I was thinking of going?
During the final project meeting, me and my teammate decided to include dynamic gauge charts in our Shiny-coded app for Child Protective Services. My immediate job is to find an example of R Shiny that has a nice interface for moving gauge meters, similar to what we saw in New York Times' Live Presidential Forecast.
I've been thinking about this idea for a while: I wear my Fitbit all the time when I'm awake, and I realized that the Fitbit app has been collecting all that data since I first bought it in July. I'm interested in studying the correlation between the physical parameters (heart rate, walking/running speed) and emotion (calm, excitement, etc.) If I can model somehow the relationship, I can develop a music recommendation system that will boost my workout more effectively by choosing the "right" song at the right moment. I should look into Fitbit's API..
Today, I went to an event on AI and music. One of the presenters was Drew Silverstein from Amper Music, a startup that uses AI to create original music (e.g. music for commercials). I was impressed by their demo where they select inputs such as "genre", "mood", and "time", and then played a beautiful, structured music. Drew mentioned that unlike other AI tasks (classification, prediction), the metric of "right" or "wrong" is ambiguous for music creation, because how we perceive music is subjective. Their demo showed that while tasks like music creation is different from traditional AI tasks, statistics and math can contribute and assist in creative process.
I changed my Facebook profile picture just now after my boyfriend took a really nice photo of me today. We are now watching how much "likes" I'm getting, and he said it could be related Poisson Distribution. I didn't recall the exact details of Poisson distribution, so I looked it up: generally, it seems that poisson distribution is an appropriate model for events that are independent and occur in constant rate. In this sense, I guess Facebook "likes" can be modeled by Poisson distribution. Although I learned Poisson distribution and memorized the formula for its probability density function, etc., I've never had a chance to think of it in terms of real-life application.
I make a lot of review-based decisions in my life. Today, I was shopping for Kaya Jam (jam made from coconut that is widely used in Malaysian and Singaporean cuisine) on Amazon. One product was overwhelmingly better in terms of reviews and ratings compared to other products. However, for some reason the 4.5 stars were not convincing enough to purchase that product, because I didn't like the packaging (it was too "westernized", and I wanted something truly authentic). This reminded me that no matter how much data analysis or statistical method is provided, it is never the definitive answer. It maybe is merely an additional "tool".
Statistically, does haste really make waste? I can be really hasty, and statistically, I would say out of 10 incidents I’ve been hasty about 4 times resulted in waste. For example, when I barely made it to the downtown train at 125th station, I realized that I carelessly got on the A train, not the D train I needed to get on. One may argue that there’s 40% of chance of wasting your time when you’re hasty.
However, when I got off at the next station—which I have never been to because I never took A train before—I realized that D train was running at the same platform. This opens up so much new traveling itinerary for me, and I wouldn’t have had figured out if I wasn’t hurrying. I guess my point is, while it might seem ‘haste make waste’ is statistically significant, there’s always more beyond the numbers.
Me and my friends were talking about an acquaintance who went to Juilliard but decided to go to business school after graduation. Being a professional pianist for the past four years, it was a big change in his life to prepare for exams like GRE and studying math and statistics all over again. His current roommate is a Columbia graduate who studied statistics, and apparently he has been very "frustrated" teaching the formerly-pianist. This made me think about how difficult it can be to communicate across different disciplines, because the way people think are wired in different ways over the years.
Today in class, we discussed that sometimes statistics is just treated as a tool that confirm's scientists' hypothesis. This made me think about how sometimes statistical or mathematical analysis are focused on the result, not the procedures or the meaning. When I worked as a mathematics teaching assistant, sometimes students from economics department came to me to ask about solving an equation with derivatives. Often they wanted me to just solve the answer, so that they can move on to the less technical part of their assignment. In this sense, I guess it is important to sometimes realize that people may have clear task for me and that I have to provide my analysis according to it.
One of the assigned reading today was a list of advices about Statistical Consulting from David Rindskopf. What resonated the most with me was the importance of understanding the task. So often I am focused on "solving" part of the problem that I neglect the problem itself. In research, this issue becomes more important because I need be able to clearly define my task and goals in order to make progress.
I've been trying to learn how to use this Shiny thing in R, and today in class the creator of GGPlot starred as a guest lecturer. It was interesting that he named the package GGPlot, GG standing for Grammar of Graphics. In some way, all packages are grammar of something--for example, Python's Numpy is grammar for matrix operations and Matplotlib is grammar for graphing. A lot of people write their own packages and have them available in public in github. I think when writing these packages, it is important to remind yourself that you're constructing grammar for a specific task, and think in the shoes of the users so that the package is easy to learn.
Topic of birth control came up with my friends, and we talked about effectiveness of condoms and IUD. While we all seemed to agree that IUD is the most effective and economical option, one friend pointed out that we should really try to understand the meaning of the statistics that evidence IUD effectiveness. They say that while condoms are effective 95% of time, and IUD for 99%, we shouldn't jump to conclusion that IUD is the superior method of birth control--rather, the 95% statistics for condoms count for cases where condoms are misused, and the number itself doesn't do justice to the effectiveness of condoms. This made me think about how I should present numbers and how I should interpret the numbers, depending on my intentions and goals.
This Korean comic book I read couple of years ago came into my mind today. It was about this boy who fell in love with this girl who people thought was a "witch", because any man who was attracted to her/got closer to her ended up hurt or dead. The girl suffers from the guilt and discrimination from the town, and becomes a recluse, and the protagonist decides to major in Statistics to prove that the statistical facts do not prove that she actually is a witch. I forgot how exactly the comic book ended, but the guy successfully proves himself that the girl isn't a witch because he, after analyzing all possible scenarios from the data, manages to survive even when he gets closer to the girl. I think it's interesting because it reminded me that even if statistical facts sound really, really convincing, it is never the "truth".
Since yesterday, I've been going to supermarkets in the Morningside/Harlem area to find Halotop Pumpkin Pie flavor. Unfortunately, none of the supermarkets had it. Today, I visited Halotop's website and found out that there was a locator page. It has a beautiful interface where I could select the desired flavor, and it will tell me the closest store that has it. I thought it was a wonderful application of interactive user interface that allowed to narrow down the database.
UNFORTUNATELY, the data must not be accurate/updated because when I walked to 131st Street Foodtown on Frederick Douglass Boulevard, they did NOT have the pumpkin pie Halotop. No matter how awesome the data-application is, if the ingredient (data) is not fresh it is of no use!
I came across an interesting article by Harvard Business Review, "How to Spot a Machine Learning Opportunity, Even If You Aren’t a Data Scientist." I enjoyed reading it because it was very inviting overall. Although I took classes on machine learning and research on neural network, I remember that just two years ago when I didn't know how AI worked these all sounded super intimidating for me. This article had just enough amount of math to communicate why this idea of "learning" from past examples (inputs and outputs) can work for tasks like prediction and classification. I think combined with some dynamical graph (like the one from Tensorflow Playground Website), this article can benefit wide range of people.
During class today, we looked into interactive graphs using Shiny. Towards the end, one of the students brought up an example called Tweet Analyzer that used twitter data and calculated metrics such as "Reputation" or "Sentiment". While it was really "cool" to see a real-time data was used as ingredient for an interactive plot, I thought about what the goal of these very visual and "flashy" applications should be. Since I'm learning dynamic graphs, I should be careful not to be too focused on the visual part but also keep in mind what kind of message I want to make clear with the graphs.