A Data Science Assignment

It has been quite difficult to write statistics diary recently, because I ran out of topics to write about. So today, I asked my boyfriend about anything statistics related thoughts he had been having. A student himself, he told me about a data science assignment he just finished. The task was to run logistic regression to predict cancer, but each of the 60 training samples had 7000 DNA parameters! He had to take multiple steps such as bootstrapping, cross validation, and PCA to run the regression. In assignments like this, it is interesting to learn how we can simplify the messier, real-world data. But unlike assignments, in the real world the task does not specify which methods we learned from math and statistics classes we should use, which is something I think we should also think about when we do this type of assignment.