資料分析報告架構

報告架構

  1. 緒論
    • 背景
    • 問題/動機
    • 結果簡述
  2. 結果
    • 資料介紹
    • Exploratory Data Analysis/ Analysis
    • Code
    • Results/What we have found
  3. 結論與討論
    • Results recap
    • Answers to the Questions
    • Discussion (Limitation and future studies)

參考資料

Advertisements

Final Project

Possible Topics

  • Data and Story:  Use data summary, plots or data analysis methods you’ve learned in this course or elsewhere analyze a data set to answer some questions or tell a story. You may use the datasets available at R package datasets, Data: R packages. Of course, you can also use other data sets you get from the web.
  • Simulations of theorems or results in probability or statistics: We have done simulations for Law of Large Numbers, Central Limit Theorem. You may construct simulations for other theorems or results of interest to you or your fellow classmates. For example, secretary problem, gambler’s ruin problem, St. Petersberg paradox, Monty Hall problem, just to name a few.
  • Shiny R or R presentation. We will see some examples in class.

Discuss/Check Date: 6/5, 6/7 and 6/14 (若時間需要)。

Upload/Email Submission: (Tentative) Project title/data submission: by 5/31.  Project submission: by AM 11:59 6/4. Submission: You can email your project link to me or pull a request at Final Project Github page.

 

 

Game 8: More on exam data

R notebook for Game 8 (revised)

Mid-Project (Checkpoint: 5/1, A team =3~5 persons or ask Kno)

  • Given the 2016 Stat data set, estimate the final exam scores of 4 students who scores, say, 10, 30, 50, 70 (or more realistic scores of your interest) in the midterm Stat 2017.
  • According to an unidentified yet reliable source, the course grades given by instructor is roughly #A:#B:#C:#(D or E) 1:3:3:3
  • Estimate the course grades of these 4 students. Please provide the assumptions and rationale of your estimation.

You are invited to use R notebook/Rmarkdown for this project. Have fun!

Game 6: 我考得如何?幾分會過?

動機問題:如何理解(老師/助教)公佈的成績資料?如何利用部份訊息來大致推估整體狀況? 參考資料:2016 統計學課網

關鍵實際問題:我考得如何?幾分會過?

If it is reasonable assuming normality, these questions can be easily answered by computing F(x)=P(X \leq x) and F^{-1}(p) where X \sim N(\mu, \sigma^2), the cdf and quantile of a normal random variable. In R, they can be calculated using pnorm, qnorm functions.

Diagnosis and Remedial Measures: In many scenarios, even the data in original scale is far from normal, the normal approach still works after suitable transformations.

R code: g6class.r