科研项目

通过国际线上科研项目,可以更好地迅速提高自身科研能力
为自己申请学校快速完成背景提升

Introduction to Statistical Thinking

2019-7-1

Ph.D.:       University of California, Los Angeles

Master:     University of California, Los Angeles

Bachelor:  Beijing Institute of Technology.


A person’s orthopedic condition can be detected  from his/her biomechanical features. Advances in medical devices made possible biomechanical measurements for patients who bear potential orthopedic abnormalities. Data mining techniques can be utilized to perform disease detection and phenotyping when applied towards biomechanical data. The project is based on a dataset originated from the vertebral column dataset in UCI machine learning repository1, built by Dr. Henrique da Mota during a medical residency in the Group of Applied Research in Orthopedics (GARO) in France. Six numerical biomechanical features are derived from the shape and orientation of the pelvis and lumbar spine: pelvic incidence, pelvic tilt, lumbar lordosis angle, sacral slope, pelvic radius, grade of spondylolisthesis. Accurately classifying orthopedic conditions using easily acquired measurements will contribute to prompt diagnosis and effective targeted treatments.

The instructor has 3 years’ teaching assistant experience in statistics, R/Python programming and mathematical modeling at UCLA for lower-class undergraduate courses and have led multiple data analytics final projects. This course will equip the student with basic knowledge and coding techniques to perform quantitative analysis on a research problem with results presented as deliverables upon completion.

 

 

 

1.Introduction to basic concepts in statistics and data mining methods

2.Introduction to coding in R

3.The student is expected to perform exploratory analysis on a real-life data set, apply appropriate data mining algorithms to perform classification task and interpret the results.

4. The student will be exposed to English scientific paper reading, summary, and scientific report writing.

 

The following sessions are designed for the fundamental knowledge of the project. Student has to understand all the details of it before moving on to the next stage

Time Schedule

Content

0h-2h

Introduction to statistics and data mining: basic concepts and motivating examples

Lab: R installation, main interface, package installation, documentation, variable assignment, naming convention, math calculation, basic data types in R, data I/O, variable types

Project: How to ask research questions for a given data set? Read example scientific papers. pick a domain and a dataset.

2h-4h

Probability theory:

● what is probability

● how do we determine probabilities?

● probability distributions

● conditional probability

● computing cond. probability

● Independence

● Odds and odds ratio

Practice: calculate probabilities and conditional probabilities

Project: Background, literature search

4h-6h

Understanding the data: what is data, exploratory analysis, summary statistics, understand the equations and calculate by hand, distributions

Lab: table indexing/subsetting/combination, summary statistics, data visualization (plots), for loop

Project: Perform exploratory analysis on the chosen dataset, what conclusions can you draw from the data? Read example scientific papers.

6h-8h

Data preprocessing: normalization, imputation, removing abnormal values.

Lab: normalization, imputation, removing abnormal values, writing a function

Project: preprocessing the given dataset

8h-10h

Modeling categorical relationships: Example study, Pearson’s Chi-squared test,  contingency tables and the two-way test, odds ratios, Simpson’s Paradox

Project: how can this be applied to your dataset?

10h-12h

Modeling continuous relationships: Example study, covariance and correlation, correlation and causation

Project: how can this be applied to your dataset?

12h-14h

Generalized linear models: linear regression, multivariate linear regression, interactions between variables, cross-validation

Lab: fitting a linear model, perform cross-validation

14h-16h

(Comparing means: hypothesis testing )

Reproducible research

Scientific report writing

 

 

 

a. It will introduce the student to basic concepts in statistics and probability theory

b. The data analysis routine introduced and the coding techniques are transferable to future data analysis projects.

c. Cultivate scientific statistical thinking and literature reading skills.

 

Data mining, Informatics, Statistics

This course is designed for pre-college level students. The student is expected to perform literature summary, accomplish assigned tasks to reinforce statistical concepts and draft up the final report offline, under guidance when appropriate. Critical thinking will be promoted throughout the course, while the student is encouraged to investigate ‘why’ and ‘what if’ independently. Curiosity is the best teacher.

 

At the end of the course, the student will:

1. Be familiar with the data analytics routine

2. Know basic concepts of statistics and probability, able to perform calculation by hand/in R.

3. Build confidence in performing basic coding in R.

4. Explore application background and current techniques in the chosen domain

5. Sharpen academic literature search and academic writing skills.


我要咨询