Ph.D.: University of California, Los Angeles
Master: University of California, Los Angeles
Bachelor: Beijing Institute of Technology.
A person’s orthopedic condition can be detected from his/her biomechanical features. Advances in medical devices made possible biomechanical measurements for patients who bear potential orthopedic abnormalities. Data mining techniques can be utilized to perform disease detection and phenotyping when applied towards biomechanical data. The project is based on a dataset originated from the vertebral column dataset in UCI machine learning repository1, built by Dr. Henrique da Mota during a medical residency in the Group of Applied Research in Orthopedics (GARO) in France. Six numerical biomechanical features are derived from the shape and orientation of the pelvis and lumbar spine: pelvic incidence, pelvic tilt, lumbar lordosis angle, sacral slope, pelvic radius, grade of spondylolisthesis. Accurately classifying orthopedic conditions using easily acquired measurements will contribute to prompt diagnosis and effective targeted treatments.
The instructor has 3 years’ teaching assistant experience in statistics, R/Python programming and mathematical modeling at UCLA for lower-class undergraduate courses and have led multiple data analytics final projects. This course will equip the student with basic knowledge and coding techniques to perform quantitative analysis on a research problem with results presented as deliverables upon completion.
1.Introduction to basic concepts in statistics and data mining methods
2.Introduction to coding in R
3.The student is expected to perform exploratory analysis on a real-life data set, apply appropriate data mining algorithms to perform classification task and interpret the results.
4. The student will be exposed to English scientific paper reading, summary, and scientific report writing.
The following sessions are designed for the fundamental knowledge of the project. Student has to understand all the details of it before moving on to the next stage
Introduction to statistics and data mining: basic concepts and motivating examples
Lab: R installation, main interface, package installation, documentation, variable assignment, naming convention, math calculation, basic data types in R, data I/O, variable types
Project: How to ask research questions for a given data set? Read example scientific papers. pick a domain and a dataset.
● what is probability
● how do we determine probabilities?
● probability distributions
● conditional probability
● computing cond. probability
● Odds and odds ratio
Practice: calculate probabilities and conditional probabilities
Project: Background, literature search
Understanding the data: what is data, exploratory analysis, summary statistics, understand the equations and calculate by hand, distributions
Lab: table indexing/subsetting/combination, summary statistics, data visualization (plots), for loop
Project: Perform exploratory analysis on the chosen dataset, what conclusions can you draw from the data? Read example scientific papers.
Data preprocessing: normalization, imputation, removing abnormal values.
Lab: normalization, imputation, removing abnormal values, writing a function
Project: preprocessing the given dataset
Modeling categorical relationships: Example study, Pearson’s Chi-squared test, contingency tables and the two-way test, odds ratios, Simpson’s Paradox
Project: how can this be applied to your dataset?
Modeling continuous relationships: Example study, covariance and correlation, correlation and causation
Project: how can this be applied to your dataset?
Generalized linear models: linear regression, multivariate linear regression, interactions between variables, cross-validation
Lab: fitting a linear model, perform cross-validation
(Comparing means: hypothesis testing )
Scientific report writing
a. It will introduce the student to basic concepts in statistics and probability theory
b. The data analysis routine introduced and the coding techniques are transferable to future data analysis projects.
c. Cultivate scientific statistical thinking and literature reading skills.
Data mining, Informatics, Statistics
This course is designed for pre-college level students. The student is expected to perform literature summary, accomplish assigned tasks to reinforce statistical concepts and draft up the final report offline, under guidance when appropriate. Critical thinking will be promoted throughout the course, while the student is encouraged to investigate ‘why’ and ‘what if’ independently. Curiosity is the best teacher.
At the end of the course, the student will:
1. Be familiar with the data analytics routine
2. Know basic concepts of statistics and probability, able to perform calculation by hand/in R.
3. Build confidence in performing basic coding in R.
4. Explore application background and current techniques in the chosen domain
5. Sharpen academic literature search and academic writing skills.