Data Handling and Preprocessing

Data Handling and Preprocessing#

This session provides an introduction to data preprocessing in Python using the pandas library, tailored to medical research and clinical datasets. Participants will work with synthetic patient data that includes demographics, admission and discharge information, vital signs, and lab values.

Topics Covered#

Introduction to pandas#

  • DataFrames and Series

  • Basic exploration and manipulation of tabular data

Data Cleaning#

  • Merging patient and lab datasets

  • Correcting data types

  • Detecting and handling implausible values

  • Handling missing values

  • Removing duplicates

Data Transformation#

  • Encoding categorical variables

  • Feature Engineering

  • Normalization and Scaling

You can download the slides here