Data Handling and Preprocessing#
This session provides an introduction to data preprocessing in Python using the pandas library, tailored to medical research and clinical datasets. Participants will work with synthetic patient data that includes demographics, admission and discharge information, vital signs, and lab values.
Topics Covered#
Introduction to pandas#
DataFrames and Series
Basic exploration and manipulation of tabular data
Data Cleaning#
Merging patient and lab datasets
Correcting data types
Detecting and handling implausible values
Handling missing values
Removing duplicates
Data Transformation#
Encoding categorical variables
Feature Engineering
Normalization and Scaling
You can download the slides here