Data Handling and Preprocessing#
This session provides an introduction to data preprocessing in Python using the pandas library, tailored to medical research and clinical datasets. Participants will work with synthetic patient data that includes demographics, admission and discharge information, vital signs, and lab values.
Learning Objectives#
Understand the role of data preprocessing in clinical research and machine learning.
Get familiar with pandas as the core library for handling tabular data in Python.
Practice working with structured medical datasets containing typical data quality issues.
Topics Covered#
Introduction to pandas#
DataFrames and Series
Basic exploration and manipulation of tabular data
Data Cleaning#
Merging patient and lab datasets
Correcting data types
Detecting and handling implausible values
Handling missing values
Removing duplicates
Data Transformation#
Encoding categorical variables
Feature Engineering
Normalization and Scaling
Slides used for this session can be downloaded as PDF [here](Data Handling and Preprocessing.pdf)