Senior Data Engineer

Peter H.
Dao

15+ years building scalable data pipelines & HIPAA-compliant platforms · Healthcare & enterprise · Sacramento, CA

Databricks Apache Spark Azure Data Lake Power BI SSIS / Airflow Python · SQL · C# HIPAA · PHI Microsoft Fabric
data pipeline · Sutter Health production
Source Ingest Transform Validate Warehouse BI / Dash
Peter H. Dao
15+
Years IT / Data
7+
Data Engineering
10M
Records / Day
50%
Latency Reduced
About

Professional Summary

Senior Data Engineer and IT Strategist with over 15 years of full-stack software and analytics experience, including 7 years designing and operating scalable, HIPAA-compliant data platforms at Sutter Health. Proven track record architecting ETL pipelines, big-data solutions (Spark, Databricks, Microsoft Fabric), and data warehouses for healthcare operations — driving up to 50% reductions in pipeline latency and ensuring 100% PII/PHI regulatory adherence. Recognized mentor and cross-functional leader skilled at translating business needs into high-value BI dashboards and predictive models. My journey in computing spans five decades: from Fortran on a Mitral 15 minicomputer in Paris (1975) to real-time streaming analytics serving clinicians across California today.

Career

Work Experience

Sutter Health
Senior Data Engineer — Cloud Data & Platform Engineering
2019 – Present
Roseville, CA
Current · 6+ yrs
  • Engineered batch and near-real-time ETL pipelines in Databricks & SSIS processing 10M+ records/day; reduced end-to-end latency by 50%.
  • Built a Python-based data-quality framework integrated into orchestration pipelines (Airflow, SQL Agent) for automated schema-drift detection, reducing production errors by 75%.
  • Architected a HIPAA-compliant Azure Blob Storage data lake with automated PII/PHI masking and role-based access controls ensuring 100% regulatory adherence.
  • Delivered Power BI dashboards within 4 hours of data ingestion, enabling data-driven decisions in clinical operations.
  • Authored comprehensive pipeline specifications and data-model diagrams, cutting new-hire onboarding time by 30%.
Sutter Health
Senior Technology Analyst — Web & DBA
2009 – 2019
Roseville, CA
10 yrs
  • Led migration of on-prem SQL Server data warehouse to cloud-based platform, improving query performance by 60%.
  • Developed .NET services and SSIS workflows to ingest and standardize multi-source healthcare data into the enterprise warehouse.
  • Streamlined Excel-based ETL macros and SSRS reports, saving 10+ hours of manual effort weekly.
  • Mentored 10+ junior engineers on SQL performance tuning and data-governance best practices.
Sutter Health
Data Analyst / Programmer
2007 – 2009
2 yrs
  • Created complex Excel macros and MS Access applications to pre-process HR data feeds, improving data-prep speed by 80%.
  • Collaborated with end users to translate business requirements into dynamic reporting tools and dashboards.
Vision Service Plan (VSP) — IT Division
Senior Application Developer
1998 – 2006
Rancho Cordova, CA
8 yrs
  • Managed full SDLC for web-based claims applications: requirements analysis, prototyping, and technical documentation.
  • Implemented an intranet portal reducing manual communication tasks by 70%; trained new developers on .NET best practices.
Ross Stores
Senior Programmer Analyst
1989 – 1997
San Francisco / San Jose, CA
8 yrs
  • Developed and maintained mission-critical business applications in C and Windows programming for a national retail chain.
  • Conducted systems analysis and design, coordinating requirements across Bay Area offices.
Projects

Selected Achievements

Healthcare · Real-Time

Streaming Analytics Pilot

Engineered a real-time telemetry pipeline using Spark Structured Streaming, enabling sub-second latency and empowering clinicians with live insights from 50+ devices.

Spark Streaming Kafka Databricks Azure
HR Data · HIPAA

Enterprise Workday Data Lake

Designed HIPAA-compliant Azure Blob Storage data lake with role-based access and automated PII masking; supported 5M+ records/month with zero compliance breaches.

Azure Blob Python PII Masking RBAC
Performance · ETL

ETL Performance Optimization

Optimized SSIS workflows and Python scripts for parallel execution, significantly reducing compute overhead and improving pipeline throughput across the enterprise DW.

SSIS Python SQL Server Parallelism
Microsoft Fabric · Retail

Fabric Retail Analytics Platform

Built end-to-end retail analytics solution using Bronze/Silver/Gold medallion architecture. Designed star schema and automated pipeline from ingestion to Power BI dashboards.

Microsoft Fabric PySpark Power BI DAX
Data Quality · Automation

Automated Data Quality Framework

Python-based framework integrated into Airflow orchestration pipelines for automated quality checks detecting schema drift; reduced production errors by 75%.

Python Airflow Great Expectations
Cloud Migration · DW

On-Prem to Cloud DW Migration

Led migration of on-premises SQL Server data warehouse to cloud-based platform, optimizing architecture to improve query performance by 60% across healthcare operations.

SQL Server Azure .NET Services SSIS
Tech Stack

Skills & Competencies

Languages & Frameworks
Python SQL PySpark C# Spark .NET / ADO.NET DAX
Platforms & Databases
Databricks Microsoft Fabric Azure Data Factory SQL Server Oracle MySQL DB2 Kafka
ETL & Orchestration
SSIS Airflow Azure Blob / OneLake Medallion Architecture Star Schema ETL / ELT
BI & Reporting
Power BI SSRS Crystal Reports KPI Dashboards
DevOps & Tools
Git / GitHub Azure DevOps JIRA Linux / Unix VS Code Jupyter
Governance & Compliance
HIPAA / PHI Safeguards PII Masking RBAC Data Governance Data Quality Schema Drift Detection
Education

Academic Background

M.S.
Information Systems
Golden Gate University
San Francisco, CA
Thesis: PCs in Small Business
B.S.
Business Administration
San Francisco State University
San Francisco, CA
Information Systems concentration
A.A.
Computer Science
City College of San Francisco
San Francisco, CA
Fortran · C · COBOL · Systems Analysis
Certifications

Certifications & Continuous Learning

DP-700: Microsoft Fabric Data Engineer Associate (Preparing)
Microsoft · Microsoft Learn Fabric Engineering Paths
SSIS & T-SQL
LinkedIn Learning
Intro to Data Engineering & Big Data Analytics
Coursera
Big Data Class & Technical Certification
Dezyre · Simplilearn
Web Development Certificate
UC Davis Extension
My Story

Connecting Friends Around the Globe

1975

Began studying Computer Science in Paris, France. First program written in Fortran on a Mitral 15 minicomputer.

1978

Arrived in San Francisco, enrolled at CCSF. First class: Intro to Data Processing on mainframe computers with keypunch card batches. Studied Systems Analysis, Fortran, C, and COBOL.

1983

Earned B.S. Information Systems from SF State. Worked as computer operator on HP 3000 systems, then attended Golden Gate University for master's degree in the evenings.

1986

Received M.S. Information Systems from Golden Gate University. Thesis: Personal Computers in Small Business. Spent 14 years building career across San Francisco and San Jose.

1997

Relocated to Sacramento, California. Consulted in C and Windows programming; earned Web Development certificate at UC Davis Extension.

1998

Joined Vision Service Plan as Senior Application Developer — building web-based claims applications for one of America's largest vision insurers.

2007

Joined Sutter Health as Data Analyst / Programmer. Progressed through Senior Technology Analyst to Senior Data Engineer, building HIPAA-compliant data platforms serving millions of patients.

Now

Senior Data Engineer at Sutter Health · Earned M.S. Data Science from Regis University (May 2022) · Engineering pipelines processing 10M+ records/day · Preparing for Microsoft Fabric DP-700 certification.