Martial D. Bah Bioh

AI Infrastructure Engineer

I specialize in designing and operating high-performance cloud infrastructure for ML/AI workloads at scale. Expert in architecting GPU-accelerated compute environments, Kubernetes-based ML platforms, and MLOps pipelines supporting model training, deployment, and inference.

About Me

Building high-performance AI/ML infrastructure at scale

Professional Summary

AI Infrastructure Engineer with 7+ years of experience designing and operating high-performance cloud infrastructure for ML/AI workloads at scale. Proven expertise in architecting GPU-accelerated compute environments, Kubernetes-based ML platforms, and MLOps/CI-CD pipelines supporting model training, deployment, and inference.

Deep technical background in AWS cloud infrastructure, container orchestration (EKS), distributed systems, and Python/Go automation for AI/ML frameworks (PyTorch, TensorFlow). Successfully deployed production AI systems serving real-time inference, RAG pipelines, and LLM integrations.

NVIDIA-certified in Generative AI LLMs with comprehensive knowledge of GPU infrastructure, AI model optimization, and enterprise-scale ML operations. Demonstrated ability to collaborate with data science, engineering, and cross-functional teams.

7+
Years Experience
9
Professional Certifications
99.9%
Deployment Success Rate
50%
Faster ML Deployments

Technical Skills

A comprehensive toolkit of AI/ML infrastructure technologies, frameworks, and tools I use to build production-grade machine learning systems at scale.

AI/ML Infrastructure

๐Ÿ”—

RAG Systems

Retrieval-Augmented Generation pipelines with pgvector

๐Ÿค–

LLM Integration

GPT-4, Claude, Llama model serving & integration

โšก

GPU Infrastructure

GPU-accelerated compute environments & scheduling

๐Ÿ”„

MLOps

Model training, deployment, and inference pipelines

๐Ÿงช

AWS SageMaker

ML model training and deployment at scale

ML Frameworks & Tools

๐Ÿ”ฅ

PyTorch

Deep learning research and production models

๐Ÿง 

TensorFlow

Machine learning and neural networks

๐Ÿ”—

LangChain

LLM application development framework

๐Ÿค—

Hugging Face

Transformers and model hub integration

๐Ÿ”

pgvector

Vector similarity search for AI applications

Cloud & Infrastructure

โ˜๏ธ

AWS

EC2, EKS, S3, Lambda, SageMaker, ECS, RDS

โš™๏ธ

Kubernetes

EKS, GPU scheduling, Helm, RBAC, CRDs

๐Ÿ—๏ธ

Terraform

Infrastructure as Code for multi-cloud

๐Ÿณ

Docker

Container optimization & multi-stage builds

DevOps & MLOps

๐Ÿš€

CI/CD

GitHub Actions, Jenkins, GitLab CI, ArgoCD

๐Ÿ“Š

Prometheus

AI/ML metrics, GPU monitoring, alerting

๐Ÿ“ˆ

Grafana

ML dashboards and observability

๐Ÿ”ง

Ansible

Configuration management and automation

Programming & Frameworks

๐Ÿ

Python

FastAPI, Flask, Pandas, NumPy, async

๐Ÿ’จ

Go

High-performance CLI tools & microservices

๐Ÿ’ป

Bash

Infrastructure automation scripting

Data & Streaming

๐Ÿ“ก

Kinesis

Real-time data streaming at scale

๐Ÿ“จ

Kafka

Event-driven architectures

๐Ÿ—„๏ธ

PostgreSQL

RDS, Aurora, query optimization

โšก

Redis

Caching and session management

Interested in learning more about my technical expertise?

Professional Experience

A track record of building AI/ML infrastructure, deploying production AI systems, and leading technical teams across healthcare, biotech, and consumer technology companies.

๐Ÿค–

Founding AI Infrastructure Engineer

Stealth Startup
Healthcare AI
2025 - Present
Remote

Key Achievements

  • Architected and deployed end-to-end AI/ML infrastructure on AWS supporting real-time LLM inference, RAG pipelines, and production AI workloads with EKS, RDS PostgreSQL, and S3
  • Designed production-grade Kubernetes (EKS) cluster optimized for AI/ML workloads with GPU-aware scheduling, resource quotas, and horizontal pod autoscaling for inference services
  • Built scalable Retrieval-Augmented Generation (RAG) pipeline using pgvector on PostgreSQL for vector similarity search with sub-100ms query latency at scale
  • Engineered Python-based AI microservices integrating LLMs (GPT-4, Claude) using FastAPI with auto-scaling based on latency and throughput metrics
  • Established MLOps CI/CD pipeline using GitHub Actions with 99.9% deployment success rate, automating Docker builds, security scanning, and canary deployments
  • Configured observability stack with Prometheus and Grafana monitoring AI/ML metrics including inference latency, token throughput, and GPU utilization

Technologies & Tools

AWS EKSpgvectorFastAPIGPT-4ClaudeRAGPrometheusGrafanaPythonTerraformKubernetesGitHub Actions

Career Timeline

Stealth Startup
Founding AI Infrastructure Engineer
2025 - Present
GRAIL
Senior DevOps Engineer - Infrastructure & ML Platform
2022 - 2024
Invitae
Software Engineer - Cloud Infrastructure
2020 - 2022
Ancestry.com
Senior Software Development Engineer in Test
2018 - 2020

Technical Vision

Proposed reference architectures for production ML/AI infrastructure โ€” demonstrating how I approach platform design across different environments and constraints.

Proposed Architecture

On-Prem Autonomous Robotics MLOps Platform

A proposed production-grade MLOps platform for autonomous robotics workloads on bare-metal GPU clusters. Leverages SchedMD Slinky to unify Kubernetes and Slurm scheduling on a shared GPU pool โ€” eliminating resource silos and enabling seamless orchestration of training, simulation, and inference workloads.

KubernetesSlurmSlinkyMLflowKServeSnowflakeS3AirflowGPU ClustersOn-Prem
Layer 1 โ€” Data Foundation
โ„๏ธSnowflakeData warehouse
๐ŸชฃS3Object storage
๐Ÿ”„AirflowOrchestration
โšกGreat ExpectationsData quality
๐Ÿ’พShared StorageNFS / Lustre / Ceph
Layer 2 โ€” Unified Compute (Slinky)
Kubernetes + Slurm (Unified)
Standard K8s Apps
Jupyter Notebooks
KServe Inference
Airflow Workers
K8s Default Scheduler
Slinky Bridge
Training Jobs
Batch Processing
Slurm Scheduler Plugin
Slurm Jobs
MPI Workloads
HPO Sweeps
Simulations
Slurm Queue
๐Ÿ–ฅ๏ธGPU Node(K8s + Slurm hybrid)
๐Ÿ–ฅ๏ธGPU Node(K8s + Slurm hybrid)
๐Ÿ–ฅ๏ธGPU Node(K8s + Slurm hybrid)
Shared GPU Pool โ€” No separate clusters. The same nodes serve both Kubernetes and Slurm workloads, with dynamic resource sharing managed by Slinky.
Layer 3 โ€” ML Lifecycle
MLflowExperiments & registry
๐ŸŽฏKServeModel serving
๐ŸšœEdge FleetAutonomous vehicles

30-60-90 Day Onboarding Plan

A structured approach to ramping up as an AI Infrastructure Engineer โ€” from discovery through delivery and strategic roadmap.

Days 1โ€“30

Learn & Assess

Focus on understanding the current infrastructure, ML workflows, team dynamics, and organizational priorities. Identify quick wins while building institutional knowledge.

Infrastructure Discovery

Map existing compute resources (GPU clusters, on-prem servers, cloud accounts)
Audit current CI/CD pipelines, deployment processes, and monitoring stack
Review infrastructure-as-code repos, runbooks, and incident history
Document network topology, storage architecture, and security posture

ML Workflow Analysis

Shadow data scientists and ML engineers through their daily workflows
Identify bottlenecks in model training, experiment tracking, and deployment
Assess data pipeline reliability, latency, and quality validation practices
Understand model serving requirements (latency SLAs, throughput, edge deployment)

Team & Process Alignment

1:1s with every stakeholder โ€” engineering, data science, product, leadership
Understand team pain points, on-call burden, and technical debt priorities
Review existing OKRs and roadmap to align infrastructure priorities
Identify quick wins that build trust and deliver immediate value

Key Deliverables

Infrastructure audit document with current-state architecture diagram
Gap analysis identifying top 5 bottlenecks and risks
Quick-win action plan (2โ€“3 items deliverable within weeks)
Stakeholder alignment summary with prioritized needs

This plan is adaptable based on organizational maturity, team size, and immediate priorities. The core philosophy: listen first, deliver quick wins, then architect the future.

Professional Certifications

Validated expertise across AI/ML, cloud platforms, infrastructure as code, and container orchestration. Click any badge to verify on Credly.

Get In Touch

Ready to discuss your next cloud project or explore opportunities to work together? I'd love to hear from you.

Let's Connect

Whether you're looking for a cloud engineer to join your team, need consultation on AWS infrastructure, or want to collaborate on AI/ML projects, I'm always open to discussing new opportunities and challenges.

Response Time
Usually within 24 hours

ยฉ 2026 Martial D. Bah Bioh. All rights reserved.