Martial D. Bah Bioh

AI Infrastructure Engineer

I specialize in designing and operating high-performance cloud infrastructure for ML/AI workloads at scale. Expert in architecting GPU-accelerated compute environments, Kubernetes-based ML platforms, and MLOps pipelines supporting model training, deployment, and inference.

About Me

Building high-performance AI/ML infrastructure at scale

Professional Summary

AI Infrastructure Engineer with 7+ years of experience designing and operating high-performance cloud infrastructure for ML/AI workloads at scale. Proven expertise in architecting GPU-accelerated compute environments, Kubernetes-based ML platforms, and MLOps/CI-CD pipelines supporting model training, deployment, and inference.

Deep technical background in AWS cloud infrastructure, container orchestration (EKS), distributed systems, and Python/Go automation for AI/ML frameworks (PyTorch, TensorFlow). Successfully deployed production AI systems serving real-time inference, RAG pipelines, and LLM integrations.

NVIDIA-certified in Generative AI LLMs with comprehensive knowledge of GPU infrastructure, AI model optimization, and enterprise-scale ML operations. Demonstrated ability to collaborate with data science, engineering, and cross-functional teams.

Years Experience

Professional Certifications

99.9%

Deployment Success Rate

50%

Faster ML Deployments

Technical Skills

A comprehensive toolkit of AI/ML infrastructure technologies, frameworks, and tools I use to build production-grade machine learning systems at scale.

AI/ML Infrastructure

🔗

RAG Systems

Retrieval-Augmented Generation pipelines with pgvector

🤖

LLM Integration

GPT-4, Claude, Llama model serving & integration

⚡

GPU Infrastructure

GPU-accelerated compute environments & scheduling

🔄

MLOps

Model training, deployment, and inference pipelines

🧪

AWS SageMaker

ML model training and deployment at scale

ML Frameworks & Tools

🔥

PyTorch

Deep learning research and production models

🧠

TensorFlow

Machine learning and neural networks

🔗

LangChain

LLM application development framework

🤗

Hugging Face

Transformers and model hub integration

🔍

pgvector

Vector similarity search for AI applications

Cloud & Infrastructure

☁️

AWS

EC2, EKS, S3, Lambda, SageMaker, ECS, RDS

⚙️

Kubernetes

EKS, GPU scheduling, Helm, RBAC, CRDs

🏗️

Terraform

Infrastructure as Code for multi-cloud

🐳

Docker

Container optimization & multi-stage builds

DevOps & MLOps

🚀

CI/CD

GitHub Actions, Jenkins, GitLab CI, ArgoCD

📊

Prometheus

AI/ML metrics, GPU monitoring, alerting

📈

Grafana

ML dashboards and observability

🔧

Ansible

Configuration management and automation

Programming & Frameworks

🐍

Python

FastAPI, Flask, Pandas, NumPy, async

💨

Go

High-performance CLI tools & microservices

💻

Bash

Infrastructure automation scripting

Data & Streaming

📡

Kinesis

Real-time data streaming at scale

📨

Kafka

Event-driven architectures

🗄️

PostgreSQL

RDS, Aurora, query optimization

⚡

Redis

Caching and session management

Interested in learning more about my technical expertise?

Professional Experience

A track record of building AI/ML infrastructure, deploying production AI systems, and leading technical teams across healthcare, biotech, and consumer technology companies.

🤖

Founding AI Infrastructure Engineer

Stealth Startup

Healthcare AI

2025 - Present

Remote

Key Achievements

Architected and deployed end-to-end AI/ML infrastructure on AWS supporting real-time LLM inference, RAG pipelines, and production AI workloads with EKS, RDS PostgreSQL, and S3
Designed production-grade Kubernetes (EKS) cluster optimized for AI/ML workloads with GPU-aware scheduling, resource quotas, and horizontal pod autoscaling for inference services
Built scalable Retrieval-Augmented Generation (RAG) pipeline using pgvector on PostgreSQL for vector similarity search with sub-100ms query latency at scale
Engineered Python-based AI microservices integrating LLMs (GPT-4, Claude) using FastAPI with auto-scaling based on latency and throughput metrics
Established MLOps CI/CD pipeline using GitHub Actions with 99.9% deployment success rate, automating Docker builds, security scanning, and canary deployments
Configured observability stack with Prometheus and Grafana monitoring AI/ML metrics including inference latency, token throughput, and GPU utilization

Technologies & Tools

AWS EKSpgvectorFastAPIGPT-4ClaudeRAGPrometheusGrafanaPythonTerraformKubernetesGitHub Actions

Career Timeline

Stealth Startup

Founding AI Infrastructure Engineer

2025 - Present

GRAIL

Senior DevOps Engineer - Infrastructure & ML Platform

2022 - 2024

Invitae

Software Engineer - Cloud Infrastructure

2020 - 2022

Ancestry.com

Senior Software Development Engineer in Test

2018 - 2020

Technical Vision

Proposed reference architectures for production ML/AI infrastructure — demonstrating how I approach platform design across different environments and constraints.

Proposed Architecture

On-Prem Autonomous Robotics MLOps Platform

A proposed production-grade MLOps platform for autonomous robotics workloads on bare-metal GPU clusters. Leverages SchedMD Slinky to unify Kubernetes and Slurm scheduling on a shared GPU pool — eliminating resource silos and enabling seamless orchestration of training, simulation, and inference workloads.

KubernetesSlurmSlinkyMLflowKServeSnowflakeS3AirflowGPU ClustersOn-Prem

Layer 1 — Data Foundation

❄️SnowflakeData warehouse

🪣S3Object storage

🔄AirflowOrchestration

⚡Great ExpectationsData quality

💾Shared StorageNFS / Lustre / Ceph

Layer 2 — Unified Compute (Slinky)

Kubernetes + Slurm (Unified)

Standard K8s Apps

Jupyter Notebooks

KServe Inference

Airflow Workers

K8s Default Scheduler

Slinky Bridge

Training Jobs

Batch Processing

Slurm Scheduler Plugin

Slurm Jobs

MPI Workloads

HPO Sweeps

Simulations

Slurm Queue

🖥️GPU Node(K8s + Slurm hybrid)

Shared GPU Pool — No separate clusters. The same nodes serve both Kubernetes and Slurm workloads, with dynamic resource sharing managed by Slinky.

Layer 3 — ML Lifecycle

MLflowExperiments & registry

🎯KServeModel serving

🚜Edge FleetAutonomous vehicles

30-60-90 Day Onboarding Plan

A structured approach to ramping up as an AI Infrastructure Engineer — from discovery through delivery and strategic roadmap.

Days 1–30

Learn & Assess

Focus on understanding the current infrastructure, ML workflows, team dynamics, and organizational priorities. Identify quick wins while building institutional knowledge.

Infrastructure Discovery

Map existing compute resources (GPU clusters, on-prem servers, cloud accounts)

Audit current CI/CD pipelines, deployment processes, and monitoring stack

Review infrastructure-as-code repos, runbooks, and incident history

Document network topology, storage architecture, and security posture

ML Workflow Analysis

Shadow data scientists and ML engineers through their daily workflows

Identify bottlenecks in model training, experiment tracking, and deployment

Assess data pipeline reliability, latency, and quality validation practices

Understand model serving requirements (latency SLAs, throughput, edge deployment)

Team & Process Alignment

1:1s with every stakeholder — engineering, data science, product, leadership

Understand team pain points, on-call burden, and technical debt priorities

Review existing OKRs and roadmap to align infrastructure priorities

Identify quick wins that build trust and deliver immediate value

Key Deliverables

Infrastructure audit document with current-state architecture diagram

Gap analysis identifying top 5 bottlenecks and risks

Quick-win action plan (2–3 items deliverable within weeks)

Stakeholder alignment summary with prioritized needs

This plan is adaptable based on organizational maturity, team size, and immediate priorities. The core philosophy: listen first, deliver quick wins, then architect the future.

Professional Certifications

Validated expertise across AI/ML, cloud platforms, infrastructure as code, and container orchestration. Click any badge to verify on Credly.

NVIDIA Certified: Generative AI LLMs

GPU infrastructure & LLM deployment

NVIDIA Certified: AI Infrastructure & Operations

AI infrastructure design & management

AWS ML Engineer – Associate

ML model training & deployment

AWS Certified AI Practitioner

AI/ML services on AWS

CKA: Certified Kubernetes Administrator

Cluster management & operations

CKAD: Certified Kubernetes Application Developer

Application development on K8s

HashiCorp Certified: Terraform Associate

Infrastructure as Code

AWS Certified Solutions Architect – Associate

Distributed systems design

AWS Certified Developer – Associate

AWS application development

Get In Touch

Ready to discuss your next cloud project or explore opportunities to work together? I'd love to hear from you.

Let's Connect

Whether you're looking for a cloud engineer to join your team, need consultation on AWS infrastructure, or want to collaborate on AI/ML projects, I'm always open to discussing new opportunities and challenges.

martialbb@gmail.com

linkedin.com/in/martialbb

Response Time

Usually within 24 hours