about
Hoang-Long Nguyen, Site Reliability Engineer.
SRE focused on Kubernetes, GCP, observability, incident response, and the platform work that keeps production systems boring.
Summary
SRE with a background in both backend and frontend development. Currently keeping things running at One Mount. Comfortable across Kubernetes clusters, GCP infrastructure, and the observability stacks that tell you when things break before customers do.
Experience
Site Reliability Engineer
- Operate Kubernetes platforms on GCP for high-volume fintech workloads, including dedicated PCI-scoped GKE clusters and multi-zone (3-zone HA) EMQX clusters powering IoT/MQTT services for payment-acceptance devices in the field. Handle the GKE lifecycle end-to-end: cluster provisioning, node-pool upgrades (1.30 → 1.32), CA and credential rotation, namespace separation, ingress configuration (Kong, Nginx, PoC on Envoy), and resource right-sizing through HPA tuning and standardized request/limit templates.
- Operate the database tier — Cloud SQL (MySQL 5.7 → 8.0 upgrade testing with replica clones and restore drills), TiDB clusters with full TiUp/TiPD/TiKV/TiFlash management, PostgreSQL 17 clusters, Memorystore/Redis, and Hazelcast — including disaster-recovery rehearsals to validate RPO/RTO. Contribute to platform-wide migrations: GCR → Artifact Registry across 20+ projects, ArgoCD GitOps rollout from non-prod to production, and unified CI templates across many service repos.
- Build internal platform tooling — a GitOps file-snippet generator, a GKE node scheduler for cost-optimized off-hours scaling, GCP billing anomaly alerts and forecast reports, IAM/service-account credential expiry notifications, and BigQuery-to-Slack notifiers. Manage Kong API Gateway mappings and plugins (verify-token, CORS), Terraform/Atlantis pipelines for New Relic configuration, and Cloud KMS for service-level encryption.
- Maintain observability across OpenTelemetry (Operator-based collection on dedicated node pools), New Relic (agent upgrades, dashboards, alerting), and Splunk (PCI log-index separation, retention). Lead production go-live support, incident response, vulnerability remediation, EDR/security-agent rollouts, and PCI DSS audit evidence workflows.
Back End Developer
- Built backend services in Go for a distributed multi-chain crypto wallet — implemented REST and GraphQL APIs, built out features across microservices, and tuned hot paths against MySQL/PostgreSQL and MongoDB stores. Developed the internal ledger layer covering balances, double-entry bookkeeping, and reconciliation, and contributed to multi-chain support across the wallet's supported networks.
- Built the CMS powering the wallet's content and merchandising surfaces — admin tooling, configuration, and editorial flows behind the user-facing product.
Front End Developer
- Developed Adobe Magento e-commerce storefronts on the PWA Studio toolkit (ReactJS + GraphQL) across multiple merchant projects.
- Built storefront UI components, customized PWA Studio modules through targets, interceptors, and
upward.ymlconfiguration, and tuned Lighthouse/runtime performance. - Contributed to reusable extensions and themes shipped to merchants, collaborating with the backend Magento team on schema and integration boundaries.
Education
Bachelor's degree, Information Technology
JavaScript Full-stack Web Application Developer
Certifications
AWS Certified DevOps Engineer — Professional
Go Programming Bootcamp
Skills
Kubernetes
Google Cloud Platform
Amazon Web Services
Terraform
GitLab
ArgoCD
Kong Gateway
OpenTelemetry
Splunk
New Relic
Cloudflare
EMQX
Languages
English
French
Vietnamese
Contact