Senior HPC Network Engineer (Colombia) (Medellín)
Senior HPC Network Engineer (Colombia) (Medellín)
-
Medellín, Colombia
-
Publicado: hace menos de una semana
-
Guardar
Descripción
We are seeking a Senior HPC Network Engineer to support advanced AI, research, and Kubernetes‑based GPU infrastructure for a major general technology client. The role focuses on architecting, operating, and optimizing high‑performance network fabrics for large‑scale LLM and distributed AI workloads, including InfiniBand/RDMA, high‑speed Ethernet, Kubernetes networking, host‑side GPU networking, SmartNIC/DPU technologies, and deep network observability. Responsibilities
- Architect, operate, and troubleshoot high‑performance InfiniBand/RDMA and Ethernet fabrics for large‑scale GPU clusters and distributed AI/LLM workloads.
- Design and evaluate cluster network topologies, including Fat‑tree, Clos, Rail‑optimized, and Dragonfly, based on workload scale and performance needs.
- Optimize host‑side networking, including NIC configuration, drivers, firmware, IRQ affinity, NUMA placement, PCIe topology, and GPU‑to‑NIC communication paths.
- Tune and troubleshoot RDMA/RoCE, NCCL/MSCCL, and collective communication performance for multi‑node GPU training workloads.
- Design and maintain Kubernetes networking for GPU clusters, including CNI plugin integration. Qualifications
- Strong hands‑on experience with InfiniBand NDR/HDR and next‑generation fabrics, RDMA/RoCE, and NVIDIA/Mellanox networking.
- Proficiency with NCCL/MSCCL communication patterns, Linux host networking, PCIe/GPU/NIC topology, and Kubernetes networking for GPU clusters.
- Experience designing and evaluating large‑scale cluster topologies and optimizing network performance for AI workloads. Postúlate en Kit Empleo: kitempleo.com.co/empleo/1b3uxb
- Architect, operate, and troubleshoot high‑performance InfiniBand/RDMA and Ethernet fabrics for large‑scale GPU clusters and distributed AI/LLM workloads.
- Design and evaluate cluster network topologies, including Fat‑tree, Clos, Rail‑optimized, and Dragonfly, based on workload scale and performance needs.
- Optimize host‑side networking, including NIC configuration, drivers, firmware, IRQ affinity, NUMA placement, PCIe topology, and GPU‑to‑NIC communication paths.
- Tune and troubleshoot RDMA/RoCE, NCCL/MSCCL, and collective communication performance for multi‑node GPU training workloads.
- Design and maintain Kubernetes networking for GPU clusters, including CNI plugin integration. Qualifications
- Strong hands‑on experience with InfiniBand NDR/HDR and next‑generation fabrics, RDMA/RoCE, and NVIDIA/Mellanox networking.
- Proficiency with NCCL/MSCCL communication patterns, Linux host networking, PCIe/GPU/NIC topology, and Kubernetes networking for GPU clusters.
- Experience designing and evaluating large‑scale cluster topologies and optimizing network performance for AI workloads. Postúlate en Kit Empleo: kitempleo.com.co/empleo/1b3uxb
Información clave
-
Nombre de la empresaImportante empresa
-
Nombre de la vacanteSenior HPC Network Engineer (Colombia) (Medellín)
Consejos de seguridad
Ten cuidado con trabajos prometedores que no exigen demasiado.
Más info sobre el anuncio
El anuncio Senior HPC Network Engineer (Colombia) (Medellín) fue publicado en la categoría Medellín Informática de Locanto.
Ahora mismo, no tenemos más anuncios en esta categoría en Medellín.
¿Buscas algo más? Puedes aumentar tu radio de búsqueda y mirar los resultados en otras ubicaciones cerca de ti, como Informática en Envigado, Bello o Copacabana. Además, en esta sección, disponemos de más anuncios clasificados en un radio de 15 km. Haz clic aquí para verlos.