Distributed Systems / GPU Infrastructure Engineer
Posted by Capa Cloud
CapaCloud
https://hasjob.co/capa.cloud/ma5qk
,
Anywhere
· capa.cloud
· Full-time employmentFull-time employment
· ProgrammingProgramming
We are looking for a Distributed Systems / GPU Infrastructure Engineer to help architect and scale the core infrastructure behind the CapaCloud decentralized GPU network.
You will work on GPU orchestration, node infrastructure, distributed computing systems, workload scheduling, performance optimization, and platform reliability.
This is a high-impact engineering role for someone passionate about building the next generation of decentralized AI infrastructure.
Key Responsibilities
- Design and build scalable distributed GPU infrastructure
- Develop systems for node orchestration and workload scheduling
- Optimize GPU utilization and compute performance
- Build fault-tolerant infrastructure for decentralized environments
- Improve network reliability, scalability, and uptime
- Develop deployment automation and infrastructure tooling
- Work with AI and blockchain teams to integrate compute systems
- Monitor infrastructure performance and troubleshoot bottlenecks
- Contribute to backend architecture and cloud-native systems
- Implement secure infrastructure best practices
Required Skills & Experience
- Strong experience with distributed systems and backend infrastructure
- Experience with Kubernetes, Docker, and container orchestration
- Strong Linux systems administration knowledge
- Experience with GPU infrastructure and CUDA environments
- Proficiency in Go, Rust, Python, or similar backend languages
- Experience with cloud infrastructure platforms
- Understanding of networking, virtualization, and load balancing
- Experience building scalable APIs and infrastructure services
- Familiarity with monitoring tools and observability stacks
- Strong debugging and performance optimization skills
Nice To Have
- Experience in decentralized infrastructure or Web3
- Experience with AI/ML infrastructure
- Bare-metal infrastructure experience
- Experience with distributed storage systems
- Knowledge of peer-to-peer networking systems
- Open-source contributions
What Success Looks Like
- Reliable decentralized GPU orchestration system
- High-performance compute scheduling infrastructure
- Reduced latency and improved GPU efficiency
- Stable infrastructure scaling across multiple regions
- Strong uptime and system reliability metrics
Employment Type
- Full-time
- Remote
Apply for this position
Login with Google or GitHub to see instructions on how to apply. Your identity will not be revealed to the employer.
It is OK for recruiters, HR consultants, and other intermediaries to contact this employer