工作內容
- Build and manage a virtual AI/HPC cluster for development and testing.
- Configure and optimize networking between virtual nodes to emulate real-world HPC environments.
- Collaborate with teams to ensure the virtual cluster meets testing and performance requirements.
- Troubleshoot and resolve networking and system-level issues within the virtual environment.
- Document system configurations, workflows, and best practices for maintaining the virtual cluster.