About the Role:
We are seeking a Linux Platform Engineer to join our infrastructure team responsible for managing and supporting a fleet of more than 1,000 high-performance Linux workstations used by engineering, machine learning, and research teams.
This is a hands-on technical role focused on maintaining the stability, security, and performance of Linux endpoints. The successful candidate will work closely with the Lead Linux Platform Engineer to support daily platform operations, automate repetitive tasks, improve system reliability, and provide direct support to end users.
This position offers an excellent opportunity for a Linux professional looking to expand their platform engineering expertise while working within a structured environment that emphasizes automation, security, and operational excellence.
Key Responsibilities:
Linux Fleet Operations
- Manage and maintain a large-scale Linux workstation fleet.
- Execute patching, upgrades, and update cycles.
- Monitor and remediate configuration drift.
- Maintain accurate hardware and software inventory records.
- Respond to operational incidents and system issues.
Workstation Provisioning & Deployment
- Provision and deploy Linux workstations using PXE/iPXE-based deployment pipelines.
- Troubleshoot failed installations and deployment issues.
- Prepare systems for end-user handoff and onboarding.
Configuration Management
- Develop, maintain, and enhance configuration management code using Puppet, OpenVox, or equivalent tools.
- Follow established coding standards and review processes.
- Ensure configuration changes are tested, documented, and maintainable.
Hardware & Platform Support
- Troubleshoot workstation hardware and peripheral issues.
- Coordinate vendor support, warranty claims, and RMA processes.
- Perform firmware and BIOS upgrades using approved tooling.
- Validate system functionality following hardware changes.
Security & Compliance
- Maintain endpoint security configurations and compliance baselines.
- Support 802.1X authentication and certificate enrollment processes.
- Ensure audit logging and security controls remain operational.
- Identify and escalate security concerns or anomalies.
User Support & Documentation
- Provide technical support to researchers, engineers, and staff.
- Create and maintain technical documentation, runbooks, and knowledge articles.
- Promote knowledge sharing and operational consistency across the team.
Growth & Development Opportunities
Successful candidates will have opportunities to expand into:
- Advanced 802.1X and Network Access Control (NAC) technologies.
- Linux hardware validation and compatibility testing.
- Provisioning platform design and automation improvements.
- GPU, CUDA, ROCm, and HPC performance optimization.
- Platform architecture and infrastructure engineering initiatives.
Required Qualifications:
- 2–4 years of experience in Linux Administration, Infrastructure Operations, Systems Engineering, or a related technical role.
- Strong hands-on experience with Linux operating systems.
Technical Skills:
- Proficiency with:
- Linux command line
- Systemd
- Package management
- Basic networking concepts
- Storage administration
- Experience scripting in Bash and/or Python.
- Familiarity with Git and version control workflows.
- Exposure to configuration management platforms such as:
- Puppet
- Ansible
- Salt
- Chef
Soft Skills:
- Strong analytical and troubleshooting abilities.
- Excellent written and verbal communication skills.
- Ability to work effectively with both technical and non-technical stakeholders.
- Detail-oriented, organized, and dependable.
Preferred Qualifications:
- Production experience with Puppet or OpenVox.
- Experience with:
- PXE/iPXE provisioning
- DHCP and TFTP
- Kickstart, Preseed, or Autoinstall deployment methods
- Knowledge of Ubuntu and Rocky Linux administration.
- Familiarity with:
- 802.1X
- RADIUS
- Network Access Control solutions
- Experience supporting hardware platforms from Dell, Lenovo, HP, Supermicro, or similar vendors.
- Exposure to GPU environments including NVIDIA CUDA or AMD ROCm.
- Knowledge of:
- HashiCorp Vault
- SELinux/AppArmor
- CIS Benchmarks
- auditd
- Experience with observability and monitoring tools:
- Prometheus
- Grafana
- Loki
- Elastic Stack (ELK)
- Experience with inventory and asset management platforms such as NetBox.
- Previous experience supporting research, academic, or scientific computing environments.