AI Data Center Operations Contractor

SK hynix America

Contract

On-site

San Jose, California, United States

$110,000 - $160,000 USD yearly

Job Title: AI Data Center Operations Contractor
Office Location: San Jose, CA

Job Type: Contract (6 - 12 months)
Work Model: Onsite

About SK hynix America
At SK hynix America, we're at the forefront of semiconductor innovation, developing advanced memory solutions that power everything from smartphones to data centers. As a global leader in DRAM and NAND flash technologies, we drive the evolution of advancing mobile technology, empowering cloud computing, and pioneering future technologies. Our cutting-edge memory technologies are essential in today's most advanced electronic devices and IT infrastructure, enabling enhanced performance and user experiences across the digital landscape.
We're looking for innovative minds to join our mission of shaping the future of technology. At SK hynix America, you'll be part of a team that's pioneering breakthrough memory solutions while maintaining a strong commitment to sustainability. We're not just adapting to technological change – we're driving it, with significant investments in artificial intelligence, machine learning, and eco-friendly solutions and operational practices. As we continue to expand our market presence and push the boundaries of what's possible in semiconductor technology, we invite you to be part of our journey to creating the next generation of memory solutions that will define the future of computing.

Position Overview:

SK hynix is seeking a seasoned AI Data Center (DC) Operations Contractor to join our DC facilities management team for a 6-12 month engagement. This specialized role demands extensive operational experience managing high-performance computing environments that support AI/ML workloads. The ideal candidate brings knowledge of running AI data centers at scale, with demonstrated expertise in power management, thermal control, preventive maintenance protocols, and incident response procedures specific to GPU-intensive infrastructure.

Responsibilities:

Optimize AI Infrastructure Performance: Apply your operational expertise to enhance the performance, reliability, and efficiency of our AI computing systems. Ensure continuous availability of critical infrastructure while managing the unique demands of high-density AI workloads — including extreme power consumption, elevated thermal output, and advanced cooling requirements.
Monitor & Manage Critical Systems: Lead daily operations by monitoring real-time power consumption across GPU clusters, proactively managing cooling systems to maintain optimal thermal conditions, and coordinating water resource use for cooling infrastructure. Develop and refine AI-specific Standard Operating Procedures (SOPs) to empower teams to respond swiftly and effectively to routine maintenance and emergency situations.
Ensure Regulatory Compliance: Maintain strict adherence to operational permits, environmental regulations, and industry certifications. Conduct regular facility audits, serve as the primary liaison with regulatory inspectors and utility providers, and ensure all operational activities remain fully aligned with permit conditions and certification standards.
Lead Technical Optimization Initiatives: Serve as the technical lead in implementing power and thermal management strategies that balance peak performance with energy efficiency. Optimize cooling distribution to eliminate thermal hotspots and strategically schedule maintenance windows to minimize disruptions to AI training and inference workloads

Continuous Contribution Areas:

Identify efficiency improvements and develop enhanced monitoring capabilities to support AI workload management.
Establish best practices for operational workflows and create comprehensive documentation for critical procedures.
Implement automation where appropriate to reduce manual intervention and improve operational consistency

Qualifications:

Demonstrate concrete operational capabilities through hands-on experience managing AI or high-performance computing data centers.
Minimum 5 years of data center operations experience, including at least 2 years focused specifically on AI/ML infrastructure operations.
Direct experience managing facilities supporting GPU clusters with power densities exceeding 30 kW per rack.
Deep operational knowledge of cooling systems in AI/ML environments — including liquid cooling, chilled water systems, and advanced thermal management technologies.
Proven ability to maintain target operating temperatures in high-density compute environments and troubleshoot cooling system failures under pressure.
Comprehensive understanding of power management — including UPS systems, generator operations, PDU management, and electrical load balancing.
Skilled in reading and interpreting electrical diagrams, coordinating with utility providers during planned/unplanned outages, and implementing emergency power protocols.
Expertise in permit compliance and certification maintenance — including environmental permits, water discharge permits, air quality regulations, and ongoing certifications such as ISO or Uptime Institute operational sustainability standards.

Preferred Qualifications:

Bachelor’s degree or equivalent practical experience
Technical depth paired with operational pragmatism — combining hands-on facility management with strategic thinking.
Experience working in 24/7 operations teams, responding to critical incidents, and maintaining uptime targets exceeding 99.9% in demanding environments.
Ability to balance performance requirements with cost efficiency — with a strong commitment to safety and compliance.

Equal Employment Opportunity:

SKHYA is an Equal Employment Opportunity Employer. We provide equal employment opportunities to all qualified applicants and employees and prohibit discrimination and harassment of any type without regard to race, sex, pregnancy, sexual orientation, religion, age, gender identity, national origin, color, protected veteran or disability status, genetic information or any other status protected under federal, state, or local applicable laws.

Compensation:

Our compensation reflects the cost of labor across several U.S. geographic markets, and we pay differently based on those defined markets. Pay within the provided range varies by work location and may also depend on job-related skills and experience. Your Recruiter can share more about the specific salary range for the job location during the hiring process.

Pay Range

$110,000—$160,000 USD

Apply now

Share this job

AI Data Center Operations Contractor

More jobs

Engineer II - Software

Microchip Technology Inc.

Director Machine Learning Engineering - AI/ML Model Compiler and Applications

Advanced Micro Devices, Inc