Infrastructure Systems Engineer
At Disney, we‘re storytellers. We make the impossible, possible. We do this through utilizing and developing cutting-edge technology and pushing the envelope to bring stories to life through our movies, products, interactive games, parks and resorts, and media networks. Now is your chance to join our talented team that delivers unparalleled creative content to audiences around the world.
The Infrastructure Systems Engineer is responsible for leading teams of 24x7x365 technical analysts and influencing peers and managed service partners who deliver strong event and incident response, proactive monitoring, effective change management execution, and overall technical excellence of operational activities. The ongoing monitoring, support and maintenance of infrastructure and applications which are foundational to the breadth of the enterprise technology service portfolio include the following technologies: cloud providers, network and telecom; naming, identity and directory services; server hosting and storage; collaboration and communication; and client services. Awareness of how these technologies support the applications, and the applications role in enabling business capabilities is critically important for success. The Infrastructure Systems Engineer will succeed by influencing and leading the technical investigation of outages, and assuring operational responsibilities are completed.
This enterprise-scale infrastructure and application environment includes an array of technologies, and the Infrastructure Systems Engineer should have in depth knowledge in at least one of the following and some familiarity of concepts for most: virtualization, Unix/Wintel hosting, Storage, complex LAN switching, load balancing and acceleration, and network firewalls; all of these will interact with complex proprietary and commercial software, databases and applications.
A successful candidate will have prior experience leading technical diagnosis of major incidents in a high availability environment where services are broadly consumed.
This position requires onsite holiday, weekend, and shift work including overnight shifts; subject to shift changes, rotations, or occasional additional time based on coverage and business needs. Additionally, may be required to be available for after hours work during the assigned 24x7 work week. For regular shift changes, notice will be provided no later than 1 week prior to the shift change. For any unanticipated coverage needs, notice will be provided 24 hours in advance.
- Leads technical investigation of service impacting outages for critical applications, diagnosis and debugging the source of failures
- Performs effective analysis in outage situations and is effective requesting further assistance when needed
- Ensures incident, change and request handling meet SLA’s and provides assistance to peer groups to maintain team performance
- Demonstrates ability to diagnose complex problems through process of deduction, Kepner Tregoe analysis or similar.
- Provide accurate status updates appropriate for the audience (e.g. executives, end users, other technical teams)
- Willingness to work weekends, after hours or on-call
- Regularly collaborates with peer groups to maintain current department best practices and knowledge of products and service models.
- Improves and maintains documentation for the triage and technical response and handling of major incidents. Facilitates review by with peers before publishing. Demonstrates patience and ability to teach less technical colleagues.
- Remains engaged with leaders of operational areas to ensure the department’s processes are up-to-date and relevant in support of the matrixed service delivery model.
- Maintains knowledge of industry best practices for operations management.
- Works effectively with suppliers to resolve issues, grow knowledge and drive improvements (e.g. Microsoft, Dell EMC, VMWare, Citrix, Cisco, etc.)
- Exudes a customer-centric attitude and is an advocate for quality service delivery.
- Builds, establishes, and sustains relationships with TWDC segment executives, service owners, and stakeholders of supported services.
- Can decipher meaning of various monitoring tools for effective event correlation and management, influences improvements to system monitors used by the team for determining health of services.
- Influences and may lead opportunities to automate repetitive operational tasks.
- Demonstrates capability of independent, situational assessments and is able to make informed decisions based on available data. Influences others in ability to assess situations.
- 2.5+ years’ experience supporting converged infrastructure stacks, including: application, hypervisor, compute, storage and networking
- 2+ years leading incident recovery with multi-disciplined geographically dispersed teams in a Fortune 500
- 2+ years of experience in either a large IT shared services organization or outsourced environment
- Experience leading technical recovery of major incidents for Fortune 500
- Experience with hands-on support of cloud operations with one or more: AWS, Google Cloud or Azure
- Experience supporting diverse portfolios, multiple business applications and IT services
- Knowledge of major file and block protocols (NFS, iSCSI, CIFS, etc.)
- Experience working in a 24x7 IT operations environment.
- Ability to work calmly and influence others under pressure in a high availability environment with critical 24x7x365 services
- Skilled in translating technical details in a way various audiences will understand (executives, architecture teams, end users)
- Strong ability to establish, build and cultivate relationships
- Strong analytical skills with the ability to synthesize the ecosystem and relationships of various IT technologies for the delivery of business services
- Ability to adapt to changing environments with ease and resiliency
- Demonstrated ability to communicate effectively at a variety of levels and to communicate intricate technical and procedural matters to both technical and non-technical personnel. Ability to communicate, direct teams, and multi-task under pressure.
- Ability to synthesize and quickly learn knowledge of Disney products, services and brands
- Experience with one or more: Splunk, DataDog, New Relic, AppDynamics, SevOne
- Familiar with containerization and virtualization: AWS, VMWare, Docker, Google App Engine, etc.
- Experience with Oracle, SQL Server, Mongo or similar databases
- ITIL, Agile, DevOps, Kepner Tregoe, or Six Sigma
- Certifications: Any of the following certifications is a plus: AWS, Cisco, VMWare, etc.
- Bachelor’s Degree in Computer Science, Engineering or related field, or commensurate industry work experience and certifications.
Company Overview At Corporate, you’ll team with the best in the business to build one of the most innovative global businesses in any industry. Uniquely positioned at the center of an exciting, multi-faceted Company, the forward-thinkers at Disney Corporate constantly pursue new ideas and technologies to help the Company’s many businesses drive value, all the while gaining something valuable from the experience themselves. Come see the most interesting Company from the most interesting point of view. Additional Information
- This position is a legal entity of The Walt Disney Company, an equal opportunity employer.