Teradata Jobs

Job Information

Teradata Sr. Staff DevOps Engineer - Site Reliablity Engineering in Bangalore - Virtual, India

Our Company

Teradata is the connected multi-cloud data platform for enterprise analytics company. Our enterprise analytics solve business challenges from start to scale. Only Teradata gives you the flexibility to handle the massive and mixed data workloads of the future, today.

The Teradata Vantage architecture is cloud native, delivered as-a-service, and built on an open ecosystem. These design features make Vantage the ideal platform to optimize price performance in a multi-cloud environment.

What You'll Do

Teradata is growing our Cloud Operations team and we’re looking for individuals that exemplify our principle of Customer Obsession through operational excellence, leadership, and a passion to continually be the voice of the customer. This is a unique opportunity to join our team in a period of fast growth and expansion. If you are interested in working in a dynamic and fast paced environment where you can directly influence the future of cloud-based analytics solutions and services, then this is the place for you. You will actively develop and implement state of the art technical solutions, including capabilities to support elastic scalability, on-demand self-service, disaster recovery, and usage-based consumption, to enable customers to solve their most complex data analytics challenges.

Teradata Cloud seeks a Sr Staff DevOps Engineer to lead in building and operating highly scalable, fault tolerant, and secure systems in a distributed system highly distributed and dynamic Hybrid Cloud environment.

What Makes You a Qualified Candidate

  • Collaborate with cross-functional teams to drive SRE, Observability, and DevSecOps initiatives.

  • Apply SRE principles to drive reliability engineering initiatives, including defining and monitoring service level objectives (SLOs) and error budgets.

  • Understanding & implementation knowledge of Chaos Engineering

  • Keep the lights on, Meet the Business requirement by enabling:

  • Oncall improvement & taking oncalls.

  • Automate Release management

  • Improvement in the Meantime to Resolve

  • Automated or AI enabled incidents/issues.

  • DevOps Pipeline Automation

  • Improve automation so that issues can be solved at L1 level & L2 level.

  • Develop and maintain documentation, including policies, standards, and procedures, run book maintenance & update.

  • Implement robust observability practices by designing and maintaining monitoring, logging, and tracing solutions using tools like Datadog or similar technologies.

  • Drive the improvement of proactive alerting using modern monitoring tools such as Datadog.

  • Deduplication of alert & Reduction of false positive alerts

  • Improve system monitoring and observability through log analysis, dashboard creation, and automated alerts based on established service level objectives (SLO) and service level agreements (SLA)

  • Collaborate with monitoring and operations teams to ensure the availability, performance, and scalability of the infrastructure and applications.

  • Continuously identify opportunities to improve system performance, reliability, and security through automation, optimization, and architectural enhancements.

  • Mentor and provide guidance to junior team members, promoting a culture of observability, security, and SRE principles.

  • Provide architectural leadership for developing and building highly available systems and software in large distributed and Hybrid Cloud environments

  • Promote a culture of continuous improvement for technology, and processes

  • Lead in-depth analysis for improving the deployment of cloud-native applications, monitoring, securing, and supporting a large-scale public cloud environment

  • Analyse and improve existing provisioning processes for automation opportunities and improvements

  • Participate in on-call for escalated support of production customer and systems

  • Perform and improve SRE / operational functions, such as monitoring and maintenance of productions systems

  • Good to Have:

  • Perform security assessments, vulnerability scans, and observability analysis, and implement remediation actions to address findings.

  • Develop and maintain security-related documentation, including policies, standards, and procedures.

What You'll Bring

  • 12+ years of relevant job experience

  • Bachelor’s Degree in computer science or related field preferred

  • Proficiency in scripting languages like Python, Ruby, or Bash.

  • Experience working in hybrid environment preferred

  • Expert level hands-on system administrator experience on public cloud platforms with at least one of the big three Google Cloud, Azure, and AWS. (Google Cloud and Azure highly preferred)

  • Proven experience with Configuration Management tools such as Ansible or equivalent technologies.

  • Strong experience with Test and build systems such as Jenkins, GitLab, GitHub.

  • Experience with Monitoring and reporting tools such as DataDog, New Relic, Nagios, and Graphite

  • Strong experience with Linux operating systems

  • Experience working with database systems, network topologies, and hardware

  • Good to Have:

  • Familiarity with compliance frameworks such as GDPR, HIPAA, or PCI-DSS.

  • Understanding of security principles, secure architecture design, and common security frameworks (e.g., OWASP, NIST, CIS).

  • Experience with security tools and technologies, such as vulnerability scanners, intrusion detection systems (IDS), and SIEM solutions.