SRE (Site Reliability Engineering) job opportunity at Jobgether.



Date2026-04-29 bot
Jobgether SRE (Site Reliability Engineering)
Experience: General
Pattern: Full-time
apply Apply Now
Salary:
Status:

Job

Copy Link Report
degreeHigh School (S.S.C.E)
loacation Brazil, Brazil
loacation Brazil....Brazil
Auto GPT Summarize Enabled

<p> </p><p data-start="0" data-end="145"><strong>This position is posted by Jobgether on behalf of a partner company. We are currently looking for a SRE (Site Reliability Engineering) in Brazil.</strong></p> <p data-start="147" data-end="1097">This role sits within a high-impact MLOps environment, focused on ensuring the reliability, scalability, and performance of infrastructure that supports machine learning models and production data pipelines. You will be part of a collaborative and engineering-driven team, working on modern cloud-native systems in a fast-paced and continuously evolving context. The position involves direct contribution to the stability of critical platforms running on AWS and Kubernetes, with strong emphasis on automation, observability, and operational excellence.<br data-start="700" data-end="703">You will work closely with data, development, and infrastructure teams to improve system resilience and delivery efficiency.<br data-start="827" data-end="830">The environment promotes continuous learning, ownership, and proactive problem-solving in complex distributed systems.<br data-start="948" data-end="951">This is an opportunity to have a direct impact on large-scale production systems while growing your expertise in SRE, DevOps, and MLOps practices.</p> <p></p>\n<p></p><p><br></p><b>Accountabilities:</b><div> <ul data-start="1123" data-end="2102"> <li data-section-id="1otmqa" data-start="1123" data-end="1253">Implement and maintain infrastructure as code using Terraform, following established engineering standards and best practices.</li> <li data-section-id="1fhqjuq" data-start="1254" data-end="1374">Operate and support Kubernetes clusters using Helm and GitOps methodologies to ensure reliable application delivery.</li> <li data-section-id="1xz8l1y" data-start="1375" data-end="1495">Manage day-to-day operations of AWS environments, contributing to platform availability, scalability, and stability.</li> <li data-section-id="ihpyc2" data-start="1496" data-end="1647">Assist in diagnosing and troubleshooting cloud networking issues (VPC, security groups, DNS, load balancers), escalating complex cases when needed.</li> <li data-section-id="1tbg75t" data-start="1648" data-end="1752">Maintain and optimize CI/CD pipelines using GitLab in collaboration with development and data teams.</li> <li data-section-id="180rypc" data-start="1753" data-end="1884">Monitor systems using observability tools such as Prometheus, Grafana, and Datadog, supporting incident detection and response.</li> <li data-section-id="fotuuv" data-start="1885" data-end="2015">Participate in incident management and post-mortem processes, contributing to root cause analysis and preventive improvements.</li> <li data-section-id="19rkz6z" data-start="2016" data-end="2102">Support FinOps initiatives by identifying opportunities for cloud cost optimization.</li> </ul> <p data-start="2104" data-end="2123"><strong data-start="2104" data-end="2121">Requirements:</strong></p> <ul data-start="2124" data-end="3041"> <li data-section-id="1e0p8se" data-start="2124" data-end="2196">Solid hands-on experience with Terraform for infrastructure as code.</li> <li data-section-id="1e61yi" data-start="2197" data-end="2257">Strong knowledge of AWS cloud services and architecture.</li> <li data-section-id="13tfk4t" data-start="2258" data-end="2339">Intermediate experience with Kubernetes, including Helm and GitOps workflows.</li> <li data-section-id="1t6tfv" data-start="2340" data-end="2421">Experience working with GitLab CI/CD pipelines and version control workflows.</li> <li data-section-id="c6vyxv" data-start="2422" data-end="2527">Ability to troubleshoot networking in cloud environments (VPC, DNS, security groups, load balancers).</li> <li data-section-id="1kzes52" data-start="2528" data-end="2583">Good understanding of Linux systems administration.</li> <li data-section-id="1jijsdp" data-start="2584" data-end="2666">Familiarity with observability tools such as Prometheus, Grafana, and Datadog.</li> <li data-section-id="1eat3eo" data-start="2667" data-end="2761">Strong analytical thinking and problem-solving skills in distributed systems environments.</li> <li data-section-id="x6jntk" data-start="2762" data-end="2846">Clear communication skills and ability to collaborate in cross-functional teams.</li> <li data-section-id="1d3pw40" data-start="2847" data-end="2940">Proactive mindset with ownership and willingness to learn and grow in SRE/MLOps contexts.</li> <li data-section-id="xe99c4" data-start="2941" data-end="3041">Nice to have: exposure to FinOps practices and interest in MLOps or Data Engineering environments.</li> </ul> <p data-start="3043" data-end="3058"><strong data-start="3043" data-end="3056">Benefits:</strong></p> <ul data-start="3059" data-end="3548"> <li data-section-id="l9nsmh" data-start="3059" data-end="3094">Remote work model within Brazil</li> <li data-section-id="gr3wy9" data-start="3095" data-end="3128">Flexible working arrangements</li> <li data-section-id="1usz9ri" data-start="3129" data-end="3187">Competitive compensation package (based on experience)</li> <li data-section-id="1nfbyur" data-start="3188" data-end="3219">Health and dental insurance</li> <li data-section-id="1pwamk6" data-start="3220" data-end="3273">Continuous learning and development opportunities</li> <li data-section-id="1i4xwn8" data-start="3274" data-end="3343">Exposure to large-scale cloud and machine learning infrastructure</li> <li data-section-id="af2ddc" data-start="3344" data-end="3425">Collaborative engineering culture focused on innovation and knowledge sharing</li> <li data-section-id="1szhmx5" data-start="3426" data-end="3491">Career growth opportunities in SRE, DevOps, and MLOps domains</li> <li data-section-id="as3z0l" data-start="3492" data-end="3548">Inclusion in a diverse and supportive tech community</li> </ul> </div><p><br></p><p></p>\n<p><strong>How Jobgether works:</strong></p> <p>We use an <strong>AI-powered matching process</strong> to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.</p> <p>We appreciate your interest and wish you the best!</p> <p><a href="https://jobgether.com/how-jobgether-works">&nbsp;Why Apply Through Jobgether?</a>&nbsp;</p> <p>&nbsp;</p> <p><strong>Data Privacy Notice:</strong> By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.</p> <p>&nbsp;</p> <p>&nbsp;</p> <p>#LI-CL1</p>

Other Ai Matches

Workers&#39; Compensation Claims Representative Applicants are expected to have a solid experience in handling Job related tasks
Customer Success Specialist (Morning Schedule) Applicants are expected to have a solid experience in handling Job related tasks
Sr Technical Program Manager - REMOTE Applicants are expected to have a solid experience in handling Job related tasks
Senior Unity Developer (Poker) Applicants are expected to have a solid experience in handling Job related tasks