Tracxn - Technology - Site Reliability Engineer (SRE) (0-3 Years)

Posted 15 September 2021 by tamal.chakraborty (@tamalchakraborty)

Tracxn Technologies Pvt Ltd. https://hasjob.co/tracxn.com/5cuay , Bangalore · tracxn.com · Full-time employmentFull-time employment · ProgrammingProgramming

Job description

Tracxn is looking for experienced and motivated professionals to play a vital role in developing, scaling, and automating the IT infrastructure. As an SRE, you will get hands-on experience in the latest technologies and skills like Ansible, AWS, Docker, Shell Script, Python, NodeJS, Kafka, Zookeeper, Mongo, MySql, Elastic, Redis, Spring, ELK Stack etc.

The incumbent in this role would demonstrate a strong focus on tactical operations, as well as large-scale production engineering and orchestration.

What we are looking for:

Knowledge in IaC tools (Puppet, Ansible, Chef, etc)
Experience in configuring and managing enterprise monitoring and resource tracking systems
Ability to automate operations
Expertise in at least one of the scripting languages
Experience in versioning tools like Git
Ability to use a wide variety of open source technologies and cloud services (AWS, Azure, GCP)
Knowledge of System, Network and Application security principles and practices.

Bonus

Experience with containers and orchestration (Docker, Kubernetes)
Experience in Infrastructure and configuration automation (Terraform, SaltStack)
Understanding of protocols/technologies like HTTP, SSL, LDAP, SSH, SAML, etc.
Systems fluency (Linux, storage, networking)
Experience with modern software components (Mongo, Redis, ElasticSearch, Kafka)
In-depth knowledge of operating systems (processes, threads, concurrency issues, locks, mutexes, semaphores, monitors and how they work)
Experience in software-automation production systems (like Jenkins)
Expertise in software development methodologies

Key Deliverables

Designing and developing our AWS Infrastructure
Developing & managing the infrastructure as code using Ansible
Implement automation tools and frameworks (CI/CD pipelines)
Optimize Tracxns computing architecture
Conduct systems tests for security, performance, and availability; monitor unit performance
Keep the customer-facing services available at top performance by using proactive monitoring tools and maintaining the constant health of the supporting systems.
Develop and maintain design and troubleshooting documentation
Drive RCA (Root Cause Analysis) for high priority incidents and work with respective development teams on preventive measures.
Automate detection and resolution of recurring issues in the production environment
Provide operational management information by collecting, analyzing, and summarizing operating and engineering data and trends.

Please note - Should be remote working ready till pandemic subsides

Email this Share on WhatsApp

Tracxn - Technology - Site Reliability Engineer (SRE) (0-3 Years)

No longer accepting applications