This post is over 30 days old. The position may no longer be available
Tracxn - Technology - Site Reliability Engineer (SRE) (0-3 Years)
Posted by Sachin Kamkar (@sachinkamkar)
Tracxn is looking for experienced and motivated professionals to play a vital role in developing, scaling, and automating the IT infrastructure. As an SRE, you will get hands-on experience in the latest technologies and skills like Ansible, AWS, Docker, Shell Script, Python, NodeJS, Kafka, Zookeeper, Mongo, MySql, Elastic, Redis, Spring, ELK Stack etc.
The incumbent in this role would demonstrate a strong focus on tactical operations, as well as large-scale production engineering and orchestration.
What we are looking for:
- Knowledge in IaC tools (Puppet, Ansible, Chef, etc)
- Experience in configuring and managing enterprise monitoring and resource tracking systems
- Ability to automate operations
- Expertise in at least one of the scripting languages
- Experience in versioning tools like Git
- Ability to use a wide variety of open source technologies and cloud services (AWS, Azure, GCP)
- Knowledge of System, Network and Application security principles and practices.
- Experience with containers and orchestration (Docker, Kubernetes)
- Experience in Infrastructure and configuration automation (Terraform, SaltStack)
- Understanding of protocols/technologies like HTTP, SSL, LDAP, SSH, SAML, etc.
- Systems fluency (Linux, storage, networking)
- Experience with modern software components (Mongo, Redis, ElasticSearch, Kafka)
- In-depth knowledge of operating systems (processes, threads, concurrency issues, locks, mutexes, semaphores, monitors and how they work)
- Experience in software-automation production systems (like Jenkins)
- Expertise in software development methodologies
- Designing and developing our AWS Infrastructure
- Developing & managing the infrastructure as code using Ansible
- Implement automation tools and frameworks (CI/CD pipelines)
- Optimize Tracxn’s computing architecture
- Conduct systems tests for security, performance, and availability; monitor unit performance
- Keep the customer-facing services available at top performance by using proactive monitoring tools and maintaining the constant health of the supporting systems.
- Develop and maintain design and troubleshooting documentation
- Drive RCA (Root Cause Analysis) for high priority incidents and work with respective development teams on preventive measures.
- Automate detection and resolution of recurring issues in the production environment
- Provide operational management information by collecting, analyzing, and summarizing operating and engineering data and trends.
What we have to offer
- Work with a performance-oriented team driven by ownership and open to experiments.
- Learn to design a system for high accuracy, efficiency, and scalability.
- No strict deadlines; focus on delivering quality work.
- Meritocracy driven, candid culture. No politics.
- Very high visibility regarding which startups and markets are exciting globally
About Tech Team
Tracxn's Technology team is 40+ members strong and growing. The technology team is subdivided into multiple smaller teams, each of which has ownership of one or more services/components of the technology platform. Ours is a young team of motivated engineers with minimal management structure where almost everyone is actively involved in technical development and design activities. We have a team-centric culture where the ownership and responsibility of a feature or module lies with a team as compared to an individual.
We work on an array of technologies, including but not limited to Spring, Node, Elastic Stack, MySQL, Mongo, ReactJS, Webpack, Kafka, Redis, AWS Lambda, Ansible, etc. As a team, we value ownership, continuous learning, consistency, and discipline.