This post is over 30 days old. The position may no longer be available
Tracxn - Tech - Lead Site Reliability Engineer/Lead SRE (3-10 yrs)
Posted by tamal.chakraborty (@tamalchakraborty)
Tracxn is a Bangalore based product company providing research and deal sourcing platform for Venture Capital, Private Equity, CorpDevs & professionals working around the startup ecosystem. We are a team of 750+ working professionals serving customers across the globe. Our clients include Funds like Andreessen Horowitz, Sequoia Capital, Accel Partners, NEA; and Large Corporates such as ING, Societe Generale, LG and Royal Bank of Canada. We are backed by prominent investors like Ratan Tata, Nandan Nilekani, and SAIF Partners
What we are looking for:
Experience in IaC tools (Puppet, Ansible, Chef, etc )
Ability to automate operations
Expertise in at least one of the scripting languages
Experience in versioning tools like Git
Ability to use a wide variety of open source technologies and cloud services (AWS, Azure, GCP)
In-depth knowledge of System, Network and Application security principles and practices.
Experience in configuring and managing enterprise monitoring and resource tracking systems
Experience with containers and orchestration (Docker, Kubernetes)
Experience in Infrastructure and configuration automation (Terraform, SaltStack)
Understanding of protocols/technologies like HTTP, SSL, LDAP, SSH, SAML, etc.
Systems fluency (Linux, storage, networking)
Experience with modern software components (Mongo, Redis, ElasticSearch, Kafka)
In-depth knowledge of operating systems (processes, threads, concurrency issues, locks, mutexes, semaphores, monitors and how they work)
Experience in software-automation production systems (like Jenkins)
Expertise in software development methodologies
Build and lead a great team by example
Learn and develop leadership skills
Designing and developing our AWS Infrastructure
Developing & managing the infrastructure as code using Ansible
Implement automation tools and frameworks (CI/CD pipelines)
Optimize Tracxn’s computing architecture
Conduct systems tests for security, performance, and availability; monitor unit performance
Keep the customer-facing services available at top performance by using proactive monitoring tools and maintaining the constant health of the supporting systems.
Develop and maintain design and troubleshooting documentation
Automate detection and resolution of recurring issues in the production environment
Provide operational management information by collecting, analyzing, and summarizing operating and engineering data and trends.