This post is over 30 days old. The position may no longer be available

NLP Researcher for Indic Languages

CivicDataLab , Bangalore, Anywhere · · Full-time employment · Programming

About Us

We at CivicDataLab work with the goal to use data, tech, design and social science to strengthen the course of civic engagements in India. We work to harness the potential of open-source movement to enable citizens to engage better with public reforms. We work to build data strategy, data platforms and data science applications to push driven decision making at scale. Moreover, We work closely with governments, nonprofits, think-tanks, media houses, universities etc. to grow data and tech capacity.

We are looking for a Natural Language Processing Researcher to work on Indic Languages, build open language corpora and research on open-source translation tools & techniques. These efforts will help our partner organizations to increase discoverability and searchability of open-content and open-data in key social sectors like education, government finances, judiciary, etc.

Key responsibilities

  • Lead background research on Indic languages like Hindi, Marathi, Kannada, Telugu, Tamil, Malayalam, etc.
  • Develop corpora and dictionaries using open content available for select Indic Languages.
  • Research on existing open-source transliteration and translation techniques.
  • Explore language specific nuances like context based translations, literacy level based translations, emotions and tonality, etc.  
  • Build APIs for easy consumption of language corpora, dictionaries and translations.
  • Create feedback loops to optimize efficacy of the algorithms based on manual inputs.
  • Engage with open data community to co-create language tools and corpora.

Desired Skills

  • Thorough knowledge of current NLP techniques and algorithms including LSTMs, RNNs, GRU Networks, Memory-Augmented Network, etc.
  • Experience working with open-source projects. Open-source contributions will be a big plus.
  • Good analytical and communications skills, good sense of humor is a big plus.
  • Knowledge of Python or any other scripting language.
  • Motivation to harness computational linguistics research in social sector.

How we work

CivicDataLab operates as a small team with most people working remotely. Everyone gets an exposure of how they can use their skills to bring change in various social sectors, along with an opportunity to shape organization’s work culture. Individuals are responsible to define their own goals in accordance with partner organizations and other team members and plan their own career trajectory. 

We are committed to an inclusive work culture and strongly encourage applicants from diverse and cross-cultural backgrounds.  

Apply for this position

Login with Google or GitHub to see instructions on how to apply. Your identity will not be revealed to the employer.

It is NOT OK for recruiters, HR consultants, and other intermediaries to contact this employer