cv

MY career highlights and accomplishments.

Basics

Name Ashish Singhal
Label Scientist
Email ashsh.ash216@gmail.com
Phone +91-8867600588
Url https://ashishsinghal.io
Summary A computer and machine learning research scientist, proficient in NLP & Deep Learning.

Work

  • 2023.08 - Present
    Machine Learning Scientist
    Legal AI Tech Company
    I am working on LLMs and Deep Learning to develop AI solutions for legal domain.
  • 2022.02 - 2023.08
    Machine Learning Engineer
    Riversand - A Syndigo Company
    I worked on developing AI solutions for data deduplication.
    • Built an end-to-end machine learning project with data pipelines that return a contrastive learning based neural network model to identify duplicate records in the given database. I used word2vec embeddings with deep neural network in PyTorch to build the model.
    • I fixed the business problem of address matching for duplicate records’ identification with Named-Entity Recognition(NER). I proposed the NER solution and brought NER model training pipelines into production from scratch and made direct positive impact on customer’s requirement. I used StanfordCoreNLP to build Conditional Random Fields (CRF) based NER model.
    • I built ML model for auto columns’ type detection in the given data that accelerated the on-boarding of a new customer. Used NLP text preprocessing techniques to generate tabular features that fed to the neural network to predict the column type.
  • 2016.01 - 2019.07
    Software Engineer
    GE Healthcare
    I started as a software engineer and later transitioned into Machine Learning based role. While working on software, I built several features using Java.
    • Built an BiLSTM-based NER model to detect disease and hence X-Ray test to be taken from the given prescription text of the doctor. I used word2vec to generate the embeddings from text. This model reduces the mouse-clicks needed to take a X-Ray of a patient.
    • Built a time-forecasting model with LSTM and deep neural network to predict the system load (RAM and CPU utilization) to take appropriate measures at appropriate time to lower the system load.

Education

  • 2019.09 - 2022.07

    Enschede, Netherlands

    Masters Of Science
    University Of Twente
    Data Science
    • NLP
    • DL
    • Probabilistic Programming
    • Computer Vision
    • Deep Learning
  • 2012.07 - 2016.06

    Karnataka, India

    Bachelor Of Technology
    Manipal Institute Of Technology
    Computer Science Engineering
    • OOPs
    • Data Structures & Algorithm
    • Operating Systems
    • Database Management
    • Computer Networks
    • Compiler Design
    • Distributed Systems
    • Software Engineering

Certificates

Finetuning Large Language Models
DeepLearning.AI 2024-07-07
Quantization Fundamentals with Hugging Face
DeepLearning.AI 2024-06-10
Machine Learning - Classification
University of Washington 2018-10-7
Machine Learning - Regression
University of Washington 2018-10-20
Mathematics for Machine Learning
Imperial College London 2018-09-01

Publications

  • 2022.07.01
    Improving Extreme Multi-Label Text Classification With Sentence Level Prediction
    Masters' Dissertation
    The Extreme Multi-Label Text Classification (XMTC) problem aims to assign a small number of relevant labels to document text from a large label space. XMTC label spaces follow a power law distribution, that results in data sparsity for tail labels and aggressive prediction of head labels. Existing methods for tackling XMTC problems have utilized the whole document text to predict relevant labels. This project attempts to identify and use meaningful sentences of document text to predict relevant labels. Relevant labels are predicted for the sentences and they are empirically concatenated to form relevant labels set for the document. This method is based on the idea that not all text of a document is informative of the relevant labels. Whenever whole document text is used, informative text is often get polluted with noisy text which hampers the performance. Instead, predicting relevant labels for the sentences can facilitate augmented focus on the informative text, and more relevant and tail labels can be predicted. This project also explores the idea of using focal loss in XMTC problems with label propensities to overcome the influence of power law distribution and treat every label equally.
  • 2021.07.01
    Augmenting context-aware citation recommendations with citation and co-authorship history
    Proceedings, ISSI 2021
    The paper addresses the challenge of efficiently searching for relevant research papers amidst the growing number of publications. It discusses how local citation recommendation systems utilize text and metadata to identify suitable articles for referencing. While previous studies have highlighted the benefits of citation relationships in such recommendations, the impact of co-authorship history has been underexplored. The authors propose an extension to an existing model by integrating context, citation history, and co-authorship information into the recommendation system. They also suggest employing domain-specific embeddings to better capture semantic nuances. Experimental results demonstrate the positive influence of co-authorship information on citation recommendations, with the combined model significantly outperforming basic context-based approaches.

Skills

Artificial Intelligence
AI Research
Language Models
Deep Learning
Machine Learning
Natural Language Processing
Image Processing
Computer Vision
Software Engineering
Object Oriented Programming
Software Design
Data Structures
Algorithms
Java, Python, C++

Languages

Hindi
Native speaker
English
Proficient
Marathi
Intermediate

Interests

AI/ML
AI Research Papers
Tech Blogging
Finance
Productivity/Health