cv
MY career highlights and accomplishments.
Basics
Name | Ashish Singhal |
Label | Scientist |
ashsh.ash216@gmail.com | |
Phone | +91-8867600588 |
Url | https://ashishsinghal.io |
Summary | A computer and machine learning research scientist, proficient in NLP & Deep Learning. |
Work
-
2023.08 - Present Machine Learning Scientist
Legal AI Tech Company
I am working on LLMs and Deep Learning to develop AI solutions for legal domain.
-
2022.02 - 2023.08 Machine Learning Engineer
Riversand - A Syndigo Company
I worked on developing AI solutions for data deduplication.
- Built an end-to-end machine learning project with data pipelines that return a contrastive learning based neural network model to identify duplicate records in the given database. I used word2vec embeddings with deep neural network in PyTorch to build the model.
- I fixed the business problem of address matching for duplicate records’ identification with Named-Entity Recognition(NER). I proposed the NER solution and brought NER model training pipelines into production from scratch and made direct positive impact on customer’s requirement. I used StanfordCoreNLP to build Conditional Random Fields (CRF) based NER model.
- I built ML model for auto columns’ type detection in the given data that accelerated the on-boarding of a new customer. Used NLP text preprocessing techniques to generate tabular features that fed to the neural network to predict the column type.
-
2016.01 - 2019.07 Software Engineer
GE Healthcare
I started as a software engineer and later transitioned into Machine Learning based role. While working on software, I built several features using Java.
- Built an BiLSTM-based NER model to detect disease and hence X-Ray test to be taken from the given prescription text of the doctor. I used word2vec to generate the embeddings from text. This model reduces the mouse-clicks needed to take a X-Ray of a patient.
- Built a time-forecasting model with LSTM and deep neural network to predict the system load (RAM and CPU utilization) to take appropriate measures at appropriate time to lower the system load.
Education
-
2019.09 - 2022.07 Enschede, Netherlands
Masters Of Science
University Of Twente
Data Science
- NLP
- DL
- Probabilistic Programming
- Computer Vision
- Deep Learning
-
2012.07 - 2016.06 Karnataka, India
Bachelor Of Technology
Manipal Institute Of Technology
Computer Science Engineering
- OOPs
- Data Structures & Algorithm
- Operating Systems
- Database Management
- Computer Networks
- Compiler Design
- Distributed Systems
- Software Engineering
Certificates
Finetuning Large Language Models | ||
DeepLearning.AI | 2024-07-07 |
Quantization Fundamentals with Hugging Face | ||
DeepLearning.AI | 2024-06-10 |
Introduction to Machine Learning in Production | ||
DeepLearning.AI | 2021-10-20 |
Machine Learning - Classification | ||
University of Washington | 2018-10-7 |
Machine Learning - Regression | ||
University of Washington | 2018-10-20 |
Mathematics for Machine Learning | ||
Imperial College London | 2018-09-01 |
Publications
-
2022.07.01 Improving Extreme Multi-Label Text Classification With Sentence Level Prediction
Masters' Dissertation
The Extreme Multi-Label Text Classification (XMTC) problem aims to assign a small number of relevant labels to document text from a large label space. XMTC label spaces follow a power law distribution, that results in data sparsity for tail labels and aggressive prediction of head labels. Existing methods for tackling XMTC problems have utilized the whole document text to predict relevant labels. This project attempts to identify and use meaningful sentences of document text to predict relevant labels. Relevant labels are predicted for the sentences and they are empirically concatenated to form relevant labels set for the document. This method is based on the idea that not all text of a document is informative of the relevant labels. Whenever whole document text is used, informative text is often get polluted with noisy text which hampers the performance. Instead, predicting relevant labels for the sentences can facilitate augmented focus on the informative text, and more relevant and tail labels can be predicted. This project also explores the idea of using focal loss in XMTC problems with label propensities to overcome the influence of power law distribution and treat every label equally.
-
2021.07.01 Augmenting context-aware citation recommendations with citation and co-authorship history
Proceedings, ISSI 2021
The paper addresses the challenge of efficiently searching for relevant research papers amidst the growing number of publications. It discusses how local citation recommendation systems utilize text and metadata to identify suitable articles for referencing. While previous studies have highlighted the benefits of citation relationships in such recommendations, the impact of co-authorship history has been underexplored. The authors propose an extension to an existing model by integrating context, citation history, and co-authorship information into the recommendation system. They also suggest employing domain-specific embeddings to better capture semantic nuances. Experimental results demonstrate the positive influence of co-authorship information on citation recommendations, with the combined model significantly outperforming basic context-based approaches.
Skills
Artificial Intelligence | |
AI Research | |
Language Models | |
Deep Learning | |
Machine Learning | |
Natural Language Processing | |
Image Processing | |
Computer Vision |
Software Engineering | |
Object Oriented Programming | |
Software Design | |
Data Structures | |
Algorithms | |
Java, Python, C++ |
Languages
Hindi | |
Native speaker |
English | |
Proficient |
Marathi | |
Intermediate |
Interests
AI/ML |
AI Research Papers |
Tech Blogging |
Finance |
Productivity/Health |