publications
Published Works
2022
- ThesisImproving Extreme Multi-Label Text Classification With Sentence Level PredictionAshish SinghalJul 2022
The Extreme Multi-Label Text Classification (XMTC) problem aims to assign a small number of relevant labels to document text from a large label space. XMTC label spaces follow a power law distribution, that results in data sparsity for tail labels and aggressive prediction of head labels. Existing methods for tackling XMTC problems have utilized the whole document text to predict relevant labels. This project attempts to identify and use meaningful sentences of document text to predict relevant labels. Relevant labels are predicted for the sentences and they are empirically concatenated to form relevant labels set for the document. This method is based on the idea that not all text of a document is informative of the relevant labels. Whenever whole document text is used, informative text is often get polluted with noisy text which hampers the performance. Instead, predicting relevant labels for the sentences can facilitate augmented focus on the informative text, and more relevant and tail labels can be predicted. This project also explores the idea of using focal loss in XMTC problems with label propensities to overcome the influence of power law distribution and treat every label equally.
2021
- ISSIAugmenting context-aware citation recommendations with citation and co-authorship historyShenghui Wang Ashish SinghalProceedings, ISSI 2021, Jul 2021
With the increasing number of research papers being published, it has become a challenge to search for the most suitable articles for accurate referencing. Many local citation recommendation systems have begun to locate the suitable candidates by using the texts accompanying the citation suffix, along with the metadata of the target documents. Previous research has shown the positive effects from the citation relationships on such recommendations, however, the influence from the co-authorship history has not been fully investigated. In this paper, we extend the model proposed by Jeong, Jang & Park (2020) by combining the context, citation history with co-authorship information into the recommendation system. We also propose to use more domain-specific embeddings to better capture the semantics in the context. Our experiments show the positive effect of coauthorship information on citation recommendations, and that our model based on the combination of domainspecifically embedded context, the citation and the co-authorship history significantly outperforms the basic context-based recommendation model.