Improving Automatic Semantic Similarity Classification of the PNT

EasyChair Preprint 7362

3 pages•Date: January 21, 2022

Alexandra Salem, Robert Gale, Gerasimos Fergadiotis and Steven Bedrick

Abstract

In the Philadelphia Naming Test (PNT; Roach et al., 1996), paraphasic errors are classified into six major categories according to three dimensions: lexicality, phonological similarity and semantic similarity to the target. Our team has developed software called ParAlg (Paraphasia Algorithms) for automatically classifying paraphasias by these three dimensions given a transcription (Fergadiotis et al, 2016, Mckinney-Bock & Bedrick, 2019). In ParAlg, the semantic similarity of a response to the target is determined with a binary classifier that uses a language model which produces meaningful representations of words in a vector space. Previously, the language model used in ParAlg was word2vec (Mikolov et al. 2013).

This work focuses on improving the semantic similarity classification in ParAlg. We fine-tune a modern language model called BERT (Bidirectional Encoder Representations from Transformers; Devlin et al., 2019) alongside a binary classifier to categorize each transcribed response to a PNT item as semantically similar to the target or not. BERT produces contextual vectors, meaning the representation of a word changes based upon the context given to the model, in contrast to the static representations in word2vec. We compare ParAlg classification results using word2vec and BERT.

Our dataset is a subset of the Moss Aphasia Psycholinguistic Database (MAPPD; Mirman et al., 2010) consisting of 11,999 clinician-transcribed and categorized paraphasias from 296 participants. Errors are classified using ParAlg with word2vec or BERT to make semantic judgments.

Overall, BERT outperformed word2vec when determining the semantic similarity of each error to the target. Using BERT led to 556 semantic misclassifications compared to 1,084 with word2vec. Further, a post-hoc qualitative analysis suggests that BERT’s improved performance is associated with its ability to handle polysemy.

Keyphrases: PNT, aphasia, machine learning, picture naming, semantics

Links:

https://easychair.org/publications/preprint/Vlsk

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:7362,
  author    = {Alexandra Salem and Robert Gale and Gerasimos Fergadiotis and Steven Bedrick},
  title     = {Improving Automatic Semantic Similarity Classification of the PNT},
  howpublished = {EasyChair Preprint 7362},
  year      = {EasyChair, 2022}}

Download PDF Open PDF in browser