PLOD: Pinpoint Abbreviations in Scientific Texts

PLOD: Pinpoint Abbreviations in Scientific Texts

What is PLOD?

PLOD (Abbreviation Detection Dataset) is an English-language dataset built from PLOS journal articles, designed for training and evaluating NLP models that detect acronyms and their long forms. Each sentence is hand-labelled, clearly marking every acronym (AC) and its corresponding long form (LF)—ideal for training and testing abbreviation-aware NLP models.

What’s Inside

This coursework-ready subset contains roughly 1,000-10,000 sentences. Each entry provides:

  • Tokens – the sentence split into word pieces.
  • POS Tags – the part of speech for each token (via spaCy).
  • NER Tags – token labels: 1 = AC, 4 = LF, 0 = other.

Why it Matters

  • Training & Benchmarking – Develop or test models that link acronyms to definitions.
  • Boost NLP Tasks – Improve search, summarisation, and machine translation by handling abbreviations more accurately.
  • Track Progress – Published baselines show strong performance (F1 ≈ 0.92 for ACs, 0.89 for LFs).

License

Released under the Creative Commons Attribution-ShareAlike 4.0 International License.

Start Exploring

Looking for a reliable benchmark for acronym detection?

Unlock PLOD on AIOZ AI and integrate it directly into your token-classification pipeline today.

About the AIOZ Network

AIOZ Network is a DePIN for Web3 AI, Storage, and Streaming.

Powered by a global community of AIOZ DePINs, AIOZ rewards you for sharing your computational resources for storing, transcoding, and streaming digital media content and powering decentralized AI computation.

Find Us

AIOZ All Links | Website | X | Telegram

Receive occasional updates about the AIOZ network and our latest innovations
AIOZ Logo
© 2025 AIOZ Network. All rights reserved.