PERPUSTAKAAN BIG

  • Beranda
  • Informasi
  • Berita
  • Bantuan
  • Area Pustakawan
  • Area Anggota
  • Pilih Bahasa :
    Bahasa Arab Bahasa Bengal Bahasa Brazil Portugis Bahasa Inggris Bahasa Spanyol Bahasa Jerman Bahasa Indonesia Bahasa Jepang Bahasa Melayu Bahasa Persia Bahasa Rusia Bahasa Thailand Bahasa Turki Bahasa Urdu
Image of Geoscience language models and their intrinsic evaluation

Text

Geoscience language models and their intrinsic evaluation

Christopher J.M. Lawley - Nama Orang; Stefania Raimondo - Nama Orang; Tianyi Chen - Nama Orang; Lindsay Brin - Nama Orang; Anton Zakharov - Nama Orang; Daniel Kur - Nama Orang; Jenny Hui - Nama Orang; Glen Newton - Nama Orang; Sari L. Burgoyne - Nama Orang; Genevieve Marquis - Nama Orang;

Geoscientists use observations and descriptions of the rock record to study the origins and history of our planet, which has resulted in a vast volume of scientific literature. Recent progress in natural language processing (NLP) has the potential to parse through and extract knowledge from unstructured text, but there has, so far, been only limited work on the concepts and vocabularies that are specific to geoscience. Herein we harvest and process public geoscientific reports (i.e., Canadian federal and provincial geological survey publications databases) and a subset of open access and peer-reviewed publications to train new, geoscience-specific language models to address that knowledge gap. Language model performance is validated using a series of new geoscience-specific NLP tasks (i.e., analogies, clustering, relatedness, and nearest neighbour analysis) that were developed as part of the current study. The raw and processed national geological survey corpora, language models, and evaluation criteria are all made public for the first time. We demonstrate that non-contextual (i.e., Global Vectors for Word Representation, GloVe) and contextual (i.e., Bidirectional Encoder Representations from Transformers, BERT) language models updated using the geoscientific corpora outperform the generic versions of these models for each of the evaluation criteria. Principal component analysis further demonstrates that word embeddings trained on geoscientific text capture meaningful semantic relationships, including rock classifications, mineral properties and compositions, and the geochemical behaviour of elements. Semantic relationships that emerge from the vector space have the potential to unlock latent knowledge within unstructured text, and perhaps more importantly, also highlight the potential for other downstream geoscience-focused NLP tasks (e.g., keyword prediction, document similarity, recommender systems, rock and mineral classification).


Ketersediaan
128551.136Perpustakaan BIG (Eksternal Harddisk)Tersedia
Informasi Detail
Judul Seri
Applied Computing and Geoscience - Open Access
No. Panggil
551.136
Penerbit
Amsterdam : Elsevier., 2022
Deskripsi Fisik
10 hlm PDF, 1.163 KB
Bahasa
Inggris
ISBN/ISSN
-
Klasifikasi
551.136
Tipe Isi
text
Tipe Media
-
Tipe Pembawa
-
Edisi
Vol.14, June 2022
Subjek
Machine Learning
Artificial intelligence
Word embedding
Language models
BERT
GloVe
Info Detail Spesifik
-
Pernyataan Tanggungjawab
-
Versi lain/terkait

Tidak tersedia versi lain

Lampiran Berkas
  • Geoscience language models and their intrinsic evaluation
    Geoscientists use observations and descriptions of the rock record to study the origins and history of our planet, which has resulted in a vast volume of scientific literature. Recent progress in natural language processing (NLP) has the potential to parse through and extract knowledge from unstructured text, but there has, so far, been only limited work on the concepts and vocabularies that are specific to geoscience. Herein we harvest and process public geoscientific reports (i.e., Canadian federal and provincial geological survey publications databases) and a subset of open access and peer-reviewed publications to train new, geoscience-specific language models to address that knowledge gap. Language model performance is validated using a series of new geoscience-specific NLP tasks (i.e., analogies, clustering, relatedness, and nearest neighbour analysis) that were developed as part of the current study. The raw and processed national geological survey corpora, language models, and evaluation criteria are all made public for the first time. We demonstrate that non-contextual (i.e., Global Vectors for Word Representation, GloVe) and contextual (i.e., Bidirectional Encoder Representations from Transformers, BERT) language models updated using the geoscientific corpora outperform the generic versions of these models for each of the evaluation criteria. Principal component analysis further demonstrates that word embeddings trained on geoscientific text capture meaningful semantic relationships, including rock classifications, mineral properties and compositions, and the geochemical behaviour of elements. Semantic relationships that emerge from the vector space have the potential to unlock latent knowledge within unstructured text, and perhaps more importantly, also highlight the potential for other downstream geoscience-focused NLP tasks (e.g., keyword prediction, document similarity, recommender systems, rock and mineral classification).
    Other Resource Link
Komentar

Anda harus masuk sebelum memberikan komentar

PERPUSTAKAAN BIG
  • Informasi
  • Layanan
  • Pustakawan
  • Area Anggota

Tentang Kami

Perpustakaan Badan Informasi Geospasial (BIG) adalah sebuah perpustakaan yang berada di bawah Badan Informasi Geospasial Indonesia. Perpustakaan ini memiliki koleksi yang berkaitan dengan informasi geospasial, termasuk peta, data geospasial, dan literatur terkait. Selengkapnya

Cari

masukkan satu atau lebih kata kunci dari judul, pengarang, atau subjek

Donasi untuk SLiMS Kontribusi untuk SLiMS?

© 2025 — Senayan Developer Community

Ditenagai oleh SLiMS
Pilih subjek yang menarik bagi Anda
  • Batas Wilayah
  • Ekologi
  • Fotogrametri
  • Geografi
  • Geologi
  • GIS
  • Ilmu Tanah
  • Kartografi
  • Manajemen Bencana
  • Oceanografi
  • Penginderaan Jauh
  • Peta
Icons made by Freepik from www.flaticon.com
Pencarian Spesifik