speed up getting embedding from bert model for large set of text

veda · (This post was last modified: May-24-2024, 07:05 PM by Larz60+.)

Hi python experts,

I tried to get embedding from pre-trained language model with the function listed below. However, it took forever to execute if I have one thousand articles and each of which have at least 500 words. Could anyone suggest how to modify the code to speed up the process. Thanks!

from transformers import BertTokenizer, BertModel
import torch
def get_bert_embeddings(text_list, batch_size=32):
    embeddings = []
    
    # Process texts in batches
    for i in range(0, len(text_list), batch_size):
        batch_texts = text_list[i:i + batch_size]
        inputs = tokenizer(batch_texts, return_tensors='pt', padding=True, truncation=True, max_length=512)
        with torch.no_grad():
            outputs = model(**inputs)
        
        # Use the embeddings from the [CLS] token
        batch_embeddings = outputs.last_hidden_state[:, 0, :].numpy()
        embeddings.extend(batch_embeddings)

Larz60+ write May-24-2024, 07:05 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.

Fixed for you this time. Please use BBCode tags on future posts.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	model.fit and model.predict errors	hatflyer	6	1,756	Nov-10-2023, 01:39 AM Last Post: hatflyer
	Embedding python script into html via pyscript	pyscript_dude	7	1,803	Apr-16-2023, 11:17 PM Last Post: pyscript_dude
	C++ python embedding	comarius	0	899	Aug-26-2022, 02:01 AM Last Post: comarius
	adding the customize dataset to BERT	alicenguyen	2	1,173	Jul-06-2022, 08:06 AM Last Post: Larz60+
	FileNotFoundError: [Errno 2] No such file or directory: 'model/doc2vec.model/Articles	Anldra12	10	6,173	Jun-11-2021, 04:48 PM Last Post: snippsat
	Embedding a python file online	Dreary35	0	1,607	Jun-10-2021, 05:05 PM Last Post: Dreary35
	Need help merging/embedding	duckredbeard	10	3,714	Aug-13-2020, 04:48 AM Last Post: duckredbeard
	Iterate 2 large text files across lines and replace lines in second file	medatib531	13	6,340	Aug-10-2020, 11:01 PM Last Post: medatib531
	Read/Sort Large text file avoiding line-by-line read using mmep or hdf5	Robotguy	0	2,153	Jul-22-2020, 08:11 PM Last Post: Robotguy
	Embedding return in a print statement	Tapster	3	2,440	Oct-07-2019, 03:10 PM Last Post: Tapster

speed up getting embedding from bert model for large set of text

User Panel Messages

Announcements