Python Forum
speed up getting embedding from bert model for large set of text
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
speed up getting embedding from bert model for large set of text
#1
Hi python experts,

I tried to get embedding from pre-trained language model with the function listed below. However, it took forever to execute if I have one thousand articles and each of which have at least 500 words. Could anyone suggest how to modify the code to speed up the process. Thanks!


from transformers import BertTokenizer, BertModel
import torch
def get_bert_embeddings(text_list, batch_size=32):
    embeddings = []
    
    # Process texts in batches
    for i in range(0, len(text_list), batch_size):
        batch_texts = text_list[i:i + batch_size]
        inputs = tokenizer(batch_texts, return_tensors='pt', padding=True, truncation=True, max_length=512)
        with torch.no_grad():
            outputs = model(**inputs)
        
        # Use the embeddings from the [CLS] token
        batch_embeddings = outputs.last_hidden_state[:, 0, :].numpy()
        embeddings.extend(batch_embeddings)
Larz60+ write May-24-2024, 07:05 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.

Fixed for you this time. Please use BBCode tags on future posts.
Reply


Messages In This Thread
speed up getting embedding from bert model for large set of text - by veda - May-24-2024, 10:05 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  model.fit and model.predict errors hatflyer 6 1,756 Nov-10-2023, 01:39 AM
Last Post: hatflyer
  Embedding python script into html via pyscript pyscript_dude 7 1,803 Apr-16-2023, 11:17 PM
Last Post: pyscript_dude
  C++ python embedding comarius 0 899 Aug-26-2022, 02:01 AM
Last Post: comarius
  adding the customize dataset to BERT alicenguyen 2 1,173 Jul-06-2022, 08:06 AM
Last Post: Larz60+
  FileNotFoundError: [Errno 2] No such file or directory: 'model/doc2vec.model/Articles Anldra12 10 6,173 Jun-11-2021, 04:48 PM
Last Post: snippsat
Question Embedding a python file online Dreary35 0 1,607 Jun-10-2021, 05:05 PM
Last Post: Dreary35
  Need help merging/embedding duckredbeard 10 3,714 Aug-13-2020, 04:48 AM
Last Post: duckredbeard
  Iterate 2 large text files across lines and replace lines in second file medatib531 13 6,340 Aug-10-2020, 11:01 PM
Last Post: medatib531
  Read/Sort Large text file avoiding line-by-line read using mmep or hdf5 Robotguy 0 2,153 Jul-22-2020, 08:11 PM
Last Post: Robotguy
  Embedding return in a print statement Tapster 3 2,440 Oct-07-2019, 03:10 PM
Last Post: Tapster

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020