Python Forum
ValueError: Found input variables with inconsistent numbers of samples
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
ValueError: Found input variables with inconsistent numbers of samples
#1
I'm trying to write a bounding box regression training script with Keras and TensorFlow for object detection. I have a dataset of 3153 images (in jpg extension) and an txt file of bounding box annotations which consists 6430 lines (some pictures have multiple bounding box). This is a part of txt file (to know how it look):

2007_000027 101 174 351 349
2007_000032 180 195 229 213
2007_000032 189 26 238 44
2007_000129 1 74 462 272
2007_000129 19 252 487 334
2007_000170 91 3 206 43
2007_000170 28 4 372 461
2007_000272 71 25 500 304
2007_000323 3 277 375 500
2007_000323 3 12 375 305
I created a configuration file, which stores directories to some files:
BASE_PATH = "dataset"
IMAGES_PATH = os.path.sep.join([BASE_PATH, "images"])
ANNOTS_PATH = os.path.sep.join([BASE_PATH, "bboxes.txt"])

BASE_OUTPUT = "output"
MODEL_PATH = os.path.sep.join([BASE_OUTPUT, "detector.h5"])
PLOT_PATH = os.path.sep.join([BASE_OUTPUT, "plot.png"])
TEST_FILENAMES = os.path.sep.join([BASE_OUTPUT, "test_images.txt"])

INIT_LR = 1e-4
NUM_EPOCHS = 25
BATCH_SIZE = 32
The second file includes code to train my data:
print("INFO - loading dataset...")
rows = open(config.ANNOTS_PATH).read().strip().split("\n")
data = []
targets = []
filenames = []

for row in rows: 
    row = row.split(' ')
    (filename, startX, startY, endX, endY) = row
    suffix = ".jpg"
    imagePath = os.path.sep.join([config.IMAGES_PATH, filename+suffix])
    image = cv2.imread(imagePath)
    (h, w) = image.shape[:2]

    startX = float(startX) / w
    startY = float(startY) / h
    endX = float(endX) / w
    endY = float(endY) / h

    image = load_img(imagePath, target_size=(224, 224))
    image = img_to_array(image)

    data.append(image)
    targets.append((startX, startY, endX, endY))
    filenames.append

data = np.array(data, dtype="float32") / 255.0
targets = np.array(targets, dtype="float32")

split = train_test_split(data, targets, filenames, test_size=0.10, random_state=42)

(trainImages, testImages) = split[:2]
(trainTargets, testTargets) = split[2:4]
(trainFilenames, testFilenames) = split[4:]

print("INFO - saving testing filenames...")
f = open(config.TEST_FILENAMES, "w")
f.write("\n".join(testFilenames))
f.close()

vgg = VGG16(weights="imagenet", include_top=False, input_tensor=Input(shape=(224, 224, 3)))
vgg.trainable = False

flatten = vgg.output
flatten = Flatten()(flatten)

bboxHead = Dense(128, activation="relu")(flatten)
bboxHead = Dense(64, activation="relu")(bboxHead)
bboxHead = Dense(32, activation="relu")(bboxHead)
bboxHead = Dense(4, activation="sigmoid")(bboxHead)

model = Model(inputs=vgg.input, outputs=bboxHead)

opt = Adam(lr=config.INIT_LR)
model.compile(loss="mse", optimizer=opt)
print(model.summary())

print("INFO - training bounding box regressor...")
H = model.fit(
    trainImages, trainTargets, 
    validation_data=(testImages, testTargets), 
    batch_size=config.BATCH_SIZE, epochs=config.NUM_EPOCHS, verbose=1)

print("INFO - saving objects detector model...")
model.save(config.MODEL_PATH, save_format="h5")

N = config.NUM_EPOCHS
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")
plt.title("Bounding box regression loss on training set")
plt.xlabel("Epoch #")
plt.ylabel("Loss")
plt.legend(loc="lower left")
plt.savefig(config.PLOT_PATH)
When I run my code I get the next error:
Quote:Traceback (most recent call last): File "/Users/username/Downloads/od/train.py", line 47, in split = train_test_split(data, targets, filenames, test_size=0.10, random_state=42) File "/usr/local/lib/python3.9/site-packages/sklearn/model_selection/_split.py", line 2430, in train_test_split arrays = indexable(*arrays) File "/usr/local/lib/python3.9/site-packages/sklearn/utils/validation.py", line 433, in indexable check_consistent_length(*result) File "/usr/local/lib/python3.9/site-packages/sklearn/utils/validation.py", line 387, in check_consistent_length raise ValueError( ValueError: Found input variables with inconsistent numbers of samples: [6430, 6430, 0]

I understand that the number of lines is not equal to number of images, but I can't change data in txt file. Can someone help me to correct this code to train my data properly.

Thanks!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Read csv file with inconsistent delimiter gracenz 2 1,207 Mar-27-2023, 08:59 PM
Last Post: deanhystad
  Inconsistent loop iteration behavior JonWayn 2 1,006 Dec-10-2022, 06:49 AM
Last Post: JonWayn
  ValueError: substring not found nby2001 4 7,959 Aug-08-2022, 11:16 AM
Last Post: rob101
  WHILE Loop - constant variables NOT working with user input boundaries C0D3R 4 1,496 Apr-05-2022, 06:18 AM
Last Post: C0D3R
  Loop Dict with inconsistent Keys Personne 1 1,613 Feb-05-2022, 03:19 AM
Last Post: Larz60+
  Inconsistent counting / timing with threading rantwhy 1 1,774 Nov-24-2021, 04:04 AM
Last Post: deanhystad
  Inconsistent behaviour in output - web scraping Steve 6 2,564 Sep-20-2021, 01:54 AM
Last Post: Larz60+
  Found input variables with inconsistent numbers of samples: [1000, 200] jenya56 2 2,910 Sep-15-2021, 12:48 PM
Last Post: jenya56
  Packages inconsistent warning during hdbscan install Led_Zeppelin 0 1,933 Aug-31-2021, 04:10 PM
Last Post: Led_Zeppelin
  How can I found how many numbers are there in a Collatz Sequence that I found? cananb 2 2,557 Nov-23-2020, 05:15 PM
Last Post: cananb

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020