ValueError: Found input variables with inconsistent numbers of samples

saoko · Jun-16-2022, 06:59 PM

I'm trying to write a bounding box regression training script with Keras and TensorFlow for object detection. I have a dataset of 3153 images (in jpg extension) and an txt file of bounding box annotations which consists 6430 lines (some pictures have multiple bounding box). This is a part of txt file (to know how it look):

2007_000027 101 174 351 349
2007_000032 180 195 229 213
2007_000032 189 26 238 44
2007_000129 1 74 462 272
2007_000129 19 252 487 334
2007_000170 91 3 206 43
2007_000170 28 4 372 461
2007_000272 71 25 500 304
2007_000323 3 277 375 500
2007_000323 3 12 375 305

I created a configuration file, which stores directories to some files:

BASE_PATH = "dataset"
IMAGES_PATH = os.path.sep.join([BASE_PATH, "images"])
ANNOTS_PATH = os.path.sep.join([BASE_PATH, "bboxes.txt"])

BASE_OUTPUT = "output"
MODEL_PATH = os.path.sep.join([BASE_OUTPUT, "detector.h5"])
PLOT_PATH = os.path.sep.join([BASE_OUTPUT, "plot.png"])
TEST_FILENAMES = os.path.sep.join([BASE_OUTPUT, "test_images.txt"])

INIT_LR = 1e-4
NUM_EPOCHS = 25
BATCH_SIZE = 32

The second file includes code to train my data:

print("INFO - loading dataset...")
rows = open(config.ANNOTS_PATH).read().strip().split("\n")
data = []
targets = []
filenames = []

for row in rows: 
    row = row.split(' ')
    (filename, startX, startY, endX, endY) = row
    suffix = ".jpg"
    imagePath = os.path.sep.join([config.IMAGES_PATH, filename+suffix])
    image = cv2.imread(imagePath)
    (h, w) = image.shape[:2]

    startX = float(startX) / w
    startY = float(startY) / h
    endX = float(endX) / w
    endY = float(endY) / h

    image = load_img(imagePath, target_size=(224, 224))
    image = img_to_array(image)

    data.append(image)
    targets.append((startX, startY, endX, endY))
    filenames.append

data = np.array(data, dtype="float32") / 255.0
targets = np.array(targets, dtype="float32")

split = train_test_split(data, targets, filenames, test_size=0.10, random_state=42)

(trainImages, testImages) = split[:2]
(trainTargets, testTargets) = split[2:4]
(trainFilenames, testFilenames) = split[4:]

print("INFO - saving testing filenames...")
f = open(config.TEST_FILENAMES, "w")
f.write("\n".join(testFilenames))
f.close()

vgg = VGG16(weights="imagenet", include_top=False, input_tensor=Input(shape=(224, 224, 3)))
vgg.trainable = False

flatten = vgg.output
flatten = Flatten()(flatten)

bboxHead = Dense(128, activation="relu")(flatten)
bboxHead = Dense(64, activation="relu")(bboxHead)
bboxHead = Dense(32, activation="relu")(bboxHead)
bboxHead = Dense(4, activation="sigmoid")(bboxHead)

model = Model(inputs=vgg.input, outputs=bboxHead)

opt = Adam(lr=config.INIT_LR)
model.compile(loss="mse", optimizer=opt)
print(model.summary())

print("INFO - training bounding box regressor...")
H = model.fit(
    trainImages, trainTargets, 
    validation_data=(testImages, testTargets), 
    batch_size=config.BATCH_SIZE, epochs=config.NUM_EPOCHS, verbose=1)

print("INFO - saving objects detector model...")
model.save(config.MODEL_PATH, save_format="h5")

N = config.NUM_EPOCHS
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")
plt.title("Bounding box regression loss on training set")
plt.xlabel("Epoch #")
plt.ylabel("Loss")
plt.legend(loc="lower left")
plt.savefig(config.PLOT_PATH)

When I run my code I get the next error:

Quote:Traceback (most recent call last): File "/Users/username/Downloads/od/train.py", line 47, in split = train_test_split(data, targets, filenames, test_size=0.10, random_state=42) File "/usr/local/lib/python3.9/site-packages/sklearn/model_selection/_split.py", line 2430, in train_test_split arrays = indexable(*arrays) File "/usr/local/lib/python3.9/site-packages/sklearn/utils/validation.py", line 433, in indexable check_consistent_length(*result) File "/usr/local/lib/python3.9/site-packages/sklearn/utils/validation.py", line 387, in check_consistent_length raise ValueError( ValueError: Found input variables with inconsistent numbers of samples: [6430, 6430, 0]

I understand that the number of lines is not equal to number of images, but I can't change data in txt file. Can someone help me to correct this code to train my data properly.

Thanks!

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Identify salinity of water samples with images?	Rangerguy	1	622	Aug-24-2024, 11:18 PM Last Post: Larz60+
	ValueError: could not broadcast input array from shape	makingwithheld	1	2,302	Jul-06-2024, 03:02 PM Last Post: paul18fr
	Read csv file with inconsistent delimiter	gracenz	2	2,466	Mar-27-2023, 08:59 PM Last Post: deanhystad
	Inconsistent loop iteration behavior	JonWayn	2	1,832	Dec-10-2022, 06:49 AM Last Post: JonWayn
	ValueError: substring not found	nby2001	4	11,045	Aug-08-2022, 11:16 AM Last Post: rob101
	WHILE Loop - constant variables NOT working with user input boundaries	C0D3R	4	2,488	Apr-05-2022, 06:18 AM Last Post: C0D3R
	Loop Dict with inconsistent Keys	Personne	1	2,177	Feb-05-2022, 03:19 AM Last Post: Larz60+
	Inconsistent counting / timing with threading	rantwhy	1	2,408	Nov-24-2021, 04:04 AM Last Post: deanhystad
	Inconsistent behaviour in output - web scraping	Steve	6	4,093	Sep-20-2021, 01:54 AM Last Post: Larz60+
	Found input variables with inconsistent numbers of samples: [1000, 200]	jenya56	2	4,164	Sep-15-2021, 12:48 PM Last Post: jenya56

ValueError: Found input variables with inconsistent numbers of samples

User Panel Messages

Announcements