Python Forum

Full Version: Generate Test data (.csv) using Pandas
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I want to generate the test data in (.csv format) using Python.
Below is my script using pandas but I'm stuck at randomly generating test data for a column called ACTIVE.
1. ACTIVE column should have value only 0 and 1.
2. Also another issue is that how can I have data of array of varying length.

Thank you in advance.

import pandas as pd
import numpy as np
import random

x = str(input('Enter the date: '))
y = ['1', '0']
data = {'ACCOUNT': ['', 'Enabled', 'Disabled', 'Hold'],
        'CUSTOMER NAME': ['Test Name1', 'Test Name2']}

df = pd.DataFrame(data, columns=['ACCOUNT NUMBER', 'ACCOUNT', 'CUSTOMER NAME', 'ACTIVE', 'DATE'])
df['ACCOUNT NUMBER'] = 123  #(This needs to auto-increment)
df['ACCOUNT NUMBER'] = 123
df['ACTIVE'] = random.choice(y) #(how column named active should randomly take value 0 or 1)
df['DATE'] = x
df.to_csv(r'C:\Users\Test_User\Desktop\TestFolder\TestFile.csv', index=False)
Error:
Enter the date: 9/9/2020 Traceback (most recent call last): File "C:/Users/TestUser/PycharmProjects/TestDataAutomation/Forum.py", line 10, in <module> df = pd.DataFrame(data, columns=['ACCOUNT NUMBER', 'ACCOUNT', 'CUSTOMER NAME', 'ACTIVE', 'DATE']) File "C:\Users\ TestUser\AppData\Roaming\Python\Python37\site-packages\pandas\core\frame.py", line 435, in __init__ mgr = init_dict(data, index, columns, dtype=dtype) File "C:\Users\ TestUser\AppData\Roaming\Python\Python37\site-packages\pandas\core\internals\construction.py", line 228, in init_dict index = extract_index(arrays[~missing]) File "C:\Users\ TestUser\AppData\Roaming\Python\Python37\site-packages\pandas\core\internals\construction.py", line 365, in extract_index raise ValueError("arrays must all be same length") ValueError: arrays must all be same length Process finished with exit code 1
Pandas dataframes don't have "shaggy bottoms" - columns must be of the same length. You can either extend the short columns with null values or use multiple dataframes.
Thanks for replying. I was seeing if there's any other way to do it which i am not aware of.

Another question: How can I auto-increment a column in Pandas using data frame.
If I've column say "ID" and if I set initial value of id = 10 then I want it to auto-increment from 10, 11, 12 etc.

Thanks!
Not quite clear on what you want to do. You want to increment all values in a column? Increment the values in each row of a single column? Move from column to column? Not sure what you mean by autoincrement in regards to a column.
I have a column called "Account Number". I want that column to auto-increment. Let's say I have Account Number value is set to 1000 then I want that value to be auto increment for other rows in same column. Ex below:

Account Number
1000
1001
1002
1003
Couple ways to do it. Purely Pandas would try this
start = 1000
df['autoinc'] = pd.RangeIndex(stop=df.shape[0])+start
It will work without the "start", did not have a dataframe handy to test by adding the start.