Python Forum
Generate Test data (.csv) using Pandas
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Generate Test data (.csv) using Pandas
#1
I want to generate the test data in (.csv format) using Python.
Below is my script using pandas but I'm stuck at randomly generating test data for a column called ACTIVE.
1. ACTIVE column should have value only 0 and 1.
2. Also another issue is that how can I have data of array of varying length.

Thank you in advance.

import pandas as pd
import numpy as np
import random

x = str(input('Enter the date: '))
y = ['1', '0']
data = {'ACCOUNT': ['', 'Enabled', 'Disabled', 'Hold'],
        'CUSTOMER NAME': ['Test Name1', 'Test Name2']}

df = pd.DataFrame(data, columns=['ACCOUNT NUMBER', 'ACCOUNT', 'CUSTOMER NAME', 'ACTIVE', 'DATE'])
df['ACCOUNT NUMBER'] = 123  #(This needs to auto-increment)
df['ACCOUNT NUMBER'] = 123
df['ACTIVE'] = random.choice(y) #(how column named active should randomly take value 0 or 1)
df['DATE'] = x
df.to_csv(r'C:\Users\Test_User\Desktop\TestFolder\TestFile.csv', index=False)
Error:
Enter the date: 9/9/2020 Traceback (most recent call last): File "C:/Users/TestUser/PycharmProjects/TestDataAutomation/Forum.py", line 10, in <module> df = pd.DataFrame(data, columns=['ACCOUNT NUMBER', 'ACCOUNT', 'CUSTOMER NAME', 'ACTIVE', 'DATE']) File "C:\Users\ TestUser\AppData\Roaming\Python\Python37\site-packages\pandas\core\frame.py", line 435, in __init__ mgr = init_dict(data, index, columns, dtype=dtype) File "C:\Users\ TestUser\AppData\Roaming\Python\Python37\site-packages\pandas\core\internals\construction.py", line 228, in init_dict index = extract_index(arrays[~missing]) File "C:\Users\ TestUser\AppData\Roaming\Python\Python37\site-packages\pandas\core\internals\construction.py", line 365, in extract_index raise ValueError("arrays must all be same length") ValueError: arrays must all be same length Process finished with exit code 1
Reply
#2
Pandas dataframes don't have "shaggy bottoms" - columns must be of the same length. You can either extend the short columns with null values or use multiple dataframes.
Reply
#3
Thanks for replying. I was seeing if there's any other way to do it which i am not aware of.

Another question: How can I auto-increment a column in Pandas using data frame.
If I've column say "ID" and if I set initial value of id = 10 then I want it to auto-increment from 10, 11, 12 etc.

Thanks!
Reply
#4
Not quite clear on what you want to do. You want to increment all values in a column? Increment the values in each row of a single column? Move from column to column? Not sure what you mean by autoincrement in regards to a column.
Reply
#5
I have a column called "Account Number". I want that column to auto-increment. Let's say I have Account Number value is set to 1000 then I want that value to be auto increment for other rows in same column. Ex below:

Account Number
1000
1001
1002
1003
Reply
#6
Couple ways to do it. Purely Pandas would try this
start = 1000
df['autoinc'] = pd.RangeIndex(stop=df.shape[0])+start
It will work without the "start", did not have a dataframe handy to test by adding the start.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  pandas read_csv can't handle missing data mrdominikku 0 400 Jul-09-2020, 12:26 PM
Last Post: mrdominikku
  Pandas data frame creation from Kafka Topic vboppa 0 248 Jul-01-2020, 04:23 PM
Last Post: vboppa
  Read json array data by pandas vipinct 0 390 Apr-13-2020, 02:24 PM
Last Post: vipinct
  add formatted column to pandas data frame alkaline3 0 393 Mar-22-2020, 06:44 PM
Last Post: alkaline3
  Partitioning when splitting data into train and test-dataset Den0st 0 401 Dec-07-2019, 08:31 PM
Last Post: Den0st
  pandas DataReader error on all data sources glidecode 5 11,411 Sep-25-2019, 02:10 PM
Last Post: perfringo
  Loop pandas data frame by position ? Johnse 1 733 Sep-06-2019, 12:26 AM
Last Post: scidam
  pandas data frame dervast 1 595 Aug-28-2019, 12:40 PM
Last Post: ThomasL
  Need Help With Filtering Data For Excel Files Using Pandas eddywinch82 9 1,490 Aug-06-2019, 03:44 PM
Last Post: eddywinch82
  Insert Pandas Data Frame into Teradata DB kylenater 0 2,524 Jul-19-2019, 04:53 PM
Last Post: kylenater

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020