Summary: I want to be able to create list logic that can read any dataframe (varying column headers, data types etc) and assign it into the following structure list
[{'Name': column header 1 name, 'Type': column header 1 type, 'Name': column header 2 name, 'Type': column header 2 name}]
I have a sample table df_stack_exchange
data_stack_exchange = {'store': ['A','B', 'B', 'C', 'C', 'C', 'D', 'D', 'D', 'D'],
'worker': [1,1,2,1,2,3,1,2,3,4],
'boxes': [105, 90, 100, 80, 10, 200, 70, 210, 50, 0]}
df_stack_exchange = pandas.DataFrame(data_stack_exchange)
And want to create the following. I know it is not a functional dictionary with the duplicated keys but its for another defined purpose that requires this rigid structure
[{'Name': 'store', 'Type': 'object', 'Name': 'worker', 'Type': 'int64', 'Name': 'boxes', 'Type': 'int64'}]
Edit 1: The end state is I want to automate filling out the InputColumns on the create_data_set boto3 command without having to type out each column of my s3 file: .html
'InputColumns': [
{
'Name': 'string',
'Type': 'STRING'|'INTEGER'|'DECIMAL'|'DATETIME'|'BIT'|'BOOLEAN'|'JSON',
'SubType': 'FLOAT'|'FIXED'
},
]
Summary: I want to be able to create list logic that can read any dataframe (varying column headers, data types etc) and assign it into the following structure list
[{'Name': column header 1 name, 'Type': column header 1 type, 'Name': column header 2 name, 'Type': column header 2 name}]
I have a sample table df_stack_exchange
data_stack_exchange = {'store': ['A','B', 'B', 'C', 'C', 'C', 'D', 'D', 'D', 'D'],
'worker': [1,1,2,1,2,3,1,2,3,4],
'boxes': [105, 90, 100, 80, 10, 200, 70, 210, 50, 0]}
df_stack_exchange = pandas.DataFrame(data_stack_exchange)
And want to create the following. I know it is not a functional dictionary with the duplicated keys but its for another defined purpose that requires this rigid structure
[{'Name': 'store', 'Type': 'object', 'Name': 'worker', 'Type': 'int64', 'Name': 'boxes', 'Type': 'int64'}]
Edit 1: The end state is I want to automate filling out the InputColumns on the create_data_set boto3 command without having to type out each column of my s3 file: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/quicksight/client/create_data_set.html
'InputColumns': [
{
'Name': 'string',
'Type': 'STRING'|'INTEGER'|'DECIMAL'|'DATETIME'|'BIT'|'BOOLEAN'|'JSON',
'SubType': 'FLOAT'|'FIXED'
},
]
Another dataframe might facilitate subsequent operations more easily.
df_new = df_stack_exchange.dtypes.to_frame('Type').reset_index()
df_new.rename(columns={'index':'Name'}, inplace=True)
df_new
Name Type
0 store object
1 worker int64
2 boxes int64
Bit late to the party, but what you are really missing is a mapping from the pandas columns' types to the types you provide in your edit.
I've written the below dictionary to do that, and to build a string like you specified. You'll want to modify it to add SubType where necessary, and may want to modify it to construct a json instead of a string.
mapping_dict = {
"object": "STRING",
"string": "STRING",
"boolean": "BOOLEAN",
"category": "STRING",
"int8": "INTEGER",
"int16": "INTEGER",
"int32": "INTEGER",
"int64": "INTEGER",
"uint8": "INTEGER",
"uint16": "INTEGER",
"uint32": "INTEGER",
"uint64": "INTEGER",
"float16": "DECIMAL",
"float32": "DECIMAL",
"float64": "DECIMAL",
"datetime64": "DATETIME",
"timedelta64": "STRING",
"bool": "BIT",
"str": "STRING",
"unicode": "STRING",
"bytes": "STRING",
"dict": "JSON",
"list": "JSON",
}
result_string=""
for name, dtype in df_stack_exchange.dtypes.to_dict().items():
result_string +=(f'{{Name: {name}, Type: {mapping_dict[str(dtype)]}}}, ')
print(result_string)