The DuckDB Python API docs have examples of how to read from a JSON file, but not doesn't explicitly state how to use the JSON type.
For example, given a schema:
CREATE TABLE test (obj JSON)
Given an arbitrary python object (e.g. nested dict
or list
) called obj
, which of the following is recommended:
conn.execute("INSERT INTO test VALUES (?)", (obj,))
conn.execute("INSERT INTO test VALUES (?)", (json.dumps(obj),))
Both appear to work, none seem prescribed or proscribed by the docs, so it might be tempting use the first approach -- it's more direct, potentially more efficient.
However, it seems that the 2nd approach is more reliable, because the first approach seems to be converting to STRUCTS behind the scenes.
This can be seen with this test:
@pytest.mark.parametrize("obj", [
{"a": 1, "b": 2},
[{"a": 1}, {"a": 2}],
[
{
"a": 1,
},
{
"b": 2,
},
],
{},
])
@pytest.mark.parametrize("pre_serialize", [True, False])
def test_duckdb(obj, pre_serialize):
import duckdb
conn = duckdb.connect(":memory:")
conn.execute("CREATE TABLE test (obj JSON)")
if pre_serialize:
obj_to_insert = json.dumps(obj)
else:
obj_to_insert = obj
conn.execute("INSERT INTO test VALUES (?)", (obj_to_insert,))
result = conn.execute("SELECT * FROM test").fetchone()
retrieved_obj = json.loads(result[0])
assert retrieved_obj == obj
The test fails with (False-obj2)
, with the following error:
> conn.execute("INSERT INTO test VALUES (?)", (obj_to_insert,))
E duckdb.duckdb.TypeMismatchException: Mismatch Type Error: Type STRUCT(b INTEGER) does not match with STRUCT(a INTEGER). Cannot cast STRUCTs - element "b" in source struct was not found in target struct
Note that obj1
(the 2nd example) passes. The difference between obj1
and obj2
is that the keys in the 2nd array element differ from the first. Judging by the error message, duckdb appears to be converting each element of the list to a struct, inducing a schema after the first element, and structs have fixed schemas.
Is this expected behavior, or a bug? Presumably the safest way to proceed is to pre-serialize, but is this guaranteed in the future?
1.1.3