What is the correct way to insert into the DuckDB JSON type via the Python API? - Stack Overflow

admin2025-04-16  6

The DuckDB Python API docs have examples of how to read from a JSON file, but not doesn't explicitly state how to use the JSON type.

For example, given a schema:

CREATE TABLE test (obj JSON)

Given an arbitrary python object (e.g. nested dict or list) called obj, which of the following is recommended:

  1. conn.execute("INSERT INTO test VALUES (?)", (obj,))
  2. conn.execute("INSERT INTO test VALUES (?)", (json.dumps(obj),))

Both appear to work, none seem prescribed or proscribed by the docs, so it might be tempting use the first approach -- it's more direct, potentially more efficient.

However, it seems that the 2nd approach is more reliable, because the first approach seems to be converting to STRUCTS behind the scenes.

This can be seen with this test:

@pytest.mark.parametrize("obj", [
    {"a": 1, "b": 2},
    [{"a": 1}, {"a": 2}],
    [
            {
                "a": 1,
            },
            {
                "b": 2,
            },
    ],
    {},
])
@pytest.mark.parametrize("pre_serialize", [True, False])
def test_duckdb(obj, pre_serialize):
    import duckdb
    conn = duckdb.connect(":memory:")
    conn.execute("CREATE TABLE test (obj JSON)")
    if pre_serialize:
        obj_to_insert = json.dumps(obj)
    else:
        obj_to_insert = obj
    conn.execute("INSERT INTO test VALUES (?)", (obj_to_insert,))
    result = conn.execute("SELECT * FROM test").fetchone()
    retrieved_obj = json.loads(result[0])
    assert retrieved_obj == obj

The test fails with (False-obj2), with the following error:

>       conn.execute("INSERT INTO test VALUES (?)", (obj_to_insert,))
E       duckdb.duckdb.TypeMismatchException: Mismatch Type Error: Type STRUCT(b INTEGER) does not match with STRUCT(a INTEGER). Cannot cast STRUCTs - element "b" in source struct was not found in target struct

Note that obj1 (the 2nd example) passes. The difference between obj1 and obj2 is that the keys in the 2nd array element differ from the first. Judging by the error message, duckdb appears to be converting each element of the list to a struct, inducing a schema after the first element, and structs have fixed schemas.

Is this expected behavior, or a bug? Presumably the safest way to proceed is to pre-serialize, but is this guaranteed in the future?

  • duckdb python module version: 1.1.3
  • python3.11
转载请注明原文地址:http://anycun.com/QandA/1744782002a87548.html