amazon web services - AWS push_down_predicate not working with DynamoDb - Stack Overflow

admin2025-04-18  4

I'm using AWS Glue to read data from a DynamoDB table where the sort key sk (string) is a timestamp in the format 2024-04-10T00:00:00.000000+00:00. I'm trying to apply a push_down_predicate to filter records within a specific time range, but I'm getting unexpected results, including timestamps outside the specified range.

What I've Tried:

  1. DynamoDB Query: When I query directly from DynamoDB using the same timestamp format, the results are as expected.
  2. AWS Glue Job:
dynamic_frame = glueContext.create_dynamic_frame.from_catalog(
   database="my_database",  
   table_name="my_dynamodb_table",  
   push_down_predicate=f"sk >= '{start_timestamp}' AND sk < '{end_timestamp}'"
)
Here, `start_timestamp` and `end_timestamp` match the format in DynamoDB.

Observed Behavior: Instead of getting filtered results within the specified timestamp range, I'm seeing a mix of timestamps, including many outside the range.

Question:

Why isn't the push_down_predicate filtering the DynamoDB data as expected through AWS Glue, and how can I correctly apply this filter to get only the timestamps within the specified range?

I'm using AWS Glue to read data from a DynamoDB table where the sort key sk (string) is a timestamp in the format 2024-04-10T00:00:00.000000+00:00. I'm trying to apply a push_down_predicate to filter records within a specific time range, but I'm getting unexpected results, including timestamps outside the specified range.

What I've Tried:

  1. DynamoDB Query: When I query directly from DynamoDB using the same timestamp format, the results are as expected.
  2. AWS Glue Job:
dynamic_frame = glueContext.create_dynamic_frame.from_catalog(
   database="my_database",  
   table_name="my_dynamodb_table",  
   push_down_predicate=f"sk >= '{start_timestamp}' AND sk < '{end_timestamp}'"
)
Here, `start_timestamp` and `end_timestamp` match the format in DynamoDB.

Observed Behavior: Instead of getting filtered results within the specified timestamp range, I'm seeing a mix of timestamps, including many outside the range.

Question:

Why isn't the push_down_predicate filtering the DynamoDB data as expected through AWS Glue, and how can I correctly apply this filter to get only the timestamps within the specified range?

Share Improve this question edited Jan 30 at 20:33 fedonev 25.9k2 gold badges40 silver badges58 bronze badges asked Jan 30 at 13:23 Parag JadhavParag Jadhav 1,8992 gold badges25 silver badges42 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 2

DynamoDB connector does not support push down predicate filtering:

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect-dynamodb-home.html

转载请注明原文地址:http://anycun.com/QandA/1744916261a89438.html