code is
Dataset<Row> mainData=df.select( "data.*").filter("data.eventdesc='logout'");
Dataset<Row> groupByData = mainData.groupBy("ipaddress1").count().filter("count > 1");
mainData.filter(mainData.col("ipaddress1").contains(groupByData.col("ipaddress1")));
main-data output is
+-------+----------+------------+---------+---------------------+-----------+
|id |resource id|resource name|event-desc|event-date |ipaddress1 |
+-------+----------+------------+---------+---------------------+-----------+
|2010001|119 |Netopia |logout |+56975-05-07 23:01:37|25:34:21:44|
|2010001|119 |Netopia |logout |+56975-05-07 23:01:37|25:34:21:44|
|2010001|119 |Netopia |logout |+56975-05-07 23:01:37|25:34:21:45|
|2010001|119 |Netopia |logout |+56975-05-07 23:01:37|25:34:21:45|
|2010001|119 |Netopia |logout |+56975-05-07 23:01:37|25:34:21:44|
|2010001|119 |Netopia |logout |+56975-05-07 23:01:37|25:34:21:46|
+-------+----------+------------+---------+---------------------+-----------+
group by data is
+-----------+-----+
|ipaddress1 |count|
+-----------+-----+
|25:34:21:45|2 |
|25:34:21:44|3 |
+-----------+-----+
I need to filter the main data which are present in group data code is above but it is not working as expected, can anybody suggest any possible solution?