postgresql - Update Postgres Pedigree process to prefix child decendents with @ character - Stack Overflow

admin2025-04-29  4

I had assistance in creating a Postgres Pedigree process for plant breeding in 2012 via PostgreSQL Recursive via 2 parent/child tables. The pedigree parent/child hierarchy is defined by the family/plant ids. Each plant "links" to the family table via the "id_family" foreign key. Plants with Root level parents have an ID of 1 which maps to 'NA' and have the Family is_root value set to 'Y'.  The problem is that you can't determine the actual child decedents in the path output.

The Parent/Child Mapping Use Case: Determine the F2+ parent (non-root level) to child relationships by associating the ptst_plant id_family value that maps back to the previous family, then prefix the child plant with the @ character to show that it is the descendant plant.

Parent/Child Mapping Workflow via ptst-pedigree.sql process:

  • Select the set of F2+ parent to child relationships that have ptst_family.is_root = 'N'
  • Select the set of ptst_family male/female_plant_ids that map to the ptst_plant.id_family, that also map back to the previously listed Fn family.
  • Example: ptst_family family5AEAG: id=9, family_key=family5AEAG, male/female_plant_id=10,11, is_root='N', maps to ptst_plant.id 10,11.
  • However, ptst_plant.id=5 maps to the previously listed family3AE family for the Pedigree ID=5 record, and ptst_plant.id=6 maps to the previously listed family4AG family for the Pedigree ID=6 record
  • Therefore, prefix the f1AE child plant_key with the @ character to show that it is the decedent plant of family3AE.  Also, prefix the m1AG child plant_key with the @ character to show that it is the decedent plant of family4AG. Repeat this process for other parent/child pairs.
  • Another option might be have a separate update script to change the Pedigree table Path string searching for the F2+ families that have plant keys that map back to the previous family then prefix the applicable plant key with the "@" character.

Examples below with the desired @ character plant prefix:

  • F1 family1AA=(f1A x m2A) >F2 family3AE=(@f7A x m1E) >F3 family5AEAG=(@f1AE x m1AG)

  • F1 family2AA=(f3A x m4A) >F2 family4AG=(@f8A x m1G) >F3 family5AEAG=(f1AE x @m1AG)

Below are the test tables/data using Postgresql 16.6:

DROP TABLE if exists ptst_family CASCADE;
DROP TABLE if exists ptst_plant CASCADE;
DROP TABLE if exists ptst_pedigree CASCADE;

CREATE TABLE ptst_family (  
  id serial,
  family_key VARCHAR(20) UNIQUE,
  female_plant_id INTEGER NOT NULL DEFAULT 1,  
  male_plant_id INTEGER NOT NULL DEFAULT 1,  
  is_root VARCHAR NOT NULL DEFAULT '0',  -- Root level familes are always the first level pedigree (F1)
  CONSTRAINT ptst_family_pk PRIMARY KEY (id)
);

CREATE TABLE ptst_plant (
  id serial,
  plant_key VARCHAR(20) UNIQUE,
  id_family INTEGER NOT NULL,  
  CONSTRAINT ptst_plant_pk PRIMARY KEY (id),
  CONSTRAINT ptst_plant_id_family_fk FOREIGN KEY(id_family) REFERENCES ptst_family(id) 
);

CREATE TABLE ptst_pedigree (
id serial,
pedigree_key VARCHAR NOT NULL,
path VARCHAR NOT NULL UNIQUE
);

-- FAMILY Table DATA:
insert into ptst_family (id, family_key, female_plant_id, male_plant_id, is_root) VALUES (1,'NA',1,1,'Y'); -- Default place holder record
-- F1 Root level Alba families
insert into ptst_family (id, family_key, female_plant_id, male_plant_id, is_root) VALUES (2,'family1AA',2,3,'Y');
insert into ptst_family (id, family_key, female_plant_id, male_plant_id, is_root) VALUES (3,'family2AA',4,5,'Y');
-- F2 Hybrid Families
insert into ptst_family (id, family_key, female_plant_id, male_plant_id, is_root) VALUES (5,'family3AE',6,8,'N'); 
insert into ptst_family (id, family_key, female_plant_id, male_plant_id, is_root) VALUES (6,'family4AG',7,9,'N');
-- F3 Double Hybrid family:
insert into ptst_family (id, family_key, female_plant_id, male_plant_id, is_root) VALUES (9,'family5AEAG',10,11,'N');
      
-- PLANT Table DATA:
insert into ptst_plant (id, plant_key,  id_family) VALUES (1,'NA',1);   -- Default place holder record
insert into ptst_plant (id, plant_key,  id_family) VALUES (2,'f1A',1); 
insert into ptst_plant (id, plant_key,  id_family) VALUES (3,'m2A',1);
insert into ptst_plant (id, plant_key,  id_family) VALUES (4,'f3A',1);
insert into ptst_plant (id, plant_key,  id_family) VALUES (5,'m4A',1);
-- Female Alba progeny:
insert into ptst_plant (id, plant_key,  id_family) VALUES (6,'f7A',2);
insert into ptst_plant (id, plant_key,  id_family) VALUES (7,'f8A',3);
-- Male/female Aspen Root level parents:
insert into ptst_plant (id, plant_key,  id_family) VALUES (8,'m1E',1); 
insert into ptst_plant (id, plant_key,  id_family) VALUES (9,'m1G',1);  
-- F1 Hybrid progeny:
insert into ptst_plant (id, plant_key,  id_family) VALUES (10,'f1AE',5); 
insert into ptst_plant (id, plant_key,  id_family) VALUES (11,'m1AG',6);  

Below is the pedigree.sql script developed in 2012:

WITH RECURSIVE expanded_family AS (
    SELECT
        f.id,
        f.family_key,
        pf.id_family    pf_family,
        pm.id_family    pm_family,
        f.is_root,
        f.family_key || '=(' || pf.plant_key || ' x ' || pm.plant_key || ')' pretty_print
    FROM ptst_family f
        JOIN ptst_plant pf ON f.female_plant_id = pf.id
        JOIN ptst_plant pm ON f.male_plant_id = pm.id
),
search_tree AS
(
    SELECT
        f.id,
        f.family_key,
        f.id family_root,
        1 depth,
        '>F1 ' || f.pretty_print  path
    FROM expanded_family f
    WHERE
        f.id != 1
        AND f.is_root = 'Y' 
    UNION ALL
    SELECT
        f.id,
        f.family_key,
        st.family_root,
        st.depth + 1,
        st.path || ' >F' || st.depth+1 || ' ' || f.pretty_print
    FROM search_tree st
        JOIN expanded_family f
            ON f.pf_family = st.id
            OR f.pm_family = st.id
    WHERE
        f.id <> 1
)
SELECT
    family_key,
    path
FROM
(
    SELECT
        family_key,
        rank() over (partition by family_root order by depth desc),
        path
    FROM search_tree
) AS ranked
--  WHERE rank = 1
WHERE path NOT LIKE '%(N/A x N/A)%' -- Remove rows with no filial output
ORDER BY family_key, path

Below is my Pedigree table desired output with the "@" prefixed child decedents:

ID  | Pedigree_key | Path
1 family2AA >F1 family2AA=(f3A x m4A)
2 family3AE >F1 family1AA=(f1A x m2A) >F2 family3AE=(@f7A x m1E)
3 family4AG >F1 family2AA=(f3A x m4A) >F2 family4AG=(@f8A x m1G)
4 family5AEAG >F1 family1AA=(f1A x m2A) >F2 family3AE=(@f7A x m1E) >F3 family5AEAG=(@f1AE x m1AG)
5 family5AEAG >F1 family2AA=(f3A x m4A) >F2 family4AG=(@f8A x m1G) >F3 family5AEAG=(f1AE x @m1AG)

I had assistance in creating a Postgres Pedigree process for plant breeding in 2012 via PostgreSQL Recursive via 2 parent/child tables. The pedigree parent/child hierarchy is defined by the family/plant ids. Each plant "links" to the family table via the "id_family" foreign key. Plants with Root level parents have an ID of 1 which maps to 'NA' and have the Family is_root value set to 'Y'.  The problem is that you can't determine the actual child decedents in the path output.

The Parent/Child Mapping Use Case: Determine the F2+ parent (non-root level) to child relationships by associating the ptst_plant id_family value that maps back to the previous family, then prefix the child plant with the @ character to show that it is the descendant plant.

Parent/Child Mapping Workflow via ptst-pedigree.sql process:

  • Select the set of F2+ parent to child relationships that have ptst_family.is_root = 'N'
  • Select the set of ptst_family male/female_plant_ids that map to the ptst_plant.id_family, that also map back to the previously listed Fn family.
  • Example: ptst_family family5AEAG: id=9, family_key=family5AEAG, male/female_plant_id=10,11, is_root='N', maps to ptst_plant.id 10,11.
  • However, ptst_plant.id=5 maps to the previously listed family3AE family for the Pedigree ID=5 record, and ptst_plant.id=6 maps to the previously listed family4AG family for the Pedigree ID=6 record
  • Therefore, prefix the f1AE child plant_key with the @ character to show that it is the decedent plant of family3AE.  Also, prefix the m1AG child plant_key with the @ character to show that it is the decedent plant of family4AG. Repeat this process for other parent/child pairs.
  • Another option might be have a separate update script to change the Pedigree table Path string searching for the F2+ families that have plant keys that map back to the previous family then prefix the applicable plant key with the "@" character.

Examples below with the desired @ character plant prefix:

  • F1 family1AA=(f1A x m2A) >F2 family3AE=(@f7A x m1E) >F3 family5AEAG=(@f1AE x m1AG)

  • F1 family2AA=(f3A x m4A) >F2 family4AG=(@f8A x m1G) >F3 family5AEAG=(f1AE x @m1AG)

Below are the test tables/data using Postgresql 16.6:

DROP TABLE if exists ptst_family CASCADE;
DROP TABLE if exists ptst_plant CASCADE;
DROP TABLE if exists ptst_pedigree CASCADE;

CREATE TABLE ptst_family (  
  id serial,
  family_key VARCHAR(20) UNIQUE,
  female_plant_id INTEGER NOT NULL DEFAULT 1,  
  male_plant_id INTEGER NOT NULL DEFAULT 1,  
  is_root VARCHAR NOT NULL DEFAULT '0',  -- Root level familes are always the first level pedigree (F1)
  CONSTRAINT ptst_family_pk PRIMARY KEY (id)
);

CREATE TABLE ptst_plant (
  id serial,
  plant_key VARCHAR(20) UNIQUE,
  id_family INTEGER NOT NULL,  
  CONSTRAINT ptst_plant_pk PRIMARY KEY (id),
  CONSTRAINT ptst_plant_id_family_fk FOREIGN KEY(id_family) REFERENCES ptst_family(id) 
);

CREATE TABLE ptst_pedigree (
id serial,
pedigree_key VARCHAR NOT NULL,
path VARCHAR NOT NULL UNIQUE
);

-- FAMILY Table DATA:
insert into ptst_family (id, family_key, female_plant_id, male_plant_id, is_root) VALUES (1,'NA',1,1,'Y'); -- Default place holder record
-- F1 Root level Alba families
insert into ptst_family (id, family_key, female_plant_id, male_plant_id, is_root) VALUES (2,'family1AA',2,3,'Y');
insert into ptst_family (id, family_key, female_plant_id, male_plant_id, is_root) VALUES (3,'family2AA',4,5,'Y');
-- F2 Hybrid Families
insert into ptst_family (id, family_key, female_plant_id, male_plant_id, is_root) VALUES (5,'family3AE',6,8,'N'); 
insert into ptst_family (id, family_key, female_plant_id, male_plant_id, is_root) VALUES (6,'family4AG',7,9,'N');
-- F3 Double Hybrid family:
insert into ptst_family (id, family_key, female_plant_id, male_plant_id, is_root) VALUES (9,'family5AEAG',10,11,'N');
      
-- PLANT Table DATA:
insert into ptst_plant (id, plant_key,  id_family) VALUES (1,'NA',1);   -- Default place holder record
insert into ptst_plant (id, plant_key,  id_family) VALUES (2,'f1A',1); 
insert into ptst_plant (id, plant_key,  id_family) VALUES (3,'m2A',1);
insert into ptst_plant (id, plant_key,  id_family) VALUES (4,'f3A',1);
insert into ptst_plant (id, plant_key,  id_family) VALUES (5,'m4A',1);
-- Female Alba progeny:
insert into ptst_plant (id, plant_key,  id_family) VALUES (6,'f7A',2);
insert into ptst_plant (id, plant_key,  id_family) VALUES (7,'f8A',3);
-- Male/female Aspen Root level parents:
insert into ptst_plant (id, plant_key,  id_family) VALUES (8,'m1E',1); 
insert into ptst_plant (id, plant_key,  id_family) VALUES (9,'m1G',1);  
-- F1 Hybrid progeny:
insert into ptst_plant (id, plant_key,  id_family) VALUES (10,'f1AE',5); 
insert into ptst_plant (id, plant_key,  id_family) VALUES (11,'m1AG',6);  

Below is the pedigree.sql script developed in 2012:

WITH RECURSIVE expanded_family AS (
    SELECT
        f.id,
        f.family_key,
        pf.id_family    pf_family,
        pm.id_family    pm_family,
        f.is_root,
        f.family_key || '=(' || pf.plant_key || ' x ' || pm.plant_key || ')' pretty_print
    FROM ptst_family f
        JOIN ptst_plant pf ON f.female_plant_id = pf.id
        JOIN ptst_plant pm ON f.male_plant_id = pm.id
),
search_tree AS
(
    SELECT
        f.id,
        f.family_key,
        f.id family_root,
        1 depth,
        '>F1 ' || f.pretty_print  path
    FROM expanded_family f
    WHERE
        f.id != 1
        AND f.is_root = 'Y' 
    UNION ALL
    SELECT
        f.id,
        f.family_key,
        st.family_root,
        st.depth + 1,
        st.path || ' >F' || st.depth+1 || ' ' || f.pretty_print
    FROM search_tree st
        JOIN expanded_family f
            ON f.pf_family = st.id
            OR f.pm_family = st.id
    WHERE
        f.id <> 1
)
SELECT
    family_key,
    path
FROM
(
    SELECT
        family_key,
        rank() over (partition by family_root order by depth desc),
        path
    FROM search_tree
) AS ranked
--  WHERE rank = 1
WHERE path NOT LIKE '%(N/A x N/A)%' -- Remove rows with no filial output
ORDER BY family_key, path

Below is my Pedigree table desired output with the "@" prefixed child decedents:

ID  | Pedigree_key | Path
1 family2AA >F1 family2AA=(f3A x m4A)
2 family3AE >F1 family1AA=(f1A x m2A) >F2 family3AE=(@f7A x m1E)
3 family4AG >F1 family2AA=(f3A x m4A) >F2 family4AG=(@f8A x m1G)
4 family5AEAG >F1 family1AA=(f1A x m2A) >F2 family3AE=(@f7A x m1E) >F3 family5AEAG=(@f1AE x m1AG)
5 family5AEAG >F1 family2AA=(f3A x m4A) >F2 family4AG=(@f8A x m1G) >F3 family5AEAG=(f1AE x @m1AG)

Share Improve this question edited Jan 7 at 18:23 user1888167 asked Jan 6 at 22:11 user1888167user1888167 1313 silver badges9 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

I discovered that I can resolve this via a separate update script to change the Pedigree table Path string by searching for the F2+ families against my existing avw_plant VIEW to verify the parent plant then prefix the descendant child with the "@" character. This was a valuable exercise since it forced me to dig deep to understand the problem from different perspectives then prove it with a simple POC like the example below, thanks for listening!

Pedigree Path: >F1 41XAA91=(A10 x A73) >F2 99XAA10=(30AA5MF x AA4102)

psql r4p -c "select plant_key, family_key from avw_plant where family_key = '41XAA91' and plant_key = 'AA4102';"

Updated Path:  >F1 41XAA91=(A10 x A73) >F2 99XAA10=(30AA5MF x @AA4102)
转载请注明原文地址:http://anycun.com/QandA/1745941817a91431.html