I’m working on training a transformer model to predict a categorical variable in my dataset, which includes 10 features, an ID variable, outcome, and a relative timestamp variable. Each row represents a measurement at time point n for patient IDn.
I’m unsure about the data structure before converting it into a tensor. Should the data be structured as a 3D array with:
1D: Patient
2D: Timepoint
3D: Feature score?
Or is there another recommended structure?
This is an example of my data:
ID |
p1 |
p2 |
p3 |
p4 |
p5 |
p6 |
p7 |
p8 |
p9 |
p10 |
outcome |
relative_timestamp |
c3wk |
71.24 |
63.65 |
87.66 |
79.71 |
5.16 |
70.73 |
23.49 |
25.45 |
98.26 |
70.69 |
1 |
0 |
c3wk |
82.38 |
72.53 |
46.43 |
54.59 |
54.45 |
66.22 |
65.68 |
33.47 |
68.07 |
29.45 |
1 |
1 |
c3wk |
60.48 |
38.11 |
54.00 |
71.62 |
24.58 |
45.61 |
62.02 |
45.01 |
45.34 |
21.55 |
1 |
2 |
c3wk |
60.72 |
87.46 |
84.61 |
75.13 |
63.99 |
6.70 |
64.15 |
75.58 |
53.58 |
27.49 |
1 |
3 |
io03 |
45.29 |
3.01 |
66.35 |
64.92 |
26.60 |
93.07 |
5.60 |
75.17 |
0.03 |
64.29 |
1 |
0 |
io03 |
95.50 |
33.74 |
46.98 |
76.31 |
42.60 |
88.15 |
81.10 |
39.48 |
49.96 |
39.22 |
1 |
1 |
io03 |
62.00 |
24.46 |
96.26 |
24.24 |
60.87 |
46.46 |
38.92 |
75.86 |
44.00 |
94.23 |
1 |
2 |
io03 |
8.23 |
81.33 |
71.00 |
86.66 |
9.72 |
11.15 |
98.57 |
51.87 |
25.64 |
29.49 |
1 |
3 |
1nax |
79.77 |
28.09 |
19.96 |
14.79 |
57.68 |
95.73 |
53.35 |
58.13 |
87.70 |
38.90 |
0 |
0 |
1nax |
50.59 |
68.51 |
86.34 |
9.01 |
65.97 |
27.16 |
24.87 |
79.89 |
35.18 |
57.06 |
0 |
1 |
1nax |
30.15 |
36.26 |
70.60 |
95.91 |
16.17 |
38.27 |
11.68 |
63.77 |
7.95 |
90.40 |
0 |
2 |
1nax |
16.29 |
93.54 |
21.65 |
33.86 |
52.37 |
2.02 |
45.48 |
66.30 |
12.00 |
9.48 |
0 |
3 |