1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
|
<?xml version='1.0' encoding='UTF-8'?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Fryzek Concepts</title><atom:link href="https://fryzekconcepts.com/feed.xml" rel="self" type="application/rss+xml"/><link>https://fryzekconcepts.com</link><description>Lucas is a developer working on cool things</description><lastBuildDate>Sat, 28 Dec 2024 13:41:16 -0000</lastBuildDate><item><title>Generating Video</title><link>https://fryzekconcepts.com/notes/generating-video.html</link><description><p>One thing I’m very interested in is computer graphics. This could be
complex 3D graphics or simple 2D graphics. The idea of getting a
computer to display visual data fascinates me. One fundamental part of
showing visual data is interfacing with a computer monitor. This can be
accomplished by generating a video signal that the monitor understands.
Below I have written instructions on how an FPGA can be used to generate
a video signal. I have specifically worked with the iCEBreaker FPGA but
the theory contained within this should work with any FPGA or device
that you can generate the appropriate timings for.</p>
<h3 id="tools">Tools</h3>
<p>Hardware used (<a
href="https://www.crowdsupply.com/1bitsquared/icebreaker-fpga">link for
board</a>):</p>
<ul>
<li>iCEBreaker FPGA</li>
<li>iCEBreaker 12-Bit DVI Pmod</li>
</ul>
<p>Software Used:</p>
<ul>
<li>IceStorm FPGA toolchain (<a
href="https://github.com/esden/summon-fpga-tools">follow install
instructions here</a>)</li>
</ul>
<h3 id="theory">Theory</h3>
<p>A video signal is composed of several parts, primarily the colour
signals and the sync signals. For this DVI Pmod, there is also a data
enable signal for the visible screen area. For the example here we are
going to be generating a 640x480 60 Hz video signal. Below is a table
describing the important data for our video signal.</p>
<table>
<tbody>
<tr>
<td>
Pixel Clock
</td>
<td>
25.175 MHz
</td>
</tr>
<tr>
<td>
Pixels Per Line
</td>
<td>
800 Pixels
</td>
</tr>
<tr>
<td>
Pixels Visible Per Line
</td>
<td>
640 Pixels
</td>
</tr>
<tr>
<td>
Horizontal Sync Front Porch Length
</td>
<td>
16 Pixels
</td>
</tr>
<tr>
<td>
Horizontal Sync Length
</td>
<td>
96 Pixels
</td>
</tr>
<tr>
<td>
Horizontal Sync Back Porch Length
</td>
<td>
48 Pixels
</td>
</tr>
<tr>
<td>
Lines Per Frame
</td>
<td>
525 Lines
</td>
</tr>
<tr>
<td>
Lines Visible Per Frame
</td>
<td>
480 Lines
</td>
</tr>
<tr>
<td>
Vertical Front Porch Length
</td>
<td>
10 Lines
</td>
</tr>
<tr>
<td>
Vertical Sync Length
</td>
<td>
2 Lines
</td>
</tr>
<tr>
<td>
Vertical Back Porch Length
</td>
<td>
33 Lines
</td>
</tr>
</tbody>
</table>
<p>Sourced from http://www.tinyvga.com/vga-timing/640x480@60Hz</p>
<p>The data from this table raises a few questions:</p>
<ol type="1">
<li>What is the Pixel Clock?</li>
<li>What is the difference between “Pixels/Lines” and “Visible
Pixels/Lines”?</li>
<li>What is “Front Porch”, “Sync”, and “Back Porch”?</li>
</ol>
<h4 id="pixel-clock">Pixel Clock</h4>
<p>The pixel clock is a fairly straightforward idea; this is the rate at
which we generate pixels. For video signal generation, the “pixel” is a
fundamental building block and we count things in the number of pixels
it takes up. Every time the pixel clock “ticks” we have incremented the
number of pixels we have processed. So for a 640x480 video signal, a
full line is 800 pixels, or 800 clock ticks. For the full 800x525 frame
there is 800 ticks x 525 lines, or 420000 clock ticks. If we are running
the display at 60 Hz, 420000 pixels per frame are generated 60 times per
second. Therefore, 25200000 pixels or clock ticks will pass in one
second. From this we can see the pixel clock frequency of 25.175 MHz is
roughly equal to 25200000 clock ticks. There is a small deviance from
the “true” values here, but monitors are flexible enough to accept this
video signal (my monitor reports it as 640x480@60Hz), and all
information I can find online says that 25.175 MHz is the value you want
to use. Later on we will see that the pixel clock is not required to be
exactly 25.175 Mhz.</p>
<h4 id="visible-area-vs-invisible-area">Visible Area vs Invisible
Area</h4>
<p><img
src="/assets/2020-04-07-generating-video/visible_invisible.png" /></p>
<p>From the above image we can see that a 640x480 video signal actually
generates a resolution larger than 640x480. The true resolution we
generate is 800x525, but only a 640x480 portion of that signal is
visible. The area that is not visible is where we generate the sync
signal. In other words, every part of the above image that is black is
where a sync signal is being generated.</p>
<h4 id="front-porch-back-porch-sync">Front Porch, Back Porch &amp;
Sync</h4>
<p>To better understand the front porch, back porch and sync signal,
let’s look at what the horizontal sync signal looks like during the
duration of a line:</p>
<p><img src="/assets/2020-04-07-generating-video/sync.png" /></p>
<p>From this we can see that the “Front Porch” is the invisible pixels
between the visible pixels and the sync pixels, and is represented by a
logical one or high signal. The “Sync” is the invisible pixels between
the front porch and back porch, and is represented by a logical zero or
low signal. The “Back Porch” is the invisible pixels after the sync
signal, and is represented by a logical one. For the case of 640x480
video, the visible pixel section lasts for 640 pixels. The front porch
section lasts for 16 pixels, after which the sync signal will become a
logical zero. This logical zero sync will last for 96 pixels, after
which the sync signal will become a logical one again. The back porch
will then last for 48 pixels. If you do a quick calculation right now of
640 + 16 + 96 + 48, we get 800 pixels which represents the full
horizontal resolution of the display. The vertical sync signal works
almost exactly the same, except the vertical sync signal acts on
lines.</p>
<h3 id="implementation">Implementation</h3>
<p>The first thing we can do that is going to simplify a lot of the
following logic is to keep track of which pixel, and which line we are
on. The below code block creates two registers to keep track of the
current pixel on the line (column) and the current line (line):</p>
<div class="sourceCode" id="cb1"><pre
class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">9</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> line<span class="op">;</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">9</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> column<span class="op">;</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="kw">always</span> <span class="op">@(</span><span class="kw">posedge</span> clk <span class="dt">or</span> <span class="kw">posedge</span> reset<span class="op">)</span> <span class="kw">begin</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span><span class="op">(</span>reset <span class="op">==</span> <span class="dv">1</span><span class="op">)</span> <span class="kw">begin</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> line <span class="op">&lt;=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> column <span class="op">&lt;=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> <span class="kw">begin</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span><span class="op">(</span>column <span class="op">==</span> <span class="dv">799</span> <span class="op">&amp;&amp;</span> line <span class="op">==</span> <span class="dv">524</span><span class="op">)</span> <span class="kw">begin</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> line <span class="op">&lt;=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> column <span class="op">&lt;=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> <span class="kw">if</span><span class="op">(</span>column <span class="op">==</span> <span class="dv">799</span><span class="op">)</span> <span class="kw">begin</span></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a> line <span class="op">&lt;=</span> line <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a> column <span class="op">&lt;=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> <span class="kw">begin</span></span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a> column <span class="op">&lt;=</span> column <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span>
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span>
<span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"></a><span class="kw">end</span></span></code></pre></div>
<p>This block of Verilog works by first initializing the line and column
register to zero on a reset. This is important to make sure that we
start from known values, otherwise the line and column register could
contain any value and our logic would not work. Next, we check if we are
at the bottom of the screen by comparing the current column to 799 (the
last pixel in the line) and the current line is 524 (the last line in
the frame). If these conditions are both true then we reset the line and
column back to zero to signify that we are starting a new frame. The
next block checks if the current column equals 799. Because the above if
statement failed,we know that we are at the end of the line but not the
end of the frame. If this is true we increment the current line count
and set the column back to zero to signify that we are starting a new
line. The final block simply increments the current pixel count. If we
reach this block ,we are neither at the end of the line or the end of
the frame so we can simply increment to the next pixel.</p>
<p>Now that we are keeping track of the current column and current line,
we can use this information to generate the horizontal and vertical sync
signals. From the theory above we know that the sync signal is only low
when we are between the front and back porch, at all other times the
signal is high. From this we can generate the sync signal with an OR and
two compares.</p>
<div class="sourceCode" id="cb2"><pre
class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>logic horizontal_sync<span class="op">;</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>logic vertical_sync<span class="op">;</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> horizontal_sync <span class="op">=</span> column <span class="op">&lt;</span> <span class="dv">656</span> <span class="op">||</span> column <span class="op">&gt;=</span> <span class="dv">752</span><span class="op">;</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> vertical_sync <span class="op">=</span> line <span class="op">&lt;</span> <span class="dv">490</span> <span class="op">||</span> line <span class="op">&gt;=</span> <span class="dv">492</span><span class="op">;</span></span></code></pre></div>
<p>Let’s examine the horizontal sync signal more closely. This statement
will evaluate to true if the current column is less than 656 or the
current column is greater than or equal to 752. This means that the
horizontal sync signal will be true except for when the current column
is between 656 and 751 inclusively. That is starting on column 656 the
horizontal sync signal will become false (low) and will remain that way
for the next 96 pixels until we reach pixel 752 where it will return to
being true (high). The vertical sync signal will work in the same way
except it is turned on based on the current line. Therefore, the signal
will remain high when the line is less than 490 and greater than or
equal to 492, and will remain low between lines 490 and 491
inclusive.</p>
<h4 id="putting-it-all-together">Putting It All Together</h4>
<p>Now that we have generated the video signal, we need to route it
towards the video output connectors on the iCEBreaker 12-bit DVI Pmod.
We also need to configure the iCEBreaker FPGA to have the appropriate
pixel clock frequency. First to get the correct pixel clock we are going
to use the following block of code:</p>
<div class="sourceCode" id="cb3"><pre
class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>SB_PLL40_PAD #<span class="op">(</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> .DIVR<span class="op">(</span><span class="bn">4&#39;b0000</span><span class="op">),</span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> .DIVF<span class="op">(</span><span class="bn">7&#39;b1000010</span><span class="op">),</span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> .DIVQ<span class="op">(</span><span class="bn">3&#39;b101</span><span class="op">),</span></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> .FILTER_RANGE<span class="op">(</span><span class="bn">3&#39;b001</span><span class="op">),</span></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> .FEEDBACK_PATH<span class="op">(</span><span class="st">&quot;SIMPLE&quot;</span><span class="op">),</span></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a> .DELAY_ADJUSTMENT_MODE_FEEDBACK<span class="op">(</span><span class="st">&quot;FIXED&quot;</span><span class="op">),</span></span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a> .FDA_FEEDBACK<span class="op">(</span><span class="bn">4&#39;b0000</span><span class="op">),</span></span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a> .DELAY_ADJUSTMENT_MODE_RELATIVE<span class="op">(</span><span class="st">&quot;FIXED&quot;</span><span class="op">),</span></span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a> .FDA_RELATIVE<span class="op">(</span><span class="bn">4&#39;b0000</span><span class="op">),</span></span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a> .SHIFTREG_DIV_MODE<span class="op">(</span><span class="bn">2&#39;b00</span><span class="op">),</span></span>
<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a> .PLLOUT_SELECT<span class="op">(</span><span class="st">&quot;GENCLK&quot;</span><span class="op">),</span></span>
<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a> .ENABLE_ICEGATE<span class="op">(</span><span class="bn">1&#39;b0</span><span class="op">)</span></span>
<span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"></a><span class="op">)</span> usb_pll_inst <span class="op">(</span></span>
<span id="cb3-15"><a href="#cb3-15" aria-hidden="true" tabindex="-1"></a> .PACKAGEPIN<span class="op">(</span>CLK<span class="op">),</span></span>
<span id="cb3-16"><a href="#cb3-16" aria-hidden="true" tabindex="-1"></a> .PLLOUTCORE<span class="op">(</span>pixel_clock<span class="op">),</span></span>
<span id="cb3-17"><a href="#cb3-17" aria-hidden="true" tabindex="-1"></a> .EXTFEEDBACK<span class="op">(),</span></span>
<span id="cb3-18"><a href="#cb3-18" aria-hidden="true" tabindex="-1"></a> .DYNAMICDELAY<span class="op">(),</span></span>
<span id="cb3-19"><a href="#cb3-19" aria-hidden="true" tabindex="-1"></a> .RESETB<span class="op">(</span><span class="bn">1&#39;b1</span><span class="op">),</span></span>
<span id="cb3-20"><a href="#cb3-20" aria-hidden="true" tabindex="-1"></a> .BYPASS<span class="op">(</span><span class="bn">1&#39;b0</span><span class="op">),</span></span>
<span id="cb3-21"><a href="#cb3-21" aria-hidden="true" tabindex="-1"></a> .LATCHINPUTVALUE<span class="op">(),</span></span>
<span id="cb3-22"><a href="#cb3-22" aria-hidden="true" tabindex="-1"></a><span class="op">);</span></span></code></pre></div>
<p>This block is mainly a copy paste of the PLL setup code from the
iCEBreaker examples, but with a few important changes. The DIVR, DIVF,
and DIVQ values are changed to create a 25.125 MHz. This is not exactly
25.175 MHz, but it is close enough that the monitor is happy enough and
recognizes it as a 640x480@60 Hz signal. These values were found through
the “icepll” utility, below is an example of calling this utility from
the command line:</p>
<pre><code>$ icepll -i 12 -o 25.175
F_PLLIN: 12.000 MHz (given)
F_PLLOUT: 25.175 MHz (requested)
F_PLLOUT: 25.125 MHz (achieved)
FEEDBACK: SIMPLE
F_PFD: 12.000 MHz
F_VCO: 804.000 MHz
DIVR: 0 (4&#39;b0000)
DIVF: 66 (7&#39;b1000010)
DIVQ: 5 (3&#39;b101)
FILTER_RANGE: 1 (3&#39;b001)</code></pre>
<p>From here we can see we had an input clock of 12 MHz (This comes from
the FTDI chip on the iCEBreaker board), and we wanted to get a 25.175
MHz output clock. The closest the PLL could generate was a 25.125 MHz
clock with the settings provided for the DIVR, DIVF, and DIVQ
values.</p>
<p>Now that we have a pixel clock we can wire up the necessary signals
for the DVI video out. The DVI Pmod has the following mapping for all of
its connectors:</p>
<table>
<tbody>
<tr>
<td>
PMOD 1
</td>
<td>
</td>
<td>
PMOD 2
</td>
<td>
</td>
</tr>
<tr>
<td>
<strong>P1A1</strong>
</td>
<td>
Red bit 4
</td>
<td>
<strong>P1B1</strong>
</td>
<td>
Blue bit 4
</td>
</tr>
<tr>
<td>
<strong>P1A2</strong>
</td>
<td>
Red bit 3
</td>
<td>
<strong>P1B2</strong>
</td>
<td>
Pixel clock
</td>
</tr>
<tr>
<td>
<strong>P1A3</strong>
</td>
<td>
Green bit 4
</td>
<td>
<strong>P1B3</strong>
</td>
<td>
Blue bit 3
</td>
</tr>
<tr>
<td>
<strong>P1A4</strong>
</td>
<td>
Green bit 3
</td>
<td>
<strong>P1B4</strong>
</td>
<td>
Horizontal Sync
</td>
</tr>
<tr>
<td>
<strong>P1A7</strong>
</td>
<td>
Red bit 2
</td>
<td>
<strong>P1B7</strong>
</td>
<td>
Blue bit 2
</td>
</tr>
<tr>
<td>
<strong>P1A8</strong>
</td>
<td>
Red bit 1
</td>
<td>
<strong>P1B8</strong>
</td>
<td>
Blue bit 1
</td>
</tr>
<tr>
<td>
<strong>P1A9</strong>
</td>
<td>
Green bit 2
</td>
<td>
<strong>P1B9</strong>
</td>
<td>
Data Enable
</td>
</tr>
<tr>
<td>
<strong>P1A10</strong>
</td>
<td>
Green bit 1
</td>
<td>
<strong>P1B10</strong>
</td>
<td>
Vertical Sync
</td>
</tr>
</tbody>
</table>
<p>From this we can see that we need 4 bits for each colour channel, a
horizontal sync signal, a vertical sync signal, and additionally a data
enable signal. The data enable signal is not part of a standard video
signal and is just used by the DVI transmitter chip on the Pmod to
signify when we are in visible pixel area or invisible pixel area.
Therefore we will set the Date enable line when the current column is
less than 640 and the current line is less than 480. Based on this we
can connect the outputs like so:</p>
<div class="sourceCode" id="cb5"><pre
class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">3</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> r<span class="op">;</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">3</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> g<span class="op">;</span></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">3</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> b<span class="op">;</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>logic data_enable<span class="op">;</span></span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> data_enable <span class="op">=</span> column <span class="op">&lt;</span> <span class="dv">640</span> <span class="op">&amp;&amp;</span> line <span class="op">&lt;</span> <span class="dv">480</span><span class="op">;</span></span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> <span class="op">{</span>P1A1<span class="op">,</span> P1A2<span class="op">,</span> P1A3<span class="op">,</span> P1A4<span class="op">,</span> P1A7<span class="op">,</span> P1A8<span class="op">,</span> P1A9<span class="op">,</span> P1A10<span class="op">}</span> <span class="op">=</span> </span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a> <span class="op">{</span>r<span class="op">[</span><span class="dv">3</span><span class="op">],</span> r<span class="op">[</span><span class="dv">2</span><span class="op">],</span> g<span class="op">[</span><span class="dv">3</span><span class="op">],</span> g<span class="op">[</span><span class="dv">2</span><span class="op">],</span> r<span class="op">[</span><span class="dv">1</span><span class="op">],</span> r<span class="op">[</span><span class="dv">0</span><span class="op">],</span> g<span class="op">[</span><span class="dv">1</span><span class="op">],</span> g<span class="op">[</span><span class="dv">0</span><span class="op">]};</span></span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> <span class="op">{</span>P1B1<span class="op">,</span> P1B2<span class="op">,</span> P1B3<span class="op">,</span> P1B4<span class="op">,</span> P1B7<span class="op">,</span> P1B8<span class="op">,</span> P1B9<span class="op">,</span> P1B10<span class="op">}</span> <span class="op">=</span> </span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a> <span class="op">{</span>b<span class="op">[</span><span class="dv">3</span><span class="op">],</span> pixel_clock<span class="op">,</span> b<span class="op">[</span><span class="dv">2</span><span class="op">],</span> horizontal_sync<span class="op">,</span> b<span class="op">[</span><span class="dv">1</span><span class="op">],</span> b<span class="op">[</span><span class="dv">0</span><span class="op">],</span> data_enable<span class="op">,</span> vertical_sync<span class="op">};</span></span></code></pre></div>
<p>Now for testing purposes we are going to set the output colour to be
fixed to pure red so additional logic to pick a pixel colour is not
required for this example. We can do this as shown below:</p>
<div class="sourceCode" id="cb6"><pre
class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> r <span class="op">=</span> <span class="bn">4&#39;b1111</span><span class="op">;</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> g <span class="op">=</span> <span class="bn">4&#39;b0000</span><span class="op">;</span></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> b <span class="op">=</span> <span class="bn">4&#39;b0000</span><span class="op">;</span></span></code></pre></div>
<p>Putting all of the above code together with whatever additional
inputs are required for the iCEBreaker FPGA gives us the following block
of code:</p>
<div class="sourceCode" id="cb7"><pre
class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> top</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="op">(</span></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="dt">input</span> CLK<span class="op">,</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a><span class="dt">output</span> LEDR_N<span class="op">,</span></span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a><span class="dt">output</span> LEDG_N<span class="op">,</span></span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a><span class="dt">input</span> BTN_N<span class="op">,</span></span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a><span class="dt">output</span> P1A1<span class="op">,</span> P1A2<span class="op">,</span> P1A3<span class="op">,</span> P1A4<span class="op">,</span> P1A7<span class="op">,</span> P1A8<span class="op">,</span> P1A9<span class="op">,</span> P1A10<span class="op">,</span></span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a><span class="dt">output</span> P1B1<span class="op">,</span> P1B2<span class="op">,</span> P1B3<span class="op">,</span> P1B4<span class="op">,</span> P1B7<span class="op">,</span> P1B8<span class="op">,</span> P1B9<span class="op">,</span> P1B10</span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a><span class="op">);</span></span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a><span class="ot">`define PIXELS_PER_LINE 10&#39;d800</span></span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a><span class="ot">`define PIXELS_VISIBLE_PER_LINE 10&#39;d640</span></span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a><span class="ot">`define LINES_PER_FRAME 10&#39;d525</span></span>
<span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a><span class="ot">`define LINES_VISIBLE_PER_FRAME 10&#39;d480</span></span>
<span id="cb7-15"><a href="#cb7-15" aria-hidden="true" tabindex="-1"></a><span class="ot">`define HORIZONTAL_FRONTPORCH 10&#39;d656</span></span>
<span id="cb7-16"><a href="#cb7-16" aria-hidden="true" tabindex="-1"></a><span class="ot">`define HORIZONTAL_BACKPORCH 10&#39;d752</span></span>
<span id="cb7-17"><a href="#cb7-17" aria-hidden="true" tabindex="-1"></a><span class="ot">`define VERTICAL_FRONTPORCH 10&#39;d490</span></span>
<span id="cb7-18"><a href="#cb7-18" aria-hidden="true" tabindex="-1"></a><span class="ot">`define VERTICAL_BACKPORCH 10&#39;d492</span></span>
<span id="cb7-19"><a href="#cb7-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-20"><a href="#cb7-20" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">9</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> line<span class="op">;</span></span>
<span id="cb7-21"><a href="#cb7-21" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">9</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> column<span class="op">;</span></span>
<span id="cb7-22"><a href="#cb7-22" aria-hidden="true" tabindex="-1"></a>logic horizontal_sync<span class="op">;</span></span>
<span id="cb7-23"><a href="#cb7-23" aria-hidden="true" tabindex="-1"></a>logic vertical_sync<span class="op">;</span></span>
<span id="cb7-24"><a href="#cb7-24" aria-hidden="true" tabindex="-1"></a>logic data_enable<span class="op">;</span></span>
<span id="cb7-25"><a href="#cb7-25" aria-hidden="true" tabindex="-1"></a>logic pixel_clock<span class="op">;</span></span>
<span id="cb7-26"><a href="#cb7-26" aria-hidden="true" tabindex="-1"></a>logic reset<span class="op">;</span></span>
<span id="cb7-27"><a href="#cb7-27" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-28"><a href="#cb7-28" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">3</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> r<span class="op">;</span></span>
<span id="cb7-29"><a href="#cb7-29" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">3</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> g<span class="op">;</span></span>
<span id="cb7-30"><a href="#cb7-30" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">3</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> b<span class="op">;</span></span>
<span id="cb7-31"><a href="#cb7-31" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-32"><a href="#cb7-32" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> horizontal_sync <span class="op">=</span> column <span class="op">&lt;</span> <span class="op">(</span><span class="ot">`HORIZONTAL_FRONTPORCH</span><span class="op">)</span> <span class="op">||</span> column <span class="op">&gt;=</span> <span class="op">(</span><span class="ot">`HORIZONTAL_BACKPORCH</span><span class="op">);</span></span>
<span id="cb7-33"><a href="#cb7-33" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> vertical_sync <span class="op">=</span> line <span class="op">&lt;</span> <span class="op">(</span><span class="ot">`VERTICAL_FRONTPORCH</span><span class="op">)</span> <span class="op">||</span> line <span class="op">&gt;=</span> <span class="op">(</span><span class="ot">`VERTICAL_BACKPORCH</span><span class="op">);</span></span>
<span id="cb7-34"><a href="#cb7-34" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> data_enable <span class="op">=</span> <span class="op">(</span>column <span class="op">&lt;</span> <span class="ot">`PIXELS_VISIBLE_PER_LINE</span><span class="op">)</span> <span class="op">&amp;&amp;</span> <span class="op">(</span>line <span class="op">&lt;</span> <span class="ot">`LINES_VISIBLE_PER_FRAME</span><span class="op">);</span></span>
<span id="cb7-35"><a href="#cb7-35" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-36"><a href="#cb7-36" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> reset <span class="op">=</span> <span class="op">~</span>BTN_N<span class="op">;</span></span>
<span id="cb7-37"><a href="#cb7-37" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> LEDR_N <span class="op">=</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb7-38"><a href="#cb7-38" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> LEDG_N <span class="op">=</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb7-39"><a href="#cb7-39" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-40"><a href="#cb7-40" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> r <span class="op">=</span> <span class="bn">4&#39;b1111</span><span class="op">;</span></span>
<span id="cb7-41"><a href="#cb7-41" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> g <span class="op">=</span> <span class="bn">4&#39;b0000</span><span class="op">;</span></span>
<span id="cb7-42"><a href="#cb7-42" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> b <span class="op">=</span> <span class="bn">4&#39;b0000</span><span class="op">;</span></span>
<span id="cb7-43"><a href="#cb7-43" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-44"><a href="#cb7-44" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> <span class="op">{</span>P1A1<span class="op">,</span> P1A2<span class="op">,</span> P1A3<span class="op">,</span> P1A4<span class="op">,</span> P1A7<span class="op">,</span> P1A8<span class="op">,</span> P1A9<span class="op">,</span> P1A10<span class="op">}</span> <span class="op">=</span> </span>
<span id="cb7-45"><a href="#cb7-45" aria-hidden="true" tabindex="-1"></a> <span class="op">{</span>r<span class="op">[</span><span class="dv">3</span><span class="op">],</span> r<span class="op">[</span><span class="dv">2</span><span class="op">],</span> g<span class="op">[</span><span class="dv">3</span><span class="op">],</span> g<span class="op">[</span><span class="dv">2</span><span class="op">],</span> r<span class="op">[</span><span class="dv">1</span><span class="op">],</span> r<span class="op">[</span><span class="dv">0</span><span class="op">],</span> g<span class="op">[</span><span class="dv">1</span><span class="op">],</span> g<span class="op">[</span><span class="dv">0</span><span class="op">]};</span></span>
<span id="cb7-46"><a href="#cb7-46" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> <span class="op">{</span>P1B1<span class="op">,</span> P1B2<span class="op">,</span> P1B3<span class="op">,</span> P1B4<span class="op">,</span> P1B7<span class="op">,</span> P1B8<span class="op">,</span> P1B9<span class="op">,</span> P1B10<span class="op">}</span> <span class="op">=</span> </span>
<span id="cb7-47"><a href="#cb7-47" aria-hidden="true" tabindex="-1"></a> <span class="op">{</span>b<span class="op">[</span><span class="dv">3</span><span class="op">],</span> pixel_clock<span class="op">,</span> b<span class="op">[</span><span class="dv">2</span><span class="op">],</span> horizontal_sync<span class="op">,</span> b<span class="op">[</span><span class="dv">1</span><span class="op">],</span> b<span class="op">[</span><span class="dv">0</span><span class="op">],</span> data_enable<span class="op">,</span> vertical_sync<span class="op">};</span></span>
<span id="cb7-48"><a href="#cb7-48" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-49"><a href="#cb7-49" aria-hidden="true" tabindex="-1"></a><span class="co">// Pixel and line counter</span></span>
<span id="cb7-50"><a href="#cb7-50" aria-hidden="true" tabindex="-1"></a><span class="kw">always</span> <span class="op">@(</span><span class="kw">posedge</span> pixel_clock <span class="dt">or</span> <span class="kw">posedge</span> reset<span class="op">)</span> <span class="kw">begin</span></span>
<span id="cb7-51"><a href="#cb7-51" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span><span class="op">(</span>reset <span class="op">==</span> <span class="dv">1</span><span class="op">)</span> <span class="kw">begin</span></span>
<span id="cb7-52"><a href="#cb7-52" aria-hidden="true" tabindex="-1"></a> line <span class="op">&lt;=</span> <span class="ot">`LINES_PER_FRAME</span> <span class="op">-</span> <span class="dv">2</span><span class="op">;</span></span>
<span id="cb7-53"><a href="#cb7-53" aria-hidden="true" tabindex="-1"></a> column <span class="op">&lt;=</span> <span class="ot">`PIXELS_PER_LINE</span> <span class="op">-</span> <span class="dv">16</span><span class="op">;</span></span>
<span id="cb7-54"><a href="#cb7-54" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span>
<span id="cb7-55"><a href="#cb7-55" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> <span class="kw">begin</span></span>
<span id="cb7-56"><a href="#cb7-56" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span><span class="op">(</span>column <span class="op">==</span> <span class="op">(</span><span class="ot">`PIXELS_PER_LINE</span> <span class="op">-</span> <span class="dv">1</span><span class="op">)</span> <span class="op">&amp;&amp;</span> line <span class="op">==</span> <span class="op">(</span><span class="ot">`LINES_PER_FRAME</span> <span class="op">-</span> <span class="dv">1</span><span class="op">))</span> <span class="kw">begin</span></span>
<span id="cb7-57"><a href="#cb7-57" aria-hidden="true" tabindex="-1"></a> line <span class="op">&lt;=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb7-58"><a href="#cb7-58" aria-hidden="true" tabindex="-1"></a> column <span class="op">&lt;=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb7-59"><a href="#cb7-59" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span>
<span id="cb7-60"><a href="#cb7-60" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> <span class="kw">if</span><span class="op">(</span>column <span class="op">==</span> <span class="ot">`PIXELS_PER_LINE</span> <span class="op">-</span> <span class="dv">1</span><span class="op">)</span> <span class="kw">begin</span></span>
<span id="cb7-61"><a href="#cb7-61" aria-hidden="true" tabindex="-1"></a> line <span class="op">&lt;=</span> line <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb7-62"><a href="#cb7-62" aria-hidden="true" tabindex="-1"></a> column <span class="op">&lt;=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb7-63"><a href="#cb7-63" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span>
<span id="cb7-64"><a href="#cb7-64" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> <span class="kw">begin</span></span>
<span id="cb7-65"><a href="#cb7-65" aria-hidden="true" tabindex="-1"></a> column <span class="op">&lt;=</span> column <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb7-66"><a href="#cb7-66" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span>
<span id="cb7-67"><a href="#cb7-67" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span>
<span id="cb7-68"><a href="#cb7-68" aria-hidden="true" tabindex="-1"></a><span class="kw">end</span></span>
<span id="cb7-69"><a href="#cb7-69" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-70"><a href="#cb7-70" aria-hidden="true" tabindex="-1"></a>SB_PLL40_PAD #<span class="op">(</span></span>
<span id="cb7-71"><a href="#cb7-71" aria-hidden="true" tabindex="-1"></a> .DIVR<span class="op">(</span><span class="bn">4&#39;b0000</span><span class="op">),</span></span>
<span id="cb7-72"><a href="#cb7-72" aria-hidden="true" tabindex="-1"></a> .DIVF<span class="op">(</span><span class="bn">7&#39;b1000010</span><span class="op">),</span></span>
<span id="cb7-73"><a href="#cb7-73" aria-hidden="true" tabindex="-1"></a> .DIVQ<span class="op">(</span><span class="bn">3&#39;b101</span><span class="op">),</span></span>
<span id="cb7-74"><a href="#cb7-74" aria-hidden="true" tabindex="-1"></a> .FILTER_RANGE<span class="op">(</span><span class="bn">3&#39;b001</span><span class="op">),</span></span>
<span id="cb7-75"><a href="#cb7-75" aria-hidden="true" tabindex="-1"></a> .FEEDBACK_PATH<span class="op">(</span><span class="st">&quot;SIMPLE&quot;</span><span class="op">),</span></span>
<span id="cb7-76"><a href="#cb7-76" aria-hidden="true" tabindex="-1"></a> .DELAY_ADJUSTMENT_MODE_FEEDBACK<span class="op">(</span><span class="st">&quot;FIXED&quot;</span><span class="op">),</span></span>
<span id="cb7-77"><a href="#cb7-77" aria-hidden="true" tabindex="-1"></a> .FDA_FEEDBACK<span class="op">(</span><span class="bn">4&#39;b0000</span><span class="op">),</span></span>
<span id="cb7-78"><a href="#cb7-78" aria-hidden="true" tabindex="-1"></a> .DELAY_ADJUSTMENT_MODE_RELATIVE<span class="op">(</span><span class="st">&quot;FIXED&quot;</span><span class="op">),</span></span>
<span id="cb7-79"><a href="#cb7-79" aria-hidden="true" tabindex="-1"></a> .FDA_RELATIVE<span class="op">(</span><span class="bn">4&#39;b0000</span><span class="op">),</span></span>
<span id="cb7-80"><a href="#cb7-80" aria-hidden="true" tabindex="-1"></a> .SHIFTREG_DIV_MODE<span class="op">(</span><span class="bn">2&#39;b00</span><span class="op">),</span></span>
<span id="cb7-81"><a href="#cb7-81" aria-hidden="true" tabindex="-1"></a> .PLLOUT_SELECT<span class="op">(</span><span class="st">&quot;GENCLK&quot;</span><span class="op">),</span></span>
<span id="cb7-82"><a href="#cb7-82" aria-hidden="true" tabindex="-1"></a> .ENABLE_ICEGATE<span class="op">(</span><span class="bn">1&#39;b0</span><span class="op">)</span></span>
<span id="cb7-83"><a href="#cb7-83" aria-hidden="true" tabindex="-1"></a><span class="op">)</span> usb_pll_inst <span class="op">(</span></span>
<span id="cb7-84"><a href="#cb7-84" aria-hidden="true" tabindex="-1"></a> .PACKAGEPIN<span class="op">(</span>CLK<span class="op">),</span></span>
<span id="cb7-85"><a href="#cb7-85" aria-hidden="true" tabindex="-1"></a> .PLLOUTCORE<span class="op">(</span>pixel_clock<span class="op">),</span></span>
<span id="cb7-86"><a href="#cb7-86" aria-hidden="true" tabindex="-1"></a> .EXTFEEDBACK<span class="op">(),</span></span>
<span id="cb7-87"><a href="#cb7-87" aria-hidden="true" tabindex="-1"></a> .DYNAMICDELAY<span class="op">(),</span></span>
<span id="cb7-88"><a href="#cb7-88" aria-hidden="true" tabindex="-1"></a> .RESETB<span class="op">(</span><span class="bn">1&#39;b1</span><span class="op">),</span></span>
<span id="cb7-89"><a href="#cb7-89" aria-hidden="true" tabindex="-1"></a> .BYPASS<span class="op">(</span><span class="bn">1&#39;b0</span><span class="op">),</span></span>
<span id="cb7-90"><a href="#cb7-90" aria-hidden="true" tabindex="-1"></a> .LATCHINPUTVALUE<span class="op">(),</span></span>
<span id="cb7-91"><a href="#cb7-91" aria-hidden="true" tabindex="-1"></a><span class="op">);</span></span>
<span id="cb7-92"><a href="#cb7-92" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-93"><a href="#cb7-93" aria-hidden="true" tabindex="-1"></a><span class="kw">endmodule</span></span></code></pre></div>
<p>To build this, you will require a .pcf file describing the pin
mapping of the iCEBreaker board. I grabbed mine from the iCEBreaker
examples <a
href="https://raw.githubusercontent.com/icebreaker-fpga/icebreaker-examples/master/icebreaker.pcf">here</a>.
Grab that file and put it in the same folder as the file for the code
provided above. We can the run the following commands to generate a
binary to program onto the FPGA:</p>
<pre><code>yosys -ql out.log -p &#39;synth_ice40 -top top -json out.json&#39; top.sv
nextpnr-ice40 --up5k --json out.json --pcf icebreaker.pcf --asc out.asc
icetime -d up5k -mtr out.rpt out.asc
icepack out.asc out.bin</code></pre>
<p>This will generate an out.bin file that we will need to flash onto
the board. Make sure your iCEBreaker FPGA is connected via USB to your
computer and you can program it with the following commands.</p>
<pre><code>iceprog out.bin</code></pre>
<p>Now connect up a video cable (my DVI Pmod has an HDMI connector, but
it only carries the DVI video signal) to the board and monitor and you
should get results like this:</p>
<p><img
src="/assets/2020-04-07-generating-video/IMG_20200407_172119-1-1024x768.jpg" /></p>
<p>You can also see from the monitor settings menu that the video signal
was recognized as 640x480@60 Hz. Now the code presented in this post is
specific to the iCEBreaker board and the DVI Pmod, but the theory can be
applied to any FPGA and any connector that uses a video signal like
this. For example you could wire up a DAC with a resistor ladder to
generate a VGA signal. The logic for the timings here would be exactly
the same if you wanted a 640x480@60 Hz VGA signal.</p>
</description><pubDate>Tue, 07 Apr 2020 04:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/generating-video.html</guid></item><item><title>N64Brew GameJam 2021</title><link>https://fryzekconcepts.com/notes/n64brew-gamejam-2021.html</link><description><p>So this year, myself and two others decided to participate together
in the N64Brew homebrew GameJam, where we were supposed to build a
homebrew game that would run on a real Nintendo 64. The game jam took
place from October 8th until December 8th and was the second GameJam in
N64Brew history. Unfortunately, we never ended up finishing the game,
but we did build a really cool tech demo. Our project was called
“Bug Game”, and if you want to check it out you can find it <a
href="https://hazematman.itch.io/bug-game">here</a>. To play the game
you’ll need a flash cart to load it on a real Nintendo 64, or you can
use an accurate emulator such as <a
href="https://ares.dev/">ares</a> or <a
href="https://github.com/n64dev/cen64">cen64</a>. The reason an accurate
emulator is required is that we made use of this new open source 3D
microcode for N64 called “<a
href="https://github.com/snacchus/libdragon/tree/ugfx">ugfx</a>”,
created by the user Snacchus. This microcode is part of the Libdragon
project, which is trying to build a completely open source library and
toolchain to build N64 games, instead of relying on the official SDK
that has been leaked to the public through liquidation auctions of game
companies that have shut down over the years.</p>
<div class="gallery">
<p><img src="/assets/2021-12-10-n64brew-gamejam-2021/bug_1.png" /> <img
src="/assets/2021-12-10-n64brew-gamejam-2021/bug_2.png" /> <img
src="/assets/2021-12-10-n64brew-gamejam-2021/bug_4.png" /> <img
src="/assets/2021-12-10-n64brew-gamejam-2021/bug_5.png" /> <img
src="/assets/2021-12-10-n64brew-gamejam-2021/bug_3.png" /></p>
<p>Screenshots of Bug Game</p>
</div>
<h2 id="libdragon-and-ugfx">Libdragon and UGFX</h2>
<p>Ugfx was a brand new development in the N64 homebrew scene. By
complete coincidence, Snacchus happened to release it on September 21st,
just weeks before the GameJam was announced. There have been many
attempts to create an open source 3D microcode for the N64 (my <a
href="https://github.com/Hazematman/libhfx">libhfx</a> project
included), but ugfx was the first project to complete with easily usable
documentation and examples. This was an exciting development for the
open source N64 brew community, as for the first time we could build 3D
games that ran on the N64 without using the legally questionable
official SDK. I jumped at the opportunity to use this and be one of the
first fully 3D games running on Libdragon.</p>
<p>One of the “drawbacks” of ufgx was that it tried to follow a lot of
the design decisions the official 3D microcode for Nintendo used. This
made it easier for people familiar with the official SDK to jump ship
over to libdragon, but also went against the philosophy of the libdragon
project to provide simple easy to use APIs. The Nintendo 64 was
notoriously difficult to develop for, and one of the reasons for that
was because of the extremely low level interface that the official 3D
microcodes provided. Honestly writing 3D graphics code on the N64
reminds me more of writing a 3D OpenGL graphics driver (like I do in my
day job), than building a graphics application. Unnecessarily increasing
the burden of entry to developing 3D games on the Nintendo 64. Now that
ugfx has been released, there is an ongoing effort in the community to
revamp it and build a more user friendly API to access the 3D
functionality of the N64.</p>
<h2 id="ease-of-development">Ease of development</h2>
<p>One of the major selling points of libdragon is that it tries to
provide a standard toolchain with access to things like the c standard
library as well as the c++ standard library. To save time on the
development of bug game, I decided to put that claim to test. When
building a 3D game from scratch two things that can be extremely time
consuming are implementing linear algebra operations, and implementing
physics that work in 3D. Luckily for modern developers, there are many
open source libraries you can use instead of building these from
scratch, like <a href="https://glm.g-truc.net/0.9.9/">GLM</a> for math
operations and <a
href="https://github.com/bulletphysics/bullet3">Bullet</a> for physics.
I don’t believe anyone has tried to do this before, but knowing that
libdragon provides a pretty standard c++ development environment I tried
to build GLM and Bullet to run on the Nintendo 64 and I was successful!
Both GLM and Bullet were able to run on real N64 hardware. This saved
time during development as we were no longer concerned with having to
build our own physics or math libraries. There were some tricks I needed
to do to get bullet running on the hardware.</p>
<p>First bullet will allocate more memory for its internal pools than is
available on the N64. This is an easy fix as you can adjust the heap
sizes when you go to initialize Bullet using the below code:</p>
<div class="sourceCode" id="cb1"><pre
class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>btDefaultCollisionConstructionInfo constructionInfo <span class="op">=</span> btDefaultCollisionConstructionInfo<span class="op">();</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>constructionInfo<span class="op">.</span><span class="va">m_defaultMaxCollisionAlgorithmPoolSize</span> <span class="op">=</span> <span class="dv">512</span><span class="op">;</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>constructionInfo<span class="op">.</span><span class="va">m_defaultMaxPersistentManifoldPoolSize</span> <span class="op">=</span> <span class="dv">512</span><span class="op">;</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>btDefaultCollisionConfiguration<span class="op">*</span> collisionConfiguration <span class="op">=</span> <span class="kw">new</span> btDefaultCollisionConfiguration<span class="op">(</span>constructionInfo<span class="op">);</span></span></code></pre></div>
<p>This lets you modify the memory pools and specify a size in KB for
the pools to use. The above code will limit the internal pools to 1MB,
allowing us to easily run within the 4MB of RAM that is available on the
N64 without the expansion pak (an accessory to the N64 that increases
the available RAM to 8MB).</p>
<p>The second issue I ran into with bullet was that the N64 floating
point unit does not implement de-normalized floating point numbers. Now
I’m not an expert in floating point numbers, but from my understanding,
de-normalized numbers are a way to represent values between the smallest
normal floating point number and zero. This allows floating point
calculations to slowly fall towards zero in a more accurate way instead
of rounding directly to zero. Since the N64 CPU does not implement
de-normalized floats, if any calculations would have generated
de-normalized float on the N64 they would instead cause a floating point
exception. Because of the way the physics engine works, when two objects
got very close together this would cause de-normalized floats to be
generated and crash the FPU. This was a problem that had me stumped for
a bit, I was concerned I would have to go into bullet’s source code and
modify and calculations to round to zero if the result would be small
enough. This would have been a monumental effort! Thankfully after
digging through the NEC VR4300 programmer’s manual I was able to
discover that there is a mode you can set the FPU to, which forces
rounding towards zero if a de-normalized float would be generated. I
enabled this mode and tested it out, and all my floating point troubles
were resolved! I submitted a <a
href="https://github.com/DragonMinded/libdragon/pull/195">pull
request</a> (that was accepted) to the libdragon project to have this
implemented by default, so no one else will run into the same annoying
problems I ran into.</p>
<h2 id="whats-next">What’s next?</h2>
<p>If you decided to play our game you probably would have noticed that
it’s not very much of a game. Even though this is the case I’m very
happy with how the project turned out, as it’s one of the first 3D
libdragon projects to be released. It also easily makes use of amazing
open technologies like bullet physics, showcasing just how easy
libdragon is to integrate with modern tools and libraries. As I
mentioned before in this post there is an effort to take Snacchus’s work
and build an easier to use graphics API that feels more like building
graphics applications and less like building a graphics driver. The
effort for that has already started and I plan to contribute to it. Some
of the cool features this effort is bringing are:</p>
<ul>
<li>A standard interface for display lists and microcode overlays.
Easily allowing multiple different microcodes to seamless run on the RSP
and swap out with display list commands. This will be valuable for using
the RSP for audio and graphics at the same time.</li>
<li>A new 3D microcode that takes some lessons learned from ugfx to
build a more powerful and easier to use interface.</li>
</ul>
<p>Overall this is an exciting time for Nintendo 64 homebrew
development! It’s easier than ever to build homebrew on the N64 without
knowing about the arcane innards of the console. I hope that this
continued development of libdragon will bring more people to the scene
and allow us to see new and novel games running on the N64. One project
I would be excited to start working on is using the serial port on
modern N64 Flashcarts for networking, allowing the N64 to have online
multiplayer through a computer connected over USB. I feel that projects
like this could really elevate the kind of content that is available on
the N64 and bring it into the modern era.</p>
</description><pubDate>Fri, 10 Dec 2021 05:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/n64brew-gamejam-2021.html</guid></item><item><title>Rasterizing Triangles</title><link>https://fryzekconcepts.com/notes/rasterizing-triangles.html</link><description><p>Lately I’ve been trying to implement a software renderer <a
href="https://www.cs.drexel.edu/~david/Classes/Papers/comp175-06-pineda.pdf">following
the algorithm described by Juan Pineda in “A Parallel Algorithm for
Polygon Rasterization”</a>. For those unfamiliar with the paper, it
describes an algorithm to rasterize triangles that has an extremely nice
quality, that you simply need to preform a few additions per pixel to
see if the next pixel is inside the triangle. It achieves this quality
by defining an edge function that has the following property:</p>
<pre><code>E(x+1,y) = E(x,y) + dY
E(x,y+1) = E(x,y) - dX</code></pre>
<p>This property is extremely nice for a rasterizer as additions are
quite cheap to preform and with this method we limit the amount of work
we have to do per pixel. One frustrating quality of this paper is that
it suggest that you can calculate more properties than just if a pixel
is inside the triangle with simple addition, but provides no explanation
for how to do that. In this blog I would like to explore how you
implement a Pineda style rasterizer that can calculate per pixel values
using simple addition.</p>
<figure>
<img
src="/assets/2022-04-03-rasterizing-triangles/Screenshot-from-2022-04-03-13-43-13.png"
alt="Triangle rasterized using code in this post" />
<figcaption aria-hidden="true">Triangle rasterized using code in this
post</figcaption>
</figure>
<p>In order to figure out how build this rasterizer <a
href="https://www.reddit.com/r/GraphicsProgramming/comments/tqxxmu/interpolating_values_in_a_pineda_style_rasterizer/">I
reached out to the internet</a> to help build some more intuition on how
the properties of this rasterizer. From this reddit post I gained more
intuition on how we can use the edge function values to linear
interpolate values on the triangle. Here is there relevant comment that
gave me all the information I needed</p>
<blockquote>
<p>Think about the edge function’s key property:</p>
<p><em>recognize that the formula given for E(x,y) is the same as the
formula for the magnitude of the cross product between the vector from
(X,Y) to (X+dX, Y+dY), and the vector from (X,Y) to (x,y). By the well
known property of cross products, the magnitude is zero if the vectors
are colinear, and changes sign as the vectors cross from one side to the
other.</em></p>
<p>The magnitude of the edge distance is the area of the parallelogram
formed by <code>(X,Y)-&gt;(X+dX,Y+dY)</code> and
<code>(X,Y)-&gt;(x,y)</code>. If you normalize by the parallelogram area
at the <em>other</em> point in the triangle you get a barycentric
coordinate that’s 0 along the <code>(X,Y)-&gt;(X+dX,Y+dY)</code> edge
and 1 at the other point. You can precompute each interpolated triangle
parameter normalized by this area at setup time, and in fact most
hardware computes per-pixel step values (pre 1/w correction) so that all
the parameters are computed as a simple addition as you walk along each
raster.</p>
<p>Note that when you’re implementing all of this it’s critical to keep
all the math in the integer domain (snapping coordinates to some integer
sub-pixel precision, I’d recommend at least 4 bits) and using a
tie-breaking function (typically top-left) for pixels exactly on the
edge to avoid pixel double-hits or gaps in adjacent triangles.</p>
<p>https://www.reddit.com/r/GraphicsProgramming/comments/tqxxmu/interpolating_values_in_a_pineda_style_rasterizer/i2krwxj/</p>
</blockquote>
<p>From this comment you can see that it is trivial to calculate to
calculate the barycentric coordinates of the triangle from the edge
function. You simply need to divide the the calculated edge function
value by the area of parallelogram. Now what is the area of triangle?
Well this is where some <a
href="https://www.scratchapixel.com/lessons/3d-basic-rendering/ray-tracing-rendering-a-triangle/barycentric-coordinates">more
research</a> online helped. If the edge function defines the area of a
parallelogram (2 times the area of the triangle) of
<code>(X,Y)-&gt;(X+dX,Y+dY)</code> and <code>(X,Y)-&gt;(x,y)</code>, and
we calculate three edge function values (one for each edge), then we
have 2 times the area of each of the sub triangles that are defined by
our point.</p>
<figure>
<img
src="https://www.scratchapixel.com/images/ray-triangle/barycentric.png?"
alt="Triangle barycentric coordinates from scratchpixel tutorial" />
<figcaption aria-hidden="true">Triangle barycentric coordinates from
scratchpixel tutorial</figcaption>
</figure>
<p>From this its trivial to see that we can calculate 2 times the area
of the triangle just by adding up all the individual areas of the sub
triangles (I used triangles here, but really we are adding the area of
sub parallelograms to get the area of the whole parallelogram that has 2
times the area of the triangle we are drawing), that is adding the value
of all the edge functions together. From this we can see to linear
interpolate any value on the triangle we can use the following
equation</p>
<pre><code>Value(x,y) = (e0*v0 + e1*v1 + e2*v2) / (e0 + e1 + e2)
Value(x,y) = (e0*v0 + e1*v1 + e2*v2) / area</code></pre>
<p>Where <code>e0, e1, e2</code> are the edge function values and
<code>v0, v1, v2</code> are the per vertex values we want to
interpolate.</p>
<p>This is great for the calculating the per vertex values, but we still
haven’t achieved the property of calculating the interpolate value per
pixel with simple addition. To do that we need to use the property of
the edge function I described above</p>
<pre><code>Value(x+1, y) = (E0(x+1, y)*v0 + E1(x+1, y)*v1 + E2(x+1, y)*v2) / area
Value(x+1, y) = ((e0+dY0)*v0 + (e1+dY1)*v1 + (e2+dY2)*v2) / area
Value(x+1, y) = (e0*v0 + dY0*v0 + e1*v1+dY1*v1 + e2*v2 + dY2*v2) / area
Value(x+1, y) = (e0*v0 + e1*v1 + e2*v2)/area + (dY0*v0 + dY1*v1 + dY2*v2)/area
Value(x+1, y) = Value(x,y) + (dY0*v0 + dY1*v1 + dY2*v2)/area</code></pre>
<p>From here we can see that if we work through all the math, we can
find this same property where the interpolated value is equal to the
previous interpolated value plus some number. Therefore if we
pre-compute this addition value, when we iterate over the pixels we only
need to add this pre-computed number to the interpolated value of the
previous pixel. We can repeat this process again to figure out the
equation of the pre-computed value for <code>Value(x, y+1)</code> but
I’ll save you the time and provide both equations below</p>
<pre><code>dYV = (dY0*v0 + dY1*v1 + dY2*v2)/area
dXV = (dX0*v0 + dX1*v1 + dX2*v2)/area
Value(x+1, y) = Value(x,y) + dYV
Value(x, y+1) = Value(x,y) - dXV</code></pre>
<p>Where <code>dY0, dY1, dY2</code> are the differences between y
coordinates as described in Pineda’s paper, <code>dX0, dX1, dX2</code>
are the differences in x coordinates as described in Pineda’s paper, and
the area is the pre-calculated sum of the edge functions</p>
<p>Now you should be able to build a Pineda style rasterizer that can
calculate per pixel interpolated values using simple addition, by
following pseudo code like this:</p>
<pre><code>func edge(x, y, xi, yi, dXi, dYi)
return (x - xi)*dYi - (y-yi)*dXi
func draw_triangle(x0, y0, x1, y1, x2, y2, v0, v1, v2):
dX0 = x0 - x2
dX1 = x1 - x0
dX2 = x2 - x1
dY0 = y0 - y2
dY1 = y1 - y0
dY2 = y2 - y1
start_x = 0
start_y = 0
e0 = edge(start_x, start_y, x0, y0, dX0, dY0)
e1 = edge(start_x, start_y, x1, y1, dX1, dY1)
e2 = edge(start_x, start_y, x2, y2, dX2, dY2)
area = e0 + e1 + e2
dYV = (dY0*v0 + dY1*v1 + dY2*v2) / area
dXV = (dX0*v0 + dX1*v1 + dX2*v2) / area
v = (e0*v0 + e1*v1 + e2*v2) / area
starting_e0 = e0
starting_e1 = e1
starting_e2 = e2
starting_v = v
for y = 0 to screen_height:
for x = 0 to screen_width:
if(e0 &gt;= 0 &amp;&amp; e1 &gt;= 0 &amp;&amp; e2 &gt;= 0)
draw_pixel(x, y, v)
e0 = e0 + dY0
e1 = e1 + dY1
e2 = e2 + dY2
v = v + dYV
e0 = starting_e0 - dX0
e1 = starting_e1 - dX1
e2 = starting_e2 - dX2
v = starting_v - dXV
starting_e0 = e0
starting_e1 = e1
starting_e2 = e2
starting_v = v</code></pre>
<p>Now this pseudo code is not the most efficient as it will iterate
over the entire screen to draw one triangle, but it provides a starting
basis to show how to use these Pineda properties to calculate per vertex
values. One thing to note if you do implement this is, if you use fixed
point arithmetic, be careful to insure you have enough precision to
calculate all of these values with overflow or underflow. This was an
issue I ran into running out of precision when I did the divide by the
area.</p>
</description><pubDate>Sun, 03 Apr 2022 04:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/rasterizing-triangles.html</guid></item><item><title>Baremetal RISC-V</title><link>https://fryzekconcepts.com/notes/baremetal-risc-v.html</link><description><p>After re-watching suckerpinch’s <a
href="https://www.youtube.com/watch?v=ar9WRwCiSr0">“Reverse
Emulation”</a> video I got inspired to try and replicate what he did,
but instead do it on an N64. Now my idea here is not to preform reverse
emulation on the N64 itself but instead to use the SBC as a cheap way to
make a dev focused flash cart. Seeing that sukerpinch was able to meet
the timings of the NES bus made me think it might be possible to meet
the N64 bus timings taking an approach similar to his.</p>
<h2 id="why-risc-v-baremetal">Why RISC-V Baremetal?</h2>
<p>The answer here is more utilitarian then idealistic, I originally
wanted to use a Raspberry Pi since I thought that board may be more
accessible if other people want to try and replicate this project.
Instead what I found is that it is impossible to procure a Raspberry Pi.
Not to be deterred I purchased a <a
href="https://linux-sunxi.org/Allwinner_Nezha">“Allwinner Nezha”</a> a
while back and its just been collecting dust in my storage. I figured
this would be a good project to test the board out on since it has a
large amount of RAM (1GB on my board), a fast processor (1 GHz), and
accessible GPIO. As for why baremetal? Well one of the big problems
suckerpinch ran into was being interrupted by the Linux kernel while his
software was running. The board was fast enough to respond to the bus
timings but Linux would throw off those timings with preemption. This is
why I’m taking the approach to do everything baremetal. Giving 100% of
the CPU time to my program emulating the CPU bus.</p>
<h2 id="risc-v-baremetal-development">RISC-V Baremetal Development</h2>
<p>Below I’ll document how I got a baremetal program running on the
Nezha board, to provide guidance to anyone who wants to try doing
something like this themselves.</p>
<h3 id="toolchain-setup">Toolchain Setup</h3>
<p>In order to do any RISC-V development we will need to setup a RISC-V
toolchain that isn’t tied to a specific OS like linux. Thankfully the
RISC-V org set up a simple to use git repo that has a script to build an
entire RISC-V toolchain on your machine. Since you’re building the whole
toolchain from source this will take some time on my machine (Ryzen
4500u, 16GB of RAM, 1TB PCIe NVMe storage), it took around ~30 minutes
to build the whole tool chain. You can find the repo <a
href="https://github.com/riscv-collab/riscv-gnu-toolchain">here</a>, and
follow the instructions in the <code>Installation (Newlib)</code>
section of the README. That will setup a bare bones OS independent
toolchain that can use newlib for the cstdlib (not that I am currently
using it in my software).</p>
<h3 id="setting-up-a-program">Setting up a Program</h3>
<p>This is probably one of the more complicated steps in baremetal
programming as this will involve setting up a linker script, which can
sometimes feel like an act of black magic to get right. I’ll try to walk
through some linker script basics to show how I setup mine. The linker
script <code>linker.ld</code> I’m using is below</p>
<pre class="ld"><code>SECTIONS
{
. = 0x45000000;
.text : {
PROVIDE(__text_start = .);
*(.text.start)
*(.text*)
. = ALIGN(4096);
PROVIDE(__text_end = .);
}
.data : {
PROVIDE(__data_start = .);
. = ALIGN(16);
*(.rodata*);
*(.data .data.*)
PROVIDE(__data_end = .);
}
. += 1024;
PROVIDE(__stack_start = .);
. = ALIGN(16);
. += 4096;
PROVIDE(__stack_end = .);
/DISCARD/ :
{
*(.riscv.attributes);
*(.comment);
}
}</code></pre>
<p>The purpose of a linkscript is to describe how our binary will be
organized, the script I wrote will do the follow</p>
<ol type="1">
<li>Start the starting address offset to <code>0x45000000</code>, This
is the address we are going to load the binary into memory, so any
pointers in the program will need to be offset from this address</li>
<li>start the binary off with the <code>.text</code> section which will
contain the executable code, in the text section we want the code for
<code>.text.start</code> to come first. this is the code that implements
the “C runtime”. That is this is the code with the <code>_start</code>
function that will setup the stack pointer and call into the C
<code>main</code> function. After that we will place the text for all
the other functions in our binary. We keep this section aligned to
<code>4096</code> bytes, and the <code>PROVIDE</code> functions creates
a symbol with a pointer to that location in memory. We won’t use the
text start and end pointers in our program but it can be useful if you
want to know stuff about your binary at runtime of your program</li>
<li>Next is the <code>.data</code> section that has all the data for our
program. Here you can see I also added the <code>rodata</code> or read
only section to the data section. The reason I did this is because I’m
not going to bother with properly implementing read only data. We also
keep the data aligned to 16 bytes to ensure that every memory access
will be aligned for a 64bit RISCV memory access.</li>
<li>The last “section” is not a real section but some extra padding at
the end to reserve the stack. Here I am reserving 4096 (4Kb) for the
stack of my program.</li>
<li>Lastly I’m going to discard a few sections that GCC will compile
into the binary that I don’t need at all.</li>
</ol>
<p>Now this probably isn’t the best way to write a linker script. For
example the stack is just kind of a hack in it, and I don’t implement
the <code>.bss</code> section for zero initialized data.</p>
<p>With this linker script we can now setup a basic program, we can use
the code presented below as the <code>main.c</code> file</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im">&lt;stdint.h&gt;</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="pp">#define UART0_BASE </span><span class="bn">0x02500000</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="pp">#define UART0_DATA_REG </span><span class="op">(</span>UART0_BASE<span class="pp"> </span><span class="op">+</span><span class="pp"> </span><span class="bn">0x0000</span><span class="op">)</span></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a><span class="pp">#define UART0_USR </span><span class="op">(</span>UART0_BASE<span class="pp"> </span><span class="op">+</span><span class="pp"> </span><span class="bn">0x007c</span><span class="op">)</span></span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="pp">#define write_reg</span><span class="op">(</span><span class="pp">r</span><span class="op">,</span><span class="pp"> v</span><span class="op">)</span><span class="pp"> write_reg_handler</span><span class="op">((</span><span class="dt">volatile</span><span class="pp"> </span><span class="dt">uint32_t</span><span class="op">*)(</span><span class="pp">r</span><span class="op">),</span><span class="pp"> </span><span class="op">(</span><span class="pp">v</span><span class="op">))</span></span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> write_reg_handler<span class="op">(</span><span class="dt">volatile</span> <span class="dt">uint32_t</span> <span class="op">*</span>reg<span class="op">,</span> <span class="dt">const</span> <span class="dt">uint32_t</span> value<span class="op">)</span></span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a><span class="op">{</span></span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a> reg<span class="op">[</span><span class="dv">0</span><span class="op">]</span> <span class="op">=</span> value<span class="op">;</span></span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span>
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a><span class="pp">#define read_reg</span><span class="op">(</span><span class="pp">r</span><span class="op">)</span><span class="pp"> read_reg_handler</span><span class="op">((</span><span class="dt">volatile</span><span class="pp"> </span><span class="dt">uint32_t</span><span class="op">*)(</span><span class="pp">r</span><span class="op">))</span></span>
<span id="cb2-14"><a href="#cb2-14" aria-hidden="true" tabindex="-1"></a><span class="dt">uint32_t</span> read_reg_handler<span class="op">(</span><span class="dt">volatile</span> <span class="dt">uint32_t</span> <span class="op">*</span>reg<span class="op">)</span></span>
<span id="cb2-15"><a href="#cb2-15" aria-hidden="true" tabindex="-1"></a><span class="op">{</span></span>
<span id="cb2-16"><a href="#cb2-16" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> reg<span class="op">[</span><span class="dv">0</span><span class="op">];</span></span>
<span id="cb2-17"><a href="#cb2-17" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span>
<span id="cb2-18"><a href="#cb2-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-19"><a href="#cb2-19" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> _putchar<span class="op">(</span><span class="dt">char</span> c<span class="op">)</span></span>
<span id="cb2-20"><a href="#cb2-20" aria-hidden="true" tabindex="-1"></a><span class="op">{</span></span>
<span id="cb2-21"><a href="#cb2-21" aria-hidden="true" tabindex="-1"></a> <span class="cf">while</span><span class="op">((</span>read_reg<span class="op">(</span>UART0_USR<span class="op">)</span> <span class="op">&amp;</span> <span class="bn">0b10</span><span class="op">)</span> <span class="op">==</span> <span class="dv">0</span><span class="op">)</span></span>
<span id="cb2-22"><a href="#cb2-22" aria-hidden="true" tabindex="-1"></a> <span class="op">{</span></span>
<span id="cb2-23"><a href="#cb2-23" aria-hidden="true" tabindex="-1"></a> asm<span class="op">(</span><span class="st">&quot;nop&quot;</span><span class="op">);</span></span>
<span id="cb2-24"><a href="#cb2-24" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb2-25"><a href="#cb2-25" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-26"><a href="#cb2-26" aria-hidden="true" tabindex="-1"></a> write_reg<span class="op">(</span>UART0_DATA_REG<span class="op">,</span> c<span class="op">);</span></span>
<span id="cb2-27"><a href="#cb2-27" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span>
<span id="cb2-28"><a href="#cb2-28" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-29"><a href="#cb2-29" aria-hidden="true" tabindex="-1"></a><span class="dt">const</span> <span class="dt">char</span> <span class="op">*</span>hello_world <span class="op">=</span> <span class="st">&quot;Hello World!</span><span class="sc">\r\n</span><span class="st">&quot;</span><span class="op">;</span></span>
<span id="cb2-30"><a href="#cb2-30" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-31"><a href="#cb2-31" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> main<span class="op">()</span></span>
<span id="cb2-32"><a href="#cb2-32" aria-hidden="true" tabindex="-1"></a><span class="op">{</span></span>
<span id="cb2-33"><a href="#cb2-33" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span><span class="op">(</span><span class="dt">const</span> <span class="dt">char</span> <span class="op">*</span>c <span class="op">=</span> hello_world<span class="op">;</span> c<span class="op">[</span><span class="dv">0</span><span class="op">]</span> <span class="op">!=</span> <span class="ch">&#39;</span><span class="sc">\0</span><span class="ch">&#39;</span><span class="op">;</span> c<span class="op">++)</span></span>
<span id="cb2-34"><a href="#cb2-34" aria-hidden="true" tabindex="-1"></a> <span class="op">{</span></span>
<span id="cb2-35"><a href="#cb2-35" aria-hidden="true" tabindex="-1"></a> _putchar<span class="op">(</span>c<span class="op">);</span></span>
<span id="cb2-36"><a href="#cb2-36" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb2-37"><a href="#cb2-37" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>This program will write the string “Hello World!” to the serial port.
Now a common question for code like this is how did I know to set all
the <code>UART0</code> registers? Well the way to find this information
is to look at the datasheet, programmer’s manual, or user manual for the
chip you are using. In this case we are using an Allwinner D1 and we can
find the user manual with all the registers on the linux-sunxi page <a
href="https://linux-sunxi.org/D1">here</a>. On pages 900 to 940 we can
see a description on how the serial works for this SoC. I also looked at
the schematic <a
href="https://dl.linux-sunxi.org/D1/D1_Nezha_development_board_schematic_diagram_20210224.pdf">here</a>,
to see that the serial port we have is wired to <code>UART0</code> on
the SoC. From here we are relying on uboot to boot the board which will
setup the serial port for us, which means we can just write to the UART
data register to start printing content to the console.</p>
<p>We will also need need to setup a basic assembly program to setup the
stack and call our main function. Below you can see my example called
<code>start.S</code></p>
<div class="sourceCode" id="cb3"><pre
class="sourceCode asm"><code class="sourceCode fasm"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>.<span class="bu">section</span> <span class="op">.</span>text<span class="op">.</span>start</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> .global _start</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="fu">_start:</span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> la <span class="kw">sp</span><span class="op">,</span> __stack_start</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> j main</span></code></pre></div>
<p>This assembly file just creates a section called
<code>.text.start</code> and a global symbol for a function called
<code>_start</code> which will be the first function our program
executes. All this assembly file does is setup the stack pointer
register <code>sp</code> to with the address (using the load address
<code>la</code> pseudo instruction) to the stack we setup in the linker
script, and then call the main function by jumping directly to it.</p>
<h3 id="building-the-program">Building the Program</h3>
<p>Building the program is pretty straight forward, we need to tell gcc
to build the two source files without including the c standard library,
and then to link the binary using our linker script. we can do this with
the following command</p>
<pre><code>riscv64-unknown-elf-gcc march=rv64g --std=gnu99 -msmall-data-limit=0 -c main.c
riscv64-unknown-elf-gcc march=rv64g --std=gnu99 -msmall-data-limit=0 -c start.S
riscv64-unknown-elf-gcc march=rv64g -march=rv64g -ffreestanding -nostdlib -msmall-data-limit=0 -T linker.ld start.o main.o -o app.elf
riscv64-unknown-elf-objcopy -O binary app.elf app.bin</code></pre>
<p>This will build our source files into <code>.o</code> files first,
then combine those <code>.o</code> files into a <code>.elf</code> file,
finally converting the <code>.elf</code> into a raw binary file where we
use the <code>.bin</code> extension. We need a raw binary file as we
want to just load our program into memory and begin executing. If we
load the <code>.elf</code> file it will have the elf header and other
extra data that is not executable in it. In order to run a
<code>.elf</code> file we would need an elf loader, which goes beyond
the scope of this example.</p>
<h3 id="running-the-program">Running the Program</h3>
<p>Now we have the raw binary its time to try and load it. I found that
the uboot configuration that comes with the board has pretty limited
support for loading binaries. So we are going to take advantage of the
<code>loadx</code> command to load the binary over serial. In the uboot
terminal we are going to run the command:</p>
<pre><code>loadx 45000000</code></pre>
<p>Now the next steps will depend on which serial terminal you are
using. We want to use the <code>XMODEM</code> protocol to load the
binary. In the serial terminal I am using <code>gnu screen</code> you
can execute arbitrary programs and send their output to the serial
terminal. You can do this by hitting the key combination “CTRL-A + :”
and then typing in <code>exec !! sx app.bin</code>. This will send the
binary to the serial terminal using the XMODEM protocol. If you are not
using GNU screen look up instructions for how to send an XMODEM binary.
Now that the binary is loaded we can type in</p>
<pre><code>go 45000000</code></pre>
<p>The should start to execute the program and you should see
<code>Hello World!</code> printed to the console!</p>
<p><img
src="/assets/2022-06-09-baremetal-risc-v/riscv-terminal.png" /></p>
<h2 id="whats-next">What’s Next?</h2>
<p>Well the sky is the limit! We have a method to load and run a program
that can do anything on the Nezha board now. Looking through the
datasheet we can see how to access the GPIO on the board to blink an
LED. If you’re really ambitious you could try getting ethernet or USB
working in a baremetal environment. I am going to continue on my goal of
emulating the N64 cartridge bus which will require me to get GPIO
working as well as interrupts on the GPIO lines. If you want to see the
current progress of my work you can check it out on github <a
href="https://github.com/Hazematman/N64-Cart-Emulator">here</a>.</p>
</description><pubDate>Thu, 09 Jun 2022 04:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/baremetal-risc-v.html</guid></item><item><title>Digital Garden</title><link>https://fryzekconcepts.com/notes/digital_garden.html</link><description><p>After reading Maggie Appleton page on <a
href="https://maggieappleton.com/garden-history">digital gardens</a> I
was inspired to convert my own website into a digital garden.</p>
<p>I have many half baked ideas that I seem to be able to finish. Some
of them get to a published state like <a
href="/notes/rasterizing-triangles.html">Rasterizing Triangles</a> and
<a href="/notes/baremetal-risc-v.html">Baremetal RISC-V</a>, but many of
them never make it to the published state. The idea of digital garden
seems very appealing to me, as it encourages you to post on a topic even
if you haven’t made it “publishable” yet.</p>
<h2 id="how-this-site-works">How this site works</h2>
<p>I wanted a bit of challenge when putting together this website as I
don’t do a lot of web development in my day to day life, so I thought it
would be a good way to learn more things. This site has been entirely
built from scratch using a custom static site generator I setup with
pandoc. It relies on pandoc’s filters to implement some of the classic
“Digital Garden” features like back linking. The back linking feature
has not been totally developed yet and right now it just provides with a
convenient way to link to other notes or pages on this site.</p>
<p>I hope to develop this section more and explain how I got various
features in pandoc to work as a static site generator.</p>
</description><pubDate>Sun, 30 Oct 2022 04:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/digital_garden.html</guid></item><item><title>2022 Graphics Team Contributions at Igalia</title><link>https://fryzekconcepts.com/notes/2022_igalia_graphics_team.html</link><description><p>This year I started a new job working with <a
href="https://www.igalia.com/technology/graphics">Igalia’s Graphics
Team</a>. For those of you who don’t know <a
href="https://www.igalia.com/">Igalia</a> they are a <a
href="https://en.wikipedia.org/wiki/Igalia">“worker-owned, employee-run
cooperative model consultancy focused on open source software”</a>.</p>
<p>As a new member of the team, I thought it would be a great idea to
summarize the incredible amount of work the team completed in 2022. If
you’re interested keep reading!</p>
<h2 id="vulkan-1.2-conformance-on-rpi-4">Vulkan 1.2 Conformance on RPi
4</h2>
<p>One of the big milestones for the team in 2022 was <a
href="https://www.khronos.org/conformance/adopters/conformant-products#submission_694">achieving
Vulkan 1.2 conformance on the Raspberry Pi 4</a>. The folks over at the
Raspberry Pi company wrote a nice <a
href="https://www.raspberrypi.com/news/vulkan-update-version-1-2-conformance-for-raspberry-pi-4/">article</a>
about the achievement. Igalia has been partnering with the Raspberry Pi
company to bring build and improve the graphics driver on all versions
of the Raspberry Pi.</p>
<p>The Vulkan 1.2 spec ratification came with a few <a
href="https://registry.khronos.org/vulkan/specs/1.2-extensions/html/vkspec.html#versions-1.2">extensions</a>
that were promoted to Core. This means a conformant Vulkan 1.2 driver
needs to implement those extensions. Alejandro Piñeiro wrote this
interesting <a
href="https://blogs.igalia.com/apinheiro/2022/05/v3dv-status-update-2022-05-16/">blog
post</a> that talks about some of those extensions.</p>
<p>Vulkan 1.2 also came with a number of optional extensions such as
<code>VK_KHR_pipeline_executable_properties</code>. My colleague Iago
Toral wrote an excellent <a
href="https://blogs.igalia.com/itoral/2022/05/09/vk_khr_pipeline_executables/">blog
post</a> on how we implemented that extension on the Raspberry Pi 4 and
what benefits it provides for debugging.</p>
<h2 id="vulkan-1.3-support-on-turnip">Vulkan 1.3 support on Turnip</h2>
<p>Igalia has been heavily supporting the Open-Source Turnip Vulkan
driver for Qualcomm Adreno GPUs, and in 2022 we helped it achieve Vulkan
1.3 conformance. Danylo Piliaiev on the graphics team here at Igalia,
wrote a great <a
href="https://blogs.igalia.com/dpiliaiev/turnip-vulkan-1-3/">blog
post</a> on this achievement! One of the biggest challenges for the
Turnip driver is that it is a completely reverse-engineered driver that
has been built without access to any hardware documentation or reference
driver code.</p>
<p>With Vulkan 1.3 conformance has also come the ability to run more
commercial games on Adreno GPUs through the use of the DirectX
translation layers. If you would like to see more of this check out this
<a
href="https://blogs.igalia.com/dpiliaiev/turnip-july-2022-update/">post</a>
from Danylo where he talks about getting “The Witcher 3”, “The Talos
Principle”, and “OMD2” running on the A660 GPU. Outside of Vulkan 1.3
support he also talks about some of the extensions that were implemented
to allow “Zink” (the OpenGL over Vulkan driver) to run Turnip, and bring
OpenGL 4.6 support to Adreno GPUs.</p>
<p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/oVFWy25uiXA"></iframe></div></p>
<h2 id="vulkan-extensions">Vulkan Extensions</h2>
<p>Several developers on the Graphics Team made several key
contributions to Vulkan Extensions and the Vulkan conformance test suite
(CTS). My colleague Ricardo Garcia made an excellent <a
href="https://rg3.name/202212122137.html">blog post</a> about those
contributions. Below I’ve listed what Igalia did for each of the
extensions:</p>
<ul>
<li>VK_EXT_image_2d_view_of_3d
<ul>
<li>We reviewed the spec and are listed as contributors to this
extension</li>
</ul></li>
<li>VK_EXT_shader_module_identifier
<ul>
<li>We reviewed the spec, contributed to it, and created tests for this
extension</li>
</ul></li>
<li>VK_EXT_attachment_feedback_loop_layout
<ul>
<li>We reviewed, created tests and contributed to this extension</li>
</ul></li>
<li>VK_EXT_mesh_shader
<ul>
<li>We contributed to the spec and created tests for this extension</li>
</ul></li>
<li>VK_EXT_mutable_descriptor_type
<ul>
<li>We reviewed the spec and created tests for this extension</li>
</ul></li>
<li>VK_EXT_extended_dynamic_state3
<ul>
<li>We wrote tests and reviewed the spec for this extension</li>
</ul></li>
</ul>
<h2 id="amdgpu-kernel-driver-contributions">AMDGPU kernel driver
contributions</h2>
<p>Our resident “Not an AMD expert” Melissa Wen made several
contributions to the AMDGPU driver. Those contributions include
connecting parts of the <a
href="https://lore.kernel.org/amd-gfx/20220329201835.2393141-1-mwen@igalia.com/">pixel
blending and post blending code in AMD’s <code>DC</code> module to
<code>DRM</code></a> and <a
href="https://lore.kernel.org/amd-gfx/20220804161349.3561177-1-mwen@igalia.com/">fixing
a bug related to how panel orientation is set when a display is
connected</a>. She also had a <a
href="https://indico.freedesktop.org/event/2/contributions/50/">presentation
at XDC 2022</a>, where she talks about techniques you can use to
understand and debug AMDGPU, even when there aren’t hardware docs
available.</p>
<p>André Almeida also completed and submitted work on <a
href="https://lore.kernel.org/dri-devel/20220714191745.45512-1-andrealmeid@igalia.com/">enabled
logging features for the new GFXOFF hardware feature in AMD GPUs</a>. He
also created a userspace application (which you can find <a
href="https://gitlab.freedesktop.org/andrealmeid/gfxoff_tool">here</a>),
that lets you interact with this feature through the
<code>debugfs</code> interface. Additionally, he submitted a <a
href="https://lore.kernel.org/dri-devel/20220929184307.258331-1-contact@emersion.fr/">patch</a>
for async page flips (which he also talked about in his <a
href="https://indico.freedesktop.org/event/2/contributions/61/">XDC 2022
presentation</a>) which is still yet to be merged.</p>
<h2 id="modesetting-without-glamor-on-rpi">Modesetting without Glamor on
RPi</h2>
<p>Christopher Michael joined the Graphics Team in 2022 and along with
Chema Casanova made some key contributions to enabling hardware
acceleration and mode setting on the Raspberry Pi without the use of <a
href="https://www.freedesktop.org/wiki/Software/Glamor/">Glamor</a>
which allows making more video memory available to graphics applications
running on a Raspberry Pi.</p>
<p>The older generation Raspberry Pis (1-3) only have a maximum of 256MB
of memory available for video memory, and using Glamor will consume part
of that video memory. Christopher wrote an excellent <a
href="https://blogs.igalia.com/cmichael/2022/05/30/modesetting-a-glamor-less-rpi-adventure/">blog
post</a> on this work. Both him and Chema also had a joint presentation
at XDC 2022 going into more detail on this work.</p>
<h2 id="linux-format-magazine-column">Linux Format Magazine Column</h2>
<p>Our very own Samuel Iglesias had a column published in Linux Format
Magazine. It’s a short column about reaching Vulkan 1.1 conformance for
v3dv &amp; Turnip Vulkan drivers, and how Open-Source GPU drivers can go
from a “hobby project” to the defacto driver for the platform. Check it
out on page 7 of <a
href="https://linuxformat.com/linux-format-288.html">issue #288</a>!</p>
<h2 id="xdc-2022">XDC 2022</h2>
<p>X.Org Developers Conference is one of the big conferences for us here
at the Graphics Team. Last year at XDC 2022 our Team presented 5 talks
in Minneapolis, Minnesota. XDC 2022 took place towards the end of the
year in October, so it provides some good context on how the team closed
out the year. If you didn’t attend or missed their presentation, here’s
a breakdown:</p>
<h3
id="replacing-the-geometry-pipeline-with-mesh-shaders-ricardo-garcía"><a
href="https://indico.freedesktop.org/event/2/contributions/48/">“Replacing
the geometry pipeline with mesh shaders”</a> (Ricardo García)</h3>
<p>Ricardo presents what exactly mesh shaders are in Vulkan. He made
many contributions to this extension including writing 1000s of CTS
tests for this extension with a <a
href="https://rg3.name/202210222107.html">blog post</a> on his
presentation that should check out!</p>
<p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/aRNJ4xj_nDs"></iframe></div></p>
<h3 id="status-of-vulkan-on-raspberry-pi-iago-toral"><a
href="https://indico.freedesktop.org/event/2/contributions/68/">“Status
of Vulkan on Raspberry Pi”</a> (Iago Toral)</h3>
<p>Iago goes into detail about the current status of the Raspberry Pi
Vulkan driver. He talks about achieving Vulkan 1.2 conformance, as well
as some of the challenges the team had to solve due to hardware
limitations of the Broadcom GPU.</p>
<p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/GM9IojyzCVM"></iframe></div></p>
<h3
id="enable-hardware-acceleration-for-gl-applications-without-glamor-on-xorg-modesetting-driver-jose-maría-casanova-christopher-michael"><a
href="https://indico.freedesktop.org/event/2/contributions/60/">“Enable
hardware acceleration for GL applications without Glamor on Xorg
modesetting driver”</a> (Jose María Casanova, Christopher Michael)</h3>
<p>Chema and Christopher talk about the challenges they had to solve to
enable hardware acceleration on the Raspberry Pi without Glamor.</p>
<p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/Bo_MOM7JTeQ"></iframe></div></p>
<h3 id="im-not-an-amd-expert-but-melissa-wen"><a
href="https://indico.freedesktop.org/event/2/contributions/50/">“I’m not
an AMD expert, but…”</a> (Melissa Wen)</h3>
<p>In this non-technical presentation, Melissa talks about techniques
developers can use to understand and debug drivers without access to
hardware documentation.</p>
<p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/CMm-yhsMB7U"></iframe></div></p>
<h3 id="async-page-flip-in-atomic-api-andré-almeida"><a
href="https://indico.freedesktop.org/event/2/contributions/61/">“Async
page flip in atomic API”</a> (André Almeida)</h3>
<p>André talks about the work that has been done to enable asynchronous
page flipping in DRM’s atomic API with an introduction to the topic by
explaining about what exactly is asynchronous page flip, and why you
would want it.</p>
<p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/qayPPIfrqtE"></iframe></div></p>
<h2 id="fosdem-2022">FOSDEM 2022</h2>
<p>Another important conference for us is FOSDEM, and last year we
presented 3 of the 5 talks in the graphics dev room. FOSDEM took place
in early February 2022, these talks provide some good context of where
the team started in 2022.</p>
<h3 id="the-status-of-turnip-driver-development-hyunjun-ko"><a
href="https://archive.fosdem.org/2022/schedule/event/turnip/">The status
of Turnip driver development</a> (Hyunjun Ko)</h3>
<p>Hyunjun presented the current state of the Turnip driver, also
talking about the difficulties of developing a driver for a platform
without hardware documentation. He talks about how Turnip developers
reverse engineer the behaviour of the hardware, and then implement that
in an open-source driver. He also made a companion <a
href="https://blogs.igalia.com/zzoon/graphics/mesa/2022/02/21/complement-story/">blog
post</a> to checkout along with his presentation.</p>
<h3
id="v3dv-status-update-for-open-source-vulkan-driver-for-raspberry-pi-4-alejandro-piñeiro"><a
href="https://archive.fosdem.org/2022/schedule/event/v3dv/">v3dv: Status
Update for Open Source Vulkan Driver for Raspberry Pi 4</a> (Alejandro
Piñeiro)</h3>
<p>Igalia has been presenting the status of the v3dv driver since
December 2019 and in this presentation, Alejandro talks about the status
of the v3dv driver in early 2022. He talks about achieving conformance,
the extensions that had to be implemented, and the future plans of the
v3dv driver.</p>
<h3 id="fun-with-border-colors-in-vulkan-ricardo-garcia"><a
href="https://archive.fosdem.org/2022/schedule/event/vulkan_borders/">Fun
with border colors in Vulkan</a> (Ricardo Garcia)</h3>
<p>Ricardo presents about the work he did on the
<code>VK_EXT_border_color_swizzle</code> extension in Vulkan. He talks
about the specific contributions he did and how the extension fits in
with sampling color operations in Vulkan.</p>
<h2 id="gsoc-igalia-ce">GSoC &amp; Igalia CE</h2>
<p>Last year Melissa &amp; André co-mentored contributors working on
introducing KUnit tests to the AMD display driver. This project was
hosted as a <a href="https://summerofcode.withgoogle.com/">“Google
Summer of Code” (GSoC)</a> project from the X.Org Foundation. If you’re
interested in seeing their work Tales da Aparecida, Maíra Canal, Magali
Lemes, and Isabella Basso presented their work at the <a
href="https://lpc.events/event/16/contributions/1310/">Linux Plumbers
Conference 2022</a> and across two talks at XDC 2022. Here you can see
their <a
href="https://indico.freedesktop.org/event/2/contributions/65/">first</a>
presentation and here you can see their <a
href="https://indico.freedesktop.org/event/2/contributions/164/">second</a>
second presentation.</p>
<p>André &amp; Melissa also mentored two <a
href="https://www.igalia.com/coding-experience/">“Igalia Coding
Experience” (CE)</a> projects, one related to IGT GPU test tools on the
VKMS kernel driver, and the other for IGT GPU test tools on the V3D
kernel driver. If you’re interested in reading up on some of that work,
Maíra Canal <a
href="https://mairacanal.github.io/january-update-finishing-my-igalia-ce/">wrote
about her experience</a> being part of the Igalia CE.</p>
<p>Ella Stanforth was also part of the Igalia Coding Experience, being
mentored by Iago &amp; Alejandro. They worked on the
<code>VK_KHR_sampler_ycbcr_conversion</code> extension for the v3dv
driver. Alejandro talks about their work in his <a
href="https://blogs.igalia.com/apinheiro/2023/01/v3dv-status-update-2023-01/">blog
post here</a>.</p>
<h1 id="whats-next">What’s Next?</h1>
<p>The graphics team is looking forward to having a jam-packed 2023 with
just as many if not more contributions to the Open-Source graphics
stack! I’m super excited to be part of the team, and hope to see my name
in our 2023 recap post!</p>
<p>Also, you might have heard that <a
href="https://www.igalia.com/2022/xdc-2023">Igalia will be hosting XDC
2023</a> in the beautiful city of A Coruña! We hope to see you there
where there will be many presentations from all the great people working
on the Open-Source graphics stack, and most importantly where you can <a
href="https://www.youtube.com/watch?v=7hWcu8O9BjM">dream in the
Atlantic!</a></p>
<figure>
<img src="https://www.igalia.com/assets/i/news/XDC-event-banner.jpg"
alt="Photo of A Coruña" />
<figcaption aria-hidden="true">Photo of A Coruña</figcaption>
</figure>
</description><pubDate>Thu, 02 Feb 2023 05:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/2022_igalia_graphics_team.html</guid></item><item><title>Global Game Jam 2023 - GI Jam</title><link>https://fryzekconcepts.com/notes/global_game_jam_2023.html</link><description><p>At the beginning of this month I participated in the Games
Institutes’s Global Game Jam event. <a
href="https://uwaterloo.ca/games-institute/">The Games Institute</a> is
an organization at my local university (The University of Waterloo) that
focuses on games-based research. They host a game jam every school term
and this term’s jam happened to coincide with the Global Game Jam. Since
this event was open to everyone (and it’s been a few years since I’ve
been a student at UW 👴️), I joined up to try and stretch some of my more
creative muscles. The event was a 48-hour game jam that began on Friday,
February 3rd and ended on Sunday,February 5th.</p>
<p>The game we created is called <a
href="https://globalgamejam.org/2023/games/turtle-roots-5">Turtle
Roots</a>, and it is a simple resource management game. You play as a
magical turtle floating through the sky and collecting water in order to
survive. The turtle can spend some of its “nutrients” to grow roots
which will allow it to gather water and collect more nutrients. The
challenge in the game is trying to survive for as long as possible
without running out of water.</p>
<div class="gallery">
<p><img src="/assets/global_game_jam_2023/screen_shot_1.png" /> <img
src="/assets/global_game_jam_2023/screen_shot_2.png" /> <img
src="/assets/global_game_jam_2023/screen_shot_3.png" /></p>
<p>Screenshots of Turtle Roots</p>
</div>
<p>The game we created is called <a
href="https://globalgamejam.org/2023/games/turtle-roots-5">Turtle
Roots</a>, and it is a simple resource management game. You play as a
magical turtle floating through the sky and collecting water in order to
survive. The turtle can spend some of its “nutrients” to grow roots
which will allow it to gather water and collect more nutrients. The
challenge in the game is trying to survive for as long as possible
without running out of water.</p>
<h2 id="the-team">The Team</h2>
<p>I attended the event solo and quickly partnered up with two other
people, who also attended solo. One member had already participated in a
game jam before and specialized in art. The other member was attending a
game jam for the first time and was looking for the best way they could
contribute. Having particular skills for sound, they ended up creating
all the audio in our game. This left me as the sole programmer for our
team.</p>
<h2 id="my-game-jam-experiences">My Game Jam Experiences</h2>
<p>In recent years,I participated in a <a
href="/notes/n64brew-gamejam-2021.html">Nintendo 64 homebrew game
jam</a> and the Puerto Rico Game Developers Association event for the
global game jam, submitting <a
href="https://globalgamejam.org/2022/games/magnetic-parkour-6">Magnetic
Parkour</a>, I also participated in <a href="https://ldjam.com/">Ludum
Dare</a> back around 2013 but unfortunately I’ve since lost the link to
my submission. While in high school, my friend and I participated in the
“Ottawa Tech Jame” (similar to a game jam), sort of worked like a game
jam called “Ottawa Tech Jam” submitting <a
href="http://www.fastquake.com/projects/zorvwarz/">Zorv Warz</a> and <a
href="http://www.fastquake.com/projects/worldseed/">E410</a>. As you can
probably tell, I really like gamedev. The desire to build my own video
games is actually what originally got me into programming. When I was
around 14 years old, I picked up a C++ programming book from the library
since I wanted to try to build my own game and I heard most game
developers use C++. I used some proprietary game development library
(that I can’t recall the name of)to build 2D and 3D games in Windows
using C++. I didn’t really get too far into it until high school when I
started to learn SFML, SDL, and OpenGL. I also dabbled with Unity during
that time as well. However,I’ve always had a strong desire to build most
of the foundation of the game myself without using an engine. You can
see this desire really come out in the work I did for Zorv Warz, E410,
and the N64 homebrew game jam. When working with a team, I feel it can
be a lot easier to use a game engine, even if it doesn’t scratch the
same itch for me.</p>
<h2 id="the-tech-behind-the-game">The Tech Behind the Game</h2>
<p>Lately I’ve had a growing interest in the game engine called <a
href="https://godotengine.org/">Godot</a>, and wanted to use this
opportunity to learn the engine more and build a game in it. Godot is
interesting to me as its a completely open source game engine, and as
you can probably guess from my <a
href="/notes/2022_igalia_graphics_team.html">job</a>, open source
software as well as free software is something I’m particularly
interested in.</p>
<p>Godot is a really powerful game engine that handles a lot of
complexity for you. For example,it has a built in parallax background
component, that we took advantage of to add more depth to our game. This
allows you to control the background scrolling speed for different layer
of the background, giving the illusion of depth in a 2D game.</p>
<p>Another powerful feature of Godot is its physics engine. Godot makes
it really easy to create physics objects in your scene and have them do
interesting stuff. You might be wondering where physics comes into play
in our game, and we actually use it for the root animations. I set up a
sort of “rag doll” system for the roots to make them flop around in the
air as the player moves, really giving a lot more “life” to an otherwise
static game.</p>
<p>Godot has a built in scripting language called “GDScript” which is
very similar to Python. I’ve really grown to like this language. It has
an optional type system you can take advantage of that helps with
reducing the number of bugs that exist in your game. It also has great
connectivity with the editor. This proved useful as I could “export”
variables in the game and allow my team members to modify certain
parameters of the game without knowing any programming. This is super
helpful with balancing, and more easily allows non-technical members of
team to contribute to the game logic in a more concrete way.</p>
<p>Overall I’m very happy with how our game turned out. Last year I
tried to participate in a few more game jams, but due to a combination
of lack of personal motivation, poor team dynamics, and other factors,
none of those game jams panned out. This was the first game jam in a
while where I feel like I really connected with my team and I also feel
like we made a super polished and fun game in the end.</p>
</description><pubDate>Sat, 11 Feb 2023 05:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/global_game_jam_2023.html</guid></item><item><title>Journey Through Freedreno</title><link>https://fryzekconcepts.com/notes/freedreno_journey.html</link><description><figure>
<img src="/assets/freedreno/glinfo_freedreno.png"
alt="Android running Freedreno" />
<figcaption aria-hidden="true">Android running Freedreno</figcaption>
</figure>
<p>As part of my training at Igalia I’ve been attempting to write a new
backend for Freedreno that targets the proprietary “KGSL” kernel mode
driver. For those unaware there are two “main” kernel mode drivers on
Qualcomm SOCs for the GPU, there is the “MSM”, and “KGSL”. “MSM” is DRM
compliant, and Freedreno already able to run on this driver. “KGSL” is
the proprietary KMD that Qualcomm’s proprietary userspace driver
targets. Now why would you want to run freedreno against KGSL, when MSM
exists? Well there are a few ones, first MSM only really works on an
up-streamed kernel, so if you have to run a down-streamed kernel you can
continue using the version of KGSL that the manufacturer shipped with
your device. Second this allows you to run both the proprietary adreno
driver and the open source freedreno driver on the same device just by
swapping libraries, which can be very nice for quickly testing something
against both drivers.</p>
<h2 id="when-drm-isnt-just-drm">When “DRM” isn’t just “DRM”</h2>
<p>When working on a new backend, one of the critical things to do is to
make use of as much “common code” as possible. This has a number of
benefits, least of all reducing the amount of code you have to write. It
also allows reduces the number of bugs that will likely exist as you are
relying on well tested code, and it ensures that the backend is mostly
likely going to continue to work with new driver updates.</p>
<p>When I started the work for a new backend I looked inside mesa’s
<code>src/freedreno/drm</code> folder. This has the current backend code
for Freedreno, and its already modularized to support multiple backends.
It currently has support for the above mentioned MSM kernel mode driver
as well as virtio (a backend that allows Freedreno to be used from
within in a virtualized environment). From the name of this path, you
would think that the code in this module would only work with kernel
mode drivers that implement DRM, but actually there is only a handful of
places in this module where DRM support is assumed. This made it a good
starting point to introduce the KGSL backend and piggy back off the
common code.</p>
<p>For example the <code>drm</code> module has a lot of code to deal
with the management of synchronization primitives, buffer objects, and
command submit lists. All managed at a abstraction above “DRM” and to
re-implement this code would be a bad idea.</p>
<h2 id="how-to-get-android-to-behave">How to get Android to behave</h2>
<p>One of this big struggles with getting the KGSL backend working was
figuring out how I could get Android to load mesa instead of Qualcomm
blob driver that is shipped with the device image. Thankfully a good
chunk of this work has already been figured out when the Turnip
developers (Turnip is the open source Vulkan implementation for Adreno
GPUs) figured out how to get Turnip running on android with KGSL.
Thankfully one of my coworkers <a
href="https://blogs.igalia.com/dpiliaiev/">Danylo</a> is one of those
Turnip developers, and he gave me a lot of guidance on getting Android
setup. One thing to watch out for is the outdated instructions <a
href="https://docs.mesa3d.org/android.html">here</a>. These instructions
<em>almost</em> work, but require some modifications. First if you’re
using a more modern version of the Android NDK, the compiler has been
replaced with LLVM/Clang, so you need to change which compiler is being
used. Second flags like <code>system</code> in the cross compiler script
incorrectly set the system as <code>linux</code> instead of
<code>android</code>. I had success using the below cross compiler
script. Take note that the compiler paths need to be updated to match
where you extracted the android NDK on your system.</p>
<pre class="meson"><code>[binaries]
ar = &#39;/home/lfryzek/Documents/projects/igalia/freedreno/android-ndk-r25b-linux/android-ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-ar&#39;
c = [&#39;ccache&#39;, &#39;/home/lfryzek/Documents/projects/igalia/freedreno/android-ndk-r25b-linux/android-ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android29-clang&#39;]
cpp = [&#39;ccache&#39;, &#39;/home/lfryzek/Documents/projects/igalia/freedreno/android-ndk-r25b-linux/android-ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android29-clang++&#39;, &#39;-fno-exceptions&#39;, &#39;-fno-unwind-tables&#39;, &#39;-fno-asynchronous-unwind-tables&#39;, &#39;-static-libstdc++&#39;]
c_ld = &#39;lld&#39;
cpp_ld = &#39;lld&#39;
strip = &#39;/home/lfryzek/Documents/projects/igalia/freedreno/android-ndk-r25b-linux/android-ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-strip&#39;
# Android doesn&#39;t come with a pkg-config, but we need one for Meson to be happy not
# finding all the optional deps it looks for. Use system pkg-config pointing at a
# directory we get to populate with any .pc files we want to add for Android
pkgconfig = [&#39;env&#39;, &#39;PKG_CONFIG_LIBDIR=/home/lfryzek/Documents/projects/igalia/freedreno/android-ndk-r25b-linux/android-ndk-r25b/pkgconfig:/home/lfryzek/Documents/projects/igalia/freedreno/install-android/lib/pkgconfig&#39;, &#39;/usr/bin/pkg-config&#39;]
[host_machine]
system = &#39;android&#39;
cpu_family = &#39;arm&#39;
cpu = &#39;armv8&#39;
endian = &#39;little&#39;</code></pre>
<p>Another thing I had to figure out with Android, that was different
with these instructions, was how I would get Android to load mesa
versions of mesa libraries. That’s when my colleague <a
href="https://www.igalia.com/team/mark">Mark</a> pointed out to me that
Android is open source and I could just check the source code myself.
Sure enough you have find the OpenGL driver loader in <a
href="https://android.googlesource.com/platform/frameworks/native/+/master/opengl/libs/EGL/Loader.cpp">Android’s
source code</a>. From this code we can that Android will try to load a
few different files based on some settings, and in my case it would try
to load 3 different shaded libraries in the
<code>/vendor/lib64/egl</code> folder, <code>libEGL_adreno.so</code>
,<code>libGLESv1_CM_adreno.so</code>, and <code>libGLESv2.so</code>. I
could just replace these libraries with the version built from mesa and
voilà, you’re now loading a custom driver! This realization that I could
just “read the code” was very powerful in debugging some more android
specific issues I ran into, like dealing with gralloc.</p>
<p>Something cool that the opensource Freedreno &amp; Turnip driver
developers figured out was getting android to run test OpenGL
applications from the adb shell without building android APKs. If you
check out the <a
href="https://gitlab.freedesktop.org/freedreno/freedreno">freedreno
repo</a>, they have an <code>ndk-build.sh</code> script that can build
tests in the <code>tests-*</code> folder. The nice benefit of this is
that it provides an easy way to run simple test cases without worrying
about the android window system integration. Another nifty feature about
this repo is the <code>libwrap</code> tool that lets trace the commands
being submitted to the GPU.</p>
<h2 id="what-even-is-gralloc">What even is Gralloc?</h2>
<p>Gralloc is the graphics memory allocated in Android, and the OS will
use it to allocate the surface for “windows”. This means that the memory
we want to render the display to is managed by gralloc and not our KGSL
backend. This means we have to get all the information about this
surface from gralloc, and if you look in
<code>src/egl/driver/dri2/platform_android.c</code> you will see
existing code for handing gralloc. You would think “Hey there is no work
for me here then”, but you would be wrong. The handle gralloc provides
is hardware specific, and the code in <code>platform_android.c</code>
assumes a DRM gralloc implementation. Thankfully the turnip developers
had already gone through this struggle and if you look in
<code>src/freedreno/vulkan/tu_android.c</code> you can see they have
implemented a separate path when a Qualcomm msm implementation of
gralloc is detected. I could copy this detection logic and add a
separate path to <code>platform_android.c</code>.</p>
<h2 id="working-with-the-freedreno-community">Working with the Freedreno
community</h2>
<p>When working on any project (open-source or otherwise), it’s nice to
know that you aren’t working alone. Thankfully the
<code>#freedreno</code> channel on <code>irc.oftc.net</code> is very
active and full of helpful people to answer any questions you may have.
While working on the backend, one area I wasn’t really sure how to
address was the synchronization code for buffer objects. The backend
exposed a function called <code>cpu_prep</code>, This function was just
there to call the DRM implementation of <code>cpu_prep</code> on the
buffer object. I wasn’t exactly sure how to implement this functionality
with KGSL since it doesn’t use DRM buffer objects.</p>
<p>I ended up reaching out to the IRC channel and Rob Clark on the
channel explained to me that he was actually working on moving a lot of
the code for <code>cpu_prep</code> into common code so that a non-drm
driver (like the KGSL backend I was working on) would just need to
implement that operation as NOP (no operation).</p>
<h2 id="dealing-with-bugs-reverse-engineering-the-blob">Dealing with
bugs &amp; reverse engineering the blob</h2>
<p>I encountered a few different bugs when implementing the KGSL
backend, but most of them consisted of me calling KGSL wrong, or handing
synchronization incorrectly. Thankfully since Turnip is already running
on KGSL, I could just more carefully compare my code to what Turnip is
doing and figure out my logical mistake.</p>
<p>Some of the bugs I encountered required the backend interface in
Freedreno to be modified to expose per a new per driver implementation
of that backend function, instead of just using a common implementation.
For example the existing function to map a buffer object into userspace
assumed that the same <code>fd</code> for the device could be used for
the buffer object in the <code>mmap</code> call. This worked fine for
any buffer objects we created through KGSL but would not work for buffer
objects created from gralloc (remember the above section on surface
memory for windows comming from gralloc). To resolve this issue I
exposed a new per backend implementation of “map” where I could take a
different path if the buffer object came from gralloc.</p>
<p>While testing the KGSL backend I did encounter a new bug that seems
to effect both my new KGSL backend and the Turnip KGSL backend. The bug
is an <code>iommu fault</code> that occurs when the surface allocated by
gralloc does not have a height that is aligned to 4. The blitting engine
on a6xx GPUs copies in 16x4 chunks, so if the height is not aligned by 4
the GPU will try to write to pixels that exists outside the allocated
memory. This issue only happens with KGSL backends since we import
memory from gralloc, and gralloc allocates exactly enough memory for the
surface, with no alignment on the height. If running on any other
platform, the <code>fdl</code> (Freedreno Layout) code would be called
to compute the minimum required size for a surface which would take into
account the alignment requirement for the height. The blob driver
Qualcomm didn’t seem to have this problem, even though its getting the
exact same buffer from gralloc. So it must be doing something different
to handle the none aligned height.</p>
<p>Because this issue relied on gralloc, the application needed to
running as an Android APK to get a surface from gralloc. The best way to
fix this issue would be to figure out what the blob driver is doing and
try to replicate this behavior in Freedreno (assuming it isn’t doing
something silly like switch to sysmem rendering). Unfortunately it
didn’t look like the libwrap library worked to trace an APK.</p>
<p>The libwrap library relied on a linux feature known as
<code>LD_PRELOAD</code> to load <code>libwrap.so</code> when the
application starts and replace the system functions like
<code>open</code> and <code>ioctl</code> with their own implementation
that traces what is being submitted to the KGSL kernel mode driver.
Thankfully android exposes this <code>LD_PRELOAD</code> mechanism
through its “wrap” interface where you create a propety called
<code>wrap.&lt;app-name&gt;</code> with a value
<code>LD_PRELOAD=&lt;path to libwrap.so&gt;</code>. Android will then
load your library like would be done in a normal linux shell. If you
tried to do this with libwrap though you find very quickly that you
would get corrupted traces. When android launches your APK, it doesn’t
only launch your application, there are different threads for different
android system related functions and some of them can also use OpenGL.
The libwrap library is not designed to handle multiple threads using
KGSL at the same time. After discovering this issue I created a <a
href="https://gitlab.freedesktop.org/freedreno/freedreno/-/merge_requests/22">MR</a>
that would store the tracing file handles as TLS (thread local storage)
preventing the clobbering of the trace file, and also allowing you to
view the traces generated by different threads separately from each
other.</p>
<p>With this is in hand one could begin investing what the blob driver
is doing to handle this unaligned surfaces.</p>
<h2 id="whats-next">What’s next?</h2>
<p>Well the next obvious thing to fix is the aligned height issue which
is still open. I’ve also worked on upstreaming my changes with this <a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21570">WIP
MR</a>.</p>
<figure>
<img src="/assets/freedreno/3d-mark.png"
alt="Freedreno running 3d-mark" />
<figcaption aria-hidden="true">Freedreno running 3d-mark</figcaption>
</figure>
</description><pubDate>Tue, 28 Feb 2023 05:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/freedreno_journey.html</guid></item><item><title>Igalia’s Mesa 23.1 Contributions - Behind the Scenes</title><link>https://fryzekconcepts.com/notes/mesa_23_1_contributions_behind_the_scenes.html</link><description><p>It’s an exciting time for Mesa as its next major release is unveiled
this week. Igalia has played an important role in this milestone, with
Eric Engestrom managing the release and 11 other Igalians contributing
over 110 merge requests. A sample of these contributions are detailed
below.</p>
<h2 id="radv-implement-vk.check_status">radv: Implement
vk.check_status</h2>
<p>As part of an effort to enhance the reliability of GPU resets on
amdgpu, Tony implemented a GPU reset notification feature in the RADV
Vulkan driver. This new function improves the robustness of the RADV
driver. The driver can now check if the GPU has been reset by a
userspace application, allowing the driver to recover their contexts,
exit, or engage in some other appropriate action.</p>
<p>You can read more about Tony’s changes in the link below</p>
<ul>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22253">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22253</a></li>
</ul>
<h2 id="turnip-kgsl-backend-rewrite">turnip: KGSL backend rewrite</h2>
<p>With a goal of improving feature parity of the KGSL kernel mode
driver with its drm counterpart, Mark has been rewriting the backend for
KGSL. These changes leverage the new, common backend Vulkan
infrastructure inside Mesa and fix multiple bugs. In addition, they
introduce support for importing/exporting sync FDs, pre-signalled
fences, and timeline semaphore support.</p>
<p>If you’re interested in taking a deeper dive into Mark’s changes, you
can read the following MR:</p>
<ul>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21651">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21651</a></li>
</ul>
<h2 id="turnip-a7xx-preparation-transition-to-c">turnip: a7xx
preparation, transition to C++</h2>
<p>Danylo has adopted a significant role for two major changes inside
turnip: 1)contributing to the effort to migrate turnip to C++ and
2)supporting the next generation a7xx Adreno GPUs from Qualcomm. A more
detailed overview of Danylo’s changes can be found in the linked MRs
below:</p>
<ul>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21931">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21931</a></li>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22148">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22148</a></li>
</ul>
<h2 id="v3dv3dv-various-fixes-cts-conformance">v3d/v3dv various fixes
&amp; CTS conformance</h2>
<p>Igalia maintains the v3d OpenGL driver and v3dv Vulkan drive for
broadcom videocore GPUs which can be found on devices such as the
Raspberry Pi. Iago, Alex and Juan have combined their expertise to
implement multiple fixes for both the v3d gallium driver and the v3dv
vulkan driver on the Raspberry Pi. These changes include CPU performance
optimizations, support for 16-bit floating point vertex attributes, and
raising support in the driver to OpenGL 3.1 level functionality. This
Igalian trio has also been addressing fixes for conformance issues
raised in the Vulkan 1.3.5 conformance test suite (CTS).</p>
<p>You can dive into some of their Raspberry Pi driver changes here:</p>
<ul>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22131">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22131</a></li>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21361">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21361</a></li>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20787">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20787</a></li>
</ul>
<h2 id="ci-build-system-and-cleanup">ci, build system, and cleanup</h2>
<p>In addition to managing the 23.1 release, Eric has also implemented
many fixes in Mesa’s infrastructure. He has assisted with addressing a
number of CI issues within Mesa on various drivers from v3d to panfrost.
Eric also dedicated part of his time to general clean-up of the Mesa
code (e.g. removing duplicate functions, fixing and improving the
meson-based build system, and removing dead code).</p>
<p>If you’re interested in seeing some of his work, check out some of
the MRs below:</p>
<ul>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22410">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22410</a></li>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21504">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21504</a></li>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21558">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21558</a></li>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20180">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20180</a></li>
</ul>
</description><pubDate>Thu, 11 May 2023 04:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/mesa_23_1_contributions_behind_the_scenes.html</guid></item><item><title>Converting from 3D to 2D</title><link>https://fryzekconcepts.com/notes/converting_from_3d_to_2d.html</link><description><p>Recently I’ve been working on a project where I needed to convert an
application written in OpenGL to a software renderer. The matrix
transformation code in OpenGL made use of the GLM library for matrix
math, and I needed to convert the 4x4 matrices to be 3x3 matrices to
work with the software renderer. There was some existing code to do this
that was broken, and looked something like this:</p>
<pre><code>glm::mat3 mat3x3 = glm::mat3(mat4x4);</code></pre>
<p>Don’t worry if you don’t see the problem already, I’m going to
illustrate in more detail with the example of a translation matrix. In
3D a standard translation matrix to translate by a vector
<code>(x, y, z)</code> looks something like this:</p>
<pre><code>[1 0 0 x]
[0 1 0 y]
[0 0 1 z]
[0 0 0 1]</code></pre>
<p>Then when we multiply this matrix by a vector like
<code>(a, b, c, 1)</code> the result is
<code>(a + x, b + y, c + z, 1)</code>. If you don’t understand why the
matrix is 4x4 or why we have that extra 1 at the end don’t worry, I’ll
explain that in more detail later.</p>
<p>Now using the existing conversion code to get a 3x3 matrix will
simply take the first 3 columns and first 3 rows of the matrix and
produce a 3x3 matrix from those. Converting the translation matrix above
using this code produces the following matrix:</p>
<pre><code>[1 0 0]
[0 1 0]
[0 0 1]</code></pre>
<p>See the problem now? The <code>(x, y, z)</code> values disappeared!
In the conversion process we lost these critical values from the
translation matrix, and now if we multiply by this matrix nothing will
happen since we are just left with the identity matrix. So if we can’t
use this simple “cast” function in GLM, what can we use?</p>
<p>Well one thing we can do is preserve the last column and last row of
the matrix. So assume we have a 4x4 matrix like this:</p>
<pre><code>[a b c d]
[e f g h]
[i j k l]
[m n o p]</code></pre>
<p>Then preserving the last row and column we should get a matrix like
this:</p>
<pre><code>[a b d]
[e f h]
[m n p]</code></pre>
<p>And if we use this conversion process for the same translation matrix
we will get:</p>
<pre><code>[1 0 x]
[0 1 y]
[0 0 1]</code></pre>
<p>Now we see that the <code>(x, y)</code> part of the translation is
preserved, and if we try to multiply this matrix by the vector
<code>(a, b, 1)</code> the result will be
<code>(a + x, b + y, 1)</code>. The translation is preserved in the
conversion!</p>
<h2 id="why-do-we-have-to-use-this-conversion">Why do we have to use
this conversion?</h2>
<p>The reason the conversion is more complicated is hidden in how we
defined the translation matrix and vector we wanted to translate. The
vector was actually a 4D vector with the final component set to 1. The
reason we do this is that we actually want to represent an affine space
instead of just a vector space. An affine space being a type of space
where you can have both points and vectors. A point is exactly what you
would expect it to be just a point in space from some origin, and vector
is a direction with magnitude but no origin. This is important because
strictly speaking translation isn’t actually defined for vectors in a
normal vector space. Additionally if you try to construct a matrix to
represent translation for a vector space you’ll find that its impossible
to derive a matrix to do this and that operation is not a linear
function. On the other hand operations like translation are well defined
in an affine space and do what you would expect.</p>
<p>To get around the problem of vector spaces, mathematicians more
clever than I figured out you can implement an affine space in a normal
vector space by increasing the dimension of the vector space by one, and
by adding an extra row and column to the transformation matrices used.
They called this a <strong>homogeneous coordinate system</strong>. This
lets you say that a vector is actually just a point if the 4th component
is 1, but if its 0 its just a vector. Using this abstraction one can
implement all the well defined operations for an affine space (like
translation!).</p>
<p>So using the “homogeneous coordinate system” abstraction, translation
is an operation that defined by taking a point and moving it by a
vector. Lets look at how that works with the translation matrix I used
as an example above. If you multiply that matrix by a 4D vector where
the 4th component is 0, it will just return the same vector. Now if we
multiply by a 4D vector where the 4th component is 1, it will return the
point translated by the vector we used to construct that translation
matrix. This implements the translation operation as its defined in an
affine space!</p>
<p>If you’re interested in understanding more about homogeneous
coordinate spaces, (like how the translation matrix is derived in the
first place) I would encourage you to look at resources like <a
href="https://books.google.ca/books/about/Mathematics_for_Computer_Graphics_Applic.html?id=YmQy799flPkC&amp;redir_esc=y">“Mathematics
for Computer Graphics Applications”</a>. They provide a much more
detailed explanation than I am providing here. (The homogeneous
coordinate system also has some benefits for representing projections
which I won’t get into here, but are explained in that text book.)</p>
<p>Now to finally answer the question about why we needed to preserve
those final columns and vectors. Based on what we now know, we weren’t
actually just converting from a “3D space” to a “2D space” we were
converting from a “3D homogeneous space” to a “2D homogeneous space”.
The process of converting from a higher dimension matrix to a lower
dimensional matrix is lossy and some transformation details are going to
be lost in process (like for example the translation along the z-axis).
There is no way to tell what kind of space a given matrix is supposed to
transform just by looking at the matrix itself. The matrix does not
carry any information about about what space its operating in and any
conversion function would need to know that information to properly
convert that matrix. Therefore we need develop our own conversion
function that preserves the transformations that are important to our
application when moving from a “3D homogeneous space” to a “2D
homogeneous space”.</p>
<p>Hopefully this explanation helps if you are every working on
converting 3D transformation code to 2D.</p>
</description><pubDate>Mon, 25 Sep 2023 04:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/converting_from_3d_to_2d.html</guid></item><item><title>A Dive into Vulkanised 2024</title><link>https://fryzekconcepts.com/notes/vulkanised_2024.html</link><description><figure>
<img src="/assets/vulkanised_2024/vulkanized_logo_web.jpg"
alt="Vulkanised sign at google’s office" />
<figcaption aria-hidden="true">Vulkanised sign at google’s
office</figcaption>
</figure>
<p>Last week I had an exciting opportunity to attend the Vulkanised 2024
conference. For those of you not familar with the event, it is <a
href="https://vulkan.org/events/vulkanised-2024">“The Premier Vulkan
Developer Conference”</a> hosted by the Vulkan working group from
Khronos. With the excitement out of the way, I decided to write about
some of the interesting information that came out of the conference.</p>
<h2 id="a-few-presentations">A Few Presentations</h2>
<p>My colleagues Iago, Stéphane, and Hyunjun each had the opportunity to
present on some of their work into the wider Vulkan ecosystem.</p>
<figure>
<img src="/assets/vulkanised_2024/vulkan_video_web.jpg"
alt="Stéphane and Hyujun presenting" />
<figcaption aria-hidden="true">Stéphane and Hyujun
presenting</figcaption>
</figure>
<p>Stéphane &amp; Hyunjun presented “Implementing a Vulkan Video Encoder
From Mesa to Streamer”. They jointly talked about the work they
performed to implement the Vulkan video extensions in Intel’s ANV Mesa
driver as well as in GStreamer. This was an interesting presentation
because you got to see how the new Vulkan video extensions affected both
driver developers implementing the extensions and application developers
making use of the extensions for real time video decoding and encoding.
<a
href="https://vulkan.org/user/pages/09.events/vulkanised-2024/vulkanised-2024-stephane-cerveau-ko-igalia.pdf">Their
presentation is available on vulkan.org</a>.</p>
<figure>
<img src="/assets/vulkanised_2024/opensource_vulkan_web.jpg"
alt="Iago presenting" />
<figcaption aria-hidden="true">Iago presenting</figcaption>
</figure>
<p>Later my colleague Iago presented jointly with Faith Ekstrand (a
well-known Linux graphic stack contributor from Collabora) on “8 Years
of Open Drivers, including the State of Vulkan in Mesa”. They both
talked about the current state of Vulkan in the open source driver
ecosystem, and some of the benefits open source drivers have been able
to take advantage of, like the common Vulkan runtime code and a shared
compiler stack. You can check out <a
href="https://vulkan.org/user/pages/09.events/vulkanised-2024/Vulkanised-2024-faith-ekstrand-collabora-Iago-toral-igalia.pdf">their
presentation for all the details</a>.</p>
<p>Besides Igalia’s presentations, there were several more which I found
interesting, with topics such as Vulkan developer tools, experiences of
using Vulkan in real work applications, and even how to teach Vulkan to
new developers. Here are some highlights for some of them.</p>
<h3 id="using-vulkan-synchronization-validation-effectively"><a
href="https://vulkan.org/user/pages/09.events/vulkanised-2024/vulkanised-2024-john-zulauf-lunarg.pdf">Using
Vulkan Synchronization Validation Effectively</a></h3>
<p>John Zulauf had a presentation of the Vulkan synchronization
validation layers that he has been working on. If you are not familiar
with these, then you should really check them out. They work by tracking
how resources are used inside Vulkan and providing error messages with
some hints if you use a resource in a way where it is not synchronized
properly. It can’t catch every error, but it’s a great tool in the
toolbelt of Vulkan developers to make their lives easier when it comes
to debugging synchronization issues. As John said in the presentation,
synchronization in Vulkan is hard, and nearly every application he
tested the layers on reveled a synchronization issue, no matter how
simple it was. He can proudly say he is a vkQuake contributor now
because of these layers.</p>
<h3 id="years-of-teaching-vulkan-with-example-for-video-extensions"><a
href="https://vulkan.org/user/pages/09.events/vulkanised-2024/vulkanised-2024-helmut-hlavacs.pdf">6
Years of Teaching Vulkan with Example for Video Extensions</a></h3>
<p>This was an interesting presentation from a professor at the
university of Vienna about his experience teaching graphics as well as
game development to students who may have little real programming
experience. He covered the techniques he uses to make learning easier as
well as resources that he uses. This would be a great presentation to
check out if you’re trying to teach Vulkan to others.</p>
<h3 id="vulkan-synchronization-made-easy"><a
href="https://vulkan.org/user/pages/09.events/vulkanised-2024/vulkanised-2024-grigory-dzhavadyan.pdf">Vulkan
Synchronization Made Easy</a></h3>
<p>Another presentation focused on Vulkan sync, but instead of debugging
it, Grigory showed how his graphics library abstracts sync away from the
user without implementing a render graph. He presented an interesting
technique that is similar to how the sync validation layers work when it
comes ensuring that resources are always synchronized before use. If
you’re building your own engine in Vulkan, this is definitely something
worth checking out.</p>
<h3 id="vulkan-video-encode-api-a-deep-dive"><a
href="https://vulkan.org/user/pages/09.events/vulkanised-2024/vulkanised-2024-tony-zlatinski-nvidia.pdf">Vulkan
Video Encode API: A Deep Dive</a></h3>
<p>Tony at Nvidia did a deep dive into the new Vulkan Video extensions,
explaining a bit about how video codecs work, and also including a
roadmap for future codec support in the video extensions. Especially
interesting for us was that he made a nice call-out to Igalia and our
work on Vulkan Video CTS and open source driver support on slide (6)
:)</p>
<h2 id="thoughts-on-vulkanised">Thoughts on Vulkanised</h2>
<p>Vulkanised is an interesting conference that gives you the
intersection of people working on Vulkan drivers, game developers using
Vulkan for their graphics backend, visual FX tool developers using
Vulkan-based tools in their pipeline, industrial application developers
using Vulkan for some embedded commercial systems, and general hobbyists
who are just interested in Vulkan. As an example of some of these
interesting audience members, I got to talk with a member of the Blender
foundation about his work on the Vulkan backend to Blender.</p>
<p>Lastly the event was held at Google’s offices in Sunnyvale. Which I’m
always happy to travel to, not just for the better weather (coming from
Canada), but also for the amazing restaurants and food that’s in the Bay
Area!</p>
<figure>
<img src="/assets/vulkanised_2024/food_web.jpg"
alt="Great bay area food" />
<figcaption aria-hidden="true">Great bay area food</figcaption>
</figure>
</description><pubDate>Wed, 14 Feb 2024 05:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/vulkanised_2024.html</guid></item><item><title>Software Rendering and Android</title><link>https://fryzekconcepts.com/notes/android_swrast.html</link><description><p>My current project at Igalia has had me working on Mesa’s software
renderers, llvmpipe and lavapipe. I’ve been working to get them running
on Android, and I wanted to document the progress I’ve made, the
challenges I’ve faced, and talk a little bit about the development
process for a project like this. My work is not totally merged into
upstream mesa yet, but you can see the MRs I made here:</p>
<ul>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29344">llvmpipe:
Add android platform integration</a></li>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29785">u_gralloc/fallback:
Set fd from handle directly</a></li>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27805">llvmpipe
&amp; lavalpipe: Implement sync fd import/export extensions</a></li>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28735">lavapipe:
Implement <code>VK_EXT_external_memory_dma_buf</code></a></li>
</ul>
<h2 id="setting-up-an-android-development-environment">Setting up an
Android development environment</h2>
<p>Getting system level software to build and run on Android is
unfortunately not straightforward. Since we are doing software rendering
we don’t need a physical device and instead we can make use of the
Android emulator, and if you didn’t know Android has two emulators, the
common one most people use is “goldfish” and the other lesser known is
“cuttlefish”. For this project I did my work on the cuttlefish emulator
as its meant for testing the Android OS itself instead of just Android
apps and is more reflective of real hardware. The cuttlefish emulator
takes a little bit more work to setup, and I’ve found that it only works
properly in Debian based linux distros. I run Fedora, so I had to run
the emulator in a debian VM.</p>
<p>Thankfully Google has good instructions for building and running
cuttlefish, which you can find <a
href="https://source.android.com/docs/devices/cuttlefish/get-started">here</a>.
The instructions show you how to setup the emulator using nightly build
images from Google. We’ll also need to setup our own Android OS images
so after we’ve confirmed we can run the emulator, we need to start
looking at building AOSP.</p>
<p>For building our own AOSP image, we can also follow the instructions
from Google <a
href="https://source.android.com/docs/setup/build/building">here</a>.
For the target we’ll want
<code>aosp_cf_x86_64_phone-trunk_staging-eng</code>. At this point it’s
a good idea to verify that you can build the image, which you can do by
following the rest of the instructions on the page. Building AOSP from
source does take a while though, so prepare to wait potentially an
entire day for the image to build. Also if you get errors complaining
that you’re out of memory, you can try to reduce the number of parallel
builds. Google officially recommends to have 64GB of RAM, and I only had
32GB so some packages had to be built with the parallel builds set to 1
so I wouldn’t run out of RAM.</p>
<p>For running this custom-built image on Cuttlefish, you can just copy
all the <code>*.img</code> files from
<code>out/target/product/vsoc_x86_64/</code> to the root cuttlefish
directory, and then launch cuttlefish. If everything worked successfully
you should be able to see your custom built AOSP image running in the
cuttlefish webui.</p>
<h2 id="building-mesa-targeting-android">Building Mesa targeting
Android</h2>
<p>Working from the changes in MR <a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29344">!29344</a>
building llvmpipe or lavapipe targeting Android should just work™️. To
get to that stage required a few changes. First llvmpipe actually
already had some support on Android, as long as it was running on a
device that supports a DRM display driver. In that case it could use the
<code>dri</code> window system integration which already works on
Android. I wanted to get llvmpipe (and lavapipe) running without dri, so
I had to add support for Android in the <code>drisw</code> window system
integration.</p>
<p>To support Android in <code>drisw</code>, this mainly meant adding
support for importing dmabuf as framebuffers. The Android windowing
system will provide us with a “gralloc” buffer which inside has a dmabuf
fd that represents the framebuffer. Adding support for importing dmabufs
in drisw means we can import and begin drawing to these frame buffers.
Most the changes to support that can be found in <a
href="https://gitlab.freedesktop.org/mesa/mesa/-/blob/9705df53408777d493eab19e5a58c432c1e75acb/src/gallium/frontends/dri/drisw.c#L405"><code>drisw_allocate_textures</code></a>
and the underlying changes to llvmpipe to support importing dmabufs in
MR <a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27805">!27805</a>.
The EGL Android platform code also needed some changes to use the
<code>drisw</code> window system code. Previously this code would only
work with true dri drivers, but with some small tweaks it was possible
to get to have it initialize the drisw window system and then using it
for rendering if no hardware devices are available.</p>
<p>For lavapipe the changes were a lot simpler. The Android Vulkan
loader requires your driver to have <code>HAL_MODULE_INFO_SYM</code>
symbol in the binary, so that got created and populated correctly,
following other Vulkan drivers in Mesa like turnip. Then the image
creation code had to be modified to support the
<code>VK_ANDROID_native_buffer</code> extension which allows the Android
Vulkan loader to create images using Android native buffer handles.
Under the hood this means getting the dmabuf fd from the native buffer
handle. Thankfully mesa already has some common code to handle this, so
I could just use that. Some other small changes were also necessary to
address crashes and other failures that came up during testing.</p>
<p>With the changes out of of the way we can now start building Mesa on
Android. For this project I had to update the Android documentation for
Mesa to include steps for building LLVM for Android since the version
Google ships with the NDK is missing libraries that llvmpipe/lavapipe
need to function. You can see the updated documentation <a
href="https://gitlab.freedesktop.org/mesa/mesa/-/blob/9705df53408777d493eab19e5a58c432c1e75acb/docs/drivers/llvmpipe.rst">here</a>
and <a
href="https://gitlab.freedesktop.org/mesa/mesa/-/blob/9705df53408777d493eab19e5a58c432c1e75acb/docs/android.rst">here</a>.
After sorting out LLVM, building llvmpipe/lavapipe is the same as
building any other Mesa driver for Android: we setup a cross file to
tell meson how to cross compile and then we run meson. At this point you
could manual modify the Android image and copy these files to the vm,
but I also wanted to support building a new AOSP image directly
including the driver. In order to do that you also have to rename the
driver binaries to match Android’s naming convention, and make sure
SO_NAME matches as well. If you check out <a
href="https://gitlab.freedesktop.org/mesa/mesa/-/blob/9705df53408777d493eab19e5a58c432c1e75acb/docs/android.rst?plain=1#L183">this</a>
section of the documentation I wrote, it covers how to do that.</p>
<p>If you followed all of that you should have built an version of
llvmpipe and lavapipe that you can run on Android’s cuttlefish
emulator.</p>
<figure>
<img src="/assets/2024-06-27-android-swrast/lavapipe.png"
alt="Android running lavapipe" />
<figcaption aria-hidden="true">Android running lavapipe</figcaption>
</figure>
<h2 id="references">References</h2>
<ul>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29344"
class="uri">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29344</a>
<ul>
<li>Main MR with Android changes</li>
</ul></li>
<li><a
href="https://source.android.com/docs/devices/cuttlefish/get-started"
class="uri">https://source.android.com/docs/devices/cuttlefish/get-started</a>
<ul>
<li>Google’s official guide for getting started with the Cuttlefish
emulator</li>
</ul></li>
<li><a href="https://source.android.com/docs/setup/build/building"
class="uri">https://source.android.com/docs/setup/build/building</a>
<ul>
<li>Google’s official guide for building AOSP images</li>
</ul></li>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/blob/9705df53408777d493eab19e5a58c432c1e75acb/docs/drivers/llvmpipe.rst"
class="uri">https://gitlab.freedesktop.org/mesa/mesa/-/blob/9705df53408777d493eab19e5a58c432c1e75acb/docs/drivers/llvmpipe.rst</a>
<ul>
<li>My updated documentation in MR for llvmpipe</li>
</ul></li>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/blob/9705df53408777d493eab19e5a58c432c1e75acb/docs/android.rst"
class="uri">https://gitlab.freedesktop.org/mesa/mesa/-/blob/9705df53408777d493eab19e5a58c432c1e75acb/docs/android.rst</a>
<ul>
<li>My updated documentation in MR for Android integration in mesa</li>
</ul></li>
</ul>
</description><pubDate>Thu, 27 Jun 2024 04:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/android_swrast.html</guid></item><item><title>2024 Graphics Team Contributions at Igalia</title><link>https://fryzekconcepts.com/notes/2024_igalia_graphics_team.html</link><description><p>2024 has been an exciting year for the <a
href="https://www.igalia.com/technology/graphics">Igalia’s Graphics
Team</a>. We’ve been making a lot of progress on Turnip, AMD display
driver, the Raspberry Pi graphics stack, Vulkan video, and more.</p>
<h2 id="vulkan-device-generated-commands">Vulkan Device Generated
Commands</h2>
<p>Igalia’s Ricardo Garcia has been working hard on adding support for
the new <code>VK_EXT_device_generated_commands</code> extension in the
Vulkan Conformance Test Suite. He wrote an excellent blog post on the
extension and on his work that you can read <a
href="https://rg3.name/202409270942.html">here</a>. Ricardo also
presented the extension at XDC 2024 in Montréal, which he also <a
href="https://rg3.name/202411181555.html">blogged about</a>. Take a look
and see what generating Vulkan commands directly on the GPU looks
like!</p>
<h2 id="raspberry-pi-enhancements-performance-improvements">Raspberry Pi
Enhancements &amp; Performance Improvements</h2>
<p>Our very own Maíra Canal made a big contribution to improve the
graphics performance of Raspberry Pi 4 &amp; 5 devices by introducing
support for “Super Pages”. She wrote an excellent and detailed blog post
on what Super Pages are, how they improve performance, and comparing
performance of different apps and games. You can read all the juicy
details <a
href="https://mairacanal.github.io/unleashing-power-enabling-super-pages-on-RPi/">here</a>.</p>
<p>She also worked on introducing CPU jobs to the Broadcom GPU kernel
driver in Linux. These changes allow user space to implement jobs that
get executed on the CPU in sync with the work on the GPU. She wrote a
great blog post detailing what CPU jobs allow you to do and how they
work that you can read <a
href="https://mairacanal.github.io/introducing-cpu-jobs-to-the-rpi/">here</a>.</p>
<p>Christian Gmeiner on the Graphics team has also been working on
adding Perfetto support to Broadcom GPUs. Perfetto is a performance
tracing tool and support for it in Broadcom drivers will allow to
developers to gain more insight into bottlenecks of their GPU
applications. You can check out his changes to add support in the
following MRs: - <a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31575">MR
31575</a> - <a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32277">MR
32277</a> - <a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31751">MR
31751</a></p>
<p>The Raspberry Pi team here at Igalia presented all of their work at
XDC 2024 in Montréal. You can see a video below.</p>
<p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/tlSFHkp6ODM"></iframe></div></p>
<h2 id="linux-kernel-6.8">Linux Kernel 6.8</h2>
<p>A number of Igalians made several contributions to the Linux 6.8
kernel release back in March of this year. Our colleague Maíra wrote a
great blog post outlining these contributions that you can read <a
href="https://mairacanal.github.io/introducing-cpu-jobs-to-the-rpi/">here</a>.
To highlight some of these contributions:</p>
<ul>
<li>AMD HDR &amp; Color Management
<ul>
<li>Melissa Wen has been working on improving and implementing HDR
support in AMD’s display driver as well as working on color management
in the Linux display stack.</li>
</ul></li>
<li>Async Flip
<ul>
<li>André Almeida implemented support for asynchronous page flip in the
atomic DRM modesetting API.</li>
</ul></li>
<li>V3D 7.1.x Kernel Driver
<ul>
<li>Iago Toral contributed a number of patches upstream to get the
Broadcom DRM driver working with the latest Broadcom hardware used in
the Raspberry Pi 5.</li>
</ul></li>
<li>GPU stats for the Raspberry Pi 4/5
<ul>
<li>José María “Chema” Casanova worked on adding GPU stats support to
the latest Raspberry Pi hardware.</li>
</ul></li>
</ul>
<h2 id="turnip-improvements">Turnip Improvements</h2>
<p>Dhruv Mark Collins has been very hard at work to try and bring
performance parity between Qualcomm’s proprietary driver and the open
source Turnip driver. Two of his big contributions to this were
improving the 2D buffer to image copies on A7XX devices, and
implementing unidirectional Low Resolution Z (LRZ) on A7XX devices. You
can see the MR for these changes <a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31401">here</a>
and <a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29453">here</a>.</p>
<p>A new member of the Igalia Graphics Team Karmjit Mahil has been
working on different parts of the Turnip stack, but one notable
improvement he made was to improve <code>fmulz</code> handling for
Direct3D 9. You can check out his changes <a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31479">here</a>
and read more about them.</p>
<p>Danylo Piliaiev has been hard at work adding support for the latest
generation of Adreno GPUs. This included getting support for the <a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934">A750
working</a>, and then implementing performance improvements to bring it
up to parity with other Adreno GPUs in Turnip. He also worked on
implementing a number of Vulkan extensions and performance improvements
such as:</p>
<ul>
<li>VK_KHR_shader_atomic_int64
<ul>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27776">MR
27776</a></li>
</ul></li>
<li>VK_KHR_fragment_shading_rate
<ul>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30905">MR
30905</a></li>
</ul></li>
<li>VK_KHR_8bit_storage
<ul>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28254">MR
28254</a></li>
</ul></li>
<li>shaderInt8 feature
<ul>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29875">MR
29875</a></li>
</ul></li>
<li>VK_KHR_shader_subgroup_rotate
<ul>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31358">MR
31358</a></li>
</ul></li>
<li>VK_EXT_map_memory_placed
<ul>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28928">MR
28928</a></li>
</ul></li>
<li>VK_EXT_legacy_dithering
<ul>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30536">MR
30536</a></li>
</ul></li>
<li>VK_EXT_depth_clamp_zero_one
<ul>
<li><a
href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29387">MR
29387</a></li>
</ul></li>
</ul>
<h2 id="display-next-hackfest-displaykms-meet-up">Display Next Hackfest
&amp; Display/KMS Meet-up</h2>
<p>Igalia hosted the 2024 version of the Display Next Hackfest. This
community event is a way to get Linux display developers together to
work on improving the Linux display stack. Our Melissa Wen wrote a blog
post about the event and what it was like to organize it. You can read
all about it <a
href="https://melissawen.github.io/blog/2024/09/25/reflections-2024-display-next-hackfest">here</a>.</p>
<figure>
<img
src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/refs/heads/master/img/2024-ldnh/hackfest-room-0.jpg"
alt="Display Next Hackfest" />
<figcaption aria-hidden="true">Display Next Hackfest</figcaption>
</figure>
<p>Just in-case you thought you couldn’t get enough Linux display stack,
Melissa also helped organize a Display/KMS meet-up at XDC 2024. She
wrote all about that meet-up and the progress the community made on her
blog <a
href="https://melissawen.github.io/blog/2024/11/19/summary-display-kms-meeting-xdc2024">here</a>.</p>
<h2 id="amd-display-amdgpu">AMD Display &amp; AMDGPU</h2>
<p>Melissa Wen has also been hard at work improving AMDGPU’s display
driver. She made a number of changes including <a
href="https://lore.kernel.org/amd-gfx/ea31b795-5b75-40e6-846e-51dc6696f8bc@amd.com/#t">improving
display debug log</a> to include hardware color capabilities, <a
href="https://lore.kernel.org/amd-gfx/20240927230600.2619844-1-superm1@kernel.org/">Migrating
EDID handling to EDID common code</a> and various bug fixes such as:</p>
<ul>
<li>Fixing null-pointer dereference on edid reading
<ul>
<li><a
href="https://lore.kernel.org/amd-gfx/20240216122401.216860-1-mwen@igalia.com/">https://lore.kernel.org/amd-gfx/20240216122401.216860-1-mwen@igalia.com/</a></li>
</ul></li>
<li>Checking dc_link before dereferencing
<ul>
<li><a
href="https://lore.kernel.org/amd-gfx/20240227190828.444715-1-mwen@igalia.com/">https://lore.kernel.org/amd-gfx/20240227190828.444715-1-mwen@igalia.com/</a></li>
</ul></li>
<li>Using mpcc_count to log MPC state
<ul>
<li><a
href="https://lore.kernel.org/amd-gfx/20240412163928.118203-1-mwen@igalia.com/">https://lore.kernel.org/amd-gfx/20240412163928.118203-1-mwen@igalia.com/</a></li>
</ul></li>
<li>Fixing cursor offset on rotation 180
<ul>
<li><a
href="https://lore.kernel.org/amd-gfx/20240807075546.831208-22-chiahsuan.chung@amd.com/">https://lore.kernel.org/amd-gfx/20240807075546.831208-22-chiahsuan.chung@amd.com/</a></li>
</ul></li>
<li>Fixes for kernel crashes since cursor overlay mode
<ul>
<li><a
href="https://lore.kernel.org/amd-gfx/20241217205029.39850-1-mwen@igalia.com/">https://lore.kernel.org/amd-gfx/20241217205029.39850-1-mwen@igalia.com/</a></li>
</ul></li>
</ul>
<p>Tvrtko Ursulin, a recent addition to our team, has been working on
fixing issues in AMDGPU and some of the Linux kernel’s common code. For
example, he worked on fixing bugs in the DRM scheduler around missing
locks, optimizing the re-lock cycle on the submit path, and cleaned up
the code. On AMDGPU he worked on improving memory usage reporting,
fixing out of bounds writes, and micro-optimized ring emissions. For DMA
fence he simplified fence merging and resolved a potential memory leak.
Lastly, on workqueue he fixed false positive sanity check warnings that
AMDGPU &amp; DRM scheduler interactions were triggering. You can see the
code for some of changes below: - <a
href="https://lore.kernel.org/amd-gfx/20240906180639.12218-1-tursulin@igalia.com/">https://lore.kernel.org/amd-gfx/20240906180639.12218-1-tursulin@igalia.com/</a>
- <a
href="https://lore.kernel.org/amd-gfx/20241008150532.23661-1-tursulin@igalia.com/">https://lore.kernel.org/amd-gfx/20241008150532.23661-1-tursulin@igalia.com/</a>
- <a
href="https://lore.kernel.org/amd-gfx/20241227111938.22974-1-tursulin@igalia.com/">https://lore.kernel.org/amd-gfx/20241227111938.22974-1-tursulin@igalia.com/</a>
- <a
href="https://lore.kernel.org/amd-gfx/20240813135712.82611-1-tursulin@igalia.com/">https://lore.kernel.org/amd-gfx/20240813135712.82611-1-tursulin@igalia.com/</a>
- <a
href="https://lore.kernel.org/amd-gfx/20240712152855.45284-1-tursulin@igalia.com/">https://lore.kernel.org/amd-gfx/20240712152855.45284-1-tursulin@igalia.com/</a></p>
<h2 id="vulkan-opengl-extensions">Vulkan &amp; OpenGL Extensions</h2>
<ul>
<li><code>GL_EXT_texture_offset_non_const</code>
<ul>
<li>Ricardo was busy working on extending OpenGL by adding <a
href="https://github.com/KhronosGroup/GLSL/blob/main/extensions/ext/GL_EXT_texture_offset_non_const.txt">this</a>
extension to GLSL as well as providing an implementation for it in <a
href="https://github.com/KhronosGroup/glslang/pull/3782">glslang</a></li>
</ul></li>
<li><code>VK_KHR_video_encode_av1</code> &amp;
<code>VK_KHR_video_decode_av1</code>
<ul>
<li>Igalia is listed as a contributor to these extensions and worked
very hard to implement CTS support for the extensions.</li>
</ul></li>
</ul>
<h2 id="etnaviv-improvements">Etnaviv Improvements</h2>
<p>Christian Gmeiner, one of the maintainers of the Etnaviv driver for
Vivante GPUs, has been hard at work this year to make a number of big
improvements to Etnaviv. This includes using hwdb to detect GPU
features, which he wrote about <a
href="https://christian-gmeiner.info/2024-04-12-hwdb/">here</a>. Another
big improvement was migrating Etnaviv to use isaspec for the GPU isa
description, allowing an assembler and disassembler to be generated from
XML. This also allowed Etnaviv to reuse some common features in Mesa for
assemblers/disassemblers and take advantage of the python code
generation features others in the community have been working on. He
wrote a detailed blog about it, that you can find <a
href="https://christian-gmeiner.info/2024-07-11-it-all-started-with-a-nop-part1/">here</a>.
On the same vein of Etnaviv infrastructure improvements, Christian has
also been working on a new shader compiler, written in Rust, called
“EBC”. Christian presented this new shader compiler at XDC 2024 this
year. You can check out his presentation below.</p>
<p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/n_fn4evXeZo"></iframe></div></p>
<p>On the side of new features, Christian landed a big one in Mesa 24.03
for Etnaviv: Multiple Render Target (MRT) support! This allows games and
applications to render to multiple render targets (think framebuffers)
in a single graphics operations. This feature is heavily used by
deferred rendering techniques, and is a requirement for later versions
of desktop OpenGL and OpenGL ES 3. Keep an eye on <a
href="https://christian-gmeiner.info/">Christian’s blog</a> to see any
of his future announcements.</p>
<h2 id="lavapipellvmpipe-android-chromeos">Lavapipe/LLVMpipe, Android
&amp; ChromeOS</h2>
<p>I had a busy year working on improving Lavapipe/LLVMpipe platform
integration. This started with adding support for DMABUF import/export,
so that the display handles from Android Window system could be properly
imported and mapped. Next came Android window system integration for DRI
software rendering backend in EGL, and lastly but most importantly came
updating the documentation in Mesa for building Android support. I wrote
all about this effort <a href="/notes/android_swrast.html">here</a>.</p>
<p>The latter half on the year had me working on improving lavapipe’s
integration with ChromeOs, and having Lavapipe work as a host Vulkan
driver for Venus. You can see some of the changes I made in
virglrenderer <a
href="https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1458">here</a>
and crosvm <a
href="https://gitlab.freedesktop.org/Hazematman/crosvm/-/commit/9ee86e72edfb3a652148dd233ffca75847949558">here</a>.
This work is still ongoing.</p>
<h2 id="whats-next">What’s Next?</h2>
<p>We’re not planning to stop our 2024 momentum, and we’re hopping for
2025 to be a great year for Igalia and the Linux graphics stack! I’m
booked to present about <a
href="https://www.vulkan.org/events/vulkanised-2025#agenda">Lavapipe at
Vulkanised 2025</a>, where Ricardo will also present about
Device-Generated Commands. Maíra &amp; Chema will be presenting together
at FOSDEM 2025 about improving performance on Raspberry Pi GPUs, and
Melissa will also present about kworkflow there. We’ll also be at XDC
2025, networking and presenting about all the work we are doing on the
Linux graphics stack. Thanks for following our work this year, and
here’s to making 2025 an even better year for Linux graphics!</p>
</description><pubDate>Fri, 20 Dec 2024 05:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/2024_igalia_graphics_team.html</guid></item></channel></rss>
|