diff options
author | Lucas Fryzek <lucas.fryzek@gmail.com> | 2023-04-28 16:58:01 -0400 |
---|---|---|
committer | Lucas Fryzek <lucas.fryzek@gmail.com> | 2023-04-28 16:58:01 -0400 |
commit | cb31a3788c74cea253a5a7657f5c9b65341309f6 (patch) | |
tree | 44b60ea66fb0790eabe63a5659dfe1a48e6dbd80 | |
parent | 8cb244ed5b535139dcaa6a11a4edca73a84afaf5 (diff) |
Add social link to site
-rw-r--r-- | html/about.html | 21 | ||||
-rw-r--r-- | html/assets/style.css | 1 | ||||
-rw-r--r-- | html/feed.xml | 1247 | ||||
-rw-r--r-- | html/graphics_feed.xml | 469 | ||||
-rw-r--r-- | html/index.html | 39 | ||||
-rw-r--r-- | html/notes/2022_igalia_graphics_team.html | 267 | ||||
-rw-r--r-- | html/notes/baremetal-risc-v.html | 178 | ||||
-rw-r--r-- | html/notes/digital_garden.html | 25 | ||||
-rw-r--r-- | html/notes/freedreno_journey.html | 207 | ||||
-rw-r--r-- | html/notes/generating-video.html | 208 | ||||
-rw-r--r-- | html/notes/global_game_jam_2023.html | 109 | ||||
-rw-r--r-- | html/notes/n64brew-gamejam-2021.html | 143 | ||||
-rw-r--r-- | html/notes/rasterizing-triangles.html | 121 | ||||
-rw-r--r-- | html/now.html | 20 | ||||
-rw-r--r-- | templates/main.html | 1 |
15 files changed, 2635 insertions, 421 deletions
diff --git a/html/about.html b/html/about.html index 6b3d7c5..8347270 100644 --- a/html/about.html +++ b/html/about.html @@ -20,6 +20,7 @@ <div class="header-links"> <a href="/now.html" class="header-link">Now</a> <a href="/about.html" class="header-link">About</a> + <a rel="me" href="https://mastodon.social/@hazematman">Social</a> </div> </div> <main> @@ -31,17 +32,29 @@ <div class="note-body"> <p><img class="page-self-image" src="/assets/me.jpg"></p> <p>Hello my name is Lucas Fryzek, and welcome to my website!</p> -<p>I’m a software developer with specific interests in computer graphics, GPUs, embedded systems, and operating systems. I’m currently working with Igalia’s graphics team working on the open source driver stack in mesa. In my free time I enjoy working on personal projects (including this website). I plan to use this site to share information about projects I’m working on, or just generally cool stuff I’m interested in.</p> +<p>I’m a software developer with specific interests in computer +graphics, GPUs, embedded systems, and operating systems. I’m currently +working with Igalia’s graphics team working on the open source driver +stack in mesa. In my free time I enjoy working on personal projects +(including this website). I plan to use this site to share information +about projects I’m working on, or just generally cool stuff I’m +interested in.</p> <div class="social-links-container"> <ul class="social-media-list"> <li> -<a href="https://github.com/Hazematman"> <svg class="svg-icon"><use xlink:href="/assets/minima-social-icons.svg#github"></use></svg> <span>Hazematman</span> </a> +<a href="https://github.com/Hazematman"> +<svg class="svg-icon"><use xlink:href="/assets/minima-social-icons.svg#github"></use></svg> +<span>Hazematman</span> </a> </li> <li> -<a href="https://www.linkedin.com/in/lucas-fryzek"> <svg class="svg-icon"><use xlink:href="/assets/minima-social-icons.svg#linkedin"></use></svg> <span>lucas-fryzek</span> </a> +<a href="https://www.linkedin.com/in/lucas-fryzek"> +<svg class="svg-icon"><use xlink:href="/assets/minima-social-icons.svg#linkedin"></use></svg> +<span>lucas-fryzek</span> </a> </li> <li> -<a href="https://www.twitter.com/Hazematman"> <svg class="svg-icon"><use xlink:href="/assets/minima-social-icons.svg#twitter"></use></svg> <span>Hazematman</span> </a> +<a href="https://www.twitter.com/Hazematman"> +<svg class="svg-icon"><use xlink:href="/assets/minima-social-icons.svg#twitter"></use></svg> +<span>Hazematman</span> </a> </li> </ul> </div> diff --git a/html/assets/style.css b/html/assets/style.css index 77004f5..4bbd14b 100644 --- a/html/assets/style.css +++ b/html/assets/style.css @@ -254,6 +254,7 @@ img.rss .header-links { margin-left: auto; + margin-right: 1em; } a.header-link diff --git a/html/feed.xml b/html/feed.xml index dac7e7b..8bc7f6b 100644 --- a/html/feed.xml +++ b/html/feed.xml @@ -1,17 +1,33 @@ <?xml version='1.0' encoding='UTF-8'?> -<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Fryzek Concepts</title><atom:link href="https://fryzekconcepts.com/feed.xml" rel="self" type="application/rss+xml"/><link>https://fryzekconcepts.com</link><description>Lucas is a developer working on cool things</description><lastBuildDate>Sun, 02 Apr 2023 13:27:21 -0000</lastBuildDate><item><title>Generating Video</title><link>https://fryzekconcepts.com/notes/generating-video.html</link><description><p>One thing I’m very interested in is computer graphics. This could be complex 3D graphics or simple 2D graphics. The idea of getting a computer to display visual data fascinates me. One fundamental part of showing visual data is interfacing with a computer monitor. This can be accomplished by generating a video signal that the monitor understands. Below I have written instructions on how an FPGA can be used to generate a video signal. I have specifically worked with the iCEBreaker FPGA but the theory contained within this should work with any FPGA or device that you can generate the appropriate timings for.</p> +<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Fryzek Concepts</title><atom:link href="https://fryzekconcepts.com/feed.xml" rel="self" type="application/rss+xml"/><link>https://fryzekconcepts.com</link><description>Lucas is a developer working on cool things</description><lastBuildDate>Fri, 28 Apr 2023 20:57:13 -0000</lastBuildDate><item><title>Generating Video</title><link>https://fryzekconcepts.com/notes/generating-video.html</link><description><p>One thing I’m very interested in is computer graphics. This could be +complex 3D graphics or simple 2D graphics. The idea of getting a +computer to display visual data fascinates me. One fundamental part of +showing visual data is interfacing with a computer monitor. This can be +accomplished by generating a video signal that the monitor understands. +Below I have written instructions on how an FPGA can be used to generate +a video signal. I have specifically worked with the iCEBreaker FPGA but +the theory contained within this should work with any FPGA or device +that you can generate the appropriate timings for.</p> <h3 id="tools">Tools</h3> -<p>Hardware used (<a href="https://www.crowdsupply.com/1bitsquared/icebreaker-fpga">link for board</a>):</p> +<p>Hardware used (<a +href="https://www.crowdsupply.com/1bitsquared/icebreaker-fpga">link for +board</a>):</p> <ul> <li>iCEBreaker FPGA</li> <li>iCEBreaker 12-Bit DVI Pmod</li> </ul> <p>Software Used:</p> <ul> -<li>IceStorm FPGA toolchain (<a href="https://github.com/esden/summon-fpga-tools">follow install instructions here</a>)</li> +<li>IceStorm FPGA toolchain (<a +href="https://github.com/esden/summon-fpga-tools">follow install +instructions here</a>)</li> </ul> <h3 id="theory">Theory</h3> -<p>A video signal is composed of several parts, primarily the colour signals and the sync signals. For this DVI Pmod, there is also a data enable signal for the visible screen area. For the example here we are going to be generating a 640x480 60 Hz video signal. Below is a table describing the important data for our video signal.</p> +<p>A video signal is composed of several parts, primarily the colour +signals and the sync signals. For this DVI Pmod, there is also a data +enable signal for the visible screen area. For the example here we are +going to be generating a 640x480 60 Hz video signal. Below is a table +describing the important data for our video signal.</p> <table> <tbody> <tr> @@ -108,21 +124,65 @@ Vertical Back Porch Length <p>The data from this table raises a few questions:</p> <ol type="1"> <li>What is the Pixel Clock?</li> -<li>What is the difference between “Pixels/Lines” and “Visible Pixels/Lines”?</li> +<li>What is the difference between “Pixels/Lines” and “Visible +Pixels/Lines”?</li> <li>What is “Front Porch”, “Sync”, and “Back Porch”?</li> </ol> <h4 id="pixel-clock">Pixel Clock</h4> -<p>The pixel clock is a fairly straightforward idea; this is the rate at which we generate pixels. For video signal generation, the “pixel” is a fundamental building block and we count things in the number of pixels it takes up. Every time the pixel clock “ticks” we have incremented the number of pixels we have processed. So for a 640x480 video signal, a full line is 800 pixels, or 800 clock ticks. For the full 800x525 frame there is 800 ticks x 525 lines, or 420000 clock ticks. If we are running the display at 60 Hz, 420000 pixels per frame are generated 60 times per second. Therefore, 25200000 pixels or clock ticks will pass in one second. From this we can see the pixel clock frequency of 25.175 MHz is roughly equal to 25200000 clock ticks. There is a small deviance from the “true” values here, but monitors are flexible enough to accept this video signal (my monitor reports it as 640x480@60Hz), and all information I can find online says that 25.175 MHz is the value you want to use. Later on we will see that the pixel clock is not required to be exactly 25.175 Mhz.</p> -<h4 id="visible-area-vs-invisible-area">Visible Area vs Invisible Area</h4> -<p><img src="/assets/2020-04-07-generating-video/visible_invisible.png" /></p> -<p>From the above image we can see that a 640x480 video signal actually generates a resolution larger than 640x480. The true resolution we generate is 800x525, but only a 640x480 portion of that signal is visible. The area that is not visible is where we generate the sync signal. In other words, every part of the above image that is black is where a sync signal is being generated.</p> -<h4 id="front-porch-back-porch-sync">Front Porch, Back Porch &amp; Sync</h4> -<p>To better understand the front porch, back porch and sync signal, let’s look at what the horizontal sync signal looks like during the duration of a line:</p> +<p>The pixel clock is a fairly straightforward idea; this is the rate at +which we generate pixels. For video signal generation, the “pixel” is a +fundamental building block and we count things in the number of pixels +it takes up. Every time the pixel clock “ticks” we have incremented the +number of pixels we have processed. So for a 640x480 video signal, a +full line is 800 pixels, or 800 clock ticks. For the full 800x525 frame +there is 800 ticks x 525 lines, or 420000 clock ticks. If we are running +the display at 60 Hz, 420000 pixels per frame are generated 60 times per +second. Therefore, 25200000 pixels or clock ticks will pass in one +second. From this we can see the pixel clock frequency of 25.175 MHz is +roughly equal to 25200000 clock ticks. There is a small deviance from +the “true” values here, but monitors are flexible enough to accept this +video signal (my monitor reports it as 640x480@60Hz), and all +information I can find online says that 25.175 MHz is the value you want +to use. Later on we will see that the pixel clock is not required to be +exactly 25.175 Mhz.</p> +<h4 id="visible-area-vs-invisible-area">Visible Area vs Invisible +Area</h4> +<p><img +src="/assets/2020-04-07-generating-video/visible_invisible.png" /></p> +<p>From the above image we can see that a 640x480 video signal actually +generates a resolution larger than 640x480. The true resolution we +generate is 800x525, but only a 640x480 portion of that signal is +visible. The area that is not visible is where we generate the sync +signal. In other words, every part of the above image that is black is +where a sync signal is being generated.</p> +<h4 id="front-porch-back-porch-sync">Front Porch, Back Porch &amp; +Sync</h4> +<p>To better understand the front porch, back porch and sync signal, +let’s look at what the horizontal sync signal looks like during the +duration of a line:</p> <p><img src="/assets/2020-04-07-generating-video/sync.png" /></p> -<p>From this we can see that the “Front Porch” is the invisible pixels between the visible pixels and the sync pixels, and is represented by a logical one or high signal. The “Sync” is the invisible pixels between the front porch and back porch, and is represented by a logical zero or low signal. The “Back Porch” is the invisible pixels after the sync signal, and is represented by a logical one. For the case of 640x480 video, the visible pixel section lasts for 640 pixels. The front porch section lasts for 16 pixels, after which the sync signal will become a logical zero. This logical zero sync will last for 96 pixels, after which the sync signal will become a logical one again. The back porch will then last for 48 pixels. If you do a quick calculation right now of 640 + 16 + 96 + 48, we get 800 pixels which represents the full horizontal resolution of the display. The vertical sync signal works almost exactly the same, except the vertical sync signal acts on lines.</p> +<p>From this we can see that the “Front Porch” is the invisible pixels +between the visible pixels and the sync pixels, and is represented by a +logical one or high signal. The “Sync” is the invisible pixels between +the front porch and back porch, and is represented by a logical zero or +low signal. The “Back Porch” is the invisible pixels after the sync +signal, and is represented by a logical one. For the case of 640x480 +video, the visible pixel section lasts for 640 pixels. The front porch +section lasts for 16 pixels, after which the sync signal will become a +logical zero. This logical zero sync will last for 96 pixels, after +which the sync signal will become a logical one again. The back porch +will then last for 48 pixels. If you do a quick calculation right now of +640 + 16 + 96 + 48, we get 800 pixels which represents the full +horizontal resolution of the display. The vertical sync signal works +almost exactly the same, except the vertical sync signal acts on +lines.</p> <h3 id="implementation">Implementation</h3> -<p>The first thing we can do that is going to simplify a lot of the following logic is to keep track of which pixel, and which line we are on. The below code block creates two registers to keep track of the current pixel on the line (column) and the current line (line):</p> -<div class="sourceCode" id="cb1"><pre class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">9</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> line<span class="op">;</span></span> +<p>The first thing we can do that is going to simplify a lot of the +following logic is to keep track of which pixel, and which line we are +on. The below code block creates two registers to keep track of the +current pixel on the line (column) and the current line (line):</p> +<div class="sourceCode" id="cb1"><pre +class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">9</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> line<span class="op">;</span></span> <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">9</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> column<span class="op">;</span></span> <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span> <span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="kw">always</span> <span class="op">@(</span><span class="kw">posedge</span> clk <span class="dt">or</span> <span class="kw">posedge</span> reset<span class="op">)</span> <span class="kw">begin</span></span> @@ -144,16 +204,52 @@ Vertical Back Porch Length <span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span> <span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span> <span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"></a><span class="kw">end</span></span></code></pre></div> -<p>This block of Verilog works by first initializing the line and column register to zero on a reset. This is important to make sure that we start from known values, otherwise the line and column register could contain any value and our logic would not work. Next, we check if we are at the bottom of the screen by comparing the current column to 799 (the last pixel in the line) and the current line is 524 (the last line in the frame). If these conditions are both true then we reset the line and column back to zero to signify that we are starting a new frame. The next block checks if the current column equals 799. Because the above if statement failed,we know that we are at the end of the line but not the end of the frame. If this is true we increment the current line count and set the column back to zero to signify that we are starting a new line. The final block simply increments the current pixel count. If we reach this block ,we are neither at the end of the line or the end of the frame so we can simply increment to the next pixel.</p> -<p>Now that we are keeping track of the current column and current line, we can use this information to generate the horizontal and vertical sync signals. From the theory above we know that the sync signal is only low when we are between the front and back porch, at all other times the signal is high. From this we can generate the sync signal with an OR and two compares.</p> -<div class="sourceCode" id="cb2"><pre class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>logic horizontal_sync<span class="op">;</span></span> +<p>This block of Verilog works by first initializing the line and column +register to zero on a reset. This is important to make sure that we +start from known values, otherwise the line and column register could +contain any value and our logic would not work. Next, we check if we are +at the bottom of the screen by comparing the current column to 799 (the +last pixel in the line) and the current line is 524 (the last line in +the frame). If these conditions are both true then we reset the line and +column back to zero to signify that we are starting a new frame. The +next block checks if the current column equals 799. Because the above if +statement failed,we know that we are at the end of the line but not the +end of the frame. If this is true we increment the current line count +and set the column back to zero to signify that we are starting a new +line. The final block simply increments the current pixel count. If we +reach this block ,we are neither at the end of the line or the end of +the frame so we can simply increment to the next pixel.</p> +<p>Now that we are keeping track of the current column and current line, +we can use this information to generate the horizontal and vertical sync +signals. From the theory above we know that the sync signal is only low +when we are between the front and back porch, at all other times the +signal is high. From this we can generate the sync signal with an OR and +two compares.</p> +<div class="sourceCode" id="cb2"><pre +class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>logic horizontal_sync<span class="op">;</span></span> <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>logic vertical_sync<span class="op">;</span></span> <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> horizontal_sync <span class="op">=</span> column <span class="op">&lt;</span> <span class="dv">656</span> <span class="op">||</span> column <span class="op">&gt;=</span> <span class="dv">752</span><span class="op">;</span></span> <span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> vertical_sync <span class="op">=</span> line <span class="op">&lt;</span> <span class="dv">490</span> <span class="op">||</span> line <span class="op">&gt;=</span> <span class="dv">492</span><span class="op">;</span></span></code></pre></div> -<p>Let’s examine the horizontal sync signal more closely. This statement will evaluate to true if the current column is less than 656 or the current column is greater than or equal to 752. This means that the horizontal sync signal will be true except for when the current column is between 656 and 751 inclusively. That is starting on column 656 the horizontal sync signal will become false (low) and will remain that way for the next 96 pixels until we reach pixel 752 where it will return to being true (high). The vertical sync signal will work in the same way except it is turned on based on the current line. Therefore, the signal will remain high when the line is less than 490 and greater than or equal to 492, and will remain low between lines 490 and 491 inclusive.</p> +<p>Let’s examine the horizontal sync signal more closely. This statement +will evaluate to true if the current column is less than 656 or the +current column is greater than or equal to 752. This means that the +horizontal sync signal will be true except for when the current column +is between 656 and 751 inclusively. That is starting on column 656 the +horizontal sync signal will become false (low) and will remain that way +for the next 96 pixels until we reach pixel 752 where it will return to +being true (high). The vertical sync signal will work in the same way +except it is turned on based on the current line. Therefore, the signal +will remain high when the line is less than 490 and greater than or +equal to 492, and will remain low between lines 490 and 491 +inclusive.</p> <h4 id="putting-it-all-together">Putting It All Together</h4> -<p>Now that we have generated the video signal, we need to route it towards the video output connectors on the iCEBreaker 12-bit DVI Pmod. We also need to configure the iCEBreaker FPGA to have the appropriate pixel clock frequency. First to get the correct pixel clock we are going to use the following block of code:</p> -<div class="sourceCode" id="cb3"><pre class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>SB_PLL40_PAD #<span class="op">(</span></span> +<p>Now that we have generated the video signal, we need to route it +towards the video output connectors on the iCEBreaker 12-bit DVI Pmod. +We also need to configure the iCEBreaker FPGA to have the appropriate +pixel clock frequency. First to get the correct pixel clock we are going +to use the following block of code:</p> +<div class="sourceCode" id="cb3"><pre +class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>SB_PLL40_PAD #<span class="op">(</span></span> <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> .DIVR<span class="op">(</span><span class="bn">4&#39;b0000</span><span class="op">),</span></span> <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> .DIVF<span class="op">(</span><span class="bn">7&#39;b1000010</span><span class="op">),</span></span> <span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> .DIVQ<span class="op">(</span><span class="bn">3&#39;b101</span><span class="op">),</span></span> @@ -175,7 +271,13 @@ Vertical Back Porch Length <span id="cb3-20"><a href="#cb3-20" aria-hidden="true" tabindex="-1"></a> .BYPASS<span class="op">(</span><span class="bn">1&#39;b0</span><span class="op">),</span></span> <span id="cb3-21"><a href="#cb3-21" aria-hidden="true" tabindex="-1"></a> .LATCHINPUTVALUE<span class="op">(),</span></span> <span id="cb3-22"><a href="#cb3-22" aria-hidden="true" tabindex="-1"></a><span class="op">);</span></span></code></pre></div> -<p>This block is mainly a copy paste of the PLL setup code from the iCEBreaker examples, but with a few important changes. The DIVR, DIVF, and DIVQ values are changed to create a 25.125 MHz. This is not exactly 25.175 MHz, but it is close enough that the monitor is happy enough and recognizes it as a 640x480@60 Hz signal. These values were found through the “icepll” utility, below is an example of calling this utility from the command line:</p> +<p>This block is mainly a copy paste of the PLL setup code from the +iCEBreaker examples, but with a few important changes. The DIVR, DIVF, +and DIVQ values are changed to create a 25.125 MHz. This is not exactly +25.175 MHz, but it is close enough that the monitor is happy enough and +recognizes it as a 640x480@60 Hz signal. These values were found through +the “icepll” utility, below is an example of calling this utility from +the command line:</p> <pre><code>$ icepll -i 12 -o 25.175 F_PLLIN: 12.000 MHz (given) @@ -191,8 +293,14 @@ DIVF: 66 (7&#39;b1000010) DIVQ: 5 (3&#39;b101) FILTER_RANGE: 1 (3&#39;b001)</code></pre> -<p>From here we can see we had an input clock of 12 MHz (This comes from the FTDI chip on the iCEBreaker board), and we wanted to get a 25.175 MHz output clock. The closest the PLL could generate was a 25.125 MHz clock with the settings provided for the DIVR, DIVF, and DIVQ values.</p> -<p>Now that we have a pixel clock we can wire up the necessary signals for the DVI video out. The DVI Pmod has the following mapping for all of its connectors:</p> +<p>From here we can see we had an input clock of 12 MHz (This comes from +the FTDI chip on the iCEBreaker board), and we wanted to get a 25.175 +MHz output clock. The closest the PLL could generate was a 25.125 MHz +clock with the settings provided for the DIVR, DIVF, and DIVQ +values.</p> +<p>Now that we have a pixel clock we can wire up the necessary signals +for the DVI video out. The DVI Pmod has the following mapping for all of +its connectors:</p> <table> <tbody> <tr> @@ -321,8 +429,16 @@ Vertical Sync </tr> </tbody> </table> -<p>From this we can see that we need 4 bits for each colour channel, a horizontal sync signal, a vertical sync signal, and additionally a data enable signal. The data enable signal is not part of a standard video signal and is just used by the DVI transmitter chip on the Pmod to signify when we are in visible pixel area or invisible pixel area. Therefore we will set the Date enable line when the current column is less than 640 and the current line is less than 480. Based on this we can connect the outputs like so:</p> -<div class="sourceCode" id="cb5"><pre class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">3</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> r<span class="op">;</span></span> +<p>From this we can see that we need 4 bits for each colour channel, a +horizontal sync signal, a vertical sync signal, and additionally a data +enable signal. The data enable signal is not part of a standard video +signal and is just used by the DVI transmitter chip on the Pmod to +signify when we are in visible pixel area or invisible pixel area. +Therefore we will set the Date enable line when the current column is +less than 640 and the current line is less than 480. Based on this we +can connect the outputs like so:</p> +<div class="sourceCode" id="cb5"><pre +class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">3</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> r<span class="op">;</span></span> <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">3</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> g<span class="op">;</span></span> <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">3</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> b<span class="op">;</span></span> <span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>logic data_enable<span class="op">;</span></span> @@ -331,12 +447,18 @@ Vertical Sync <span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a> <span class="op">{</span>r<span class="op">[</span><span class="dv">3</span><span class="op">],</span> r<span class="op">[</span><span class="dv">2</span><span class="op">],</span> g<span class="op">[</span><span class="dv">3</span><span class="op">],</span> g<span class="op">[</span><span class="dv">2</span><span class="op">],</span> r<span class="op">[</span><span class="dv">1</span><span class="op">],</span> r<span class="op">[</span><span class="dv">0</span><span class="op">],</span> g<span class="op">[</span><span class="dv">1</span><span class="op">],</span> g<span class="op">[</span><span class="dv">0</span><span class="op">]};</span></span> <span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> <span class="op">{</span>P1B1<span class="op">,</span> P1B2<span class="op">,</span> P1B3<span class="op">,</span> P1B4<span class="op">,</span> P1B7<span class="op">,</span> P1B8<span class="op">,</span> P1B9<span class="op">,</span> P1B10<span class="op">}</span> <span class="op">=</span> </span> <span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a> <span class="op">{</span>b<span class="op">[</span><span class="dv">3</span><span class="op">],</span> pixel_clock<span class="op">,</span> b<span class="op">[</span><span class="dv">2</span><span class="op">],</span> horizontal_sync<span class="op">,</span> b<span class="op">[</span><span class="dv">1</span><span class="op">],</span> b<span class="op">[</span><span class="dv">0</span><span class="op">],</span> data_enable<span class="op">,</span> vertical_sync<span class="op">};</span></span></code></pre></div> -<p>Now for testing purposes we are going to set the output colour to be fixed to pure red so additional logic to pick a pixel colour is not required for this example. We can do this as shown below:</p> -<div class="sourceCode" id="cb6"><pre class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> r <span class="op">=</span> <span class="bn">4&#39;b1111</span><span class="op">;</span></span> +<p>Now for testing purposes we are going to set the output colour to be +fixed to pure red so additional logic to pick a pixel colour is not +required for this example. We can do this as shown below:</p> +<div class="sourceCode" id="cb6"><pre +class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> r <span class="op">=</span> <span class="bn">4&#39;b1111</span><span class="op">;</span></span> <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> g <span class="op">=</span> <span class="bn">4&#39;b0000</span><span class="op">;</span></span> <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> b <span class="op">=</span> <span class="bn">4&#39;b0000</span><span class="op">;</span></span></code></pre></div> -<p>Putting all of the above code together with whatever additional inputs are required for the iCEBreaker FPGA gives us the following block of code:</p> -<div class="sourceCode" id="cb7"><pre class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> top</span> +<p>Putting all of the above code together with whatever additional +inputs are required for the iCEBreaker FPGA gives us the following block +of code:</p> +<div class="sourceCode" id="cb7"><pre +class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> top</span> <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="op">(</span></span> <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="dt">input</span> CLK<span class="op">,</span></span> <span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a><span class="dt">output</span> LEDR_N<span class="op">,</span></span> @@ -429,76 +551,290 @@ Vertical Sync <span id="cb7-91"><a href="#cb7-91" aria-hidden="true" tabindex="-1"></a><span class="op">);</span></span> <span id="cb7-92"><a href="#cb7-92" aria-hidden="true" tabindex="-1"></a></span> <span id="cb7-93"><a href="#cb7-93" aria-hidden="true" tabindex="-1"></a><span class="kw">endmodule</span></span></code></pre></div> -<p>To build this, you will require a .pcf file describing the pin mapping of the iCEBreaker board. I grabbed mine from the iCEBreaker examples <a href="https://raw.githubusercontent.com/icebreaker-fpga/icebreaker-examples/master/icebreaker.pcf">here</a>. Grab that file and put it in the same folder as the file for the code provided above. We can the run the following commands to generate a binary to program onto the FPGA:</p> +<p>To build this, you will require a .pcf file describing the pin +mapping of the iCEBreaker board. I grabbed mine from the iCEBreaker +examples <a +href="https://raw.githubusercontent.com/icebreaker-fpga/icebreaker-examples/master/icebreaker.pcf">here</a>. +Grab that file and put it in the same folder as the file for the code +provided above. We can the run the following commands to generate a +binary to program onto the FPGA:</p> <pre><code>yosys -ql out.log -p &#39;synth_ice40 -top top -json out.json&#39; top.sv nextpnr-ice40 --up5k --json out.json --pcf icebreaker.pcf --asc out.asc icetime -d up5k -mtr out.rpt out.asc icepack out.asc out.bin</code></pre> -<p>This will generate an out.bin file that we will need to flash onto the board. Make sure your iCEBreaker FPGA is connected via USB to your computer and you can program it with the following commands.</p> +<p>This will generate an out.bin file that we will need to flash onto +the board. Make sure your iCEBreaker FPGA is connected via USB to your +computer and you can program it with the following commands.</p> <pre><code>iceprog out.bin</code></pre> -<p>Now connect up a video cable (my DVI Pmod has an HDMI connector, but it only carries the DVI video signal) to the board and monitor and you should get results like this:</p> -<p><img src="/assets/2020-04-07-generating-video/IMG_20200407_172119-1-1024x768.jpg" /></p> -<p>You can also see from the monitor settings menu that the video signal was recognized as 640x480@60 Hz. Now the code presented in this post is specific to the iCEBreaker board and the DVI Pmod, but the theory can be applied to any FPGA and any connector that uses a video signal like this. For example you could wire up a DAC with a resistor ladder to generate a VGA signal. The logic for the timings here would be exactly the same if you wanted a 640x480@60 Hz VGA signal.</p> -</description><pubDate>Tue, 07 Apr 2020 04:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/generating-video.html</guid></item><item><title>N64Brew GameJam 2021</title><link>https://fryzekconcepts.com/notes/n64brew-gamejam-2021.html</link><description><p>So this year, myself and two others decided to participate together in the N64Brew homebrew GameJam, where we were supposed to build a homebrew game that would run on a real Nintendo 64. The game jam took place from October 8th until December 8th and was the second GameJam in N64Brew history. Unfortunately, we never ended up finishing the game, but we did build a really cool tech demo. Our project was called “Bug Game”, and if you want to check it out you can find it <a href="https://hazematman.itch.io/bug-game">here</a>. To play the game you’ll need a flash cart to load it on a real Nintendo 64, or you can use an accurate emulator such as <a href="https://ares.dev/">ares</a> or <a href="https://github.com/n64dev/cen64">cen64</a>. The reason an accurate emulator is required is that we made use of this new open source 3D microcode for N64 called “<a href="https://github.com/snacchus/libdragon/tree/ugfx">ugfx</a>”, created by the user Snacchus. This microcode is part of the Libdragon project, which is trying to build a completely open source library and toolchain to build N64 games, instead of relying on the official SDK that has been leaked to the public through liquidation auctions of game companies that have shut down over the years.</p> +<p>Now connect up a video cable (my DVI Pmod has an HDMI connector, but +it only carries the DVI video signal) to the board and monitor and you +should get results like this:</p> +<p><img +src="/assets/2020-04-07-generating-video/IMG_20200407_172119-1-1024x768.jpg" /></p> +<p>You can also see from the monitor settings menu that the video signal +was recognized as 640x480@60 Hz. Now the code presented in this post is +specific to the iCEBreaker board and the DVI Pmod, but the theory can be +applied to any FPGA and any connector that uses a video signal like +this. For example you could wire up a DAC with a resistor ladder to +generate a VGA signal. The logic for the timings here would be exactly +the same if you wanted a 640x480@60 Hz VGA signal.</p> +</description><pubDate>Tue, 07 Apr 2020 04:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/generating-video.html</guid></item><item><title>N64Brew GameJam 2021</title><link>https://fryzekconcepts.com/notes/n64brew-gamejam-2021.html</link><description><p>So this year, myself and two others decided to participate together +in the N64Brew homebrew GameJam, where we were supposed to build a +homebrew game that would run on a real Nintendo 64. The game jam took +place from October 8th until December 8th and was the second GameJam in +N64Brew history. Unfortunately, we never ended up finishing the game, +but we did build a really cool tech demo. Our project was called +“Bug Game”, and if you want to check it out you can find it <a +href="https://hazematman.itch.io/bug-game">here</a>. To play the game +you’ll need a flash cart to load it on a real Nintendo 64, or you can +use an accurate emulator such as <a +href="https://ares.dev/">ares</a> or <a +href="https://github.com/n64dev/cen64">cen64</a>. The reason an accurate +emulator is required is that we made use of this new open source 3D +microcode for N64 called “<a +href="https://github.com/snacchus/libdragon/tree/ugfx">ugfx</a>”, +created by the user Snacchus. This microcode is part of the Libdragon +project, which is trying to build a completely open source library and +toolchain to build N64 games, instead of relying on the official SDK +that has been leaked to the public through liquidation auctions of game +companies that have shut down over the years.</p> <div class="gallery"> -<p><img src="/assets/2021-12-10-n64brew-gamejam-2021/bug_1.png" /> <img src="/assets/2021-12-10-n64brew-gamejam-2021/bug_2.png" /> <img src="/assets/2021-12-10-n64brew-gamejam-2021/bug_4.png" /> <img src="/assets/2021-12-10-n64brew-gamejam-2021/bug_5.png" /> <img src="/assets/2021-12-10-n64brew-gamejam-2021/bug_3.png" /></p> +<p><img src="/assets/2021-12-10-n64brew-gamejam-2021/bug_1.png" /> <img +src="/assets/2021-12-10-n64brew-gamejam-2021/bug_2.png" /> <img +src="/assets/2021-12-10-n64brew-gamejam-2021/bug_4.png" /> <img +src="/assets/2021-12-10-n64brew-gamejam-2021/bug_5.png" /> <img +src="/assets/2021-12-10-n64brew-gamejam-2021/bug_3.png" /></p> <p>Screenshots of Bug Game</p> </div> <h2 id="libdragon-and-ugfx">Libdragon and UGFX</h2> -<p>Ugfx was a brand new development in the N64 homebrew scene. By complete coincidence, Snacchus happened to release it on September 21st, just weeks before the GameJam was announced. There have been many attempts to create an open source 3D microcode for the N64 (my <a href="https://github.com/Hazematman/libhfx">libhfx</a> project included), but ugfx was the first project to complete with easily usable documentation and examples. This was an exciting development for the open source N64 brew community, as for the first time we could build 3D games that ran on the N64 without using the legally questionable official SDK. I jumped at the opportunity to use this and be one of the first fully 3D games running on Libdragon.</p> -<p>One of the “drawbacks” of ufgx was that it tried to follow a lot of the design decisions the official 3D microcode for Nintendo used. This made it easier for people familiar with the official SDK to jump ship over to libdragon, but also went against the philosophy of the libdragon project to provide simple easy to use APIs. The Nintendo 64 was notoriously difficult to develop for, and one of the reasons for that was because of the extremely low level interface that the official 3D microcodes provided. Honestly writing 3D graphics code on the N64 reminds me more of writing a 3D OpenGL graphics driver (like I do in my day job), than building a graphics application. Unnecessarily increasing the burden of entry to developing 3D games on the Nintendo 64. Now that ugfx has been released, there is an ongoing effort in the community to revamp it and build a more user friendly API to access the 3D functionality of the N64.</p> +<p>Ugfx was a brand new development in the N64 homebrew scene. By +complete coincidence, Snacchus happened to release it on September 21st, +just weeks before the GameJam was announced. There have been many +attempts to create an open source 3D microcode for the N64 (my <a +href="https://github.com/Hazematman/libhfx">libhfx</a> project +included), but ugfx was the first project to complete with easily usable +documentation and examples. This was an exciting development for the +open source N64 brew community, as for the first time we could build 3D +games that ran on the N64 without using the legally questionable +official SDK. I jumped at the opportunity to use this and be one of the +first fully 3D games running on Libdragon.</p> +<p>One of the “drawbacks” of ufgx was that it tried to follow a lot of +the design decisions the official 3D microcode for Nintendo used. This +made it easier for people familiar with the official SDK to jump ship +over to libdragon, but also went against the philosophy of the libdragon +project to provide simple easy to use APIs. The Nintendo 64 was +notoriously difficult to develop for, and one of the reasons for that +was because of the extremely low level interface that the official 3D +microcodes provided. Honestly writing 3D graphics code on the N64 +reminds me more of writing a 3D OpenGL graphics driver (like I do in my +day job), than building a graphics application. Unnecessarily increasing +the burden of entry to developing 3D games on the Nintendo 64. Now that +ugfx has been released, there is an ongoing effort in the community to +revamp it and build a more user friendly API to access the 3D +functionality of the N64.</p> <h2 id="ease-of-development">Ease of development</h2> -<p>One of the major selling points of libdragon is that it tries to provide a standard toolchain with access to things like the c standard library as well as the c++ standard library. To save time on the development of bug game, I decided to put that claim to test. When building a 3D game from scratch two things that can be extremely time consuming are implementing linear algebra operations, and implementing physics that work in 3D. Luckily for modern developers, there are many open source libraries you can use instead of building these from scratch, like <a href="https://glm.g-truc.net/0.9.9/">GLM</a> for math operations and <a href="https://github.com/bulletphysics/bullet3">Bullet</a> for physics. I don’t believe anyone has tried to do this before, but knowing that libdragon provides a pretty standard c++ development environment I tried to build GLM and Bullet to run on the Nintendo 64 and I was successful! Both GLM and Bullet were able to run on real N64 hardware. This saved time during development as we were no longer concerned with having to build our own physics or math libraries. There were some tricks I needed to do to get bullet running on the hardware.</p> -<p>First bullet will allocate more memory for its internal pools than is available on the N64. This is an easy fix as you can adjust the heap sizes when you go to initialize Bullet using the below code:</p> -<div class="sourceCode" id="cb1"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>btDefaultCollisionConstructionInfo constructionInfo <span class="op">=</span> btDefaultCollisionConstructionInfo<span class="op">();</span></span> +<p>One of the major selling points of libdragon is that it tries to +provide a standard toolchain with access to things like the c standard +library as well as the c++ standard library. To save time on the +development of bug game, I decided to put that claim to test. When +building a 3D game from scratch two things that can be extremely time +consuming are implementing linear algebra operations, and implementing +physics that work in 3D. Luckily for modern developers, there are many +open source libraries you can use instead of building these from +scratch, like <a href="https://glm.g-truc.net/0.9.9/">GLM</a> for math +operations and <a +href="https://github.com/bulletphysics/bullet3">Bullet</a> for physics. +I don’t believe anyone has tried to do this before, but knowing that +libdragon provides a pretty standard c++ development environment I tried +to build GLM and Bullet to run on the Nintendo 64 and I was successful! +Both GLM and Bullet were able to run on real N64 hardware. This saved +time during development as we were no longer concerned with having to +build our own physics or math libraries. There were some tricks I needed +to do to get bullet running on the hardware.</p> +<p>First bullet will allocate more memory for its internal pools than is +available on the N64. This is an easy fix as you can adjust the heap +sizes when you go to initialize Bullet using the below code:</p> +<div class="sourceCode" id="cb1"><pre +class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>btDefaultCollisionConstructionInfo constructionInfo <span class="op">=</span> btDefaultCollisionConstructionInfo<span class="op">();</span></span> <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>constructionInfo<span class="op">.</span><span class="va">m_defaultMaxCollisionAlgorithmPoolSize</span> <span class="op">=</span> <span class="dv">512</span><span class="op">;</span></span> <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>constructionInfo<span class="op">.</span><span class="va">m_defaultMaxPersistentManifoldPoolSize</span> <span class="op">=</span> <span class="dv">512</span><span class="op">;</span></span> <span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>btDefaultCollisionConfiguration<span class="op">*</span> collisionConfiguration <span class="op">=</span> <span class="kw">new</span> btDefaultCollisionConfiguration<span class="op">(</span>constructionInfo<span class="op">);</span></span></code></pre></div> -<p>This lets you modify the memory pools and specify a size in KB for the pools to use. The above code will limit the internal pools to 1MB, allowing us to easily run within the 4MB of RAM that is available on the N64 without the expansion pak (an accessory to the N64 that increases the available RAM to 8MB).</p> -<p>The second issue I ran into with bullet was that the N64 floating point unit does not implement de-normalized floating point numbers. Now I’m not an expert in floating point numbers, but from my understanding, de-normalized numbers are a way to represent values between the smallest normal floating point number and zero. This allows floating point calculations to slowly fall towards zero in a more accurate way instead of rounding directly to zero. Since the N64 CPU does not implement de-normalized floats, if any calculations would have generated de-normalized float on the N64 they would instead cause a floating point exception. Because of the way the physics engine works, when two objects got very close together this would cause de-normalized floats to be generated and crash the FPU. This was a problem that had me stumped for a bit, I was concerned I would have to go into bullet’s source code and modify and calculations to round to zero if the result would be small enough. This would have been a monumental effort! Thankfully after digging through the NEC VR4300 programmer’s manual I was able to discover that there is a mode you can set the FPU to, which forces rounding towards zero if a de-normalized float would be generated. I enabled this mode and tested it out, and all my floating point troubles were resolved! I submitted a <a href="https://github.com/DragonMinded/libdragon/pull/195">pull request</a> (that was accepted) to the libdragon project to have this implemented by default, so no one else will run into the same annoying problems I ran into.</p> +<p>This lets you modify the memory pools and specify a size in KB for +the pools to use. The above code will limit the internal pools to 1MB, +allowing us to easily run within the 4MB of RAM that is available on the +N64 without the expansion pak (an accessory to the N64 that increases +the available RAM to 8MB).</p> +<p>The second issue I ran into with bullet was that the N64 floating +point unit does not implement de-normalized floating point numbers. Now +I’m not an expert in floating point numbers, but from my understanding, +de-normalized numbers are a way to represent values between the smallest +normal floating point number and zero. This allows floating point +calculations to slowly fall towards zero in a more accurate way instead +of rounding directly to zero. Since the N64 CPU does not implement +de-normalized floats, if any calculations would have generated +de-normalized float on the N64 they would instead cause a floating point +exception. Because of the way the physics engine works, when two objects +got very close together this would cause de-normalized floats to be +generated and crash the FPU. This was a problem that had me stumped for +a bit, I was concerned I would have to go into bullet’s source code and +modify and calculations to round to zero if the result would be small +enough. This would have been a monumental effort! Thankfully after +digging through the NEC VR4300 programmer’s manual I was able to +discover that there is a mode you can set the FPU to, which forces +rounding towards zero if a de-normalized float would be generated. I +enabled this mode and tested it out, and all my floating point troubles +were resolved! I submitted a <a +href="https://github.com/DragonMinded/libdragon/pull/195">pull +request</a> (that was accepted) to the libdragon project to have this +implemented by default, so no one else will run into the same annoying +problems I ran into.</p> <h2 id="whats-next">What’s next?</h2> -<p>If you decided to play our game you probably would have noticed that it’s not very much of a game. Even though this is the case I’m very happy with how the project turned out, as it’s one of the first 3D libdragon projects to be released. It also easily makes use of amazing open technologies like bullet physics, showcasing just how easy libdragon is to integrate with modern tools and libraries. As I mentioned before in this post there is an effort to take Snacchus’s work and build an easier to use graphics API that feels more like building graphics applications and less like building a graphics driver. The effort for that has already started and I plan to contribute to it. Some of the cool features this effort is bringing are:</p> +<p>If you decided to play our game you probably would have noticed that +it’s not very much of a game. Even though this is the case I’m very +happy with how the project turned out, as it’s one of the first 3D +libdragon projects to be released. It also easily makes use of amazing +open technologies like bullet physics, showcasing just how easy +libdragon is to integrate with modern tools and libraries. As I +mentioned before in this post there is an effort to take Snacchus’s work +and build an easier to use graphics API that feels more like building +graphics applications and less like building a graphics driver. The +effort for that has already started and I plan to contribute to it. Some +of the cool features this effort is bringing are:</p> <ul> -<li>A standard interface for display lists and microcode overlays. Easily allowing multiple different microcodes to seamless run on the RSP and swap out with display list commands. This will be valuable for using the RSP for audio and graphics at the same time.</li> -<li>A new 3D microcode that takes some lessons learned from ugfx to build a more powerful and easier to use interface.</li> +<li>A standard interface for display lists and microcode overlays. +Easily allowing multiple different microcodes to seamless run on the RSP +and swap out with display list commands. This will be valuable for using +the RSP for audio and graphics at the same time.</li> +<li>A new 3D microcode that takes some lessons learned from ugfx to +build a more powerful and easier to use interface.</li> </ul> -<p>Overall this is an exciting time for Nintendo 64 homebrew development! It’s easier than ever to build homebrew on the N64 without knowing about the arcane innards of the console. I hope that this continued development of libdragon will bring more people to the scene and allow us to see new and novel games running on the N64. One project I would be excited to start working on is using the serial port on modern N64 Flashcarts for networking, allowing the N64 to have online multiplayer through a computer connected over USB. I feel that projects like this could really elevate the kind of content that is available on the N64 and bring it into the modern era.</p> -</description><pubDate>Fri, 10 Dec 2021 05:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/n64brew-gamejam-2021.html</guid></item><item><title>Rasterizing Triangles</title><link>https://fryzekconcepts.com/notes/rasterizing-triangles.html</link><description><p>Lately I’ve been trying to implement a software renderer <a href="https://www.cs.drexel.edu/~david/Classes/Papers/comp175-06-pineda.pdf">following the algorithm described by Juan Pineda in “A Parallel Algorithm for Polygon Rasterization”</a>. For those unfamiliar with the paper, it describes an algorithm to rasterize triangles that has an extremely nice quality, that you simply need to preform a few additions per pixel to see if the next pixel is inside the triangle. It achieves this quality by defining an edge function that has the following property:</p> +<p>Overall this is an exciting time for Nintendo 64 homebrew +development! It’s easier than ever to build homebrew on the N64 without +knowing about the arcane innards of the console. I hope that this +continued development of libdragon will bring more people to the scene +and allow us to see new and novel games running on the N64. One project +I would be excited to start working on is using the serial port on +modern N64 Flashcarts for networking, allowing the N64 to have online +multiplayer through a computer connected over USB. I feel that projects +like this could really elevate the kind of content that is available on +the N64 and bring it into the modern era.</p> +</description><pubDate>Fri, 10 Dec 2021 05:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/n64brew-gamejam-2021.html</guid></item><item><title>Rasterizing Triangles</title><link>https://fryzekconcepts.com/notes/rasterizing-triangles.html</link><description><p>Lately I’ve been trying to implement a software renderer <a +href="https://www.cs.drexel.edu/~david/Classes/Papers/comp175-06-pineda.pdf">following +the algorithm described by Juan Pineda in “A Parallel Algorithm for +Polygon Rasterization”</a>. For those unfamiliar with the paper, it +describes an algorithm to rasterize triangles that has an extremely nice +quality, that you simply need to preform a few additions per pixel to +see if the next pixel is inside the triangle. It achieves this quality +by defining an edge function that has the following property:</p> <pre><code>E(x+1,y) = E(x,y) + dY E(x,y+1) = E(x,y) - dX</code></pre> -<p>This property is extremely nice for a rasterizer as additions are quite cheap to preform and with this method we limit the amount of work we have to do per pixel. One frustrating quality of this paper is that it suggest that you can calculate more properties than just if a pixel is inside the triangle with simple addition, but provides no explanation for how to do that. In this blog I would like to explore how you implement a Pineda style rasterizer that can calculate per pixel values using simple addition.</p> +<p>This property is extremely nice for a rasterizer as additions are +quite cheap to preform and with this method we limit the amount of work +we have to do per pixel. One frustrating quality of this paper is that +it suggest that you can calculate more properties than just if a pixel +is inside the triangle with simple addition, but provides no explanation +for how to do that. In this blog I would like to explore how you +implement a Pineda style rasterizer that can calculate per pixel values +using simple addition.</p> <figure> -<img src="/assets/2022-04-03-rasterizing-triangles/Screenshot-from-2022-04-03-13-43-13.png" alt="Triangle rasterized using code in this post" /><figcaption aria-hidden="true">Triangle rasterized using code in this post</figcaption> +<img +src="/assets/2022-04-03-rasterizing-triangles/Screenshot-from-2022-04-03-13-43-13.png" +alt="Triangle rasterized using code in this post" /> +<figcaption aria-hidden="true">Triangle rasterized using code in this +post</figcaption> </figure> -<p>In order to figure out how build this rasterizer <a href="https://www.reddit.com/r/GraphicsProgramming/comments/tqxxmu/interpolating_values_in_a_pineda_style_rasterizer/">I reached out to the internet</a> to help build some more intuition on how the properties of this rasterizer. From this reddit post I gained more intuition on how we can use the edge function values to linear interpolate values on the triangle. Here is there relevant comment that gave me all the information I needed</p> +<p>In order to figure out how build this rasterizer <a +href="https://www.reddit.com/r/GraphicsProgramming/comments/tqxxmu/interpolating_values_in_a_pineda_style_rasterizer/">I +reached out to the internet</a> to help build some more intuition on how +the properties of this rasterizer. From this reddit post I gained more +intuition on how we can use the edge function values to linear +interpolate values on the triangle. Here is there relevant comment that +gave me all the information I needed</p> <blockquote> <p>Think about the edge function’s key property:</p> -<p><em>recognize that the formula given for E(x,y) is the same as the formula for the magnitude of the cross product between the vector from (X,Y) to (X+dX, Y+dY), and the vector from (X,Y) to (x,y). By the well known property of cross products, the magnitude is zero if the vectors are colinear, and changes sign as the vectors cross from one side to the other.</em></p> -<p>The magnitude of the edge distance is the area of the parallelogram formed by <code>(X,Y)-&gt;(X+dX,Y+dY)</code> and <code>(X,Y)-&gt;(x,y)</code>. If you normalize by the parallelogram area at the <em>other</em> point in the triangle you get a barycentric coordinate that’s 0 along the <code>(X,Y)-&gt;(X+dX,Y+dY)</code> edge and 1 at the other point. You can precompute each interpolated triangle parameter normalized by this area at setup time, and in fact most hardware computes per-pixel step values (pre 1/w correction) so that all the parameters are computed as a simple addition as you walk along each raster.</p> -<p>Note that when you’re implementing all of this it’s critical to keep all the math in the integer domain (snapping coordinates to some integer sub-pixel precision, I’d recommend at least 4 bits) and using a tie-breaking function (typically top-left) for pixels exactly on the edge to avoid pixel double-hits or gaps in adjacent triangles.</p> +<p><em>recognize that the formula given for E(x,y) is the same as the +formula for the magnitude of the cross product between the vector from +(X,Y) to (X+dX, Y+dY), and the vector from (X,Y) to (x,y). By the well +known property of cross products, the magnitude is zero if the vectors +are colinear, and changes sign as the vectors cross from one side to the +other.</em></p> +<p>The magnitude of the edge distance is the area of the parallelogram +formed by <code>(X,Y)-&gt;(X+dX,Y+dY)</code> and +<code>(X,Y)-&gt;(x,y)</code>. If you normalize by the parallelogram area +at the <em>other</em> point in the triangle you get a barycentric +coordinate that’s 0 along the <code>(X,Y)-&gt;(X+dX,Y+dY)</code> edge +and 1 at the other point. You can precompute each interpolated triangle +parameter normalized by this area at setup time, and in fact most +hardware computes per-pixel step values (pre 1/w correction) so that all +the parameters are computed as a simple addition as you walk along each +raster.</p> +<p>Note that when you’re implementing all of this it’s critical to keep +all the math in the integer domain (snapping coordinates to some integer +sub-pixel precision, I’d recommend at least 4 bits) and using a +tie-breaking function (typically top-left) for pixels exactly on the +edge to avoid pixel double-hits or gaps in adjacent triangles.</p> <p>https://www.reddit.com/r/GraphicsProgramming/comments/tqxxmu/interpolating_values_in_a_pineda_style_rasterizer/i2krwxj/</p> </blockquote> -<p>From this comment you can see that it is trivial to calculate to calculate the barycentric coordinates of the triangle from the edge function. You simply need to divide the the calculated edge function value by the area of parallelogram. Now what is the area of triangle? Well this is where some <a href="https://www.scratchapixel.com/lessons/3d-basic-rendering/ray-tracing-rendering-a-triangle/barycentric-coordinates">more research</a> online helped. If the edge function defines the area of a parallelogram (2 times the area of the triangle) of <code>(X,Y)-&gt;(X+dX,Y+dY)</code> and <code>(X,Y)-&gt;(x,y)</code>, and we calculate three edge function values (one for each edge), then we have 2 times the area of each of the sub triangles that are defined by our point.</p> +<p>From this comment you can see that it is trivial to calculate to +calculate the barycentric coordinates of the triangle from the edge +function. You simply need to divide the the calculated edge function +value by the area of parallelogram. Now what is the area of triangle? +Well this is where some <a +href="https://www.scratchapixel.com/lessons/3d-basic-rendering/ray-tracing-rendering-a-triangle/barycentric-coordinates">more +research</a> online helped. If the edge function defines the area of a +parallelogram (2 times the area of the triangle) of +<code>(X,Y)-&gt;(X+dX,Y+dY)</code> and <code>(X,Y)-&gt;(x,y)</code>, and +we calculate three edge function values (one for each edge), then we +have 2 times the area of each of the sub triangles that are defined by +our point.</p> <figure> -<img src="https://www.scratchapixel.com/images/ray-triangle/barycentric.png?" alt="Triangle barycentric coordinates from scratchpixel tutorial" /><figcaption aria-hidden="true">Triangle barycentric coordinates from scratchpixel tutorial</figcaption> +<img +src="https://www.scratchapixel.com/images/ray-triangle/barycentric.png?" +alt="Triangle barycentric coordinates from scratchpixel tutorial" /> +<figcaption aria-hidden="true">Triangle barycentric coordinates from +scratchpixel tutorial</figcaption> </figure> -<p>From this its trivial to see that we can calculate 2 times the area of the triangle just by adding up all the individual areas of the sub triangles (I used triangles here, but really we are adding the area of sub parallelograms to get the area of the whole parallelogram that has 2 times the area of the triangle we are drawing), that is adding the value of all the edge functions together. From this we can see to linear interpolate any value on the triangle we can use the following equation</p> +<p>From this its trivial to see that we can calculate 2 times the area +of the triangle just by adding up all the individual areas of the sub +triangles (I used triangles here, but really we are adding the area of +sub parallelograms to get the area of the whole parallelogram that has 2 +times the area of the triangle we are drawing), that is adding the value +of all the edge functions together. From this we can see to linear +interpolate any value on the triangle we can use the following +equation</p> <pre><code>Value(x,y) = (e0*v0 + e1*v1 + e2*v2) / (e0 + e1 + e2) Value(x,y) = (e0*v0 + e1*v1 + e2*v2) / area</code></pre> -<p>Where <code>e0, e1, e2</code> are the edge function values and <code>v0, v1, v2</code> are the per vertex values we want to interpolate.</p> -<p>This is great for the calculating the per vertex values, but we still haven’t achieved the property of calculating the interpolate value per pixel with simple addition. To do that we need to use the property of the edge function I described above</p> +<p>Where <code>e0, e1, e2</code> are the edge function values and +<code>v0, v1, v2</code> are the per vertex values we want to +interpolate.</p> +<p>This is great for the calculating the per vertex values, but we still +haven’t achieved the property of calculating the interpolate value per +pixel with simple addition. To do that we need to use the property of +the edge function I described above</p> <pre><code>Value(x+1, y) = (E0(x+1, y)*v0 + E1(x+1, y)*v1 + E2(x+1, y)*v2) / area Value(x+1, y) = ((e0+dY0)*v0 + (e1+dY1)*v1 + (e2+dY2)*v2) / area Value(x+1, y) = (e0*v0 + dY0*v0 + e1*v1+dY1*v1 + e2*v2 + dY2*v2) / area Value(x+1, y) = (e0*v0 + e1*v1 + e2*v2)/area + (dY0*v0 + dY1*v1 + dY2*v2)/area Value(x+1, y) = Value(x,y) + (dY0*v0 + dY1*v1 + dY2*v2)/area</code></pre> -<p>From here we can see that if we work through all the math, we can find this same property where the interpolated value is equal to the previous interpolated value plus some number. Therefore if we pre-compute this addition value, when we iterate over the pixels we only need to add this pre-computed number to the interpolated value of the previous pixel. We can repeat this process again to figure out the equation of the pre-computed value for <code>Value(x, y+1)</code> but I’ll save you the time and provide both equations below</p> +<p>From here we can see that if we work through all the math, we can +find this same property where the interpolated value is equal to the +previous interpolated value plus some number. Therefore if we +pre-compute this addition value, when we iterate over the pixels we only +need to add this pre-computed number to the interpolated value of the +previous pixel. We can repeat this process again to figure out the +equation of the pre-computed value for <code>Value(x, y+1)</code> but +I’ll save you the time and provide both equations below</p> <pre><code>dYV = (dY0*v0 + dY1*v1 + dY2*v2)/area dXV = (dX0*v0 + dX1*v1 + dX2*v2)/area Value(x+1, y) = Value(x,y) + dYV Value(x, y+1) = Value(x,y) - dXV</code></pre> -<p>Where <code>dY0, dY1, dY2</code> are the differences between y coordinates as described in Pineda’s paper, <code>dX0, dX1, dX2</code> are the differences in x coordinates as described in Pineda’s paper, and the area is the pre-calculated sum of the edge functions</p> -<p>Now you should be able to build a Pineda style rasterizer that can calculate per pixel interpolated values using simple addition, by following pseudo code like this:</p> +<p>Where <code>dY0, dY1, dY2</code> are the differences between y +coordinates as described in Pineda’s paper, <code>dX0, dX1, dX2</code> +are the differences in x coordinates as described in Pineda’s paper, and +the area is the pre-calculated sum of the edge functions</p> +<p>Now you should be able to build a Pineda style rasterizer that can +calculate per pixel interpolated values using simple addition, by +following pseudo code like this:</p> <pre><code>func edge(x, y, xi, yi, dXi, dYi) return (x - xi)*dYi - (y-yi)*dXi @@ -543,16 +879,61 @@ func draw_triangle(x0, y0, x1, y1, x2, y2, v0, v1, v2): starting_e1 = e1 starting_e2 = e2 starting_v = v</code></pre> -<p>Now this pseudo code is not the most efficient as it will iterate over the entire screen to draw one triangle, but it provides a starting basis to show how to use these Pineda properties to calculate per vertex values. One thing to note if you do implement this is, if you use fixed point arithmetic, be careful to insure you have enough precision to calculate all of these values with overflow or underflow. This was an issue I ran into running out of precision when I did the divide by the area.</p> -</description><pubDate>Sun, 03 Apr 2022 04:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/rasterizing-triangles.html</guid></item><item><title>Baremetal RISC-V</title><link>https://fryzekconcepts.com/notes/baremetal-risc-v.html</link><description><p>After re-watching suckerpinch’s <a href="https://www.youtube.com/watch?v=ar9WRwCiSr0">“Reverse Emulation”</a> video I got inspired to try and replicate what he did, but instead do it on an N64. Now my idea here is not to preform reverse emulation on the N64 itself but instead to use the SBC as a cheap way to make a dev focused flash cart. Seeing that sukerpinch was able to meet the timings of the NES bus made me think it might be possible to meet the N64 bus timings taking an approach similar to his.</p> +<p>Now this pseudo code is not the most efficient as it will iterate +over the entire screen to draw one triangle, but it provides a starting +basis to show how to use these Pineda properties to calculate per vertex +values. One thing to note if you do implement this is, if you use fixed +point arithmetic, be careful to insure you have enough precision to +calculate all of these values with overflow or underflow. This was an +issue I ran into running out of precision when I did the divide by the +area.</p> +</description><pubDate>Sun, 03 Apr 2022 04:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/rasterizing-triangles.html</guid></item><item><title>Baremetal RISC-V</title><link>https://fryzekconcepts.com/notes/baremetal-risc-v.html</link><description><p>After re-watching suckerpinch’s <a +href="https://www.youtube.com/watch?v=ar9WRwCiSr0">“Reverse +Emulation”</a> video I got inspired to try and replicate what he did, +but instead do it on an N64. Now my idea here is not to preform reverse +emulation on the N64 itself but instead to use the SBC as a cheap way to +make a dev focused flash cart. Seeing that sukerpinch was able to meet +the timings of the NES bus made me think it might be possible to meet +the N64 bus timings taking an approach similar to his.</p> <h2 id="why-risc-v-baremetal">Why RISC-V Baremetal?</h2> -<p>The answer here is more utilitarian then idealistic, I originally wanted to use a Raspberry Pi since I thought that board may be more accessible if other people want to try and replicate this project. Instead what I found is that it is impossible to procure a Raspberry Pi. Not to be deterred I purchased a <a href="https://linux-sunxi.org/Allwinner_Nezha">“Allwinner Nezha”</a> a while back and its just been collecting dust in my storage. I figured this would be a good project to test the board out on since it has a large amount of RAM (1GB on my board), a fast processor (1 GHz), and accessible GPIO. As for why baremetal? Well one of the big problems suckerpinch ran into was being interrupted by the Linux kernel while his software was running. The board was fast enough to respond to the bus timings but Linux would throw off those timings with preemption. This is why I’m taking the approach to do everything baremetal. Giving 100% of the CPU time to my program emulating the CPU bus.</p> +<p>The answer here is more utilitarian then idealistic, I originally +wanted to use a Raspberry Pi since I thought that board may be more +accessible if other people want to try and replicate this project. +Instead what I found is that it is impossible to procure a Raspberry Pi. +Not to be deterred I purchased a <a +href="https://linux-sunxi.org/Allwinner_Nezha">“Allwinner Nezha”</a> a +while back and its just been collecting dust in my storage. I figured +this would be a good project to test the board out on since it has a +large amount of RAM (1GB on my board), a fast processor (1 GHz), and +accessible GPIO. As for why baremetal? Well one of the big problems +suckerpinch ran into was being interrupted by the Linux kernel while his +software was running. The board was fast enough to respond to the bus +timings but Linux would throw off those timings with preemption. This is +why I’m taking the approach to do everything baremetal. Giving 100% of +the CPU time to my program emulating the CPU bus.</p> <h2 id="risc-v-baremetal-development">RISC-V Baremetal Development</h2> -<p>Below I’ll document how I got a baremetal program running on the Nezha board, to provide guidance to anyone who wants to try doing something like this themselves.</p> +<p>Below I’ll document how I got a baremetal program running on the +Nezha board, to provide guidance to anyone who wants to try doing +something like this themselves.</p> <h3 id="toolchain-setup">Toolchain Setup</h3> -<p>In order to do any RISC-V development we will need to setup a RISC-V toolchain that isn’t tied to a specific OS like linux. Thankfully the RISC-V org set up a simple to use git repo that has a script to build an entire RISC-V toolchain on your machine. Since you’re building the whole toolchain from source this will take some time on my machine (Ryzen 4500u, 16GB of RAM, 1TB PCIe NVMe storage), it took around ~30 minutes to build the whole tool chain. You can find the repo <a href="https://github.com/riscv-collab/riscv-gnu-toolchain">here</a>, and follow the instructions in the <code>Installation (Newlib)</code> section of the README. That will setup a bare bones OS independent toolchain that can use newlib for the cstdlib (not that I am currently using it in my software).</p> +<p>In order to do any RISC-V development we will need to setup a RISC-V +toolchain that isn’t tied to a specific OS like linux. Thankfully the +RISC-V org set up a simple to use git repo that has a script to build an +entire RISC-V toolchain on your machine. Since you’re building the whole +toolchain from source this will take some time on my machine (Ryzen +4500u, 16GB of RAM, 1TB PCIe NVMe storage), it took around ~30 minutes +to build the whole tool chain. You can find the repo <a +href="https://github.com/riscv-collab/riscv-gnu-toolchain">here</a>, and +follow the instructions in the <code>Installation (Newlib)</code> +section of the README. That will setup a bare bones OS independent +toolchain that can use newlib for the cstdlib (not that I am currently +using it in my software).</p> <h3 id="setting-up-a-program">Setting up a Program</h3> -<p>This is probably one of the more complicated steps in baremetal programming as this will involve setting up a linker script, which can sometimes feel like an act of black magic to get right. I’ll try to walk through some linker script basics to show how I setup mine. The linker script <code>linker.ld</code> I’m using is below</p> +<p>This is probably one of the more complicated steps in baremetal +programming as this will involve setting up a linker script, which can +sometimes feel like an act of black magic to get right. I’ll try to walk +through some linker script basics to show how I setup mine. The linker +script <code>linker.ld</code> I’m using is below</p> <pre class="ld"><code>SECTIONS { . = 0x45000000; @@ -582,29 +963,53 @@ func draw_triangle(x0, y0, x1, y1, x2, y2, v0, v1, v2): *(.comment); } }</code></pre> -<p>The purpose of a linkscript is to describe how our binary will be organized, the script I wrote will do the follow</p> +<p>The purpose of a linkscript is to describe how our binary will be +organized, the script I wrote will do the follow</p> <ol type="1"> -<li>Start the starting address offset to <code>0x45000000</code>, This is the address we are going to load the binary into memory, so any pointers in the program will need to be offset from this address</li> -<li>start the binary off with the <code>.text</code> section which will contain the executable code, in the text section we want the code for <code>.text.start</code> to come first. this is the code that implements the “C runtime”. That is this is the code with the <code>_start</code> function that will setup the stack pointer and call into the C <code>main</code> function. After that we will place the text for all the other functions in our binary. We keep this section aligned to <code>4096</code> bytes, and the <code>PROVIDE</code> functions creates a symbol with a pointer to that location in memory. We won’t use the text start and end pointers in our program but it can be useful if you want to know stuff about your binary at runtime of your program</li> -<li>Next is the <code>.data</code> section that has all the data for our program. Here you can see I also added the <code>rodata</code> or read only section to the data section. The reason I did this is because I’m not going to bother with properly implementing read only data. We also keep the data aligned to 16 bytes to ensure that every memory access will be aligned for a 64bit RISCV memory access.</li> -<li>The last “section” is not a real section but some extra padding at the end to reserve the stack. Here I am reserving 4096 (4Kb) for the stack of my program.</li> -<li>Lastly I’m going to discard a few sections that GCC will compile into the binary that I don’t need at all.</li> +<li>Start the starting address offset to <code>0x45000000</code>, This +is the address we are going to load the binary into memory, so any +pointers in the program will need to be offset from this address</li> +<li>start the binary off with the <code>.text</code> section which will +contain the executable code, in the text section we want the code for +<code>.text.start</code> to come first. this is the code that implements +the “C runtime”. That is this is the code with the <code>_start</code> +function that will setup the stack pointer and call into the C +<code>main</code> function. After that we will place the text for all +the other functions in our binary. We keep this section aligned to +<code>4096</code> bytes, and the <code>PROVIDE</code> functions creates +a symbol with a pointer to that location in memory. We won’t use the +text start and end pointers in our program but it can be useful if you +want to know stuff about your binary at runtime of your program</li> +<li>Next is the <code>.data</code> section that has all the data for our +program. Here you can see I also added the <code>rodata</code> or read +only section to the data section. The reason I did this is because I’m +not going to bother with properly implementing read only data. We also +keep the data aligned to 16 bytes to ensure that every memory access +will be aligned for a 64bit RISCV memory access.</li> +<li>The last “section” is not a real section but some extra padding at +the end to reserve the stack. Here I am reserving 4096 (4Kb) for the +stack of my program.</li> +<li>Lastly I’m going to discard a few sections that GCC will compile +into the binary that I don’t need at all.</li> </ol> -<p>Now this probably isn’t the best way to write a linker script. For example the stack is just kind of a hack in it, and I don’t implement the <code>.bss</code> section for zero initialized data.</p> -<p>With this linker script we can now setup a basic program, we can use the code presented below as the <code>main.c</code> file</p> +<p>Now this probably isn’t the best way to write a linker script. For +example the stack is just kind of a hack in it, and I don’t implement +the <code>.bss</code> section for zero initialized data.</p> +<p>With this linker script we can now setup a basic program, we can use +the code presented below as the <code>main.c</code> file</p> <div class="sourceCode" id="cb2"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im">&lt;stdint.h&gt;</span></span> <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span> -<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="pp">#define UART0_BASE 0x02500000</span></span> -<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="pp">#define UART0_DATA_REG (UART0_BASE + 0x0000)</span></span> -<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a><span class="pp">#define UART0_USR (UART0_BASE + 0x007c)</span></span> +<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="pp">#define UART0_BASE </span><span class="bn">0x02500000</span></span> +<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="pp">#define UART0_DATA_REG </span><span class="op">(</span>UART0_BASE<span class="pp"> </span><span class="op">+</span><span class="pp"> </span><span class="bn">0x0000</span><span class="op">)</span></span> +<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a><span class="pp">#define UART0_USR </span><span class="op">(</span>UART0_BASE<span class="pp"> </span><span class="op">+</span><span class="pp"> </span><span class="bn">0x007c</span><span class="op">)</span></span> <span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a></span> -<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="pp">#define write_reg(r, v) write_reg_handler((volatile uint32_t*)(r), (v))</span></span> +<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="pp">#define write_reg</span><span class="op">(</span><span class="pp">r</span><span class="op">,</span><span class="pp"> v</span><span class="op">)</span><span class="pp"> write_reg_handler</span><span class="op">((</span><span class="dt">volatile</span><span class="pp"> </span><span class="dt">uint32_t</span><span class="op">*)(</span><span class="pp">r</span><span class="op">),</span><span class="pp"> </span><span class="op">(</span><span class="pp">v</span><span class="op">))</span></span> <span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> write_reg_handler<span class="op">(</span><span class="dt">volatile</span> <span class="dt">uint32_t</span> <span class="op">*</span>reg<span class="op">,</span> <span class="dt">const</span> <span class="dt">uint32_t</span> value<span class="op">)</span></span> <span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a><span class="op">{</span></span> <span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a> reg<span class="op">[</span><span class="dv">0</span><span class="op">]</span> <span class="op">=</span> value<span class="op">;</span></span> <span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span> <span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a></span> -<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a><span class="pp">#define read_reg(r) read_reg_handler((volatile uint32_t*)(r))</span></span> +<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a><span class="pp">#define read_reg</span><span class="op">(</span><span class="pp">r</span><span class="op">)</span><span class="pp"> read_reg_handler</span><span class="op">((</span><span class="dt">volatile</span><span class="pp"> </span><span class="dt">uint32_t</span><span class="op">*)(</span><span class="pp">r</span><span class="op">))</span></span> <span id="cb2-14"><a href="#cb2-14" aria-hidden="true" tabindex="-1"></a><span class="dt">uint32_t</span> read_reg_handler<span class="op">(</span><span class="dt">volatile</span> <span class="dt">uint32_t</span> <span class="op">*</span>reg<span class="op">)</span></span> <span id="cb2-15"><a href="#cb2-15" aria-hidden="true" tabindex="-1"></a><span class="op">{</span></span> <span id="cb2-16"><a href="#cb2-16" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> reg<span class="op">[</span><span class="dv">0</span><span class="op">];</span></span> @@ -624,60 +1029,181 @@ func draw_triangle(x0, y0, x1, y1, x2, y2, v0, v1, v2): <span id="cb2-30"><a href="#cb2-30" aria-hidden="true" tabindex="-1"></a></span> <span id="cb2-31"><a href="#cb2-31" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> main<span class="op">()</span></span> <span id="cb2-32"><a href="#cb2-32" aria-hidden="true" tabindex="-1"></a><span class="op">{</span></span> -<span id="cb2-33"><a href="#cb2-33" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span><span class="op">(</span><span class="dt">const</span> <span class="dt">char</span> <span class="op">*</span>c <span class="op">=</span> hello_world<span class="op">;</span> c<span class="op">[</span><span class="dv">0</span><span class="op">]</span> <span class="op">!=</span> <span class="ch">&#39;\0&#39;</span><span class="op">;</span> c<span class="op">++)</span></span> +<span id="cb2-33"><a href="#cb2-33" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span><span class="op">(</span><span class="dt">const</span> <span class="dt">char</span> <span class="op">*</span>c <span class="op">=</span> hello_world<span class="op">;</span> c<span class="op">[</span><span class="dv">0</span><span class="op">]</span> <span class="op">!=</span> <span class="ch">&#39;</span><span class="sc">\0</span><span class="ch">&#39;</span><span class="op">;</span> c<span class="op">++)</span></span> <span id="cb2-34"><a href="#cb2-34" aria-hidden="true" tabindex="-1"></a> <span class="op">{</span></span> <span id="cb2-35"><a href="#cb2-35" aria-hidden="true" tabindex="-1"></a> _putchar<span class="op">(</span>c<span class="op">);</span></span> <span id="cb2-36"><a href="#cb2-36" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span> <span id="cb2-37"><a href="#cb2-37" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div> -<p>This program will write the string “Hello World!” to the serial port. Now a common question for code like this is how did I know to set all the <code>UART0</code> registers? Well the way to find this information is to look at the datasheet, programmer’s manual, or user manual for the chip you are using. In this case we are using an Allwinner D1 and we can find the user manual with all the registers on the linux-sunxi page <a href="https://linux-sunxi.org/D1">here</a>. On pages 900 to 940 we can see a description on how the serial works for this SoC. I also looked at the schematic <a href="https://dl.linux-sunxi.org/D1/D1_Nezha_development_board_schematic_diagram_20210224.pdf">here</a>, to see that the serial port we have is wired to <code>UART0</code> on the SoC. From here we are relying on uboot to boot the board which will setup the serial port for us, which means we can just write to the UART data register to start printing content to the console.</p> -<p>We will also need need to setup a basic assembly program to setup the stack and call our main function. Below you can see my example called <code>start.S</code></p> -<div class="sourceCode" id="cb3"><pre class="sourceCode asm"><code class="sourceCode fasm"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>.<span class="bu">section</span> <span class="op">.</span>text<span class="op">.</span>start</span> +<p>This program will write the string “Hello World!” to the serial port. +Now a common question for code like this is how did I know to set all +the <code>UART0</code> registers? Well the way to find this information +is to look at the datasheet, programmer’s manual, or user manual for the +chip you are using. In this case we are using an Allwinner D1 and we can +find the user manual with all the registers on the linux-sunxi page <a +href="https://linux-sunxi.org/D1">here</a>. On pages 900 to 940 we can +see a description on how the serial works for this SoC. I also looked at +the schematic <a +href="https://dl.linux-sunxi.org/D1/D1_Nezha_development_board_schematic_diagram_20210224.pdf">here</a>, +to see that the serial port we have is wired to <code>UART0</code> on +the SoC. From here we are relying on uboot to boot the board which will +setup the serial port for us, which means we can just write to the UART +data register to start printing content to the console.</p> +<p>We will also need need to setup a basic assembly program to setup the +stack and call our main function. Below you can see my example called +<code>start.S</code></p> +<div class="sourceCode" id="cb3"><pre +class="sourceCode asm"><code class="sourceCode fasm"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>.<span class="bu">section</span> <span class="op">.</span>text<span class="op">.</span>start</span> <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> .global _start</span> <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="fu">_start:</span></span> <span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> la <span class="kw">sp</span><span class="op">,</span> __stack_start</span> <span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> j main</span></code></pre></div> -<p>This assembly file just creates a section called <code>.text.start</code> and a global symbol for a function called <code>_start</code> which will be the first function our program executes. All this assembly file does is setup the stack pointer register <code>sp</code> to with the address (using the load address <code>la</code> pseudo instruction) to the stack we setup in the linker script, and then call the main function by jumping directly to it.</p> +<p>This assembly file just creates a section called +<code>.text.start</code> and a global symbol for a function called +<code>_start</code> which will be the first function our program +executes. All this assembly file does is setup the stack pointer +register <code>sp</code> to with the address (using the load address +<code>la</code> pseudo instruction) to the stack we setup in the linker +script, and then call the main function by jumping directly to it.</p> <h3 id="building-the-program">Building the Program</h3> -<p>Building the program is pretty straight forward, we need to tell gcc to build the two source files without including the c standard library, and then to link the binary using our linker script. we can do this with the following command</p> +<p>Building the program is pretty straight forward, we need to tell gcc +to build the two source files without including the c standard library, +and then to link the binary using our linker script. we can do this with +the following command</p> <pre><code>riscv64-unknown-elf-gcc march=rv64g --std=gnu99 -msmall-data-limit=0 -c main.c riscv64-unknown-elf-gcc march=rv64g --std=gnu99 -msmall-data-limit=0 -c start.S riscv64-unknown-elf-gcc march=rv64g -march=rv64g -ffreestanding -nostdlib -msmall-data-limit=0 -T linker.ld start.o main.o -o app.elf riscv64-unknown-elf-objcopy -O binary app.elf app.bin</code></pre> -<p>This will build our source files into <code>.o</code> files first, then combine those <code>.o</code> files into a <code>.elf</code> file, finally converting the <code>.elf</code> into a raw binary file where we use the <code>.bin</code> extension. We need a raw binary file as we want to just load our program into memory and begin executing. If we load the <code>.elf</code> file it will have the elf header and other extra data that is not executable in it. In order to run a <code>.elf</code> file we would need an elf loader, which goes beyond the scope of this example.</p> +<p>This will build our source files into <code>.o</code> files first, +then combine those <code>.o</code> files into a <code>.elf</code> file, +finally converting the <code>.elf</code> into a raw binary file where we +use the <code>.bin</code> extension. We need a raw binary file as we +want to just load our program into memory and begin executing. If we +load the <code>.elf</code> file it will have the elf header and other +extra data that is not executable in it. In order to run a +<code>.elf</code> file we would need an elf loader, which goes beyond +the scope of this example.</p> <h3 id="running-the-program">Running the Program</h3> -<p>Now we have the raw binary its time to try and load it. I found that the uboot configuration that comes with the board has pretty limited support for loading binaries. So we are going to take advantage of the <code>loadx</code> command to load the binary over serial. In the uboot terminal we are going to run the command:</p> +<p>Now we have the raw binary its time to try and load it. I found that +the uboot configuration that comes with the board has pretty limited +support for loading binaries. So we are going to take advantage of the +<code>loadx</code> command to load the binary over serial. In the uboot +terminal we are going to run the command:</p> <pre><code>loadx 45000000</code></pre> -<p>Now the next steps will depend on which serial terminal you are using. We want to use the <code>XMODEM</code> protocol to load the binary. In the serial terminal I am using <code>gnu screen</code> you can execute arbitrary programs and send their output to the serial terminal. You can do this by hitting the key combination “CTRL-A + :” and then typing in <code>exec !! sx app.bin</code>. This will send the binary to the serial terminal using the XMODEM protocol. If you are not using GNU screen look up instructions for how to send an XMODEM binary. Now that the binary is loaded we can type in</p> +<p>Now the next steps will depend on which serial terminal you are +using. We want to use the <code>XMODEM</code> protocol to load the +binary. In the serial terminal I am using <code>gnu screen</code> you +can execute arbitrary programs and send their output to the serial +terminal. You can do this by hitting the key combination “CTRL-A + :” +and then typing in <code>exec !! sx app.bin</code>. This will send the +binary to the serial terminal using the XMODEM protocol. If you are not +using GNU screen look up instructions for how to send an XMODEM binary. +Now that the binary is loaded we can type in</p> <pre><code>go 45000000</code></pre> -<p>The should start to execute the program and you should see <code>Hello World!</code> printed to the console!</p> -<p><img src="/assets/2022-06-09-baremetal-risc-v/riscv-terminal.png" /></p> +<p>The should start to execute the program and you should see +<code>Hello World!</code> printed to the console!</p> +<p><img +src="/assets/2022-06-09-baremetal-risc-v/riscv-terminal.png" /></p> <h2 id="whats-next">What’s Next?</h2> -<p>Well the sky is the limit! We have a method to load and run a program that can do anything on the Nezha board now. Looking through the datasheet we can see how to access the GPIO on the board to blink an LED. If you’re really ambitious you could try getting ethernet or USB working in a baremetal environment. I am going to continue on my goal of emulating the N64 cartridge bus which will require me to get GPIO working as well as interrupts on the GPIO lines. If you want to see the current progress of my work you can check it out on github <a href="https://github.com/Hazematman/N64-Cart-Emulator">here</a>.</p> -</description><pubDate>Thu, 09 Jun 2022 04:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/baremetal-risc-v.html</guid></item><item><title>Digital Garden</title><link>https://fryzekconcepts.com/notes/digital_garden.html</link><description><p>After reading Maggie Appleton page on <a href="https://maggieappleton.com/garden-history">digital gardens</a> I was inspired to convert my own website into a digital garden.</p> -<p>I have many half baked ideas that I seem to be able to finish. Some of them get to a published state like <a href="/notes/rasterizing-triangles.html">Rasterizing Triangles</a> and <a href="/notes/baremetal-risc-v.html">Baremetal RISC-V</a>, but many of them never make it to the published state. The idea of digital garden seems very appealing to me, as it encourages you to post on a topic even if you haven’t made it “publishable” yet.</p> +<p>Well the sky is the limit! We have a method to load and run a program +that can do anything on the Nezha board now. Looking through the +datasheet we can see how to access the GPIO on the board to blink an +LED. If you’re really ambitious you could try getting ethernet or USB +working in a baremetal environment. I am going to continue on my goal of +emulating the N64 cartridge bus which will require me to get GPIO +working as well as interrupts on the GPIO lines. If you want to see the +current progress of my work you can check it out on github <a +href="https://github.com/Hazematman/N64-Cart-Emulator">here</a>.</p> +</description><pubDate>Thu, 09 Jun 2022 04:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/baremetal-risc-v.html</guid></item><item><title>Digital Garden</title><link>https://fryzekconcepts.com/notes/digital_garden.html</link><description><p>After reading Maggie Appleton page on <a +href="https://maggieappleton.com/garden-history">digital gardens</a> I +was inspired to convert my own website into a digital garden.</p> +<p>I have many half baked ideas that I seem to be able to finish. Some +of them get to a published state like <a +href="/notes/rasterizing-triangles.html">Rasterizing Triangles</a> and +<a href="/notes/baremetal-risc-v.html">Baremetal RISC-V</a>, but many of +them never make it to the published state. The idea of digital garden +seems very appealing to me, as it encourages you to post on a topic even +if you haven’t made it “publishable” yet.</p> <h2 id="how-this-site-works">How this site works</h2> -<p>I wanted a bit of challenge when putting together this website as I don’t do a lot of web development in my day to day life, so I thought it would be a good way to learn more things. This site has been entirely built from scratch using a custom static site generator I setup with pandoc. It relies on pandoc’s filters to implement some of the classic “Digital Garden” features like back linking. The back linking feature has not been totally developed yet and right now it just provides with a convenient way to link to other notes or pages on this site.</p> -<p>I hope to develop this section more and explain how I got various features in pandoc to work as a static site generator.</p> -</description><pubDate>Sun, 30 Oct 2022 04:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/digital_garden.html</guid></item><item><title>2022 Graphics Team Contributions at Igalia</title><link>https://fryzekconcepts.com/notes/2022_igalia_graphics_team.html</link><description><p>This year I started a new job working with <a href="https://www.igalia.com/technology/graphics">Igalia’s Graphics Team</a>. For those of you who don’t know <a href="https://www.igalia.com/">Igalia</a> they are a <a href="https://en.wikipedia.org/wiki/Igalia">“worker-owned, employee-run cooperative model consultancy focused on open source software”</a>.</p> -<p>As a new member of the team, I thought it would be a great idea to summarize the incredible amount of work the team completed in 2022. If you’re interested keep reading!</p> -<h2 id="vulkan-1.2-conformance-on-rpi-4">Vulkan 1.2 Conformance on RPi 4</h2> -<p>One of the big milestones for the team in 2022 was <a href="https://www.khronos.org/conformance/adopters/conformant-products#submission_694">achieving Vulkan 1.2 conformance on the Raspberry Pi 4</a>. The folks over at the Raspberry Pi company wrote a nice <a href="https://www.raspberrypi.com/news/vulkan-update-version-1-2-conformance-for-raspberry-pi-4/">article</a> about the achievement. Igalia has been partnering with the Raspberry Pi company to bring build and improve the graphics driver on all versions of the Raspberry Pi.</p> -<p>The Vulkan 1.2 spec ratification came with a few <a href="https://registry.khronos.org/vulkan/specs/1.2-extensions/html/vkspec.html#versions-1.2">extensions</a> that were promoted to Core. This means a conformant Vulkan 1.2 driver needs to implement those extensions. Alejandro Piñeiro wrote this interesting <a href="https://blogs.igalia.com/apinheiro/2022/05/v3dv-status-update-2022-05-16/">blog post</a> that talks about some of those extensions.</p> -<p>Vulkan 1.2 also came with a number of optional extensions such as <code>VK_KHR_pipeline_executable_properties</code>. My colleague Iago Toral wrote an excellent <a href="https://blogs.igalia.com/itoral/2022/05/09/vk_khr_pipeline_executables/">blog post</a> on how we implemented that extension on the Raspberry Pi 4 and what benefits it provides for debugging.</p> +<p>I wanted a bit of challenge when putting together this website as I +don’t do a lot of web development in my day to day life, so I thought it +would be a good way to learn more things. This site has been entirely +built from scratch using a custom static site generator I setup with +pandoc. It relies on pandoc’s filters to implement some of the classic +“Digital Garden” features like back linking. The back linking feature +has not been totally developed yet and right now it just provides with a +convenient way to link to other notes or pages on this site.</p> +<p>I hope to develop this section more and explain how I got various +features in pandoc to work as a static site generator.</p> +</description><pubDate>Sun, 30 Oct 2022 04:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/digital_garden.html</guid></item><item><title>2022 Graphics Team Contributions at Igalia</title><link>https://fryzekconcepts.com/notes/2022_igalia_graphics_team.html</link><description><p>This year I started a new job working with <a +href="https://www.igalia.com/technology/graphics">Igalia’s Graphics +Team</a>. For those of you who don’t know <a +href="https://www.igalia.com/">Igalia</a> they are a <a +href="https://en.wikipedia.org/wiki/Igalia">“worker-owned, employee-run +cooperative model consultancy focused on open source software”</a>.</p> +<p>As a new member of the team, I thought it would be a great idea to +summarize the incredible amount of work the team completed in 2022. If +you’re interested keep reading!</p> +<h2 id="vulkan-1.2-conformance-on-rpi-4">Vulkan 1.2 Conformance on RPi +4</h2> +<p>One of the big milestones for the team in 2022 was <a +href="https://www.khronos.org/conformance/adopters/conformant-products#submission_694">achieving +Vulkan 1.2 conformance on the Raspberry Pi 4</a>. The folks over at the +Raspberry Pi company wrote a nice <a +href="https://www.raspberrypi.com/news/vulkan-update-version-1-2-conformance-for-raspberry-pi-4/">article</a> +about the achievement. Igalia has been partnering with the Raspberry Pi +company to bring build and improve the graphics driver on all versions +of the Raspberry Pi.</p> +<p>The Vulkan 1.2 spec ratification came with a few <a +href="https://registry.khronos.org/vulkan/specs/1.2-extensions/html/vkspec.html#versions-1.2">extensions</a> +that were promoted to Core. This means a conformant Vulkan 1.2 driver +needs to implement those extensions. Alejandro Piñeiro wrote this +interesting <a +href="https://blogs.igalia.com/apinheiro/2022/05/v3dv-status-update-2022-05-16/">blog +post</a> that talks about some of those extensions.</p> +<p>Vulkan 1.2 also came with a number of optional extensions such as +<code>VK_KHR_pipeline_executable_properties</code>. My colleague Iago +Toral wrote an excellent <a +href="https://blogs.igalia.com/itoral/2022/05/09/vk_khr_pipeline_executables/">blog +post</a> on how we implemented that extension on the Raspberry Pi 4 and +what benefits it provides for debugging.</p> <h2 id="vulkan-1.3-support-on-turnip">Vulkan 1.3 support on Turnip</h2> -<p>Igalia has been heavily supporting the Open-Source Turnip Vulkan driver for Qualcomm Adreno GPUs, and in 2022 we helped it achieve Vulkan 1.3 conformance. Danylo Piliaiev on the graphics team here at Igalia, wrote a great <a href="https://blogs.igalia.com/dpiliaiev/turnip-vulkan-1-3/">blog post</a> on this achievement! One of the biggest challenges for the Turnip driver is that it is a completely reverse-engineered driver that has been built without access to any hardware documentation or reference driver code.</p> -<p>With Vulkan 1.3 conformance has also come the ability to run more commercial games on Adreno GPUs through the use of the DirectX translation layers. If you would like to see more of this check out this <a href="https://blogs.igalia.com/dpiliaiev/turnip-july-2022-update/">post</a> from Danylo where he talks about getting “The Witcher 3”, “The Talos Principle”, and “OMD2” running on the A660 GPU. Outside of Vulkan 1.3 support he also talks about some of the extensions that were implemented to allow “Zink” (the OpenGL over Vulkan driver) to run Turnip, and bring OpenGL 4.6 support to Adreno GPUs.</p> +<p>Igalia has been heavily supporting the Open-Source Turnip Vulkan +driver for Qualcomm Adreno GPUs, and in 2022 we helped it achieve Vulkan +1.3 conformance. Danylo Piliaiev on the graphics team here at Igalia, +wrote a great <a +href="https://blogs.igalia.com/dpiliaiev/turnip-vulkan-1-3/">blog +post</a> on this achievement! One of the biggest challenges for the +Turnip driver is that it is a completely reverse-engineered driver that +has been built without access to any hardware documentation or reference +driver code.</p> +<p>With Vulkan 1.3 conformance has also come the ability to run more +commercial games on Adreno GPUs through the use of the DirectX +translation layers. If you would like to see more of this check out this +<a +href="https://blogs.igalia.com/dpiliaiev/turnip-july-2022-update/">post</a> +from Danylo where he talks about getting “The Witcher 3”, “The Talos +Principle”, and “OMD2” running on the A660 GPU. Outside of Vulkan 1.3 +support he also talks about some of the extensions that were implemented +to allow “Zink” (the OpenGL over Vulkan driver) to run Turnip, and bring +OpenGL 4.6 support to Adreno GPUs.</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/oVFWy25uiXA"></iframe></div></p> <h2 id="vulkan-extensions">Vulkan Extensions</h2> -<p>Several developers on the Graphics Team made several key contributions to Vulkan Extensions and the Vulkan conformance test suite (CTS). My colleague Ricardo Garcia made an excellent <a href="https://rg3.name/202212122137.html">blog post</a> about those contributions. Below I’ve listed what Igalia did for each of the extensions:</p> +<p>Several developers on the Graphics Team made several key +contributions to Vulkan Extensions and the Vulkan conformance test suite +(CTS). My colleague Ricardo Garcia made an excellent <a +href="https://rg3.name/202212122137.html">blog post</a> about those +contributions. Below I’ve listed what Igalia did for each of the +extensions:</p> <ul> <li>VK_EXT_image_2d_view_of_3d <ul> -<li>We reviewed the spec and are listed as contributors to this extension</li> +<li>We reviewed the spec and are listed as contributors to this +extension</li> </ul></li> <li>VK_EXT_shader_module_identifier <ul> -<li>We reviewed the spec, contributed to it, and created tests for this extension</li> +<li>We reviewed the spec, contributed to it, and created tests for this +extension</li> </ul></li> <li>VK_EXT_attachment_feedback_loop_layout <ul> @@ -696,76 +1222,342 @@ riscv64-unknown-elf-objcopy -O binary app.elf app.bin</code></pre> <li>We wrote tests and reviewed the spec for this extension</li> </ul></li> </ul> -<h2 id="amdgpu-kernel-driver-contributions">AMDGPU kernel driver contributions</h2> -<p>Our resident “Not an AMD expert” Melissa Wen made several contributions to the AMDGPU driver. Those contributions include connecting parts of the <a href="https://lore.kernel.org/amd-gfx/20220329201835.2393141-1-mwen@igalia.com/">pixel blending and post blending code in AMD’s <code>DC</code> module to <code>DRM</code></a> and <a href="https://lore.kernel.org/amd-gfx/20220804161349.3561177-1-mwen@igalia.com/">fixing a bug related to how panel orientation is set when a display is connected</a>. She also had a <a href="https://indico.freedesktop.org/event/2/contributions/50/">presentation at XDC 2022</a>, where she talks about techniques you can use to understand and debug AMDGPU, even when there aren’t hardware docs available.</p> -<p>André Almeida also completed and submitted work on <a href="https://lore.kernel.org/dri-devel/20220714191745.45512-1-andrealmeid@igalia.com/">enabled logging features for the new GFXOFF hardware feature in AMD GPUs</a>. He also created a userspace application (which you can find <a href="https://gitlab.freedesktop.org/andrealmeid/gfxoff_tool">here</a>), that lets you interact with this feature through the <code>debugfs</code> interface. Additionally, he submitted a <a href="https://lore.kernel.org/dri-devel/20220929184307.258331-1-contact@emersion.fr/">patch</a> for async page flips (which he also talked about in his <a href="https://indico.freedesktop.org/event/2/contributions/61/">XDC 2022 presentation</a>) which is still yet to be merged.</p> -<h2 id="modesetting-without-glamor-on-rpi">Modesetting without Glamor on RPi</h2> -<p>Christopher Michael joined the Graphics Team in 2022 and along with Chema Casanova made some key contributions to enabling hardware acceleration and mode setting on the Raspberry Pi without the use of <a href="https://www.freedesktop.org/wiki/Software/Glamor/">Glamor</a> which allows making more video memory available to graphics applications running on a Raspberry Pi.</p> -<p>The older generation Raspberry Pis (1-3) only have a maximum of 256MB of memory available for video memory, and using Glamor will consume part of that video memory. Christopher wrote an excellent <a href="https://blogs.igalia.com/cmichael/2022/05/30/modesetting-a-glamor-less-rpi-adventure/">blog post</a> on this work. Both him and Chema also had a joint presentation at XDC 2022 going into more detail on this work.</p> +<h2 id="amdgpu-kernel-driver-contributions">AMDGPU kernel driver +contributions</h2> +<p>Our resident “Not an AMD expert” Melissa Wen made several +contributions to the AMDGPU driver. Those contributions include +connecting parts of the <a +href="https://lore.kernel.org/amd-gfx/20220329201835.2393141-1-mwen@igalia.com/">pixel +blending and post blending code in AMD’s <code>DC</code> module to +<code>DRM</code></a> and <a +href="https://lore.kernel.org/amd-gfx/20220804161349.3561177-1-mwen@igalia.com/">fixing +a bug related to how panel orientation is set when a display is +connected</a>. She also had a <a +href="https://indico.freedesktop.org/event/2/contributions/50/">presentation +at XDC 2022</a>, where she talks about techniques you can use to +understand and debug AMDGPU, even when there aren’t hardware docs +available.</p> +<p>André Almeida also completed and submitted work on <a +href="https://lore.kernel.org/dri-devel/20220714191745.45512-1-andrealmeid@igalia.com/">enabled +logging features for the new GFXOFF hardware feature in AMD GPUs</a>. He +also created a userspace application (which you can find <a +href="https://gitlab.freedesktop.org/andrealmeid/gfxoff_tool">here</a>), +that lets you interact with this feature through the +<code>debugfs</code> interface. Additionally, he submitted a <a +href="https://lore.kernel.org/dri-devel/20220929184307.258331-1-contact@emersion.fr/">patch</a> +for async page flips (which he also talked about in his <a +href="https://indico.freedesktop.org/event/2/contributions/61/">XDC 2022 +presentation</a>) which is still yet to be merged.</p> +<h2 id="modesetting-without-glamor-on-rpi">Modesetting without Glamor on +RPi</h2> +<p>Christopher Michael joined the Graphics Team in 2022 and along with +Chema Casanova made some key contributions to enabling hardware +acceleration and mode setting on the Raspberry Pi without the use of <a +href="https://www.freedesktop.org/wiki/Software/Glamor/">Glamor</a> +which allows making more video memory available to graphics applications +running on a Raspberry Pi.</p> +<p>The older generation Raspberry Pis (1-3) only have a maximum of 256MB +of memory available for video memory, and using Glamor will consume part +of that video memory. Christopher wrote an excellent <a +href="https://blogs.igalia.com/cmichael/2022/05/30/modesetting-a-glamor-less-rpi-adventure/">blog +post</a> on this work. Both him and Chema also had a joint presentation +at XDC 2022 going into more detail on this work.</p> <h2 id="linux-format-magazine-column">Linux Format Magazine Column</h2> -<p>Our very own Samuel Iglesias had a column published in Linux Format Magazine. It’s a short column about reaching Vulkan 1.1 conformance for v3dv &amp; Turnip Vulkan drivers, and how Open-Source GPU drivers can go from a “hobby project” to the defacto driver for the platform. Check it out on page 7 of <a href="https://linuxformat.com/linux-format-288.html">issue #288</a>!</p> +<p>Our very own Samuel Iglesias had a column published in Linux Format +Magazine. It’s a short column about reaching Vulkan 1.1 conformance for +v3dv &amp; Turnip Vulkan drivers, and how Open-Source GPU drivers can go +from a “hobby project” to the defacto driver for the platform. Check it +out on page 7 of <a +href="https://linuxformat.com/linux-format-288.html">issue #288</a>!</p> <h2 id="xdc-2022">XDC 2022</h2> -<p>X.Org Developers Conference is one of the big conferences for us here at the Graphics Team. Last year at XDC 2022 our Team presented 5 talks in Minneapolis, Minnesota. XDC 2022 took place towards the end of the year in October, so it provides some good context on how the team closed out the year. If you didn’t attend or missed their presentation, here’s a breakdown:</p> -<h3 id="replacing-the-geometry-pipeline-with-mesh-shaders-ricardo-garcía"><a href="https://indico.freedesktop.org/event/2/contributions/48/">“Replacing the geometry pipeline with mesh shaders”</a> (Ricardo García)</h3> -<p>Ricardo presents what exactly mesh shaders are in Vulkan. He made many contributions to this extension including writing 1000s of CTS tests for this extension with a <a href="https://rg3.name/202210222107.html">blog post</a> on his presentation that should check out!</p> +<p>X.Org Developers Conference is one of the big conferences for us here +at the Graphics Team. Last year at XDC 2022 our Team presented 5 talks +in Minneapolis, Minnesota. XDC 2022 took place towards the end of the +year in October, so it provides some good context on how the team closed +out the year. If you didn’t attend or missed their presentation, here’s +a breakdown:</p> +<h3 +id="replacing-the-geometry-pipeline-with-mesh-shaders-ricardo-garcía"><a +href="https://indico.freedesktop.org/event/2/contributions/48/">“Replacing +the geometry pipeline with mesh shaders”</a> (Ricardo García)</h3> +<p>Ricardo presents what exactly mesh shaders are in Vulkan. He made +many contributions to this extension including writing 1000s of CTS +tests for this extension with a <a +href="https://rg3.name/202210222107.html">blog post</a> on his +presentation that should check out!</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/aRNJ4xj_nDs"></iframe></div></p> -<h3 id="status-of-vulkan-on-raspberry-pi-iago-toral"><a href="https://indico.freedesktop.org/event/2/contributions/68/">“Status of Vulkan on Raspberry Pi”</a> (Iago Toral)</h3> -<p>Iago goes into detail about the current status of the Raspberry Pi Vulkan driver. He talks about achieving Vulkan 1.2 conformance, as well as some of the challenges the team had to solve due to hardware limitations of the Broadcom GPU.</p> +<h3 id="status-of-vulkan-on-raspberry-pi-iago-toral"><a +href="https://indico.freedesktop.org/event/2/contributions/68/">“Status +of Vulkan on Raspberry Pi”</a> (Iago Toral)</h3> +<p>Iago goes into detail about the current status of the Raspberry Pi +Vulkan driver. He talks about achieving Vulkan 1.2 conformance, as well +as some of the challenges the team had to solve due to hardware +limitations of the Broadcom GPU.</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/GM9IojyzCVM"></iframe></div></p> -<h3 id="enable-hardware-acceleration-for-gl-applications-without-glamor-on-xorg-modesetting-driver-jose-maría-casanova-christopher-michael"><a href="https://indico.freedesktop.org/event/2/contributions/60/">“Enable hardware acceleration for GL applications without Glamor on Xorg modesetting driver”</a> (Jose María Casanova, Christopher Michael)</h3> -<p>Chema and Christopher talk about the challenges they had to solve to enable hardware acceleration on the Raspberry Pi without Glamor.</p> +<h3 +id="enable-hardware-acceleration-for-gl-applications-without-glamor-on-xorg-modesetting-driver-jose-maría-casanova-christopher-michael"><a +href="https://indico.freedesktop.org/event/2/contributions/60/">“Enable +hardware acceleration for GL applications without Glamor on Xorg +modesetting driver”</a> (Jose María Casanova, Christopher Michael)</h3> +<p>Chema and Christopher talk about the challenges they had to solve to +enable hardware acceleration on the Raspberry Pi without Glamor.</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/Bo_MOM7JTeQ"></iframe></div></p> -<h3 id="im-not-an-amd-expert-but-melissa-wen"><a href="https://indico.freedesktop.org/event/2/contributions/50/">“I’m not an AMD expert, but…”</a> (Melissa Wen)</h3> -<p>In this non-technical presentation, Melissa talks about techniques developers can use to understand and debug drivers without access to hardware documentation.</p> +<h3 id="im-not-an-amd-expert-but-melissa-wen"><a +href="https://indico.freedesktop.org/event/2/contributions/50/">“I’m not +an AMD expert, but…”</a> (Melissa Wen)</h3> +<p>In this non-technical presentation, Melissa talks about techniques +developers can use to understand and debug drivers without access to +hardware documentation.</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/CMm-yhsMB7U"></iframe></div></p> -<h3 id="async-page-flip-in-atomic-api-andré-almeida"><a href="https://indico.freedesktop.org/event/2/contributions/61/">“Async page flip in atomic API”</a> (André Almeida)</h3> -<p>André talks about the work that has been done to enable asynchronous page flipping in DRM’s atomic API with an introduction to the topic by explaining about what exactly is asynchronous page flip, and why you would want it.</p> +<h3 id="async-page-flip-in-atomic-api-andré-almeida"><a +href="https://indico.freedesktop.org/event/2/contributions/61/">“Async +page flip in atomic API”</a> (André Almeida)</h3> +<p>André talks about the work that has been done to enable asynchronous +page flipping in DRM’s atomic API with an introduction to the topic by +explaining about what exactly is asynchronous page flip, and why you +would want it.</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/qayPPIfrqtE"></iframe></div></p> <h2 id="fosdem-2022">FOSDEM 2022</h2> -<p>Another important conference for us is FOSDEM, and last year we presented 3 of the 5 talks in the graphics dev room. FOSDEM took place in early February 2022, these talks provide some good context of where the team started in 2022.</p> -<h3 id="the-status-of-turnip-driver-development-hyunjun-ko"><a href="https://archive.fosdem.org/2022/schedule/event/turnip/">The status of Turnip driver development</a> (Hyunjun Ko)</h3> -<p>Hyunjun presented the current state of the Turnip driver, also talking about the difficulties of developing a driver for a platform without hardware documentation. He talks about how Turnip developers reverse engineer the behaviour of the hardware, and then implement that in an open-source driver. He also made a companion <a href="https://blogs.igalia.com/zzoon/graphics/mesa/2022/02/21/complement-story/">blog post</a> to checkout along with his presentation.</p> -<h3 id="v3dv-status-update-for-open-source-vulkan-driver-for-raspberry-pi-4-alejandro-piñeiro"><a href="https://archive.fosdem.org/2022/schedule/event/v3dv/">v3dv: Status Update for Open Source Vulkan Driver for Raspberry Pi 4</a> (Alejandro Piñeiro)</h3> -<p>Igalia has been presenting the status of the v3dv driver since December 2019 and in this presentation, Alejandro talks about the status of the v3dv driver in early 2022. He talks about achieving conformance, the extensions that had to be implemented, and the future plans of the v3dv driver.</p> -<h3 id="fun-with-border-colors-in-vulkan-ricardo-garcia"><a href="https://archive.fosdem.org/2022/schedule/event/vulkan_borders/">Fun with border colors in Vulkan</a> (Ricardo Garcia)</h3> -<p>Ricardo presents about the work he did on the <code>VK_EXT_border_color_swizzle</code> extension in Vulkan. He talks about the specific contributions he did and how the extension fits in with sampling color operations in Vulkan.</p> +<p>Another important conference for us is FOSDEM, and last year we +presented 3 of the 5 talks in the graphics dev room. FOSDEM took place +in early February 2022, these talks provide some good context of where +the team started in 2022.</p> +<h3 id="the-status-of-turnip-driver-development-hyunjun-ko"><a +href="https://archive.fosdem.org/2022/schedule/event/turnip/">The status +of Turnip driver development</a> (Hyunjun Ko)</h3> +<p>Hyunjun presented the current state of the Turnip driver, also +talking about the difficulties of developing a driver for a platform +without hardware documentation. He talks about how Turnip developers +reverse engineer the behaviour of the hardware, and then implement that +in an open-source driver. He also made a companion <a +href="https://blogs.igalia.com/zzoon/graphics/mesa/2022/02/21/complement-story/">blog +post</a> to checkout along with his presentation.</p> +<h3 +id="v3dv-status-update-for-open-source-vulkan-driver-for-raspberry-pi-4-alejandro-piñeiro"><a +href="https://archive.fosdem.org/2022/schedule/event/v3dv/">v3dv: Status +Update for Open Source Vulkan Driver for Raspberry Pi 4</a> (Alejandro +Piñeiro)</h3> +<p>Igalia has been presenting the status of the v3dv driver since +December 2019 and in this presentation, Alejandro talks about the status +of the v3dv driver in early 2022. He talks about achieving conformance, +the extensions that had to be implemented, and the future plans of the +v3dv driver.</p> +<h3 id="fun-with-border-colors-in-vulkan-ricardo-garcia"><a +href="https://archive.fosdem.org/2022/schedule/event/vulkan_borders/">Fun +with border colors in Vulkan</a> (Ricardo Garcia)</h3> +<p>Ricardo presents about the work he did on the +<code>VK_EXT_border_color_swizzle</code> extension in Vulkan. He talks +about the specific contributions he did and how the extension fits in +with sampling color operations in Vulkan.</p> <h2 id="gsoc-igalia-ce">GSoC &amp; Igalia CE</h2> -<p>Last year Melissa &amp; André co-mentored contributors working on introducing KUnit tests to the AMD display driver. This project was hosted as a <a href="https://summerofcode.withgoogle.com/">“Google Summer of Code” (GSoC)</a> project from the X.Org Foundation. If you’re interested in seeing their work Tales da Aparecida, Maíra Canal, Magali Lemes, and Isabella Basso presented their work at the <a href="https://lpc.events/event/16/contributions/1310/">Linux Plumbers Conference 2022</a> and across two talks at XDC 2022. Here you can see their <a href="https://indico.freedesktop.org/event/2/contributions/65/">first</a> presentation and here you can see their <a href="https://indico.freedesktop.org/event/2/contributions/164/">second</a> second presentation.</p> -<p>André &amp; Melissa also mentored two <a href="https://www.igalia.com/coding-experience/">“Igalia Coding Experience” (CE)</a> projects, one related to IGT GPU test tools on the VKMS kernel driver, and the other for IGT GPU test tools on the V3D kernel driver. If you’re interested in reading up on some of that work, Maíra Canal <a href="https://mairacanal.github.io/january-update-finishing-my-igalia-ce/">wrote about her experience</a> being part of the Igalia CE.</p> -<p>Ella Stanforth was also part of the Igalia Coding Experience, being mentored by Iago &amp; Alejandro. They worked on the <code>VK_KHR_sampler_ycbcr_conversion</code> extension for the v3dv driver. Alejandro talks about their work in his <a href="https://blogs.igalia.com/apinheiro/2023/01/v3dv-status-update-2023-01/">blog post here</a>.</p> +<p>Last year Melissa &amp; André co-mentored contributors working on +introducing KUnit tests to the AMD display driver. This project was +hosted as a <a href="https://summerofcode.withgoogle.com/">“Google +Summer of Code” (GSoC)</a> project from the X.Org Foundation. If you’re +interested in seeing their work Tales da Aparecida, Maíra Canal, Magali +Lemes, and Isabella Basso presented their work at the <a +href="https://lpc.events/event/16/contributions/1310/">Linux Plumbers +Conference 2022</a> and across two talks at XDC 2022. Here you can see +their <a +href="https://indico.freedesktop.org/event/2/contributions/65/">first</a> +presentation and here you can see their <a +href="https://indico.freedesktop.org/event/2/contributions/164/">second</a> +second presentation.</p> +<p>André &amp; Melissa also mentored two <a +href="https://www.igalia.com/coding-experience/">“Igalia Coding +Experience” (CE)</a> projects, one related to IGT GPU test tools on the +VKMS kernel driver, and the other for IGT GPU test tools on the V3D +kernel driver. If you’re interested in reading up on some of that work, +Maíra Canal <a +href="https://mairacanal.github.io/january-update-finishing-my-igalia-ce/">wrote +about her experience</a> being part of the Igalia CE.</p> +<p>Ella Stanforth was also part of the Igalia Coding Experience, being +mentored by Iago &amp; Alejandro. They worked on the +<code>VK_KHR_sampler_ycbcr_conversion</code> extension for the v3dv +driver. Alejandro talks about their work in his <a +href="https://blogs.igalia.com/apinheiro/2023/01/v3dv-status-update-2023-01/">blog +post here</a>.</p> <h1 id="whats-next">What’s Next?</h1> -<p>The graphics team is looking forward to having a jam-packed 2023 with just as many if not more contributions to the Open-Source graphics stack! I’m super excited to be part of the team, and hope to see my name in our 2023 recap post!</p> -<p>Also, you might have heard that <a href="https://www.igalia.com/2022/xdc-2023">Igalia will be hosting XDC 2023</a> in the beautiful city of A Coruña! We hope to see you there where there will be many presentations from all the great people working on the Open-Source graphics stack, and most importantly where you can <a href="https://www.youtube.com/watch?v=7hWcu8O9BjM">dream in the Atlantic!</a></p> +<p>The graphics team is looking forward to having a jam-packed 2023 with +just as many if not more contributions to the Open-Source graphics +stack! I’m super excited to be part of the team, and hope to see my name +in our 2023 recap post!</p> +<p>Also, you might have heard that <a +href="https://www.igalia.com/2022/xdc-2023">Igalia will be hosting XDC +2023</a> in the beautiful city of A Coruña! We hope to see you there +where there will be many presentations from all the great people working +on the Open-Source graphics stack, and most importantly where you can <a +href="https://www.youtube.com/watch?v=7hWcu8O9BjM">dream in the +Atlantic!</a></p> <figure> -<img src="https://www.igalia.com/assets/i/news/XDC-event-banner.jpg" alt="Photo of A Coruña" /><figcaption aria-hidden="true">Photo of A Coruña</figcaption> +<img src="https://www.igalia.com/assets/i/news/XDC-event-banner.jpg" +alt="Photo of A Coruña" /> +<figcaption aria-hidden="true">Photo of A Coruña</figcaption> </figure> -</description><pubDate>Thu, 02 Feb 2023 05:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/2022_igalia_graphics_team.html</guid></item><item><title>Global Game Jam 2023 - GI Jam</title><link>https://fryzekconcepts.com/notes/global_game_jam_2023.html</link><description><p>At the beginning of this month I participated in the Games Institutes’s Global Game Jam event. <a href="https://uwaterloo.ca/games-institute/">The Games Institute</a> is an organization at my local university (The University of Waterloo) that focuses on games-based research. They host a game jam every school term and this term’s jam happened to coincide with the Global Game Jam. Since this event was open to everyone (and it’s been a few years since I’ve been a student at UW 👴️), I joined up to try and stretch some of my more creative muscles. The event was a 48-hour game jam that began on Friday, February 3rd and ended on Sunday,February 5th.</p> -<p>The game we created is called <a href="https://globalgamejam.org/2023/games/turtle-roots-5">Turtle Roots</a>, and it is a simple resource management game. You play as a magical turtle floating through the sky and collecting water in order to survive. The turtle can spend some of its “nutrients” to grow roots which will allow it to gather water and collect more nutrients. The challenge in the game is trying to survive for as long as possible without running out of water.</p> +</description><pubDate>Thu, 02 Feb 2023 05:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/2022_igalia_graphics_team.html</guid></item><item><title>Global Game Jam 2023 - GI Jam</title><link>https://fryzekconcepts.com/notes/global_game_jam_2023.html</link><description><p>At the beginning of this month I participated in the Games +Institutes’s Global Game Jam event. <a +href="https://uwaterloo.ca/games-institute/">The Games Institute</a> is +an organization at my local university (The University of Waterloo) that +focuses on games-based research. They host a game jam every school term +and this term’s jam happened to coincide with the Global Game Jam. Since +this event was open to everyone (and it’s been a few years since I’ve +been a student at UW 👴️), I joined up to try and stretch some of my more +creative muscles. The event was a 48-hour game jam that began on Friday, +February 3rd and ended on Sunday,February 5th.</p> +<p>The game we created is called <a +href="https://globalgamejam.org/2023/games/turtle-roots-5">Turtle +Roots</a>, and it is a simple resource management game. You play as a +magical turtle floating through the sky and collecting water in order to +survive. The turtle can spend some of its “nutrients” to grow roots +which will allow it to gather water and collect more nutrients. The +challenge in the game is trying to survive for as long as possible +without running out of water.</p> <div class="gallery"> -<p><img src="/assets/global_game_jam_2023/screen_shot_1.png" /> <img src="/assets/global_game_jam_2023/screen_shot_2.png" /> <img src="/assets/global_game_jam_2023/screen_shot_3.png" /></p> +<p><img src="/assets/global_game_jam_2023/screen_shot_1.png" /> <img +src="/assets/global_game_jam_2023/screen_shot_2.png" /> <img +src="/assets/global_game_jam_2023/screen_shot_3.png" /></p> <p>Screenshots of Turtle Roots</p> </div> -<p>The game we created is called <a href="https://globalgamejam.org/2023/games/turtle-roots-5">Turtle Roots</a>, and it is a simple resource management game. You play as a magical turtle floating through the sky and collecting water in order to survive. The turtle can spend some of its “nutrients” to grow roots which will allow it to gather water and collect more nutrients. The challenge in the game is trying to survive for as long as possible without running out of water.</p> +<p>The game we created is called <a +href="https://globalgamejam.org/2023/games/turtle-roots-5">Turtle +Roots</a>, and it is a simple resource management game. You play as a +magical turtle floating through the sky and collecting water in order to +survive. The turtle can spend some of its “nutrients” to grow roots +which will allow it to gather water and collect more nutrients. The +challenge in the game is trying to survive for as long as possible +without running out of water.</p> <h2 id="the-team">The Team</h2> -<p>I attended the event solo and quickly partnered up with two other people, who also attended solo. One member had already participated in a game jam before and specialized in art. The other member was attending a game jam for the first time and was looking for the best way they could contribute. Having particular skills for sound, they ended up creating all the audio in our game. This left me as the sole programmer for our team.</p> +<p>I attended the event solo and quickly partnered up with two other +people, who also attended solo. One member had already participated in a +game jam before and specialized in art. The other member was attending a +game jam for the first time and was looking for the best way they could +contribute. Having particular skills for sound, they ended up creating +all the audio in our game. This left me as the sole programmer for our +team.</p> <h2 id="my-game-jam-experiences">My Game Jam Experiences</h2> -<p>In recent years,I participated in a <a href="/notes/n64brew-gamejam-2021.html">Nintendo 64 homebrew game jam</a> and the Puerto Rico Game Developers Association event for the global game jam, submitting <a href="https://globalgamejam.org/2022/games/magnetic-parkour-6">Magnetic Parkour</a>, I also participated in <a href="https://ldjam.com/">Ludum Dare</a> back around 2013 but unfortunately I’ve since lost the link to my submission. While in high school, my friend and I participated in the “Ottawa Tech Jame” (similar to a game jam), sort of worked like a game jam called “Ottawa Tech Jam” submitting <a href="http://www.fastquake.com/projects/zorvwarz/">Zorv Warz</a> and <a href="http://www.fastquake.com/projects/worldseed/">E410</a>. As you can probably tell, I really like gamedev. The desire to build my own video games is actually what originally got me into programming. When I was around 14 years old, I picked up a C++ programming book from the library since I wanted to try to build my own game and I heard most game developers use C++. I used some proprietary game development library (that I can’t recall the name of)to build 2D and 3D games in Windows using C++. I didn’t really get too far into it until high school when I started to learn SFML, SDL, and OpenGL. I also dabbled with Unity during that time as well. However,I’ve always had a strong desire to build most of the foundation of the game myself without using an engine. You can see this desire really come out in the work I did for Zorv Warz, E410, and the N64 homebrew game jam. When working with a team, I feel it can be a lot easier to use a game engine, even if it doesn’t scratch the same itch for me.</p> +<p>In recent years,I participated in a <a +href="/notes/n64brew-gamejam-2021.html">Nintendo 64 homebrew game +jam</a> and the Puerto Rico Game Developers Association event for the +global game jam, submitting <a +href="https://globalgamejam.org/2022/games/magnetic-parkour-6">Magnetic +Parkour</a>, I also participated in <a href="https://ldjam.com/">Ludum +Dare</a> back around 2013 but unfortunately I’ve since lost the link to +my submission. While in high school, my friend and I participated in the +“Ottawa Tech Jame” (similar to a game jam), sort of worked like a game +jam called “Ottawa Tech Jam” submitting <a +href="http://www.fastquake.com/projects/zorvwarz/">Zorv Warz</a> and <a +href="http://www.fastquake.com/projects/worldseed/">E410</a>. As you can +probably tell, I really like gamedev. The desire to build my own video +games is actually what originally got me into programming. When I was +around 14 years old, I picked up a C++ programming book from the library +since I wanted to try to build my own game and I heard most game +developers use C++. I used some proprietary game development library +(that I can’t recall the name of)to build 2D and 3D games in Windows +using C++. I didn’t really get too far into it until high school when I +started to learn SFML, SDL, and OpenGL. I also dabbled with Unity during +that time as well. However,I’ve always had a strong desire to build most +of the foundation of the game myself without using an engine. You can +see this desire really come out in the work I did for Zorv Warz, E410, +and the N64 homebrew game jam. When working with a team, I feel it can +be a lot easier to use a game engine, even if it doesn’t scratch the +same itch for me.</p> <h2 id="the-tech-behind-the-game">The Tech Behind the Game</h2> -<p>Lately I’ve had a growing interest in the game engine called <a href="https://godotengine.org/">Godot</a>, and wanted to use this opportunity to learn the engine more and build a game in it. Godot is interesting to me as its a completely open source game engine, and as you can probably guess from my <a href="/notes/2022_igalia_graphics_team.html">job</a>, open source software as well as free software is something I’m particularly interested in.</p> -<p>Godot is a really powerful game engine that handles a lot of complexity for you. For example,it has a built in parallax background component, that we took advantage of to add more depth to our game. This allows you to control the background scrolling speed for different layer of the background, giving the illusion of depth in a 2D game.</p> -<p>Another powerful feature of Godot is its physics engine. Godot makes it really easy to create physics objects in your scene and have them do interesting stuff. You might be wondering where physics comes into play in our game, and we actually use it for the root animations. I set up a sort of “rag doll” system for the roots to make them flop around in the air as the player moves, really giving a lot more “life” to an otherwise static game.</p> -<p>Godot has a built in scripting language called “GDScript” which is very similar to Python. I’ve really grown to like this language. It has an optional type system you can take advantage of that helps with reducing the number of bugs that exist in your game. It also has great connectivity with the editor. This proved useful as I could “export” variables in the game and allow my team members to modify certain parameters of the game without knowing any programming. This is super helpful with balancing, and more easily allows non-technical members of team to contribute to the game logic in a more concrete way.</p> -<p>Overall I’m very happy with how our game turned out. Last year I tried to participate in a few more game jams, but due to a combination of lack of personal motivation, poor team dynamics, and other factors, none of those game jams panned out. This was the first game jam in a while where I feel like I really connected with my team and I also feel like we made a super polished and fun game in the end.</p> +<p>Lately I’ve had a growing interest in the game engine called <a +href="https://godotengine.org/">Godot</a>, and wanted to use this +opportunity to learn the engine more and build a game in it. Godot is +interesting to me as its a completely open source game engine, and as +you can probably guess from my <a +href="/notes/2022_igalia_graphics_team.html">job</a>, open source +software as well as free software is something I’m particularly +interested in.</p> +<p>Godot is a really powerful game engine that handles a lot of +complexity for you. For example,it has a built in parallax background +component, that we took advantage of to add more depth to our game. This +allows you to control the background scrolling speed for different layer +of the background, giving the illusion of depth in a 2D game.</p> +<p>Another powerful feature of Godot is its physics engine. Godot makes +it really easy to create physics objects in your scene and have them do +interesting stuff. You might be wondering where physics comes into play +in our game, and we actually use it for the root animations. I set up a +sort of “rag doll” system for the roots to make them flop around in the +air as the player moves, really giving a lot more “life” to an otherwise +static game.</p> +<p>Godot has a built in scripting language called “GDScript” which is +very similar to Python. I’ve really grown to like this language. It has +an optional type system you can take advantage of that helps with +reducing the number of bugs that exist in your game. It also has great +connectivity with the editor. This proved useful as I could “export” +variables in the game and allow my team members to modify certain +parameters of the game without knowing any programming. This is super +helpful with balancing, and more easily allows non-technical members of +team to contribute to the game logic in a more concrete way.</p> +<p>Overall I’m very happy with how our game turned out. Last year I +tried to participate in a few more game jams, but due to a combination +of lack of personal motivation, poor team dynamics, and other factors, +none of those game jams panned out. This was the first game jam in a +while where I feel like I really connected with my team and I also feel +like we made a super polished and fun game in the end.</p> </description><pubDate>Sat, 11 Feb 2023 05:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/global_game_jam_2023.html</guid></item><item><title>Journey Through Freedreno</title><link>https://fryzekconcepts.com/notes/freedreno_journey.html</link><description><figure> -<img src="/assets/freedreno/glinfo_freedreno.png" alt="Android running Freedreno" /><figcaption aria-hidden="true">Android running Freedreno</figcaption> +<img src="/assets/freedreno/glinfo_freedreno.png" +alt="Android running Freedreno" /> +<figcaption aria-hidden="true">Android running Freedreno</figcaption> </figure> -<p>As part of my training at Igalia I’ve been attempting to write a new backend for Freedreno that targets the proprietary “KGSL” kernel mode driver. For those unaware there are two “main” kernel mode drivers on Qualcomm SOCs for the GPU, there is the “MSM”, and “KGSL”. “MSM” is DRM compliant, and Freedreno already able to run on this driver. “KGSL” is the proprietary KMD that Qualcomm’s proprietary userspace driver targets. Now why would you want to run freedreno against KGSL, when MSM exists? Well there are a few ones, first MSM only really works on an up-streamed kernel, so if you have to run a down-streamed kernel you can continue using the version of KGSL that the manufacturer shipped with your device. Second this allows you to run both the proprietary adreno driver and the open source freedreno driver on the same device just by swapping libraries, which can be very nice for quickly testing something against both drivers.</p> +<p>As part of my training at Igalia I’ve been attempting to write a new +backend for Freedreno that targets the proprietary “KGSL” kernel mode +driver. For those unaware there are two “main” kernel mode drivers on +Qualcomm SOCs for the GPU, there is the “MSM”, and “KGSL”. “MSM” is DRM +compliant, and Freedreno already able to run on this driver. “KGSL” is +the proprietary KMD that Qualcomm’s proprietary userspace driver +targets. Now why would you want to run freedreno against KGSL, when MSM +exists? Well there are a few ones, first MSM only really works on an +up-streamed kernel, so if you have to run a down-streamed kernel you can +continue using the version of KGSL that the manufacturer shipped with +your device. Second this allows you to run both the proprietary adreno +driver and the open source freedreno driver on the same device just by +swapping libraries, which can be very nice for quickly testing something +against both drivers.</p> <h2 id="when-drm-isnt-just-drm">When “DRM” isn’t just “DRM”</h2> -<p>When working on a new backend, one of the critical things to do is to make use of as much “common code” as possible. This has a number of benefits, least of all reducing the amount of code you have to write. It also allows reduces the number of bugs that will likely exist as you are relying on well tested code, and it ensures that the backend is mostly likely going to continue to work with new driver updates.</p> -<p>When I started the work for a new backend I looked inside mesa’s <code>src/freedreno/drm</code> folder. This has the current backend code for Freedreno, and its already modularized to support multiple backends. It currently has support for the above mentioned MSM kernel mode driver as well as virtio (a backend that allows Freedreno to be used from within in a virtualized environment). From the name of this path, you would think that the code in this module would only work with kernel mode drivers that implement DRM, but actually there is only a handful of places in this module where DRM support is assumed. This made it a good starting point to introduce the KGSL backend and piggy back off the common code.</p> -<p>For example the <code>drm</code> module has a lot of code to deal with the management of synchronization primitives, buffer objects, and command submit lists. All managed at a abstraction above “DRM” and to re-implement this code would be a bad idea.</p> +<p>When working on a new backend, one of the critical things to do is to +make use of as much “common code” as possible. This has a number of +benefits, least of all reducing the amount of code you have to write. It +also allows reduces the number of bugs that will likely exist as you are +relying on well tested code, and it ensures that the backend is mostly +likely going to continue to work with new driver updates.</p> +<p>When I started the work for a new backend I looked inside mesa’s +<code>src/freedreno/drm</code> folder. This has the current backend code +for Freedreno, and its already modularized to support multiple backends. +It currently has support for the above mentioned MSM kernel mode driver +as well as virtio (a backend that allows Freedreno to be used from +within in a virtualized environment). From the name of this path, you +would think that the code in this module would only work with kernel +mode drivers that implement DRM, but actually there is only a handful of +places in this module where DRM support is assumed. This made it a good +starting point to introduce the KGSL backend and piggy back off the +common code.</p> +<p>For example the <code>drm</code> module has a lot of code to deal +with the management of synchronization primitives, buffer objects, and +command submit lists. All managed at a abstraction above “DRM” and to +re-implement this code would be a bad idea.</p> <h2 id="how-to-get-android-to-behave">How to get Android to behave</h2> -<p>One of this big struggles with getting the KGSL backend working was figuring out how I could get Android to load mesa instead of Qualcomm blob driver that is shipped with the device image. Thankfully a good chunk of this work has already been figured out when the Turnip developers (Turnip is the open source Vulkan implementation for Adreno GPUs) figured out how to get Turnip running on android with KGSL. Thankfully one of my coworkers <a href="https://blogs.igalia.com/dpiliaiev/">Danylo</a> is one of those Turnip developers, and he gave me a lot of guidance on getting Android setup. One thing to watch out for is the outdated instructions <a href="https://docs.mesa3d.org/android.html">here</a>. These instructions <em>almost</em> work, but require some modifications. First if you’re using a more modern version of the Android NDK, the compiler has been replaced with LLVM/Clang, so you need to change which compiler is being used. Second flags like <code>system</code> in the cross compiler script incorrectly set the system as <code>linux</code> instead of <code>android</code>. I had success using the below cross compiler script. Take note that the compiler paths need to be updated to match where you extracted the android NDK on your system.</p> +<p>One of this big struggles with getting the KGSL backend working was +figuring out how I could get Android to load mesa instead of Qualcomm +blob driver that is shipped with the device image. Thankfully a good +chunk of this work has already been figured out when the Turnip +developers (Turnip is the open source Vulkan implementation for Adreno +GPUs) figured out how to get Turnip running on android with KGSL. +Thankfully one of my coworkers <a +href="https://blogs.igalia.com/dpiliaiev/">Danylo</a> is one of those +Turnip developers, and he gave me a lot of guidance on getting Android +setup. One thing to watch out for is the outdated instructions <a +href="https://docs.mesa3d.org/android.html">here</a>. These instructions +<em>almost</em> work, but require some modifications. First if you’re +using a more modern version of the Android NDK, the compiler has been +replaced with LLVM/Clang, so you need to change which compiler is being +used. Second flags like <code>system</code> in the cross compiler script +incorrectly set the system as <code>linux</code> instead of +<code>android</code>. I had success using the below cross compiler +script. Take note that the compiler paths need to be updated to match +where you extracted the android NDK on your system.</p> <pre class="meson"><code>[binaries] ar = &#39;/home/lfryzek/Documents/projects/igalia/freedreno/android-ndk-r25b-linux/android-ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-ar&#39; c = [&#39;ccache&#39;, &#39;/home/lfryzek/Documents/projects/igalia/freedreno/android-ndk-r25b-linux/android-ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android29-clang&#39;] @@ -783,23 +1575,136 @@ system = &#39;android&#39; cpu_family = &#39;arm&#39; cpu = &#39;armv8&#39; endian = &#39;little&#39;</code></pre> -<p>Another thing I had to figure out with Android, that was different with these instructions, was how I would get Android to load mesa versions of mesa libraries. That’s when my colleague <a href="https://www.igalia.com/team/mark">Mark</a> pointed out to me that Android is open source and I could just check the source code myself. Sure enough you have find the OpenGL driver loader in <a href="https://android.googlesource.com/platform/frameworks/native/+/master/opengl/libs/EGL/Loader.cpp">Android’s source code</a>. From this code we can that Android will try to load a few different files based on some settings, and in my case it would try to load 3 different shaded libraries in the <code>/vendor/lib64/egl</code> folder, <code>libEGL_adreno.so</code> ,<code>libGLESv1_CM_adreno.so</code>, and <code>libGLESv2.so</code>. I could just replace these libraries with the version built from mesa and voilà, you’re now loading a custom driver! This realization that I could just “read the code” was very powerful in debugging some more android specific issues I ran into, like dealing with gralloc.</p> -<p>Something cool that the opensource Freedreno &amp; Turnip driver developers figured out was getting android to run test OpenGL applications from the adb shell without building android APKs. If you check out the <a href="https://gitlab.freedesktop.org/freedreno/freedreno">freedreno repo</a>, they have an <code>ndk-build.sh</code> script that can build tests in the <code>tests-*</code> folder. The nice benefit of this is that it provides an easy way to run simple test cases without worrying about the android window system integration. Another nifty feature about this repo is the <code>libwrap</code> tool that lets trace the commands being submitted to the GPU.</p> +<p>Another thing I had to figure out with Android, that was different +with these instructions, was how I would get Android to load mesa +versions of mesa libraries. That’s when my colleague <a +href="https://www.igalia.com/team/mark">Mark</a> pointed out to me that +Android is open source and I could just check the source code myself. +Sure enough you have find the OpenGL driver loader in <a +href="https://android.googlesource.com/platform/frameworks/native/+/master/opengl/libs/EGL/Loader.cpp">Android’s +source code</a>. From this code we can that Android will try to load a +few different files based on some settings, and in my case it would try +to load 3 different shaded libraries in the +<code>/vendor/lib64/egl</code> folder, <code>libEGL_adreno.so</code> +,<code>libGLESv1_CM_adreno.so</code>, and <code>libGLESv2.so</code>. I +could just replace these libraries with the version built from mesa and +voilà, you’re now loading a custom driver! This realization that I could +just “read the code” was very powerful in debugging some more android +specific issues I ran into, like dealing with gralloc.</p> +<p>Something cool that the opensource Freedreno &amp; Turnip driver +developers figured out was getting android to run test OpenGL +applications from the adb shell without building android APKs. If you +check out the <a +href="https://gitlab.freedesktop.org/freedreno/freedreno">freedreno +repo</a>, they have an <code>ndk-build.sh</code> script that can build +tests in the <code>tests-*</code> folder. The nice benefit of this is +that it provides an easy way to run simple test cases without worrying +about the android window system integration. Another nifty feature about +this repo is the <code>libwrap</code> tool that lets trace the commands +being submitted to the GPU.</p> <h2 id="what-even-is-gralloc">What even is Gralloc?</h2> -<p>Gralloc is the graphics memory allocated in Android, and the OS will use it to allocate the surface for “windows”. This means that the memory we want to render the display to is managed by gralloc and not our KGSL backend. This means we have to get all the information about this surface from gralloc, and if you look in <code>src/egl/driver/dri2/platform_android.c</code> you will see existing code for handing gralloc. You would think “Hey there is no work for me here then”, but you would be wrong. The handle gralloc provides is hardware specific, and the code in <code>platform_android.c</code> assumes a DRM gralloc implementation. Thankfully the turnip developers had already gone through this struggle and if you look in <code>src/freedreno/vulkan/tu_android.c</code> you can see they have implemented a separate path when a Qualcomm msm implementation of gralloc is detected. I could copy this detection logic and add a separate path to <code>platform_android.c</code>.</p> -<h2 id="working-with-the-freedreno-community">Working with the Freedreno community</h2> -<p>When working on any project (open-source or otherwise), it’s nice to know that you aren’t working alone. Thankfully the <code>#freedreno</code> channel on <code>irc.oftc.net</code> is very active and full of helpful people to answer any questions you may have. While working on the backend, one area I wasn’t really sure how to address was the synchronization code for buffer objects. The backend exposed a function called <code>cpu_prep</code>, This function was just there to call the DRM implementation of <code>cpu_prep</code> on the buffer object. I wasn’t exactly sure how to implement this functionality with KGSL since it doesn’t use DRM buffer objects.</p> -<p>I ended up reaching out to the IRC channel and Rob Clark on the channel explained to me that he was actually working on moving a lot of the code for <code>cpu_prep</code> into common code so that a non-drm driver (like the KGSL backend I was working on) would just need to implement that operation as NOP (no operation).</p> -<h2 id="dealing-with-bugs-reverse-engineering-the-blob">Dealing with bugs &amp; reverse engineering the blob</h2> -<p>I encountered a few different bugs when implementing the KGSL backend, but most of them consisted of me calling KGSL wrong, or handing synchronization incorrectly. Thankfully since Turnip is already running on KGSL, I could just more carefully compare my code to what Turnip is doing and figure out my logical mistake.</p> -<p>Some of the bugs I encountered required the backend interface in Freedreno to be modified to expose per a new per driver implementation of that backend function, instead of just using a common implementation. For example the existing function to map a buffer object into userspace assumed that the same <code>fd</code> for the device could be used for the buffer object in the <code>mmap</code> call. This worked fine for any buffer objects we created through KGSL but would not work for buffer objects created from gralloc (remember the above section on surface memory for windows comming from gralloc). To resolve this issue I exposed a new per backend implementation of “map” where I could take a different path if the buffer object came from gralloc.</p> -<p>While testing the KGSL backend I did encounter a new bug that seems to effect both my new KGSL backend and the Turnip KGSL backend. The bug is an <code>iommu fault</code> that occurs when the surface allocated by gralloc does not have a height that is aligned to 4. The blitting engine on a6xx GPUs copies in 16x4 chunks, so if the height is not aligned by 4 the GPU will try to write to pixels that exists outside the allocated memory. This issue only happens with KGSL backends since we import memory from gralloc, and gralloc allocates exactly enough memory for the surface, with no alignment on the height. If running on any other platform, the <code>fdl</code> (Freedreno Layout) code would be called to compute the minimum required size for a surface which would take into account the alignment requirement for the height. The blob driver Qualcomm didn’t seem to have this problem, even though its getting the exact same buffer from gralloc. So it must be doing something different to handle the none aligned height.</p> -<p>Because this issue relied on gralloc, the application needed to running as an Android APK to get a surface from gralloc. The best way to fix this issue would be to figure out what the blob driver is doing and try to replicate this behavior in Freedreno (assuming it isn’t doing something silly like switch to sysmem rendering). Unfortunately it didn’t look like the libwrap library worked to trace an APK.</p> -<p>The libwrap library relied on a linux feature known as <code>LD_PRELOAD</code> to load <code>libwrap.so</code> when the application starts and replace the system functions like <code>open</code> and <code>ioctl</code> with their own implementation that traces what is being submitted to the KGSL kernel mode driver. Thankfully android exposes this <code>LD_PRELOAD</code> mechanism through its “wrap” interface where you create a propety called <code>wrap.&lt;app-name&gt;</code> with a value <code>LD_PRELOAD=&lt;path to libwrap.so&gt;</code>. Android will then load your library like would be done in a normal linux shell. If you tried to do this with libwrap though you find very quickly that you would get corrupted traces. When android launches your APK, it doesn’t only launch your application, there are different threads for different android system related functions and some of them can also use OpenGL. The libwrap library is not designed to handle multiple threads using KGSL at the same time. After discovering this issue I created a <a href="https://gitlab.freedesktop.org/freedreno/freedreno/-/merge_requests/22">MR</a> that would store the tracing file handles as TLS (thread local storage) preventing the clobbering of the trace file, and also allowing you to view the traces generated by different threads separately from each other.</p> -<p>With this is in hand one could begin investing what the blob driver is doing to handle this unaligned surfaces.</p> +<p>Gralloc is the graphics memory allocated in Android, and the OS will +use it to allocate the surface for “windows”. This means that the memory +we want to render the display to is managed by gralloc and not our KGSL +backend. This means we have to get all the information about this +surface from gralloc, and if you look in +<code>src/egl/driver/dri2/platform_android.c</code> you will see +existing code for handing gralloc. You would think “Hey there is no work +for me here then”, but you would be wrong. The handle gralloc provides +is hardware specific, and the code in <code>platform_android.c</code> +assumes a DRM gralloc implementation. Thankfully the turnip developers +had already gone through this struggle and if you look in +<code>src/freedreno/vulkan/tu_android.c</code> you can see they have +implemented a separate path when a Qualcomm msm implementation of +gralloc is detected. I could copy this detection logic and add a +separate path to <code>platform_android.c</code>.</p> +<h2 id="working-with-the-freedreno-community">Working with the Freedreno +community</h2> +<p>When working on any project (open-source or otherwise), it’s nice to +know that you aren’t working alone. Thankfully the +<code>#freedreno</code> channel on <code>irc.oftc.net</code> is very +active and full of helpful people to answer any questions you may have. +While working on the backend, one area I wasn’t really sure how to +address was the synchronization code for buffer objects. The backend +exposed a function called <code>cpu_prep</code>, This function was just +there to call the DRM implementation of <code>cpu_prep</code> on the +buffer object. I wasn’t exactly sure how to implement this functionality +with KGSL since it doesn’t use DRM buffer objects.</p> +<p>I ended up reaching out to the IRC channel and Rob Clark on the +channel explained to me that he was actually working on moving a lot of +the code for <code>cpu_prep</code> into common code so that a non-drm +driver (like the KGSL backend I was working on) would just need to +implement that operation as NOP (no operation).</p> +<h2 id="dealing-with-bugs-reverse-engineering-the-blob">Dealing with +bugs &amp; reverse engineering the blob</h2> +<p>I encountered a few different bugs when implementing the KGSL +backend, but most of them consisted of me calling KGSL wrong, or handing +synchronization incorrectly. Thankfully since Turnip is already running +on KGSL, I could just more carefully compare my code to what Turnip is +doing and figure out my logical mistake.</p> +<p>Some of the bugs I encountered required the backend interface in +Freedreno to be modified to expose per a new per driver implementation +of that backend function, instead of just using a common implementation. +For example the existing function to map a buffer object into userspace +assumed that the same <code>fd</code> for the device could be used for +the buffer object in the <code>mmap</code> call. This worked fine for +any buffer objects we created through KGSL but would not work for buffer +objects created from gralloc (remember the above section on surface +memory for windows comming from gralloc). To resolve this issue I +exposed a new per backend implementation of “map” where I could take a +different path if the buffer object came from gralloc.</p> +<p>While testing the KGSL backend I did encounter a new bug that seems +to effect both my new KGSL backend and the Turnip KGSL backend. The bug +is an <code>iommu fault</code> that occurs when the surface allocated by +gralloc does not have a height that is aligned to 4. The blitting engine +on a6xx GPUs copies in 16x4 chunks, so if the height is not aligned by 4 +the GPU will try to write to pixels that exists outside the allocated +memory. This issue only happens with KGSL backends since we import +memory from gralloc, and gralloc allocates exactly enough memory for the +surface, with no alignment on the height. If running on any other +platform, the <code>fdl</code> (Freedreno Layout) code would be called +to compute the minimum required size for a surface which would take into +account the alignment requirement for the height. The blob driver +Qualcomm didn’t seem to have this problem, even though its getting the +exact same buffer from gralloc. So it must be doing something different +to handle the none aligned height.</p> +<p>Because this issue relied on gralloc, the application needed to +running as an Android APK to get a surface from gralloc. The best way to +fix this issue would be to figure out what the blob driver is doing and +try to replicate this behavior in Freedreno (assuming it isn’t doing +something silly like switch to sysmem rendering). Unfortunately it +didn’t look like the libwrap library worked to trace an APK.</p> +<p>The libwrap library relied on a linux feature known as +<code>LD_PRELOAD</code> to load <code>libwrap.so</code> when the +application starts and replace the system functions like +<code>open</code> and <code>ioctl</code> with their own implementation +that traces what is being submitted to the KGSL kernel mode driver. +Thankfully android exposes this <code>LD_PRELOAD</code> mechanism +through its “wrap” interface where you create a propety called +<code>wrap.&lt;app-name&gt;</code> with a value +<code>LD_PRELOAD=&lt;path to libwrap.so&gt;</code>. Android will then +load your library like would be done in a normal linux shell. If you +tried to do this with libwrap though you find very quickly that you +would get corrupted traces. When android launches your APK, it doesn’t +only launch your application, there are different threads for different +android system related functions and some of them can also use OpenGL. +The libwrap library is not designed to handle multiple threads using +KGSL at the same time. After discovering this issue I created a <a +href="https://gitlab.freedesktop.org/freedreno/freedreno/-/merge_requests/22">MR</a> +that would store the tracing file handles as TLS (thread local storage) +preventing the clobbering of the trace file, and also allowing you to +view the traces generated by different threads separately from each +other.</p> +<p>With this is in hand one could begin investing what the blob driver +is doing to handle this unaligned surfaces.</p> <h2 id="whats-next">What’s next?</h2> -<p>Well the next obvious thing to fix is the aligned height issue which is still open. I’ve also worked on upstreaming my changes with this <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21570">WIP MR</a>.</p> +<p>Well the next obvious thing to fix is the aligned height issue which +is still open. I’ve also worked on upstreaming my changes with this <a +href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21570">WIP +MR</a>.</p> <figure> -<img src="/assets/freedreno/3d-mark.png" alt="Freedreno running 3d-mark" /><figcaption aria-hidden="true">Freedreno running 3d-mark</figcaption> +<img src="/assets/freedreno/3d-mark.png" +alt="Freedreno running 3d-mark" /> +<figcaption aria-hidden="true">Freedreno running 3d-mark</figcaption> </figure> </description><pubDate>Tue, 28 Feb 2023 05:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/freedreno_journey.html</guid></item></channel></rss>
\ No newline at end of file diff --git a/html/graphics_feed.xml b/html/graphics_feed.xml index d46431f..8dca2db 100644 --- a/html/graphics_feed.xml +++ b/html/graphics_feed.xml @@ -1,24 +1,74 @@ <?xml version='1.0' encoding='UTF-8'?> -<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Fryzek Concepts</title><atom:link href="https://fryzekconcepts.com/feed.xml" rel="self" type="application/rss+xml"/><link>https://fryzekconcepts.com</link><description>Lucas is a developer working on cool things</description><lastBuildDate>Sun, 02 Apr 2023 13:27:21 -0000</lastBuildDate><item><title>2022 Graphics Team Contributions at Igalia</title><link>https://fryzekconcepts.com/notes/2022_igalia_graphics_team.html</link><description><p>This year I started a new job working with <a href="https://www.igalia.com/technology/graphics">Igalia’s Graphics Team</a>. For those of you who don’t know <a href="https://www.igalia.com/">Igalia</a> they are a <a href="https://en.wikipedia.org/wiki/Igalia">“worker-owned, employee-run cooperative model consultancy focused on open source software”</a>.</p> -<p>As a new member of the team, I thought it would be a great idea to summarize the incredible amount of work the team completed in 2022. If you’re interested keep reading!</p> -<h2 id="vulkan-1.2-conformance-on-rpi-4">Vulkan 1.2 Conformance on RPi 4</h2> -<p>One of the big milestones for the team in 2022 was <a href="https://www.khronos.org/conformance/adopters/conformant-products#submission_694">achieving Vulkan 1.2 conformance on the Raspberry Pi 4</a>. The folks over at the Raspberry Pi company wrote a nice <a href="https://www.raspberrypi.com/news/vulkan-update-version-1-2-conformance-for-raspberry-pi-4/">article</a> about the achievement. Igalia has been partnering with the Raspberry Pi company to bring build and improve the graphics driver on all versions of the Raspberry Pi.</p> -<p>The Vulkan 1.2 spec ratification came with a few <a href="https://registry.khronos.org/vulkan/specs/1.2-extensions/html/vkspec.html#versions-1.2">extensions</a> that were promoted to Core. This means a conformant Vulkan 1.2 driver needs to implement those extensions. Alejandro Piñeiro wrote this interesting <a href="https://blogs.igalia.com/apinheiro/2022/05/v3dv-status-update-2022-05-16/">blog post</a> that talks about some of those extensions.</p> -<p>Vulkan 1.2 also came with a number of optional extensions such as <code>VK_KHR_pipeline_executable_properties</code>. My colleague Iago Toral wrote an excellent <a href="https://blogs.igalia.com/itoral/2022/05/09/vk_khr_pipeline_executables/">blog post</a> on how we implemented that extension on the Raspberry Pi 4 and what benefits it provides for debugging.</p> +<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Fryzek Concepts</title><atom:link href="https://fryzekconcepts.com/feed.xml" rel="self" type="application/rss+xml"/><link>https://fryzekconcepts.com</link><description>Lucas is a developer working on cool things</description><lastBuildDate>Fri, 28 Apr 2023 20:57:14 -0000</lastBuildDate><item><title>2022 Graphics Team Contributions at Igalia</title><link>https://fryzekconcepts.com/notes/2022_igalia_graphics_team.html</link><description><p>This year I started a new job working with <a +href="https://www.igalia.com/technology/graphics">Igalia’s Graphics +Team</a>. For those of you who don’t know <a +href="https://www.igalia.com/">Igalia</a> they are a <a +href="https://en.wikipedia.org/wiki/Igalia">“worker-owned, employee-run +cooperative model consultancy focused on open source software”</a>.</p> +<p>As a new member of the team, I thought it would be a great idea to +summarize the incredible amount of work the team completed in 2022. If +you’re interested keep reading!</p> +<h2 id="vulkan-1.2-conformance-on-rpi-4">Vulkan 1.2 Conformance on RPi +4</h2> +<p>One of the big milestones for the team in 2022 was <a +href="https://www.khronos.org/conformance/adopters/conformant-products#submission_694">achieving +Vulkan 1.2 conformance on the Raspberry Pi 4</a>. The folks over at the +Raspberry Pi company wrote a nice <a +href="https://www.raspberrypi.com/news/vulkan-update-version-1-2-conformance-for-raspberry-pi-4/">article</a> +about the achievement. Igalia has been partnering with the Raspberry Pi +company to bring build and improve the graphics driver on all versions +of the Raspberry Pi.</p> +<p>The Vulkan 1.2 spec ratification came with a few <a +href="https://registry.khronos.org/vulkan/specs/1.2-extensions/html/vkspec.html#versions-1.2">extensions</a> +that were promoted to Core. This means a conformant Vulkan 1.2 driver +needs to implement those extensions. Alejandro Piñeiro wrote this +interesting <a +href="https://blogs.igalia.com/apinheiro/2022/05/v3dv-status-update-2022-05-16/">blog +post</a> that talks about some of those extensions.</p> +<p>Vulkan 1.2 also came with a number of optional extensions such as +<code>VK_KHR_pipeline_executable_properties</code>. My colleague Iago +Toral wrote an excellent <a +href="https://blogs.igalia.com/itoral/2022/05/09/vk_khr_pipeline_executables/">blog +post</a> on how we implemented that extension on the Raspberry Pi 4 and +what benefits it provides for debugging.</p> <h2 id="vulkan-1.3-support-on-turnip">Vulkan 1.3 support on Turnip</h2> -<p>Igalia has been heavily supporting the Open-Source Turnip Vulkan driver for Qualcomm Adreno GPUs, and in 2022 we helped it achieve Vulkan 1.3 conformance. Danylo Piliaiev on the graphics team here at Igalia, wrote a great <a href="https://blogs.igalia.com/dpiliaiev/turnip-vulkan-1-3/">blog post</a> on this achievement! One of the biggest challenges for the Turnip driver is that it is a completely reverse-engineered driver that has been built without access to any hardware documentation or reference driver code.</p> -<p>With Vulkan 1.3 conformance has also come the ability to run more commercial games on Adreno GPUs through the use of the DirectX translation layers. If you would like to see more of this check out this <a href="https://blogs.igalia.com/dpiliaiev/turnip-july-2022-update/">post</a> from Danylo where he talks about getting “The Witcher 3”, “The Talos Principle”, and “OMD2” running on the A660 GPU. Outside of Vulkan 1.3 support he also talks about some of the extensions that were implemented to allow “Zink” (the OpenGL over Vulkan driver) to run Turnip, and bring OpenGL 4.6 support to Adreno GPUs.</p> +<p>Igalia has been heavily supporting the Open-Source Turnip Vulkan +driver for Qualcomm Adreno GPUs, and in 2022 we helped it achieve Vulkan +1.3 conformance. Danylo Piliaiev on the graphics team here at Igalia, +wrote a great <a +href="https://blogs.igalia.com/dpiliaiev/turnip-vulkan-1-3/">blog +post</a> on this achievement! One of the biggest challenges for the +Turnip driver is that it is a completely reverse-engineered driver that +has been built without access to any hardware documentation or reference +driver code.</p> +<p>With Vulkan 1.3 conformance has also come the ability to run more +commercial games on Adreno GPUs through the use of the DirectX +translation layers. If you would like to see more of this check out this +<a +href="https://blogs.igalia.com/dpiliaiev/turnip-july-2022-update/">post</a> +from Danylo where he talks about getting “The Witcher 3”, “The Talos +Principle”, and “OMD2” running on the A660 GPU. Outside of Vulkan 1.3 +support he also talks about some of the extensions that were implemented +to allow “Zink” (the OpenGL over Vulkan driver) to run Turnip, and bring +OpenGL 4.6 support to Adreno GPUs.</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/oVFWy25uiXA"></iframe></div></p> <h2 id="vulkan-extensions">Vulkan Extensions</h2> -<p>Several developers on the Graphics Team made several key contributions to Vulkan Extensions and the Vulkan conformance test suite (CTS). My colleague Ricardo Garcia made an excellent <a href="https://rg3.name/202212122137.html">blog post</a> about those contributions. Below I’ve listed what Igalia did for each of the extensions:</p> +<p>Several developers on the Graphics Team made several key +contributions to Vulkan Extensions and the Vulkan conformance test suite +(CTS). My colleague Ricardo Garcia made an excellent <a +href="https://rg3.name/202212122137.html">blog post</a> about those +contributions. Below I’ve listed what Igalia did for each of the +extensions:</p> <ul> <li>VK_EXT_image_2d_view_of_3d <ul> -<li>We reviewed the spec and are listed as contributors to this extension</li> +<li>We reviewed the spec and are listed as contributors to this +extension</li> </ul></li> <li>VK_EXT_shader_module_identifier <ul> -<li>We reviewed the spec, contributed to it, and created tests for this extension</li> +<li>We reviewed the spec, contributed to it, and created tests for this +extension</li> </ul></li> <li>VK_EXT_attachment_feedback_loop_layout <ul> @@ -37,59 +87,239 @@ <li>We wrote tests and reviewed the spec for this extension</li> </ul></li> </ul> -<h2 id="amdgpu-kernel-driver-contributions">AMDGPU kernel driver contributions</h2> -<p>Our resident “Not an AMD expert” Melissa Wen made several contributions to the AMDGPU driver. Those contributions include connecting parts of the <a href="https://lore.kernel.org/amd-gfx/20220329201835.2393141-1-mwen@igalia.com/">pixel blending and post blending code in AMD’s <code>DC</code> module to <code>DRM</code></a> and <a href="https://lore.kernel.org/amd-gfx/20220804161349.3561177-1-mwen@igalia.com/">fixing a bug related to how panel orientation is set when a display is connected</a>. She also had a <a href="https://indico.freedesktop.org/event/2/contributions/50/">presentation at XDC 2022</a>, where she talks about techniques you can use to understand and debug AMDGPU, even when there aren’t hardware docs available.</p> -<p>André Almeida also completed and submitted work on <a href="https://lore.kernel.org/dri-devel/20220714191745.45512-1-andrealmeid@igalia.com/">enabled logging features for the new GFXOFF hardware feature in AMD GPUs</a>. He also created a userspace application (which you can find <a href="https://gitlab.freedesktop.org/andrealmeid/gfxoff_tool">here</a>), that lets you interact with this feature through the <code>debugfs</code> interface. Additionally, he submitted a <a href="https://lore.kernel.org/dri-devel/20220929184307.258331-1-contact@emersion.fr/">patch</a> for async page flips (which he also talked about in his <a href="https://indico.freedesktop.org/event/2/contributions/61/">XDC 2022 presentation</a>) which is still yet to be merged.</p> -<h2 id="modesetting-without-glamor-on-rpi">Modesetting without Glamor on RPi</h2> -<p>Christopher Michael joined the Graphics Team in 2022 and along with Chema Casanova made some key contributions to enabling hardware acceleration and mode setting on the Raspberry Pi without the use of <a href="https://www.freedesktop.org/wiki/Software/Glamor/">Glamor</a> which allows making more video memory available to graphics applications running on a Raspberry Pi.</p> -<p>The older generation Raspberry Pis (1-3) only have a maximum of 256MB of memory available for video memory, and using Glamor will consume part of that video memory. Christopher wrote an excellent <a href="https://blogs.igalia.com/cmichael/2022/05/30/modesetting-a-glamor-less-rpi-adventure/">blog post</a> on this work. Both him and Chema also had a joint presentation at XDC 2022 going into more detail on this work.</p> +<h2 id="amdgpu-kernel-driver-contributions">AMDGPU kernel driver +contributions</h2> +<p>Our resident “Not an AMD expert” Melissa Wen made several +contributions to the AMDGPU driver. Those contributions include +connecting parts of the <a +href="https://lore.kernel.org/amd-gfx/20220329201835.2393141-1-mwen@igalia.com/">pixel +blending and post blending code in AMD’s <code>DC</code> module to +<code>DRM</code></a> and <a +href="https://lore.kernel.org/amd-gfx/20220804161349.3561177-1-mwen@igalia.com/">fixing +a bug related to how panel orientation is set when a display is +connected</a>. She also had a <a +href="https://indico.freedesktop.org/event/2/contributions/50/">presentation +at XDC 2022</a>, where she talks about techniques you can use to +understand and debug AMDGPU, even when there aren’t hardware docs +available.</p> +<p>André Almeida also completed and submitted work on <a +href="https://lore.kernel.org/dri-devel/20220714191745.45512-1-andrealmeid@igalia.com/">enabled +logging features for the new GFXOFF hardware feature in AMD GPUs</a>. He +also created a userspace application (which you can find <a +href="https://gitlab.freedesktop.org/andrealmeid/gfxoff_tool">here</a>), +that lets you interact with this feature through the +<code>debugfs</code> interface. Additionally, he submitted a <a +href="https://lore.kernel.org/dri-devel/20220929184307.258331-1-contact@emersion.fr/">patch</a> +for async page flips (which he also talked about in his <a +href="https://indico.freedesktop.org/event/2/contributions/61/">XDC 2022 +presentation</a>) which is still yet to be merged.</p> +<h2 id="modesetting-without-glamor-on-rpi">Modesetting without Glamor on +RPi</h2> +<p>Christopher Michael joined the Graphics Team in 2022 and along with +Chema Casanova made some key contributions to enabling hardware +acceleration and mode setting on the Raspberry Pi without the use of <a +href="https://www.freedesktop.org/wiki/Software/Glamor/">Glamor</a> +which allows making more video memory available to graphics applications +running on a Raspberry Pi.</p> +<p>The older generation Raspberry Pis (1-3) only have a maximum of 256MB +of memory available for video memory, and using Glamor will consume part +of that video memory. Christopher wrote an excellent <a +href="https://blogs.igalia.com/cmichael/2022/05/30/modesetting-a-glamor-less-rpi-adventure/">blog +post</a> on this work. Both him and Chema also had a joint presentation +at XDC 2022 going into more detail on this work.</p> <h2 id="linux-format-magazine-column">Linux Format Magazine Column</h2> -<p>Our very own Samuel Iglesias had a column published in Linux Format Magazine. It’s a short column about reaching Vulkan 1.1 conformance for v3dv &amp; Turnip Vulkan drivers, and how Open-Source GPU drivers can go from a “hobby project” to the defacto driver for the platform. Check it out on page 7 of <a href="https://linuxformat.com/linux-format-288.html">issue #288</a>!</p> +<p>Our very own Samuel Iglesias had a column published in Linux Format +Magazine. It’s a short column about reaching Vulkan 1.1 conformance for +v3dv &amp; Turnip Vulkan drivers, and how Open-Source GPU drivers can go +from a “hobby project” to the defacto driver for the platform. Check it +out on page 7 of <a +href="https://linuxformat.com/linux-format-288.html">issue #288</a>!</p> <h2 id="xdc-2022">XDC 2022</h2> -<p>X.Org Developers Conference is one of the big conferences for us here at the Graphics Team. Last year at XDC 2022 our Team presented 5 talks in Minneapolis, Minnesota. XDC 2022 took place towards the end of the year in October, so it provides some good context on how the team closed out the year. If you didn’t attend or missed their presentation, here’s a breakdown:</p> -<h3 id="replacing-the-geometry-pipeline-with-mesh-shaders-ricardo-garcía"><a href="https://indico.freedesktop.org/event/2/contributions/48/">“Replacing the geometry pipeline with mesh shaders”</a> (Ricardo García)</h3> -<p>Ricardo presents what exactly mesh shaders are in Vulkan. He made many contributions to this extension including writing 1000s of CTS tests for this extension with a <a href="https://rg3.name/202210222107.html">blog post</a> on his presentation that should check out!</p> +<p>X.Org Developers Conference is one of the big conferences for us here +at the Graphics Team. Last year at XDC 2022 our Team presented 5 talks +in Minneapolis, Minnesota. XDC 2022 took place towards the end of the +year in October, so it provides some good context on how the team closed +out the year. If you didn’t attend or missed their presentation, here’s +a breakdown:</p> +<h3 +id="replacing-the-geometry-pipeline-with-mesh-shaders-ricardo-garcía"><a +href="https://indico.freedesktop.org/event/2/contributions/48/">“Replacing +the geometry pipeline with mesh shaders”</a> (Ricardo García)</h3> +<p>Ricardo presents what exactly mesh shaders are in Vulkan. He made +many contributions to this extension including writing 1000s of CTS +tests for this extension with a <a +href="https://rg3.name/202210222107.html">blog post</a> on his +presentation that should check out!</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/aRNJ4xj_nDs"></iframe></div></p> -<h3 id="status-of-vulkan-on-raspberry-pi-iago-toral"><a href="https://indico.freedesktop.org/event/2/contributions/68/">“Status of Vulkan on Raspberry Pi”</a> (Iago Toral)</h3> -<p>Iago goes into detail about the current status of the Raspberry Pi Vulkan driver. He talks about achieving Vulkan 1.2 conformance, as well as some of the challenges the team had to solve due to hardware limitations of the Broadcom GPU.</p> +<h3 id="status-of-vulkan-on-raspberry-pi-iago-toral"><a +href="https://indico.freedesktop.org/event/2/contributions/68/">“Status +of Vulkan on Raspberry Pi”</a> (Iago Toral)</h3> +<p>Iago goes into detail about the current status of the Raspberry Pi +Vulkan driver. He talks about achieving Vulkan 1.2 conformance, as well +as some of the challenges the team had to solve due to hardware +limitations of the Broadcom GPU.</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/GM9IojyzCVM"></iframe></div></p> -<h3 id="enable-hardware-acceleration-for-gl-applications-without-glamor-on-xorg-modesetting-driver-jose-maría-casanova-christopher-michael"><a href="https://indico.freedesktop.org/event/2/contributions/60/">“Enable hardware acceleration for GL applications without Glamor on Xorg modesetting driver”</a> (Jose María Casanova, Christopher Michael)</h3> -<p>Chema and Christopher talk about the challenges they had to solve to enable hardware acceleration on the Raspberry Pi without Glamor.</p> +<h3 +id="enable-hardware-acceleration-for-gl-applications-without-glamor-on-xorg-modesetting-driver-jose-maría-casanova-christopher-michael"><a +href="https://indico.freedesktop.org/event/2/contributions/60/">“Enable +hardware acceleration for GL applications without Glamor on Xorg +modesetting driver”</a> (Jose María Casanova, Christopher Michael)</h3> +<p>Chema and Christopher talk about the challenges they had to solve to +enable hardware acceleration on the Raspberry Pi without Glamor.</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/Bo_MOM7JTeQ"></iframe></div></p> -<h3 id="im-not-an-amd-expert-but-melissa-wen"><a href="https://indico.freedesktop.org/event/2/contributions/50/">“I’m not an AMD expert, but…”</a> (Melissa Wen)</h3> -<p>In this non-technical presentation, Melissa talks about techniques developers can use to understand and debug drivers without access to hardware documentation.</p> +<h3 id="im-not-an-amd-expert-but-melissa-wen"><a +href="https://indico.freedesktop.org/event/2/contributions/50/">“I’m not +an AMD expert, but…”</a> (Melissa Wen)</h3> +<p>In this non-technical presentation, Melissa talks about techniques +developers can use to understand and debug drivers without access to +hardware documentation.</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/CMm-yhsMB7U"></iframe></div></p> -<h3 id="async-page-flip-in-atomic-api-andré-almeida"><a href="https://indico.freedesktop.org/event/2/contributions/61/">“Async page flip in atomic API”</a> (André Almeida)</h3> -<p>André talks about the work that has been done to enable asynchronous page flipping in DRM’s atomic API with an introduction to the topic by explaining about what exactly is asynchronous page flip, and why you would want it.</p> +<h3 id="async-page-flip-in-atomic-api-andré-almeida"><a +href="https://indico.freedesktop.org/event/2/contributions/61/">“Async +page flip in atomic API”</a> (André Almeida)</h3> +<p>André talks about the work that has been done to enable asynchronous +page flipping in DRM’s atomic API with an introduction to the topic by +explaining about what exactly is asynchronous page flip, and why you +would want it.</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/qayPPIfrqtE"></iframe></div></p> <h2 id="fosdem-2022">FOSDEM 2022</h2> -<p>Another important conference for us is FOSDEM, and last year we presented 3 of the 5 talks in the graphics dev room. FOSDEM took place in early February 2022, these talks provide some good context of where the team started in 2022.</p> -<h3 id="the-status-of-turnip-driver-development-hyunjun-ko"><a href="https://archive.fosdem.org/2022/schedule/event/turnip/">The status of Turnip driver development</a> (Hyunjun Ko)</h3> -<p>Hyunjun presented the current state of the Turnip driver, also talking about the difficulties of developing a driver for a platform without hardware documentation. He talks about how Turnip developers reverse engineer the behaviour of the hardware, and then implement that in an open-source driver. He also made a companion <a href="https://blogs.igalia.com/zzoon/graphics/mesa/2022/02/21/complement-story/">blog post</a> to checkout along with his presentation.</p> -<h3 id="v3dv-status-update-for-open-source-vulkan-driver-for-raspberry-pi-4-alejandro-piñeiro"><a href="https://archive.fosdem.org/2022/schedule/event/v3dv/">v3dv: Status Update for Open Source Vulkan Driver for Raspberry Pi 4</a> (Alejandro Piñeiro)</h3> -<p>Igalia has been presenting the status of the v3dv driver since December 2019 and in this presentation, Alejandro talks about the status of the v3dv driver in early 2022. He talks about achieving conformance, the extensions that had to be implemented, and the future plans of the v3dv driver.</p> -<h3 id="fun-with-border-colors-in-vulkan-ricardo-garcia"><a href="https://archive.fosdem.org/2022/schedule/event/vulkan_borders/">Fun with border colors in Vulkan</a> (Ricardo Garcia)</h3> -<p>Ricardo presents about the work he did on the <code>VK_EXT_border_color_swizzle</code> extension in Vulkan. He talks about the specific contributions he did and how the extension fits in with sampling color operations in Vulkan.</p> +<p>Another important conference for us is FOSDEM, and last year we +presented 3 of the 5 talks in the graphics dev room. FOSDEM took place +in early February 2022, these talks provide some good context of where +the team started in 2022.</p> +<h3 id="the-status-of-turnip-driver-development-hyunjun-ko"><a +href="https://archive.fosdem.org/2022/schedule/event/turnip/">The status +of Turnip driver development</a> (Hyunjun Ko)</h3> +<p>Hyunjun presented the current state of the Turnip driver, also +talking about the difficulties of developing a driver for a platform +without hardware documentation. He talks about how Turnip developers +reverse engineer the behaviour of the hardware, and then implement that +in an open-source driver. He also made a companion <a +href="https://blogs.igalia.com/zzoon/graphics/mesa/2022/02/21/complement-story/">blog +post</a> to checkout along with his presentation.</p> +<h3 +id="v3dv-status-update-for-open-source-vulkan-driver-for-raspberry-pi-4-alejandro-piñeiro"><a +href="https://archive.fosdem.org/2022/schedule/event/v3dv/">v3dv: Status +Update for Open Source Vulkan Driver for Raspberry Pi 4</a> (Alejandro +Piñeiro)</h3> +<p>Igalia has been presenting the status of the v3dv driver since +December 2019 and in this presentation, Alejandro talks about the status +of the v3dv driver in early 2022. He talks about achieving conformance, +the extensions that had to be implemented, and the future plans of the +v3dv driver.</p> +<h3 id="fun-with-border-colors-in-vulkan-ricardo-garcia"><a +href="https://archive.fosdem.org/2022/schedule/event/vulkan_borders/">Fun +with border colors in Vulkan</a> (Ricardo Garcia)</h3> +<p>Ricardo presents about the work he did on the +<code>VK_EXT_border_color_swizzle</code> extension in Vulkan. He talks +about the specific contributions he did and how the extension fits in +with sampling color operations in Vulkan.</p> <h2 id="gsoc-igalia-ce">GSoC &amp; Igalia CE</h2> -<p>Last year Melissa &amp; André co-mentored contributors working on introducing KUnit tests to the AMD display driver. This project was hosted as a <a href="https://summerofcode.withgoogle.com/">“Google Summer of Code” (GSoC)</a> project from the X.Org Foundation. If you’re interested in seeing their work Tales da Aparecida, Maíra Canal, Magali Lemes, and Isabella Basso presented their work at the <a href="https://lpc.events/event/16/contributions/1310/">Linux Plumbers Conference 2022</a> and across two talks at XDC 2022. Here you can see their <a href="https://indico.freedesktop.org/event/2/contributions/65/">first</a> presentation and here you can see their <a href="https://indico.freedesktop.org/event/2/contributions/164/">second</a> second presentation.</p> -<p>André &amp; Melissa also mentored two <a href="https://www.igalia.com/coding-experience/">“Igalia Coding Experience” (CE)</a> projects, one related to IGT GPU test tools on the VKMS kernel driver, and the other for IGT GPU test tools on the V3D kernel driver. If you’re interested in reading up on some of that work, Maíra Canal <a href="https://mairacanal.github.io/january-update-finishing-my-igalia-ce/">wrote about her experience</a> being part of the Igalia CE.</p> -<p>Ella Stanforth was also part of the Igalia Coding Experience, being mentored by Iago &amp; Alejandro. They worked on the <code>VK_KHR_sampler_ycbcr_conversion</code> extension for the v3dv driver. Alejandro talks about their work in his <a href="https://blogs.igalia.com/apinheiro/2023/01/v3dv-status-update-2023-01/">blog post here</a>.</p> +<p>Last year Melissa &amp; André co-mentored contributors working on +introducing KUnit tests to the AMD display driver. This project was +hosted as a <a href="https://summerofcode.withgoogle.com/">“Google +Summer of Code” (GSoC)</a> project from the X.Org Foundation. If you’re +interested in seeing their work Tales da Aparecida, Maíra Canal, Magali +Lemes, and Isabella Basso presented their work at the <a +href="https://lpc.events/event/16/contributions/1310/">Linux Plumbers +Conference 2022</a> and across two talks at XDC 2022. Here you can see +their <a +href="https://indico.freedesktop.org/event/2/contributions/65/">first</a> +presentation and here you can see their <a +href="https://indico.freedesktop.org/event/2/contributions/164/">second</a> +second presentation.</p> +<p>André &amp; Melissa also mentored two <a +href="https://www.igalia.com/coding-experience/">“Igalia Coding +Experience” (CE)</a> projects, one related to IGT GPU test tools on the +VKMS kernel driver, and the other for IGT GPU test tools on the V3D +kernel driver. If you’re interested in reading up on some of that work, +Maíra Canal <a +href="https://mairacanal.github.io/january-update-finishing-my-igalia-ce/">wrote +about her experience</a> being part of the Igalia CE.</p> +<p>Ella Stanforth was also part of the Igalia Coding Experience, being +mentored by Iago &amp; Alejandro. They worked on the +<code>VK_KHR_sampler_ycbcr_conversion</code> extension for the v3dv +driver. Alejandro talks about their work in his <a +href="https://blogs.igalia.com/apinheiro/2023/01/v3dv-status-update-2023-01/">blog +post here</a>.</p> <h1 id="whats-next">What’s Next?</h1> -<p>The graphics team is looking forward to having a jam-packed 2023 with just as many if not more contributions to the Open-Source graphics stack! I’m super excited to be part of the team, and hope to see my name in our 2023 recap post!</p> -<p>Also, you might have heard that <a href="https://www.igalia.com/2022/xdc-2023">Igalia will be hosting XDC 2023</a> in the beautiful city of A Coruña! We hope to see you there where there will be many presentations from all the great people working on the Open-Source graphics stack, and most importantly where you can <a href="https://www.youtube.com/watch?v=7hWcu8O9BjM">dream in the Atlantic!</a></p> +<p>The graphics team is looking forward to having a jam-packed 2023 with +just as many if not more contributions to the Open-Source graphics +stack! I’m super excited to be part of the team, and hope to see my name +in our 2023 recap post!</p> +<p>Also, you might have heard that <a +href="https://www.igalia.com/2022/xdc-2023">Igalia will be hosting XDC +2023</a> in the beautiful city of A Coruña! We hope to see you there +where there will be many presentations from all the great people working +on the Open-Source graphics stack, and most importantly where you can <a +href="https://www.youtube.com/watch?v=7hWcu8O9BjM">dream in the +Atlantic!</a></p> <figure> -<img src="https://www.igalia.com/assets/i/news/XDC-event-banner.jpg" alt="Photo of A Coruña" /><figcaption aria-hidden="true">Photo of A Coruña</figcaption> +<img src="https://www.igalia.com/assets/i/news/XDC-event-banner.jpg" +alt="Photo of A Coruña" /> +<figcaption aria-hidden="true">Photo of A Coruña</figcaption> </figure> </description><pubDate>Thu, 02 Feb 2023 05:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/2022_igalia_graphics_team.html</guid></item><item><title>Journey Through Freedreno</title><link>https://fryzekconcepts.com/notes/freedreno_journey.html</link><description><figure> -<img src="/assets/freedreno/glinfo_freedreno.png" alt="Android running Freedreno" /><figcaption aria-hidden="true">Android running Freedreno</figcaption> +<img src="/assets/freedreno/glinfo_freedreno.png" +alt="Android running Freedreno" /> +<figcaption aria-hidden="true">Android running Freedreno</figcaption> </figure> -<p>As part of my training at Igalia I’ve been attempting to write a new backend for Freedreno that targets the proprietary “KGSL” kernel mode driver. For those unaware there are two “main” kernel mode drivers on Qualcomm SOCs for the GPU, there is the “MSM”, and “KGSL”. “MSM” is DRM compliant, and Freedreno already able to run on this driver. “KGSL” is the proprietary KMD that Qualcomm’s proprietary userspace driver targets. Now why would you want to run freedreno against KGSL, when MSM exists? Well there are a few ones, first MSM only really works on an up-streamed kernel, so if you have to run a down-streamed kernel you can continue using the version of KGSL that the manufacturer shipped with your device. Second this allows you to run both the proprietary adreno driver and the open source freedreno driver on the same device just by swapping libraries, which can be very nice for quickly testing something against both drivers.</p> +<p>As part of my training at Igalia I’ve been attempting to write a new +backend for Freedreno that targets the proprietary “KGSL” kernel mode +driver. For those unaware there are two “main” kernel mode drivers on +Qualcomm SOCs for the GPU, there is the “MSM”, and “KGSL”. “MSM” is DRM +compliant, and Freedreno already able to run on this driver. “KGSL” is +the proprietary KMD that Qualcomm’s proprietary userspace driver +targets. Now why would you want to run freedreno against KGSL, when MSM +exists? Well there are a few ones, first MSM only really works on an +up-streamed kernel, so if you have to run a down-streamed kernel you can +continue using the version of KGSL that the manufacturer shipped with +your device. Second this allows you to run both the proprietary adreno +driver and the open source freedreno driver on the same device just by +swapping libraries, which can be very nice for quickly testing something +against both drivers.</p> <h2 id="when-drm-isnt-just-drm">When “DRM” isn’t just “DRM”</h2> -<p>When working on a new backend, one of the critical things to do is to make use of as much “common code” as possible. This has a number of benefits, least of all reducing the amount of code you have to write. It also allows reduces the number of bugs that will likely exist as you are relying on well tested code, and it ensures that the backend is mostly likely going to continue to work with new driver updates.</p> -<p>When I started the work for a new backend I looked inside mesa’s <code>src/freedreno/drm</code> folder. This has the current backend code for Freedreno, and its already modularized to support multiple backends. It currently has support for the above mentioned MSM kernel mode driver as well as virtio (a backend that allows Freedreno to be used from within in a virtualized environment). From the name of this path, you would think that the code in this module would only work with kernel mode drivers that implement DRM, but actually there is only a handful of places in this module where DRM support is assumed. This made it a good starting point to introduce the KGSL backend and piggy back off the common code.</p> -<p>For example the <code>drm</code> module has a lot of code to deal with the management of synchronization primitives, buffer objects, and command submit lists. All managed at a abstraction above “DRM” and to re-implement this code would be a bad idea.</p> +<p>When working on a new backend, one of the critical things to do is to +make use of as much “common code” as possible. This has a number of +benefits, least of all reducing the amount of code you have to write. It +also allows reduces the number of bugs that will likely exist as you are +relying on well tested code, and it ensures that the backend is mostly +likely going to continue to work with new driver updates.</p> +<p>When I started the work for a new backend I looked inside mesa’s +<code>src/freedreno/drm</code> folder. This has the current backend code +for Freedreno, and its already modularized to support multiple backends. +It currently has support for the above mentioned MSM kernel mode driver +as well as virtio (a backend that allows Freedreno to be used from +within in a virtualized environment). From the name of this path, you +would think that the code in this module would only work with kernel +mode drivers that implement DRM, but actually there is only a handful of +places in this module where DRM support is assumed. This made it a good +starting point to introduce the KGSL backend and piggy back off the +common code.</p> +<p>For example the <code>drm</code> module has a lot of code to deal +with the management of synchronization primitives, buffer objects, and +command submit lists. All managed at a abstraction above “DRM” and to +re-implement this code would be a bad idea.</p> <h2 id="how-to-get-android-to-behave">How to get Android to behave</h2> -<p>One of this big struggles with getting the KGSL backend working was figuring out how I could get Android to load mesa instead of Qualcomm blob driver that is shipped with the device image. Thankfully a good chunk of this work has already been figured out when the Turnip developers (Turnip is the open source Vulkan implementation for Adreno GPUs) figured out how to get Turnip running on android with KGSL. Thankfully one of my coworkers <a href="https://blogs.igalia.com/dpiliaiev/">Danylo</a> is one of those Turnip developers, and he gave me a lot of guidance on getting Android setup. One thing to watch out for is the outdated instructions <a href="https://docs.mesa3d.org/android.html">here</a>. These instructions <em>almost</em> work, but require some modifications. First if you’re using a more modern version of the Android NDK, the compiler has been replaced with LLVM/Clang, so you need to change which compiler is being used. Second flags like <code>system</code> in the cross compiler script incorrectly set the system as <code>linux</code> instead of <code>android</code>. I had success using the below cross compiler script. Take note that the compiler paths need to be updated to match where you extracted the android NDK on your system.</p> +<p>One of this big struggles with getting the KGSL backend working was +figuring out how I could get Android to load mesa instead of Qualcomm +blob driver that is shipped with the device image. Thankfully a good +chunk of this work has already been figured out when the Turnip +developers (Turnip is the open source Vulkan implementation for Adreno +GPUs) figured out how to get Turnip running on android with KGSL. +Thankfully one of my coworkers <a +href="https://blogs.igalia.com/dpiliaiev/">Danylo</a> is one of those +Turnip developers, and he gave me a lot of guidance on getting Android +setup. One thing to watch out for is the outdated instructions <a +href="https://docs.mesa3d.org/android.html">here</a>. These instructions +<em>almost</em> work, but require some modifications. First if you’re +using a more modern version of the Android NDK, the compiler has been +replaced with LLVM/Clang, so you need to change which compiler is being +used. Second flags like <code>system</code> in the cross compiler script +incorrectly set the system as <code>linux</code> instead of +<code>android</code>. I had success using the below cross compiler +script. Take note that the compiler paths need to be updated to match +where you extracted the android NDK on your system.</p> <pre class="meson"><code>[binaries] ar = &#39;/home/lfryzek/Documents/projects/igalia/freedreno/android-ndk-r25b-linux/android-ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-ar&#39; c = [&#39;ccache&#39;, &#39;/home/lfryzek/Documents/projects/igalia/freedreno/android-ndk-r25b-linux/android-ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android29-clang&#39;] @@ -107,23 +337,136 @@ system = &#39;android&#39; cpu_family = &#39;arm&#39; cpu = &#39;armv8&#39; endian = &#39;little&#39;</code></pre> -<p>Another thing I had to figure out with Android, that was different with these instructions, was how I would get Android to load mesa versions of mesa libraries. That’s when my colleague <a href="https://www.igalia.com/team/mark">Mark</a> pointed out to me that Android is open source and I could just check the source code myself. Sure enough you have find the OpenGL driver loader in <a href="https://android.googlesource.com/platform/frameworks/native/+/master/opengl/libs/EGL/Loader.cpp">Android’s source code</a>. From this code we can that Android will try to load a few different files based on some settings, and in my case it would try to load 3 different shaded libraries in the <code>/vendor/lib64/egl</code> folder, <code>libEGL_adreno.so</code> ,<code>libGLESv1_CM_adreno.so</code>, and <code>libGLESv2.so</code>. I could just replace these libraries with the version built from mesa and voilà, you’re now loading a custom driver! This realization that I could just “read the code” was very powerful in debugging some more android specific issues I ran into, like dealing with gralloc.</p> -<p>Something cool that the opensource Freedreno &amp; Turnip driver developers figured out was getting android to run test OpenGL applications from the adb shell without building android APKs. If you check out the <a href="https://gitlab.freedesktop.org/freedreno/freedreno">freedreno repo</a>, they have an <code>ndk-build.sh</code> script that can build tests in the <code>tests-*</code> folder. The nice benefit of this is that it provides an easy way to run simple test cases without worrying about the android window system integration. Another nifty feature about this repo is the <code>libwrap</code> tool that lets trace the commands being submitted to the GPU.</p> +<p>Another thing I had to figure out with Android, that was different +with these instructions, was how I would get Android to load mesa +versions of mesa libraries. That’s when my colleague <a +href="https://www.igalia.com/team/mark">Mark</a> pointed out to me that +Android is open source and I could just check the source code myself. +Sure enough you have find the OpenGL driver loader in <a +href="https://android.googlesource.com/platform/frameworks/native/+/master/opengl/libs/EGL/Loader.cpp">Android’s +source code</a>. From this code we can that Android will try to load a +few different files based on some settings, and in my case it would try +to load 3 different shaded libraries in the +<code>/vendor/lib64/egl</code> folder, <code>libEGL_adreno.so</code> +,<code>libGLESv1_CM_adreno.so</code>, and <code>libGLESv2.so</code>. I +could just replace these libraries with the version built from mesa and +voilà, you’re now loading a custom driver! This realization that I could +just “read the code” was very powerful in debugging some more android +specific issues I ran into, like dealing with gralloc.</p> +<p>Something cool that the opensource Freedreno &amp; Turnip driver +developers figured out was getting android to run test OpenGL +applications from the adb shell without building android APKs. If you +check out the <a +href="https://gitlab.freedesktop.org/freedreno/freedreno">freedreno +repo</a>, they have an <code>ndk-build.sh</code> script that can build +tests in the <code>tests-*</code> folder. The nice benefit of this is +that it provides an easy way to run simple test cases without worrying +about the android window system integration. Another nifty feature about +this repo is the <code>libwrap</code> tool that lets trace the commands +being submitted to the GPU.</p> <h2 id="what-even-is-gralloc">What even is Gralloc?</h2> -<p>Gralloc is the graphics memory allocated in Android, and the OS will use it to allocate the surface for “windows”. This means that the memory we want to render the display to is managed by gralloc and not our KGSL backend. This means we have to get all the information about this surface from gralloc, and if you look in <code>src/egl/driver/dri2/platform_android.c</code> you will see existing code for handing gralloc. You would think “Hey there is no work for me here then”, but you would be wrong. The handle gralloc provides is hardware specific, and the code in <code>platform_android.c</code> assumes a DRM gralloc implementation. Thankfully the turnip developers had already gone through this struggle and if you look in <code>src/freedreno/vulkan/tu_android.c</code> you can see they have implemented a separate path when a Qualcomm msm implementation of gralloc is detected. I could copy this detection logic and add a separate path to <code>platform_android.c</code>.</p> -<h2 id="working-with-the-freedreno-community">Working with the Freedreno community</h2> -<p>When working on any project (open-source or otherwise), it’s nice to know that you aren’t working alone. Thankfully the <code>#freedreno</code> channel on <code>irc.oftc.net</code> is very active and full of helpful people to answer any questions you may have. While working on the backend, one area I wasn’t really sure how to address was the synchronization code for buffer objects. The backend exposed a function called <code>cpu_prep</code>, This function was just there to call the DRM implementation of <code>cpu_prep</code> on the buffer object. I wasn’t exactly sure how to implement this functionality with KGSL since it doesn’t use DRM buffer objects.</p> -<p>I ended up reaching out to the IRC channel and Rob Clark on the channel explained to me that he was actually working on moving a lot of the code for <code>cpu_prep</code> into common code so that a non-drm driver (like the KGSL backend I was working on) would just need to implement that operation as NOP (no operation).</p> -<h2 id="dealing-with-bugs-reverse-engineering-the-blob">Dealing with bugs &amp; reverse engineering the blob</h2> -<p>I encountered a few different bugs when implementing the KGSL backend, but most of them consisted of me calling KGSL wrong, or handing synchronization incorrectly. Thankfully since Turnip is already running on KGSL, I could just more carefully compare my code to what Turnip is doing and figure out my logical mistake.</p> -<p>Some of the bugs I encountered required the backend interface in Freedreno to be modified to expose per a new per driver implementation of that backend function, instead of just using a common implementation. For example the existing function to map a buffer object into userspace assumed that the same <code>fd</code> for the device could be used for the buffer object in the <code>mmap</code> call. This worked fine for any buffer objects we created through KGSL but would not work for buffer objects created from gralloc (remember the above section on surface memory for windows comming from gralloc). To resolve this issue I exposed a new per backend implementation of “map” where I could take a different path if the buffer object came from gralloc.</p> -<p>While testing the KGSL backend I did encounter a new bug that seems to effect both my new KGSL backend and the Turnip KGSL backend. The bug is an <code>iommu fault</code> that occurs when the surface allocated by gralloc does not have a height that is aligned to 4. The blitting engine on a6xx GPUs copies in 16x4 chunks, so if the height is not aligned by 4 the GPU will try to write to pixels that exists outside the allocated memory. This issue only happens with KGSL backends since we import memory from gralloc, and gralloc allocates exactly enough memory for the surface, with no alignment on the height. If running on any other platform, the <code>fdl</code> (Freedreno Layout) code would be called to compute the minimum required size for a surface which would take into account the alignment requirement for the height. The blob driver Qualcomm didn’t seem to have this problem, even though its getting the exact same buffer from gralloc. So it must be doing something different to handle the none aligned height.</p> -<p>Because this issue relied on gralloc, the application needed to running as an Android APK to get a surface from gralloc. The best way to fix this issue would be to figure out what the blob driver is doing and try to replicate this behavior in Freedreno (assuming it isn’t doing something silly like switch to sysmem rendering). Unfortunately it didn’t look like the libwrap library worked to trace an APK.</p> -<p>The libwrap library relied on a linux feature known as <code>LD_PRELOAD</code> to load <code>libwrap.so</code> when the application starts and replace the system functions like <code>open</code> and <code>ioctl</code> with their own implementation that traces what is being submitted to the KGSL kernel mode driver. Thankfully android exposes this <code>LD_PRELOAD</code> mechanism through its “wrap” interface where you create a propety called <code>wrap.&lt;app-name&gt;</code> with a value <code>LD_PRELOAD=&lt;path to libwrap.so&gt;</code>. Android will then load your library like would be done in a normal linux shell. If you tried to do this with libwrap though you find very quickly that you would get corrupted traces. When android launches your APK, it doesn’t only launch your application, there are different threads for different android system related functions and some of them can also use OpenGL. The libwrap library is not designed to handle multiple threads using KGSL at the same time. After discovering this issue I created a <a href="https://gitlab.freedesktop.org/freedreno/freedreno/-/merge_requests/22">MR</a> that would store the tracing file handles as TLS (thread local storage) preventing the clobbering of the trace file, and also allowing you to view the traces generated by different threads separately from each other.</p> -<p>With this is in hand one could begin investing what the blob driver is doing to handle this unaligned surfaces.</p> +<p>Gralloc is the graphics memory allocated in Android, and the OS will +use it to allocate the surface for “windows”. This means that the memory +we want to render the display to is managed by gralloc and not our KGSL +backend. This means we have to get all the information about this +surface from gralloc, and if you look in +<code>src/egl/driver/dri2/platform_android.c</code> you will see +existing code for handing gralloc. You would think “Hey there is no work +for me here then”, but you would be wrong. The handle gralloc provides +is hardware specific, and the code in <code>platform_android.c</code> +assumes a DRM gralloc implementation. Thankfully the turnip developers +had already gone through this struggle and if you look in +<code>src/freedreno/vulkan/tu_android.c</code> you can see they have +implemented a separate path when a Qualcomm msm implementation of +gralloc is detected. I could copy this detection logic and add a +separate path to <code>platform_android.c</code>.</p> +<h2 id="working-with-the-freedreno-community">Working with the Freedreno +community</h2> +<p>When working on any project (open-source or otherwise), it’s nice to +know that you aren’t working alone. Thankfully the +<code>#freedreno</code> channel on <code>irc.oftc.net</code> is very +active and full of helpful people to answer any questions you may have. +While working on the backend, one area I wasn’t really sure how to +address was the synchronization code for buffer objects. The backend +exposed a function called <code>cpu_prep</code>, This function was just +there to call the DRM implementation of <code>cpu_prep</code> on the +buffer object. I wasn’t exactly sure how to implement this functionality +with KGSL since it doesn’t use DRM buffer objects.</p> +<p>I ended up reaching out to the IRC channel and Rob Clark on the +channel explained to me that he was actually working on moving a lot of +the code for <code>cpu_prep</code> into common code so that a non-drm +driver (like the KGSL backend I was working on) would just need to +implement that operation as NOP (no operation).</p> +<h2 id="dealing-with-bugs-reverse-engineering-the-blob">Dealing with +bugs &amp; reverse engineering the blob</h2> +<p>I encountered a few different bugs when implementing the KGSL +backend, but most of them consisted of me calling KGSL wrong, or handing +synchronization incorrectly. Thankfully since Turnip is already running +on KGSL, I could just more carefully compare my code to what Turnip is +doing and figure out my logical mistake.</p> +<p>Some of the bugs I encountered required the backend interface in +Freedreno to be modified to expose per a new per driver implementation +of that backend function, instead of just using a common implementation. +For example the existing function to map a buffer object into userspace +assumed that the same <code>fd</code> for the device could be used for +the buffer object in the <code>mmap</code> call. This worked fine for +any buffer objects we created through KGSL but would not work for buffer +objects created from gralloc (remember the above section on surface +memory for windows comming from gralloc). To resolve this issue I +exposed a new per backend implementation of “map” where I could take a +different path if the buffer object came from gralloc.</p> +<p>While testing the KGSL backend I did encounter a new bug that seems +to effect both my new KGSL backend and the Turnip KGSL backend. The bug +is an <code>iommu fault</code> that occurs when the surface allocated by +gralloc does not have a height that is aligned to 4. The blitting engine +on a6xx GPUs copies in 16x4 chunks, so if the height is not aligned by 4 +the GPU will try to write to pixels that exists outside the allocated +memory. This issue only happens with KGSL backends since we import +memory from gralloc, and gralloc allocates exactly enough memory for the +surface, with no alignment on the height. If running on any other +platform, the <code>fdl</code> (Freedreno Layout) code would be called +to compute the minimum required size for a surface which would take into +account the alignment requirement for the height. The blob driver +Qualcomm didn’t seem to have this problem, even though its getting the +exact same buffer from gralloc. So it must be doing something different +to handle the none aligned height.</p> +<p>Because this issue relied on gralloc, the application needed to +running as an Android APK to get a surface from gralloc. The best way to +fix this issue would be to figure out what the blob driver is doing and +try to replicate this behavior in Freedreno (assuming it isn’t doing +something silly like switch to sysmem rendering). Unfortunately it +didn’t look like the libwrap library worked to trace an APK.</p> +<p>The libwrap library relied on a linux feature known as +<code>LD_PRELOAD</code> to load <code>libwrap.so</code> when the +application starts and replace the system functions like +<code>open</code> and <code>ioctl</code> with their own implementation +that traces what is being submitted to the KGSL kernel mode driver. +Thankfully android exposes this <code>LD_PRELOAD</code> mechanism +through its “wrap” interface where you create a propety called +<code>wrap.&lt;app-name&gt;</code> with a value +<code>LD_PRELOAD=&lt;path to libwrap.so&gt;</code>. Android will then +load your library like would be done in a normal linux shell. If you +tried to do this with libwrap though you find very quickly that you +would get corrupted traces. When android launches your APK, it doesn’t +only launch your application, there are different threads for different +android system related functions and some of them can also use OpenGL. +The libwrap library is not designed to handle multiple threads using +KGSL at the same time. After discovering this issue I created a <a +href="https://gitlab.freedesktop.org/freedreno/freedreno/-/merge_requests/22">MR</a> +that would store the tracing file handles as TLS (thread local storage) +preventing the clobbering of the trace file, and also allowing you to +view the traces generated by different threads separately from each +other.</p> +<p>With this is in hand one could begin investing what the blob driver +is doing to handle this unaligned surfaces.</p> <h2 id="whats-next">What’s next?</h2> -<p>Well the next obvious thing to fix is the aligned height issue which is still open. I’ve also worked on upstreaming my changes with this <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21570">WIP MR</a>.</p> +<p>Well the next obvious thing to fix is the aligned height issue which +is still open. I’ve also worked on upstreaming my changes with this <a +href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21570">WIP +MR</a>.</p> <figure> -<img src="/assets/freedreno/3d-mark.png" alt="Freedreno running 3d-mark" /><figcaption aria-hidden="true">Freedreno running 3d-mark</figcaption> +<img src="/assets/freedreno/3d-mark.png" +alt="Freedreno running 3d-mark" /> +<figcaption aria-hidden="true">Freedreno running 3d-mark</figcaption> </figure> </description><pubDate>Tue, 28 Feb 2023 05:00:00 -0000</pubDate><guid>https://fryzekconcepts.com/notes/freedreno_journey.html</guid></item></channel></rss>
\ No newline at end of file diff --git a/html/index.html b/html/index.html index 957740d..aedebcd 100644 --- a/html/index.html +++ b/html/index.html @@ -20,6 +20,7 @@ <div class="header-links"> <a href="/now.html" class="header-link">Now</a> <a href="/about.html" class="header-link">About</a> + <a rel="me" href="https://mastodon.social/@hazematman">Social</a> </div> </div> <main> @@ -48,54 +49,76 @@ <div class="note-box"> <img src="/assets/freedreno/glinfo_freedreno_preview.png"> <h2>Journey Through Freedreno</h2> - <p>Android running FreedrenoAs part of my training at Igalia I’ve been attempting to write a new ...</p> + <p>Android running FreedrenoAs part of my training at Igalia + I’ve been attempting to write a new ...</p> </div> </a> <a href="/notes/rasterizing-triangles.html" class="note-link"> <div class="note-box"> <img src="/assets/2022-04-03-rasterizing-triangles/Screenshot-from-2022-04-03-13-43-13.png"> <h2>Rasterizing Triangles</h2> - <p>Lately I’ve been trying to implement a software renderer following the algorithm described by ...</p> + <p>Lately I’ve been trying to implement a software renderer + following the algorithm described by ...</p> </div> </a> <a href="/notes/global_game_jam_2023.html" class="note-link"> <div class="note-box"> <img src="/assets/global_game_jam_2023/screen_shot_2.png"> <h2>Global Game Jam 2023 - GI Jam</h2> - <p>At the beginning of this month I participated in the Games Institutes’s Global Game Jam event. ...</p> + <p>At the beginning of this month I participated in the + Games Institutes’s Global Game Jam event. ...</p> </div> </a> <a href="/notes/2022_igalia_graphics_team.html" class="note-link"> <div class="note-box"> <img src="/assets/igalia_logo.png"> <h2>2022 Graphics Team Contributions at Igalia</h2> - <p>This year I started a new job working with Igalia’s Graphics Team. For those of you who don’t ...</p> + <p>This year I started a new job working with Igalia’s + Graphics Team. For those of you who don’t ...</p> </div> </a> <a href="/notes/n64brew-gamejam-2021.html" class="note-link"> <div class="note-box"> <img src="/assets/2021-12-10-n64brew-gamejam-2021/bug_3.png"> <h2>N64Brew GameJam 2021</h2> - <p>So this year, myself and two others decided to participate together in the N64Brew homebrew where ...</p> + <p>So this year, myself and two others decided to + participate together in the N64Brew homebrew where ...</p> </div> </a> <a href="/notes/digital_garden.html" class="note-link"> <div class="note-box"> <h2>Digital Garden</h2> - <p>After reading Maggie Appleton page on digital gardens I was inspired to convert my own website into a digital garden.I have many half baked ideas that I seem to be able to finish. Some of them get to a published state like and , but many of them never make it to the published state. The idea of digital garden seems very appealing to me, as it encourages you to post on a topic even if you haven’t made it publishable yet.How this site works - I wanted a bit of challenge when putting together ...</p> + <p>After reading Maggie Appleton page on digital gardens I + was inspired to convert my own website into a digital + garden.I have many half baked ideas that I seem to be able + to finish. Some of them get to a published state like and , + but many of them never make it to the published state. The + idea of digital garden seems very appealing to me, as it + encourages you to post on a topic even if you haven’t made + it publishable yet.How this site works - I wanted a bit of + challenge when putting together ...</p> </div> </a> <a href="/notes/baremetal-risc-v.html" class="note-link"> <div class="note-box"> <img src="/assets/2022-06-09-baremetal-risc-v/PXL_20220609_121350403.jpg"> <h2>Baremetal RISC-V</h2> - <p>After re-watching suckerpinch’s Reverse Emulation video I got inspired to try and replicate what ...</p> + <p>After re-watching suckerpinch’s Reverse Emulation video I + got inspired to try and replicate what ...</p> </div> </a> <a href="/notes/generating-video.html" class="note-link"> <div class="note-box"> <h2>Generating Video</h2> - <p>One thing I’m very interested in is computer graphics. This could be complex 3D graphics or simple 2D graphics. The idea of getting a computer to display visual data fascinates me. One fundamental part of showing visual data is interfacing with a computer monitor. This can be accomplished by generating a video signal that the monitor understands. Below I have written instructions on how an FPGA can be used to generate a video signal. I have specifically worked with the iCEBreaker FPGA but the...</p> + <p>One thing I’m very interested in is computer graphics. + This could be complex 3D graphics or simple 2D graphics. The + idea of getting a computer to display visual data fascinates + me. One fundamental part of showing visual data is + interfacing with a computer monitor. This can be + accomplished by generating a video signal that the monitor + understands. Below I have written instructions on how an + FPGA can be used to generate a video signal. I have + specifically worked with the iCEBreaker FPGA but the...</p> </div> </a> </div> diff --git a/html/notes/2022_igalia_graphics_team.html b/html/notes/2022_igalia_graphics_team.html index 648853d..8ddac71 100644 --- a/html/notes/2022_igalia_graphics_team.html +++ b/html/notes/2022_igalia_graphics_team.html @@ -21,11 +21,13 @@ <div class="header-links"> <a href="/now.html" class="header-link">Now</a> <a href="/about.html" class="header-link">About</a> + <a rel="me" href="https://mastodon.social/@hazematman">Social</a> </div> </div> <main> <div class="page-title-header-container"> - <h1 class="page-title-header">2022 Graphics Team Contributions at Igalia</h1> + <h1 class="page-title-header">2022 Graphics Team Contributions at +Igalia</h1> <div class="page-info-container"> <div class="plant-status"> <img src="/assets/evergreen.svg"> @@ -42,26 +44,76 @@ <div class="note-divider"></div> <div class="main-container"> <div class="note-body"> -<p>This year I started a new job working with <a href="https://www.igalia.com/technology/graphics">Igalia’s Graphics Team</a>. For those of you who don’t know <a href="https://www.igalia.com/">Igalia</a> they are a <a href="https://en.wikipedia.org/wiki/Igalia">“worker-owned, employee-run cooperative model consultancy focused on open source software”</a>.</p> -<p>As a new member of the team, I thought it would be a great idea to summarize the incredible amount of work the team completed in 2022. If you’re interested keep reading!</p> -<h2 id="vulkan-1.2-conformance-on-rpi-4">Vulkan 1.2 Conformance on RPi 4</h2> -<p>One of the big milestones for the team in 2022 was <a href="https://www.khronos.org/conformance/adopters/conformant-products#submission_694">achieving Vulkan 1.2 conformance on the Raspberry Pi 4</a>. The folks over at the Raspberry Pi company wrote a nice <a href="https://www.raspberrypi.com/news/vulkan-update-version-1-2-conformance-for-raspberry-pi-4/">article</a> about the achievement. Igalia has been partnering with the Raspberry Pi company to bring build and improve the graphics driver on all versions of the Raspberry Pi.</p> -<p>The Vulkan 1.2 spec ratification came with a few <a href="https://registry.khronos.org/vulkan/specs/1.2-extensions/html/vkspec.html#versions-1.2">extensions</a> that were promoted to Core. This means a conformant Vulkan 1.2 driver needs to implement those extensions. Alejandro Piñeiro wrote this interesting <a href="https://blogs.igalia.com/apinheiro/2022/05/v3dv-status-update-2022-05-16/">blog post</a> that talks about some of those extensions.</p> -<p>Vulkan 1.2 also came with a number of optional extensions such as <code>VK_KHR_pipeline_executable_properties</code>. My colleague Iago Toral wrote an excellent <a href="https://blogs.igalia.com/itoral/2022/05/09/vk_khr_pipeline_executables/">blog post</a> on how we implemented that extension on the Raspberry Pi 4 and what benefits it provides for debugging.</p> +<p>This year I started a new job working with <a +href="https://www.igalia.com/technology/graphics">Igalia’s Graphics +Team</a>. For those of you who don’t know <a +href="https://www.igalia.com/">Igalia</a> they are a <a +href="https://en.wikipedia.org/wiki/Igalia">“worker-owned, employee-run +cooperative model consultancy focused on open source software”</a>.</p> +<p>As a new member of the team, I thought it would be a great idea to +summarize the incredible amount of work the team completed in 2022. If +you’re interested keep reading!</p> +<h2 id="vulkan-1.2-conformance-on-rpi-4">Vulkan 1.2 Conformance on RPi +4</h2> +<p>One of the big milestones for the team in 2022 was <a +href="https://www.khronos.org/conformance/adopters/conformant-products#submission_694">achieving +Vulkan 1.2 conformance on the Raspberry Pi 4</a>. The folks over at the +Raspberry Pi company wrote a nice <a +href="https://www.raspberrypi.com/news/vulkan-update-version-1-2-conformance-for-raspberry-pi-4/">article</a> +about the achievement. Igalia has been partnering with the Raspberry Pi +company to bring build and improve the graphics driver on all versions +of the Raspberry Pi.</p> +<p>The Vulkan 1.2 spec ratification came with a few <a +href="https://registry.khronos.org/vulkan/specs/1.2-extensions/html/vkspec.html#versions-1.2">extensions</a> +that were promoted to Core. This means a conformant Vulkan 1.2 driver +needs to implement those extensions. Alejandro Piñeiro wrote this +interesting <a +href="https://blogs.igalia.com/apinheiro/2022/05/v3dv-status-update-2022-05-16/">blog +post</a> that talks about some of those extensions.</p> +<p>Vulkan 1.2 also came with a number of optional extensions such as +<code>VK_KHR_pipeline_executable_properties</code>. My colleague Iago +Toral wrote an excellent <a +href="https://blogs.igalia.com/itoral/2022/05/09/vk_khr_pipeline_executables/">blog +post</a> on how we implemented that extension on the Raspberry Pi 4 and +what benefits it provides for debugging.</p> <h2 id="vulkan-1.3-support-on-turnip">Vulkan 1.3 support on Turnip</h2> -<p>Igalia has been heavily supporting the Open-Source Turnip Vulkan driver for Qualcomm Adreno GPUs, and in 2022 we helped it achieve Vulkan 1.3 conformance. Danylo Piliaiev on the graphics team here at Igalia, wrote a great <a href="https://blogs.igalia.com/dpiliaiev/turnip-vulkan-1-3/">blog post</a> on this achievement! One of the biggest challenges for the Turnip driver is that it is a completely reverse-engineered driver that has been built without access to any hardware documentation or reference driver code.</p> -<p>With Vulkan 1.3 conformance has also come the ability to run more commercial games on Adreno GPUs through the use of the DirectX translation layers. If you would like to see more of this check out this <a href="https://blogs.igalia.com/dpiliaiev/turnip-july-2022-update/">post</a> from Danylo where he talks about getting “The Witcher 3”, “The Talos Principle”, and “OMD2” running on the A660 GPU. Outside of Vulkan 1.3 support he also talks about some of the extensions that were implemented to allow “Zink” (the OpenGL over Vulkan driver) to run Turnip, and bring OpenGL 4.6 support to Adreno GPUs.</p> +<p>Igalia has been heavily supporting the Open-Source Turnip Vulkan +driver for Qualcomm Adreno GPUs, and in 2022 we helped it achieve Vulkan +1.3 conformance. Danylo Piliaiev on the graphics team here at Igalia, +wrote a great <a +href="https://blogs.igalia.com/dpiliaiev/turnip-vulkan-1-3/">blog +post</a> on this achievement! One of the biggest challenges for the +Turnip driver is that it is a completely reverse-engineered driver that +has been built without access to any hardware documentation or reference +driver code.</p> +<p>With Vulkan 1.3 conformance has also come the ability to run more +commercial games on Adreno GPUs through the use of the DirectX +translation layers. If you would like to see more of this check out this +<a +href="https://blogs.igalia.com/dpiliaiev/turnip-july-2022-update/">post</a> +from Danylo where he talks about getting “The Witcher 3”, “The Talos +Principle”, and “OMD2” running on the A660 GPU. Outside of Vulkan 1.3 +support he also talks about some of the extensions that were implemented +to allow “Zink” (the OpenGL over Vulkan driver) to run Turnip, and bring +OpenGL 4.6 support to Adreno GPUs.</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/oVFWy25uiXA"></iframe></div></p> <h2 id="vulkan-extensions">Vulkan Extensions</h2> -<p>Several developers on the Graphics Team made several key contributions to Vulkan Extensions and the Vulkan conformance test suite (CTS). My colleague Ricardo Garcia made an excellent <a href="https://rg3.name/202212122137.html">blog post</a> about those contributions. Below I’ve listed what Igalia did for each of the extensions:</p> +<p>Several developers on the Graphics Team made several key +contributions to Vulkan Extensions and the Vulkan conformance test suite +(CTS). My colleague Ricardo Garcia made an excellent <a +href="https://rg3.name/202212122137.html">blog post</a> about those +contributions. Below I’ve listed what Igalia did for each of the +extensions:</p> <ul> <li>VK_EXT_image_2d_view_of_3d <ul> -<li>We reviewed the spec and are listed as contributors to this extension</li> +<li>We reviewed the spec and are listed as contributors to this +extension</li> </ul></li> <li>VK_EXT_shader_module_identifier <ul> -<li>We reviewed the spec, contributed to it, and created tests for this extension</li> +<li>We reviewed the spec, contributed to it, and created tests for this +extension</li> </ul></li> <li>VK_EXT_attachment_feedback_loop_layout <ul> @@ -80,48 +132,177 @@ <li>We wrote tests and reviewed the spec for this extension</li> </ul></li> </ul> -<h2 id="amdgpu-kernel-driver-contributions">AMDGPU kernel driver contributions</h2> -<p>Our resident “Not an AMD expert” Melissa Wen made several contributions to the AMDGPU driver. Those contributions include connecting parts of the <a href="https://lore.kernel.org/amd-gfx/20220329201835.2393141-1-mwen@igalia.com/">pixel blending and post blending code in AMD’s <code>DC</code> module to <code>DRM</code></a> and <a href="https://lore.kernel.org/amd-gfx/20220804161349.3561177-1-mwen@igalia.com/">fixing a bug related to how panel orientation is set when a display is connected</a>. She also had a <a href="https://indico.freedesktop.org/event/2/contributions/50/">presentation at XDC 2022</a>, where she talks about techniques you can use to understand and debug AMDGPU, even when there aren’t hardware docs available.</p> -<p>André Almeida also completed and submitted work on <a href="https://lore.kernel.org/dri-devel/20220714191745.45512-1-andrealmeid@igalia.com/">enabled logging features for the new GFXOFF hardware feature in AMD GPUs</a>. He also created a userspace application (which you can find <a href="https://gitlab.freedesktop.org/andrealmeid/gfxoff_tool">here</a>), that lets you interact with this feature through the <code>debugfs</code> interface. Additionally, he submitted a <a href="https://lore.kernel.org/dri-devel/20220929184307.258331-1-contact@emersion.fr/">patch</a> for async page flips (which he also talked about in his <a href="https://indico.freedesktop.org/event/2/contributions/61/">XDC 2022 presentation</a>) which is still yet to be merged.</p> -<h2 id="modesetting-without-glamor-on-rpi">Modesetting without Glamor on RPi</h2> -<p>Christopher Michael joined the Graphics Team in 2022 and along with Chema Casanova made some key contributions to enabling hardware acceleration and mode setting on the Raspberry Pi without the use of <a href="https://www.freedesktop.org/wiki/Software/Glamor/">Glamor</a> which allows making more video memory available to graphics applications running on a Raspberry Pi.</p> -<p>The older generation Raspberry Pis (1-3) only have a maximum of 256MB of memory available for video memory, and using Glamor will consume part of that video memory. Christopher wrote an excellent <a href="https://blogs.igalia.com/cmichael/2022/05/30/modesetting-a-glamor-less-rpi-adventure/">blog post</a> on this work. Both him and Chema also had a joint presentation at XDC 2022 going into more detail on this work.</p> +<h2 id="amdgpu-kernel-driver-contributions">AMDGPU kernel driver +contributions</h2> +<p>Our resident “Not an AMD expert” Melissa Wen made several +contributions to the AMDGPU driver. Those contributions include +connecting parts of the <a +href="https://lore.kernel.org/amd-gfx/20220329201835.2393141-1-mwen@igalia.com/">pixel +blending and post blending code in AMD’s <code>DC</code> module to +<code>DRM</code></a> and <a +href="https://lore.kernel.org/amd-gfx/20220804161349.3561177-1-mwen@igalia.com/">fixing +a bug related to how panel orientation is set when a display is +connected</a>. She also had a <a +href="https://indico.freedesktop.org/event/2/contributions/50/">presentation +at XDC 2022</a>, where she talks about techniques you can use to +understand and debug AMDGPU, even when there aren’t hardware docs +available.</p> +<p>André Almeida also completed and submitted work on <a +href="https://lore.kernel.org/dri-devel/20220714191745.45512-1-andrealmeid@igalia.com/">enabled +logging features for the new GFXOFF hardware feature in AMD GPUs</a>. He +also created a userspace application (which you can find <a +href="https://gitlab.freedesktop.org/andrealmeid/gfxoff_tool">here</a>), +that lets you interact with this feature through the +<code>debugfs</code> interface. Additionally, he submitted a <a +href="https://lore.kernel.org/dri-devel/20220929184307.258331-1-contact@emersion.fr/">patch</a> +for async page flips (which he also talked about in his <a +href="https://indico.freedesktop.org/event/2/contributions/61/">XDC 2022 +presentation</a>) which is still yet to be merged.</p> +<h2 id="modesetting-without-glamor-on-rpi">Modesetting without Glamor on +RPi</h2> +<p>Christopher Michael joined the Graphics Team in 2022 and along with +Chema Casanova made some key contributions to enabling hardware +acceleration and mode setting on the Raspberry Pi without the use of <a +href="https://www.freedesktop.org/wiki/Software/Glamor/">Glamor</a> +which allows making more video memory available to graphics applications +running on a Raspberry Pi.</p> +<p>The older generation Raspberry Pis (1-3) only have a maximum of 256MB +of memory available for video memory, and using Glamor will consume part +of that video memory. Christopher wrote an excellent <a +href="https://blogs.igalia.com/cmichael/2022/05/30/modesetting-a-glamor-less-rpi-adventure/">blog +post</a> on this work. Both him and Chema also had a joint presentation +at XDC 2022 going into more detail on this work.</p> <h2 id="linux-format-magazine-column">Linux Format Magazine Column</h2> -<p>Our very own Samuel Iglesias had a column published in Linux Format Magazine. It’s a short column about reaching Vulkan 1.1 conformance for v3dv & Turnip Vulkan drivers, and how Open-Source GPU drivers can go from a “hobby project” to the defacto driver for the platform. Check it out on page 7 of <a href="https://linuxformat.com/linux-format-288.html">issue #288</a>!</p> +<p>Our very own Samuel Iglesias had a column published in Linux Format +Magazine. It’s a short column about reaching Vulkan 1.1 conformance for +v3dv & Turnip Vulkan drivers, and how Open-Source GPU drivers can go +from a “hobby project” to the defacto driver for the platform. Check it +out on page 7 of <a +href="https://linuxformat.com/linux-format-288.html">issue #288</a>!</p> <h2 id="xdc-2022">XDC 2022</h2> -<p>X.Org Developers Conference is one of the big conferences for us here at the Graphics Team. Last year at XDC 2022 our Team presented 5 talks in Minneapolis, Minnesota. XDC 2022 took place towards the end of the year in October, so it provides some good context on how the team closed out the year. If you didn’t attend or missed their presentation, here’s a breakdown:</p> -<h3 id="replacing-the-geometry-pipeline-with-mesh-shaders-ricardo-garcía"><a href="https://indico.freedesktop.org/event/2/contributions/48/">“Replacing the geometry pipeline with mesh shaders”</a> (Ricardo García)</h3> -<p>Ricardo presents what exactly mesh shaders are in Vulkan. He made many contributions to this extension including writing 1000s of CTS tests for this extension with a <a href="https://rg3.name/202210222107.html">blog post</a> on his presentation that should check out!</p> +<p>X.Org Developers Conference is one of the big conferences for us here +at the Graphics Team. Last year at XDC 2022 our Team presented 5 talks +in Minneapolis, Minnesota. XDC 2022 took place towards the end of the +year in October, so it provides some good context on how the team closed +out the year. If you didn’t attend or missed their presentation, here’s +a breakdown:</p> +<h3 +id="replacing-the-geometry-pipeline-with-mesh-shaders-ricardo-garcía"><a +href="https://indico.freedesktop.org/event/2/contributions/48/">“Replacing +the geometry pipeline with mesh shaders”</a> (Ricardo García)</h3> +<p>Ricardo presents what exactly mesh shaders are in Vulkan. He made +many contributions to this extension including writing 1000s of CTS +tests for this extension with a <a +href="https://rg3.name/202210222107.html">blog post</a> on his +presentation that should check out!</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/aRNJ4xj_nDs"></iframe></div></p> -<h3 id="status-of-vulkan-on-raspberry-pi-iago-toral"><a href="https://indico.freedesktop.org/event/2/contributions/68/">“Status of Vulkan on Raspberry Pi”</a> (Iago Toral)</h3> -<p>Iago goes into detail about the current status of the Raspberry Pi Vulkan driver. He talks about achieving Vulkan 1.2 conformance, as well as some of the challenges the team had to solve due to hardware limitations of the Broadcom GPU.</p> +<h3 id="status-of-vulkan-on-raspberry-pi-iago-toral"><a +href="https://indico.freedesktop.org/event/2/contributions/68/">“Status +of Vulkan on Raspberry Pi”</a> (Iago Toral)</h3> +<p>Iago goes into detail about the current status of the Raspberry Pi +Vulkan driver. He talks about achieving Vulkan 1.2 conformance, as well +as some of the challenges the team had to solve due to hardware +limitations of the Broadcom GPU.</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/GM9IojyzCVM"></iframe></div></p> -<h3 id="enable-hardware-acceleration-for-gl-applications-without-glamor-on-xorg-modesetting-driver-jose-maría-casanova-christopher-michael"><a href="https://indico.freedesktop.org/event/2/contributions/60/">“Enable hardware acceleration for GL applications without Glamor on Xorg modesetting driver”</a> (Jose María Casanova, Christopher Michael)</h3> -<p>Chema and Christopher talk about the challenges they had to solve to enable hardware acceleration on the Raspberry Pi without Glamor.</p> +<h3 +id="enable-hardware-acceleration-for-gl-applications-without-glamor-on-xorg-modesetting-driver-jose-maría-casanova-christopher-michael"><a +href="https://indico.freedesktop.org/event/2/contributions/60/">“Enable +hardware acceleration for GL applications without Glamor on Xorg +modesetting driver”</a> (Jose María Casanova, Christopher Michael)</h3> +<p>Chema and Christopher talk about the challenges they had to solve to +enable hardware acceleration on the Raspberry Pi without Glamor.</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/Bo_MOM7JTeQ"></iframe></div></p> -<h3 id="im-not-an-amd-expert-but-melissa-wen"><a href="https://indico.freedesktop.org/event/2/contributions/50/">“I’m not an AMD expert, but…”</a> (Melissa Wen)</h3> -<p>In this non-technical presentation, Melissa talks about techniques developers can use to understand and debug drivers without access to hardware documentation.</p> +<h3 id="im-not-an-amd-expert-but-melissa-wen"><a +href="https://indico.freedesktop.org/event/2/contributions/50/">“I’m not +an AMD expert, but…”</a> (Melissa Wen)</h3> +<p>In this non-technical presentation, Melissa talks about techniques +developers can use to understand and debug drivers without access to +hardware documentation.</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/CMm-yhsMB7U"></iframe></div></p> -<h3 id="async-page-flip-in-atomic-api-andré-almeida"><a href="https://indico.freedesktop.org/event/2/contributions/61/">“Async page flip in atomic API”</a> (André Almeida)</h3> -<p>André talks about the work that has been done to enable asynchronous page flipping in DRM’s atomic API with an introduction to the topic by explaining about what exactly is asynchronous page flip, and why you would want it.</p> +<h3 id="async-page-flip-in-atomic-api-andré-almeida"><a +href="https://indico.freedesktop.org/event/2/contributions/61/">“Async +page flip in atomic API”</a> (André Almeida)</h3> +<p>André talks about the work that has been done to enable asynchronous +page flipping in DRM’s atomic API with an introduction to the topic by +explaining about what exactly is asynchronous page flip, and why you +would want it.</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/qayPPIfrqtE"></iframe></div></p> <h2 id="fosdem-2022">FOSDEM 2022</h2> -<p>Another important conference for us is FOSDEM, and last year we presented 3 of the 5 talks in the graphics dev room. FOSDEM took place in early February 2022, these talks provide some good context of where the team started in 2022.</p> -<h3 id="the-status-of-turnip-driver-development-hyunjun-ko"><a href="https://archive.fosdem.org/2022/schedule/event/turnip/">The status of Turnip driver development</a> (Hyunjun Ko)</h3> -<p>Hyunjun presented the current state of the Turnip driver, also talking about the difficulties of developing a driver for a platform without hardware documentation. He talks about how Turnip developers reverse engineer the behaviour of the hardware, and then implement that in an open-source driver. He also made a companion <a href="https://blogs.igalia.com/zzoon/graphics/mesa/2022/02/21/complement-story/">blog post</a> to checkout along with his presentation.</p> -<h3 id="v3dv-status-update-for-open-source-vulkan-driver-for-raspberry-pi-4-alejandro-piñeiro"><a href="https://archive.fosdem.org/2022/schedule/event/v3dv/">v3dv: Status Update for Open Source Vulkan Driver for Raspberry Pi 4</a> (Alejandro Piñeiro)</h3> -<p>Igalia has been presenting the status of the v3dv driver since December 2019 and in this presentation, Alejandro talks about the status of the v3dv driver in early 2022. He talks about achieving conformance, the extensions that had to be implemented, and the future plans of the v3dv driver.</p> -<h3 id="fun-with-border-colors-in-vulkan-ricardo-garcia"><a href="https://archive.fosdem.org/2022/schedule/event/vulkan_borders/">Fun with border colors in Vulkan</a> (Ricardo Garcia)</h3> -<p>Ricardo presents about the work he did on the <code>VK_EXT_border_color_swizzle</code> extension in Vulkan. He talks about the specific contributions he did and how the extension fits in with sampling color operations in Vulkan.</p> +<p>Another important conference for us is FOSDEM, and last year we +presented 3 of the 5 talks in the graphics dev room. FOSDEM took place +in early February 2022, these talks provide some good context of where +the team started in 2022.</p> +<h3 id="the-status-of-turnip-driver-development-hyunjun-ko"><a +href="https://archive.fosdem.org/2022/schedule/event/turnip/">The status +of Turnip driver development</a> (Hyunjun Ko)</h3> +<p>Hyunjun presented the current state of the Turnip driver, also +talking about the difficulties of developing a driver for a platform +without hardware documentation. He talks about how Turnip developers +reverse engineer the behaviour of the hardware, and then implement that +in an open-source driver. He also made a companion <a +href="https://blogs.igalia.com/zzoon/graphics/mesa/2022/02/21/complement-story/">blog +post</a> to checkout along with his presentation.</p> +<h3 +id="v3dv-status-update-for-open-source-vulkan-driver-for-raspberry-pi-4-alejandro-piñeiro"><a +href="https://archive.fosdem.org/2022/schedule/event/v3dv/">v3dv: Status +Update for Open Source Vulkan Driver for Raspberry Pi 4</a> (Alejandro +Piñeiro)</h3> +<p>Igalia has been presenting the status of the v3dv driver since +December 2019 and in this presentation, Alejandro talks about the status +of the v3dv driver in early 2022. He talks about achieving conformance, +the extensions that had to be implemented, and the future plans of the +v3dv driver.</p> +<h3 id="fun-with-border-colors-in-vulkan-ricardo-garcia"><a +href="https://archive.fosdem.org/2022/schedule/event/vulkan_borders/">Fun +with border colors in Vulkan</a> (Ricardo Garcia)</h3> +<p>Ricardo presents about the work he did on the +<code>VK_EXT_border_color_swizzle</code> extension in Vulkan. He talks +about the specific contributions he did and how the extension fits in +with sampling color operations in Vulkan.</p> <h2 id="gsoc-igalia-ce">GSoC & Igalia CE</h2> -<p>Last year Melissa & André co-mentored contributors working on introducing KUnit tests to the AMD display driver. This project was hosted as a <a href="https://summerofcode.withgoogle.com/">“Google Summer of Code” (GSoC)</a> project from the X.Org Foundation. If you’re interested in seeing their work Tales da Aparecida, Maíra Canal, Magali Lemes, and Isabella Basso presented their work at the <a href="https://lpc.events/event/16/contributions/1310/">Linux Plumbers Conference 2022</a> and across two talks at XDC 2022. Here you can see their <a href="https://indico.freedesktop.org/event/2/contributions/65/">first</a> presentation and here you can see their <a href="https://indico.freedesktop.org/event/2/contributions/164/">second</a> second presentation.</p> -<p>André & Melissa also mentored two <a href="https://www.igalia.com/coding-experience/">“Igalia Coding Experience” (CE)</a> projects, one related to IGT GPU test tools on the VKMS kernel driver, and the other for IGT GPU test tools on the V3D kernel driver. If you’re interested in reading up on some of that work, Maíra Canal <a href="https://mairacanal.github.io/january-update-finishing-my-igalia-ce/">wrote about her experience</a> being part of the Igalia CE.</p> -<p>Ella Stanforth was also part of the Igalia Coding Experience, being mentored by Iago & Alejandro. They worked on the <code>VK_KHR_sampler_ycbcr_conversion</code> extension for the v3dv driver. Alejandro talks about their work in his <a href="https://blogs.igalia.com/apinheiro/2023/01/v3dv-status-update-2023-01/">blog post here</a>.</p> +<p>Last year Melissa & André co-mentored contributors working on +introducing KUnit tests to the AMD display driver. This project was +hosted as a <a href="https://summerofcode.withgoogle.com/">“Google +Summer of Code” (GSoC)</a> project from the X.Org Foundation. If you’re +interested in seeing their work Tales da Aparecida, Maíra Canal, Magali +Lemes, and Isabella Basso presented their work at the <a +href="https://lpc.events/event/16/contributions/1310/">Linux Plumbers +Conference 2022</a> and across two talks at XDC 2022. Here you can see +their <a +href="https://indico.freedesktop.org/event/2/contributions/65/">first</a> +presentation and here you can see their <a +href="https://indico.freedesktop.org/event/2/contributions/164/">second</a> +second presentation.</p> +<p>André & Melissa also mentored two <a +href="https://www.igalia.com/coding-experience/">“Igalia Coding +Experience” (CE)</a> projects, one related to IGT GPU test tools on the +VKMS kernel driver, and the other for IGT GPU test tools on the V3D +kernel driver. If you’re interested in reading up on some of that work, +Maíra Canal <a +href="https://mairacanal.github.io/january-update-finishing-my-igalia-ce/">wrote +about her experience</a> being part of the Igalia CE.</p> +<p>Ella Stanforth was also part of the Igalia Coding Experience, being +mentored by Iago & Alejandro. They worked on the +<code>VK_KHR_sampler_ycbcr_conversion</code> extension for the v3dv +driver. Alejandro talks about their work in his <a +href="https://blogs.igalia.com/apinheiro/2023/01/v3dv-status-update-2023-01/">blog +post here</a>.</p> <h1 id="whats-next">What’s Next?</h1> -<p>The graphics team is looking forward to having a jam-packed 2023 with just as many if not more contributions to the Open-Source graphics stack! I’m super excited to be part of the team, and hope to see my name in our 2023 recap post!</p> -<p>Also, you might have heard that <a href="https://www.igalia.com/2022/xdc-2023">Igalia will be hosting XDC 2023</a> in the beautiful city of A Coruña! We hope to see you there where there will be many presentations from all the great people working on the Open-Source graphics stack, and most importantly where you can <a href="https://www.youtube.com/watch?v=7hWcu8O9BjM">dream in the Atlantic!</a></p> +<p>The graphics team is looking forward to having a jam-packed 2023 with +just as many if not more contributions to the Open-Source graphics +stack! I’m super excited to be part of the team, and hope to see my name +in our 2023 recap post!</p> +<p>Also, you might have heard that <a +href="https://www.igalia.com/2022/xdc-2023">Igalia will be hosting XDC +2023</a> in the beautiful city of A Coruña! We hope to see you there +where there will be many presentations from all the great people working +on the Open-Source graphics stack, and most importantly where you can <a +href="https://www.youtube.com/watch?v=7hWcu8O9BjM">dream in the +Atlantic!</a></p> <figure> -<img src="https://www.igalia.com/assets/i/news/XDC-event-banner.jpg" alt="Photo of A Coruña" /><figcaption aria-hidden="true">Photo of A Coruña</figcaption> +<img src="https://www.igalia.com/assets/i/news/XDC-event-banner.jpg" +alt="Photo of A Coruña" /> +<figcaption aria-hidden="true">Photo of A Coruña</figcaption> </figure> </div> </div> </main> diff --git a/html/notes/baremetal-risc-v.html b/html/notes/baremetal-risc-v.html index 01c6094..000b21a 100644 --- a/html/notes/baremetal-risc-v.html +++ b/html/notes/baremetal-risc-v.html @@ -21,6 +21,7 @@ <div class="header-links"> <a href="/now.html" class="header-link">Now</a> <a href="/about.html" class="header-link">About</a> + <a rel="me" href="https://mastodon.social/@hazematman">Social</a> </div> </div> <main> @@ -42,15 +43,53 @@ <div class="note-divider"></div> <div class="main-container"> <div class="note-body"> -<p>After re-watching suckerpinch’s <a href="https://www.youtube.com/watch?v=ar9WRwCiSr0">“Reverse Emulation”</a> video I got inspired to try and replicate what he did, but instead do it on an N64. Now my idea here is not to preform reverse emulation on the N64 itself but instead to use the SBC as a cheap way to make a dev focused flash cart. Seeing that sukerpinch was able to meet the timings of the NES bus made me think it might be possible to meet the N64 bus timings taking an approach similar to his.</p> +<p>After re-watching suckerpinch’s <a +href="https://www.youtube.com/watch?v=ar9WRwCiSr0">“Reverse +Emulation”</a> video I got inspired to try and replicate what he did, +but instead do it on an N64. Now my idea here is not to preform reverse +emulation on the N64 itself but instead to use the SBC as a cheap way to +make a dev focused flash cart. Seeing that sukerpinch was able to meet +the timings of the NES bus made me think it might be possible to meet +the N64 bus timings taking an approach similar to his.</p> <h2 id="why-risc-v-baremetal">Why RISC-V Baremetal?</h2> -<p>The answer here is more utilitarian then idealistic, I originally wanted to use a Raspberry Pi since I thought that board may be more accessible if other people want to try and replicate this project. Instead what I found is that it is impossible to procure a Raspberry Pi. Not to be deterred I purchased a <a href="https://linux-sunxi.org/Allwinner_Nezha">“Allwinner Nezha”</a> a while back and its just been collecting dust in my storage. I figured this would be a good project to test the board out on since it has a large amount of RAM (1GB on my board), a fast processor (1 GHz), and accessible GPIO. As for why baremetal? Well one of the big problems suckerpinch ran into was being interrupted by the Linux kernel while his software was running. The board was fast enough to respond to the bus timings but Linux would throw off those timings with preemption. This is why I’m taking the approach to do everything baremetal. Giving 100% of the CPU time to my program emulating the CPU bus.</p> +<p>The answer here is more utilitarian then idealistic, I originally +wanted to use a Raspberry Pi since I thought that board may be more +accessible if other people want to try and replicate this project. +Instead what I found is that it is impossible to procure a Raspberry Pi. +Not to be deterred I purchased a <a +href="https://linux-sunxi.org/Allwinner_Nezha">“Allwinner Nezha”</a> a +while back and its just been collecting dust in my storage. I figured +this would be a good project to test the board out on since it has a +large amount of RAM (1GB on my board), a fast processor (1 GHz), and +accessible GPIO. As for why baremetal? Well one of the big problems +suckerpinch ran into was being interrupted by the Linux kernel while his +software was running. The board was fast enough to respond to the bus +timings but Linux would throw off those timings with preemption. This is +why I’m taking the approach to do everything baremetal. Giving 100% of +the CPU time to my program emulating the CPU bus.</p> <h2 id="risc-v-baremetal-development">RISC-V Baremetal Development</h2> -<p>Below I’ll document how I got a baremetal program running on the Nezha board, to provide guidance to anyone who wants to try doing something like this themselves.</p> +<p>Below I’ll document how I got a baremetal program running on the +Nezha board, to provide guidance to anyone who wants to try doing +something like this themselves.</p> <h3 id="toolchain-setup">Toolchain Setup</h3> -<p>In order to do any RISC-V development we will need to setup a RISC-V toolchain that isn’t tied to a specific OS like linux. Thankfully the RISC-V org set up a simple to use git repo that has a script to build an entire RISC-V toolchain on your machine. Since you’re building the whole toolchain from source this will take some time on my machine (Ryzen 4500u, 16GB of RAM, 1TB PCIe NVMe storage), it took around ~30 minutes to build the whole tool chain. You can find the repo <a href="https://github.com/riscv-collab/riscv-gnu-toolchain">here</a>, and follow the instructions in the <code>Installation (Newlib)</code> section of the README. That will setup a bare bones OS independent toolchain that can use newlib for the cstdlib (not that I am currently using it in my software).</p> +<p>In order to do any RISC-V development we will need to setup a RISC-V +toolchain that isn’t tied to a specific OS like linux. Thankfully the +RISC-V org set up a simple to use git repo that has a script to build an +entire RISC-V toolchain on your machine. Since you’re building the whole +toolchain from source this will take some time on my machine (Ryzen +4500u, 16GB of RAM, 1TB PCIe NVMe storage), it took around ~30 minutes +to build the whole tool chain. You can find the repo <a +href="https://github.com/riscv-collab/riscv-gnu-toolchain">here</a>, and +follow the instructions in the <code>Installation (Newlib)</code> +section of the README. That will setup a bare bones OS independent +toolchain that can use newlib for the cstdlib (not that I am currently +using it in my software).</p> <h3 id="setting-up-a-program">Setting up a Program</h3> -<p>This is probably one of the more complicated steps in baremetal programming as this will involve setting up a linker script, which can sometimes feel like an act of black magic to get right. I’ll try to walk through some linker script basics to show how I setup mine. The linker script <code>linker.ld</code> I’m using is below</p> +<p>This is probably one of the more complicated steps in baremetal +programming as this will involve setting up a linker script, which can +sometimes feel like an act of black magic to get right. I’ll try to walk +through some linker script basics to show how I setup mine. The linker +script <code>linker.ld</code> I’m using is below</p> <pre class="ld"><code>SECTIONS { . = 0x45000000; @@ -80,29 +119,53 @@ *(.comment); } }</code></pre> -<p>The purpose of a linkscript is to describe how our binary will be organized, the script I wrote will do the follow</p> +<p>The purpose of a linkscript is to describe how our binary will be +organized, the script I wrote will do the follow</p> <ol type="1"> -<li>Start the starting address offset to <code>0x45000000</code>, This is the address we are going to load the binary into memory, so any pointers in the program will need to be offset from this address</li> -<li>start the binary off with the <code>.text</code> section which will contain the executable code, in the text section we want the code for <code>.text.start</code> to come first. this is the code that implements the “C runtime”. That is this is the code with the <code>_start</code> function that will setup the stack pointer and call into the C <code>main</code> function. After that we will place the text for all the other functions in our binary. We keep this section aligned to <code>4096</code> bytes, and the <code>PROVIDE</code> functions creates a symbol with a pointer to that location in memory. We won’t use the text start and end pointers in our program but it can be useful if you want to know stuff about your binary at runtime of your program</li> -<li>Next is the <code>.data</code> section that has all the data for our program. Here you can see I also added the <code>rodata</code> or read only section to the data section. The reason I did this is because I’m not going to bother with properly implementing read only data. We also keep the data aligned to 16 bytes to ensure that every memory access will be aligned for a 64bit RISCV memory access.</li> -<li>The last “section” is not a real section but some extra padding at the end to reserve the stack. Here I am reserving 4096 (4Kb) for the stack of my program.</li> -<li>Lastly I’m going to discard a few sections that GCC will compile into the binary that I don’t need at all.</li> +<li>Start the starting address offset to <code>0x45000000</code>, This +is the address we are going to load the binary into memory, so any +pointers in the program will need to be offset from this address</li> +<li>start the binary off with the <code>.text</code> section which will +contain the executable code, in the text section we want the code for +<code>.text.start</code> to come first. this is the code that implements +the “C runtime”. That is this is the code with the <code>_start</code> +function that will setup the stack pointer and call into the C +<code>main</code> function. After that we will place the text for all +the other functions in our binary. We keep this section aligned to +<code>4096</code> bytes, and the <code>PROVIDE</code> functions creates +a symbol with a pointer to that location in memory. We won’t use the +text start and end pointers in our program but it can be useful if you +want to know stuff about your binary at runtime of your program</li> +<li>Next is the <code>.data</code> section that has all the data for our +program. Here you can see I also added the <code>rodata</code> or read +only section to the data section. The reason I did this is because I’m +not going to bother with properly implementing read only data. We also +keep the data aligned to 16 bytes to ensure that every memory access +will be aligned for a 64bit RISCV memory access.</li> +<li>The last “section” is not a real section but some extra padding at +the end to reserve the stack. Here I am reserving 4096 (4Kb) for the +stack of my program.</li> +<li>Lastly I’m going to discard a few sections that GCC will compile +into the binary that I don’t need at all.</li> </ol> -<p>Now this probably isn’t the best way to write a linker script. For example the stack is just kind of a hack in it, and I don’t implement the <code>.bss</code> section for zero initialized data.</p> -<p>With this linker script we can now setup a basic program, we can use the code presented below as the <code>main.c</code> file</p> +<p>Now this probably isn’t the best way to write a linker script. For +example the stack is just kind of a hack in it, and I don’t implement +the <code>.bss</code> section for zero initialized data.</p> +<p>With this linker script we can now setup a basic program, we can use +the code presented below as the <code>main.c</code> file</p> <div class="sourceCode" id="cb2"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im"><stdint.h></span></span> <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span> -<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="pp">#define UART0_BASE 0x02500000</span></span> -<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="pp">#define UART0_DATA_REG (UART0_BASE + 0x0000)</span></span> -<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a><span class="pp">#define UART0_USR (UART0_BASE + 0x007c)</span></span> +<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="pp">#define UART0_BASE </span><span class="bn">0x02500000</span></span> +<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="pp">#define UART0_DATA_REG </span><span class="op">(</span>UART0_BASE<span class="pp"> </span><span class="op">+</span><span class="pp"> </span><span class="bn">0x0000</span><span class="op">)</span></span> +<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a><span class="pp">#define UART0_USR </span><span class="op">(</span>UART0_BASE<span class="pp"> </span><span class="op">+</span><span class="pp"> </span><span class="bn">0x007c</span><span class="op">)</span></span> <span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a></span> -<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="pp">#define write_reg(r, v) write_reg_handler((volatile uint32_t*)(r), (v))</span></span> +<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="pp">#define write_reg</span><span class="op">(</span><span class="pp">r</span><span class="op">,</span><span class="pp"> v</span><span class="op">)</span><span class="pp"> write_reg_handler</span><span class="op">((</span><span class="dt">volatile</span><span class="pp"> </span><span class="dt">uint32_t</span><span class="op">*)(</span><span class="pp">r</span><span class="op">),</span><span class="pp"> </span><span class="op">(</span><span class="pp">v</span><span class="op">))</span></span> <span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> write_reg_handler<span class="op">(</span><span class="dt">volatile</span> <span class="dt">uint32_t</span> <span class="op">*</span>reg<span class="op">,</span> <span class="dt">const</span> <span class="dt">uint32_t</span> value<span class="op">)</span></span> <span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a><span class="op">{</span></span> <span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a> reg<span class="op">[</span><span class="dv">0</span><span class="op">]</span> <span class="op">=</span> value<span class="op">;</span></span> <span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span> <span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a></span> -<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a><span class="pp">#define read_reg(r) read_reg_handler((volatile uint32_t*)(r))</span></span> +<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a><span class="pp">#define read_reg</span><span class="op">(</span><span class="pp">r</span><span class="op">)</span><span class="pp"> read_reg_handler</span><span class="op">((</span><span class="dt">volatile</span><span class="pp"> </span><span class="dt">uint32_t</span><span class="op">*)(</span><span class="pp">r</span><span class="op">))</span></span> <span id="cb2-14"><a href="#cb2-14" aria-hidden="true" tabindex="-1"></a><span class="dt">uint32_t</span> read_reg_handler<span class="op">(</span><span class="dt">volatile</span> <span class="dt">uint32_t</span> <span class="op">*</span>reg<span class="op">)</span></span> <span id="cb2-15"><a href="#cb2-15" aria-hidden="true" tabindex="-1"></a><span class="op">{</span></span> <span id="cb2-16"><a href="#cb2-16" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> reg<span class="op">[</span><span class="dv">0</span><span class="op">];</span></span> @@ -122,35 +185,90 @@ <span id="cb2-30"><a href="#cb2-30" aria-hidden="true" tabindex="-1"></a></span> <span id="cb2-31"><a href="#cb2-31" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> main<span class="op">()</span></span> <span id="cb2-32"><a href="#cb2-32" aria-hidden="true" tabindex="-1"></a><span class="op">{</span></span> -<span id="cb2-33"><a href="#cb2-33" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span><span class="op">(</span><span class="dt">const</span> <span class="dt">char</span> <span class="op">*</span>c <span class="op">=</span> hello_world<span class="op">;</span> c<span class="op">[</span><span class="dv">0</span><span class="op">]</span> <span class="op">!=</span> <span class="ch">'\0'</span><span class="op">;</span> c<span class="op">++)</span></span> +<span id="cb2-33"><a href="#cb2-33" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span><span class="op">(</span><span class="dt">const</span> <span class="dt">char</span> <span class="op">*</span>c <span class="op">=</span> hello_world<span class="op">;</span> c<span class="op">[</span><span class="dv">0</span><span class="op">]</span> <span class="op">!=</span> <span class="ch">'</span><span class="sc">\0</span><span class="ch">'</span><span class="op">;</span> c<span class="op">++)</span></span> <span id="cb2-34"><a href="#cb2-34" aria-hidden="true" tabindex="-1"></a> <span class="op">{</span></span> <span id="cb2-35"><a href="#cb2-35" aria-hidden="true" tabindex="-1"></a> _putchar<span class="op">(</span>c<span class="op">);</span></span> <span id="cb2-36"><a href="#cb2-36" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span> <span id="cb2-37"><a href="#cb2-37" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div> -<p>This program will write the string “Hello World!” to the serial port. Now a common question for code like this is how did I know to set all the <code>UART0</code> registers? Well the way to find this information is to look at the datasheet, programmer’s manual, or user manual for the chip you are using. In this case we are using an Allwinner D1 and we can find the user manual with all the registers on the linux-sunxi page <a href="https://linux-sunxi.org/D1">here</a>. On pages 900 to 940 we can see a description on how the serial works for this SoC. I also looked at the schematic <a href="https://dl.linux-sunxi.org/D1/D1_Nezha_development_board_schematic_diagram_20210224.pdf">here</a>, to see that the serial port we have is wired to <code>UART0</code> on the SoC. From here we are relying on uboot to boot the board which will setup the serial port for us, which means we can just write to the UART data register to start printing content to the console.</p> -<p>We will also need need to setup a basic assembly program to setup the stack and call our main function. Below you can see my example called <code>start.S</code></p> -<div class="sourceCode" id="cb3"><pre class="sourceCode asm"><code class="sourceCode fasm"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>.<span class="bu">section</span> <span class="op">.</span>text<span class="op">.</span>start</span> +<p>This program will write the string “Hello World!” to the serial port. +Now a common question for code like this is how did I know to set all +the <code>UART0</code> registers? Well the way to find this information +is to look at the datasheet, programmer’s manual, or user manual for the +chip you are using. In this case we are using an Allwinner D1 and we can +find the user manual with all the registers on the linux-sunxi page <a +href="https://linux-sunxi.org/D1">here</a>. On pages 900 to 940 we can +see a description on how the serial works for this SoC. I also looked at +the schematic <a +href="https://dl.linux-sunxi.org/D1/D1_Nezha_development_board_schematic_diagram_20210224.pdf">here</a>, +to see that the serial port we have is wired to <code>UART0</code> on +the SoC. From here we are relying on uboot to boot the board which will +setup the serial port for us, which means we can just write to the UART +data register to start printing content to the console.</p> +<p>We will also need need to setup a basic assembly program to setup the +stack and call our main function. Below you can see my example called +<code>start.S</code></p> +<div class="sourceCode" id="cb3"><pre +class="sourceCode asm"><code class="sourceCode fasm"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>.<span class="bu">section</span> <span class="op">.</span>text<span class="op">.</span>start</span> <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> .global _start</span> <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="fu">_start:</span></span> <span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> la <span class="kw">sp</span><span class="op">,</span> __stack_start</span> <span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> j main</span></code></pre></div> -<p>This assembly file just creates a section called <code>.text.start</code> and a global symbol for a function called <code>_start</code> which will be the first function our program executes. All this assembly file does is setup the stack pointer register <code>sp</code> to with the address (using the load address <code>la</code> pseudo instruction) to the stack we setup in the linker script, and then call the main function by jumping directly to it.</p> +<p>This assembly file just creates a section called +<code>.text.start</code> and a global symbol for a function called +<code>_start</code> which will be the first function our program +executes. All this assembly file does is setup the stack pointer +register <code>sp</code> to with the address (using the load address +<code>la</code> pseudo instruction) to the stack we setup in the linker +script, and then call the main function by jumping directly to it.</p> <h3 id="building-the-program">Building the Program</h3> -<p>Building the program is pretty straight forward, we need to tell gcc to build the two source files without including the c standard library, and then to link the binary using our linker script. we can do this with the following command</p> +<p>Building the program is pretty straight forward, we need to tell gcc +to build the two source files without including the c standard library, +and then to link the binary using our linker script. we can do this with +the following command</p> <pre><code>riscv64-unknown-elf-gcc march=rv64g --std=gnu99 -msmall-data-limit=0 -c main.c riscv64-unknown-elf-gcc march=rv64g --std=gnu99 -msmall-data-limit=0 -c start.S riscv64-unknown-elf-gcc march=rv64g -march=rv64g -ffreestanding -nostdlib -msmall-data-limit=0 -T linker.ld start.o main.o -o app.elf riscv64-unknown-elf-objcopy -O binary app.elf app.bin</code></pre> -<p>This will build our source files into <code>.o</code> files first, then combine those <code>.o</code> files into a <code>.elf</code> file, finally converting the <code>.elf</code> into a raw binary file where we use the <code>.bin</code> extension. We need a raw binary file as we want to just load our program into memory and begin executing. If we load the <code>.elf</code> file it will have the elf header and other extra data that is not executable in it. In order to run a <code>.elf</code> file we would need an elf loader, which goes beyond the scope of this example.</p> +<p>This will build our source files into <code>.o</code> files first, +then combine those <code>.o</code> files into a <code>.elf</code> file, +finally converting the <code>.elf</code> into a raw binary file where we +use the <code>.bin</code> extension. We need a raw binary file as we +want to just load our program into memory and begin executing. If we +load the <code>.elf</code> file it will have the elf header and other +extra data that is not executable in it. In order to run a +<code>.elf</code> file we would need an elf loader, which goes beyond +the scope of this example.</p> <h3 id="running-the-program">Running the Program</h3> -<p>Now we have the raw binary its time to try and load it. I found that the uboot configuration that comes with the board has pretty limited support for loading binaries. So we are going to take advantage of the <code>loadx</code> command to load the binary over serial. In the uboot terminal we are going to run the command:</p> +<p>Now we have the raw binary its time to try and load it. I found that +the uboot configuration that comes with the board has pretty limited +support for loading binaries. So we are going to take advantage of the +<code>loadx</code> command to load the binary over serial. In the uboot +terminal we are going to run the command:</p> <pre><code>loadx 45000000</code></pre> -<p>Now the next steps will depend on which serial terminal you are using. We want to use the <code>XMODEM</code> protocol to load the binary. In the serial terminal I am using <code>gnu screen</code> you can execute arbitrary programs and send their output to the serial terminal. You can do this by hitting the key combination “CTRL-A + :” and then typing in <code>exec !! sx app.bin</code>. This will send the binary to the serial terminal using the XMODEM protocol. If you are not using GNU screen look up instructions for how to send an XMODEM binary. Now that the binary is loaded we can type in</p> +<p>Now the next steps will depend on which serial terminal you are +using. We want to use the <code>XMODEM</code> protocol to load the +binary. In the serial terminal I am using <code>gnu screen</code> you +can execute arbitrary programs and send their output to the serial +terminal. You can do this by hitting the key combination “CTRL-A + :” +and then typing in <code>exec !! sx app.bin</code>. This will send the +binary to the serial terminal using the XMODEM protocol. If you are not +using GNU screen look up instructions for how to send an XMODEM binary. +Now that the binary is loaded we can type in</p> <pre><code>go 45000000</code></pre> -<p>The should start to execute the program and you should see <code>Hello World!</code> printed to the console!</p> -<p><img src="/assets/2022-06-09-baremetal-risc-v/riscv-terminal.png" /></p> +<p>The should start to execute the program and you should see +<code>Hello World!</code> printed to the console!</p> +<p><img +src="/assets/2022-06-09-baremetal-risc-v/riscv-terminal.png" /></p> <h2 id="whats-next">What’s Next?</h2> -<p>Well the sky is the limit! We have a method to load and run a program that can do anything on the Nezha board now. Looking through the datasheet we can see how to access the GPIO on the board to blink an LED. If you’re really ambitious you could try getting ethernet or USB working in a baremetal environment. I am going to continue on my goal of emulating the N64 cartridge bus which will require me to get GPIO working as well as interrupts on the GPIO lines. If you want to see the current progress of my work you can check it out on github <a href="https://github.com/Hazematman/N64-Cart-Emulator">here</a>.</p> +<p>Well the sky is the limit! We have a method to load and run a program +that can do anything on the Nezha board now. Looking through the +datasheet we can see how to access the GPIO on the board to blink an +LED. If you’re really ambitious you could try getting ethernet or USB +working in a baremetal environment. I am going to continue on my goal of +emulating the N64 cartridge bus which will require me to get GPIO +working as well as interrupts on the GPIO lines. If you want to see the +current progress of my work you can check it out on github <a +href="https://github.com/Hazematman/N64-Cart-Emulator">here</a>.</p> </div> </div> </main> </body> diff --git a/html/notes/digital_garden.html b/html/notes/digital_garden.html index a8f14dc..572956b 100644 --- a/html/notes/digital_garden.html +++ b/html/notes/digital_garden.html @@ -21,6 +21,7 @@ <div class="header-links"> <a href="/now.html" class="header-link">Now</a> <a href="/about.html" class="header-link">About</a> + <a rel="me" href="https://mastodon.social/@hazematman">Social</a> </div> </div> <main> @@ -42,11 +43,27 @@ <div class="note-divider"></div> <div class="main-container"> <div class="note-body"> -<p>After reading Maggie Appleton page on <a href="https://maggieappleton.com/garden-history">digital gardens</a> I was inspired to convert my own website into a digital garden.</p> -<p>I have many half baked ideas that I seem to be able to finish. Some of them get to a published state like <a href="/notes/rasterizing-triangles.html">Rasterizing Triangles</a> and <a href="/notes/baremetal-risc-v.html">Baremetal RISC-V</a>, but many of them never make it to the published state. The idea of digital garden seems very appealing to me, as it encourages you to post on a topic even if you haven’t made it “publishable” yet.</p> +<p>After reading Maggie Appleton page on <a +href="https://maggieappleton.com/garden-history">digital gardens</a> I +was inspired to convert my own website into a digital garden.</p> +<p>I have many half baked ideas that I seem to be able to finish. Some +of them get to a published state like <a +href="/notes/rasterizing-triangles.html">Rasterizing Triangles</a> and +<a href="/notes/baremetal-risc-v.html">Baremetal RISC-V</a>, but many of +them never make it to the published state. The idea of digital garden +seems very appealing to me, as it encourages you to post on a topic even +if you haven’t made it “publishable” yet.</p> <h2 id="how-this-site-works">How this site works</h2> -<p>I wanted a bit of challenge when putting together this website as I don’t do a lot of web development in my day to day life, so I thought it would be a good way to learn more things. This site has been entirely built from scratch using a custom static site generator I setup with pandoc. It relies on pandoc’s filters to implement some of the classic “Digital Garden” features like back linking. The back linking feature has not been totally developed yet and right now it just provides with a convenient way to link to other notes or pages on this site.</p> -<p>I hope to develop this section more and explain how I got various features in pandoc to work as a static site generator.</p> +<p>I wanted a bit of challenge when putting together this website as I +don’t do a lot of web development in my day to day life, so I thought it +would be a good way to learn more things. This site has been entirely +built from scratch using a custom static site generator I setup with +pandoc. It relies on pandoc’s filters to implement some of the classic +“Digital Garden” features like back linking. The back linking feature +has not been totally developed yet and right now it just provides with a +convenient way to link to other notes or pages on this site.</p> +<p>I hope to develop this section more and explain how I got various +features in pandoc to work as a static site generator.</p> </div> </div> </main> </body> diff --git a/html/notes/freedreno_journey.html b/html/notes/freedreno_journey.html index 03a7dc4..671a12d 100644 --- a/html/notes/freedreno_journey.html +++ b/html/notes/freedreno_journey.html @@ -21,6 +21,7 @@ <div class="header-links"> <a href="/now.html" class="header-link">Now</a> <a href="/about.html" class="header-link">About</a> + <a rel="me" href="https://mastodon.social/@hazematman">Social</a> </div> </div> <main> @@ -43,15 +44,66 @@ <div class="main-container"> <div class="note-body"> <figure> -<img src="/assets/freedreno/glinfo_freedreno.png" alt="Android running Freedreno" /><figcaption aria-hidden="true">Android running Freedreno</figcaption> +<img src="/assets/freedreno/glinfo_freedreno.png" +alt="Android running Freedreno" /> +<figcaption aria-hidden="true">Android running Freedreno</figcaption> </figure> -<p>As part of my training at Igalia I’ve been attempting to write a new backend for Freedreno that targets the proprietary “KGSL” kernel mode driver. For those unaware there are two “main” kernel mode drivers on Qualcomm SOCs for the GPU, there is the “MSM”, and “KGSL”. “MSM” is DRM compliant, and Freedreno already able to run on this driver. “KGSL” is the proprietary KMD that Qualcomm’s proprietary userspace driver targets. Now why would you want to run freedreno against KGSL, when MSM exists? Well there are a few ones, first MSM only really works on an up-streamed kernel, so if you have to run a down-streamed kernel you can continue using the version of KGSL that the manufacturer shipped with your device. Second this allows you to run both the proprietary adreno driver and the open source freedreno driver on the same device just by swapping libraries, which can be very nice for quickly testing something against both drivers.</p> +<p>As part of my training at Igalia I’ve been attempting to write a new +backend for Freedreno that targets the proprietary “KGSL” kernel mode +driver. For those unaware there are two “main” kernel mode drivers on +Qualcomm SOCs for the GPU, there is the “MSM”, and “KGSL”. “MSM” is DRM +compliant, and Freedreno already able to run on this driver. “KGSL” is +the proprietary KMD that Qualcomm’s proprietary userspace driver +targets. Now why would you want to run freedreno against KGSL, when MSM +exists? Well there are a few ones, first MSM only really works on an +up-streamed kernel, so if you have to run a down-streamed kernel you can +continue using the version of KGSL that the manufacturer shipped with +your device. Second this allows you to run both the proprietary adreno +driver and the open source freedreno driver on the same device just by +swapping libraries, which can be very nice for quickly testing something +against both drivers.</p> <h2 id="when-drm-isnt-just-drm">When “DRM” isn’t just “DRM”</h2> -<p>When working on a new backend, one of the critical things to do is to make use of as much “common code” as possible. This has a number of benefits, least of all reducing the amount of code you have to write. It also allows reduces the number of bugs that will likely exist as you are relying on well tested code, and it ensures that the backend is mostly likely going to continue to work with new driver updates.</p> -<p>When I started the work for a new backend I looked inside mesa’s <code>src/freedreno/drm</code> folder. This has the current backend code for Freedreno, and its already modularized to support multiple backends. It currently has support for the above mentioned MSM kernel mode driver as well as virtio (a backend that allows Freedreno to be used from within in a virtualized environment). From the name of this path, you would think that the code in this module would only work with kernel mode drivers that implement DRM, but actually there is only a handful of places in this module where DRM support is assumed. This made it a good starting point to introduce the KGSL backend and piggy back off the common code.</p> -<p>For example the <code>drm</code> module has a lot of code to deal with the management of synchronization primitives, buffer objects, and command submit lists. All managed at a abstraction above “DRM” and to re-implement this code would be a bad idea.</p> +<p>When working on a new backend, one of the critical things to do is to +make use of as much “common code” as possible. This has a number of +benefits, least of all reducing the amount of code you have to write. It +also allows reduces the number of bugs that will likely exist as you are +relying on well tested code, and it ensures that the backend is mostly +likely going to continue to work with new driver updates.</p> +<p>When I started the work for a new backend I looked inside mesa’s +<code>src/freedreno/drm</code> folder. This has the current backend code +for Freedreno, and its already modularized to support multiple backends. +It currently has support for the above mentioned MSM kernel mode driver +as well as virtio (a backend that allows Freedreno to be used from +within in a virtualized environment). From the name of this path, you +would think that the code in this module would only work with kernel +mode drivers that implement DRM, but actually there is only a handful of +places in this module where DRM support is assumed. This made it a good +starting point to introduce the KGSL backend and piggy back off the +common code.</p> +<p>For example the <code>drm</code> module has a lot of code to deal +with the management of synchronization primitives, buffer objects, and +command submit lists. All managed at a abstraction above “DRM” and to +re-implement this code would be a bad idea.</p> <h2 id="how-to-get-android-to-behave">How to get Android to behave</h2> -<p>One of this big struggles with getting the KGSL backend working was figuring out how I could get Android to load mesa instead of Qualcomm blob driver that is shipped with the device image. Thankfully a good chunk of this work has already been figured out when the Turnip developers (Turnip is the open source Vulkan implementation for Adreno GPUs) figured out how to get Turnip running on android with KGSL. Thankfully one of my coworkers <a href="https://blogs.igalia.com/dpiliaiev/">Danylo</a> is one of those Turnip developers, and he gave me a lot of guidance on getting Android setup. One thing to watch out for is the outdated instructions <a href="https://docs.mesa3d.org/android.html">here</a>. These instructions <em>almost</em> work, but require some modifications. First if you’re using a more modern version of the Android NDK, the compiler has been replaced with LLVM/Clang, so you need to change which compiler is being used. Second flags like <code>system</code> in the cross compiler script incorrectly set the system as <code>linux</code> instead of <code>android</code>. I had success using the below cross compiler script. Take note that the compiler paths need to be updated to match where you extracted the android NDK on your system.</p> +<p>One of this big struggles with getting the KGSL backend working was +figuring out how I could get Android to load mesa instead of Qualcomm +blob driver that is shipped with the device image. Thankfully a good +chunk of this work has already been figured out when the Turnip +developers (Turnip is the open source Vulkan implementation for Adreno +GPUs) figured out how to get Turnip running on android with KGSL. +Thankfully one of my coworkers <a +href="https://blogs.igalia.com/dpiliaiev/">Danylo</a> is one of those +Turnip developers, and he gave me a lot of guidance on getting Android +setup. One thing to watch out for is the outdated instructions <a +href="https://docs.mesa3d.org/android.html">here</a>. These instructions +<em>almost</em> work, but require some modifications. First if you’re +using a more modern version of the Android NDK, the compiler has been +replaced with LLVM/Clang, so you need to change which compiler is being +used. Second flags like <code>system</code> in the cross compiler script +incorrectly set the system as <code>linux</code> instead of +<code>android</code>. I had success using the below cross compiler +script. Take note that the compiler paths need to be updated to match +where you extracted the android NDK on your system.</p> <pre class="meson"><code>[binaries] ar = '/home/lfryzek/Documents/projects/igalia/freedreno/android-ndk-r25b-linux/android-ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-ar' c = ['ccache', '/home/lfryzek/Documents/projects/igalia/freedreno/android-ndk-r25b-linux/android-ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android29-clang'] @@ -69,24 +121,137 @@ system = 'android' cpu_family = 'arm' cpu = 'armv8' endian = 'little'</code></pre> -<p>Another thing I had to figure out with Android, that was different with these instructions, was how I would get Android to load mesa versions of mesa libraries. That’s when my colleague <a href="https://www.igalia.com/team/mark">Mark</a> pointed out to me that Android is open source and I could just check the source code myself. Sure enough you have find the OpenGL driver loader in <a href="https://android.googlesource.com/platform/frameworks/native/+/master/opengl/libs/EGL/Loader.cpp">Android’s source code</a>. From this code we can that Android will try to load a few different files based on some settings, and in my case it would try to load 3 different shaded libraries in the <code>/vendor/lib64/egl</code> folder, <code>libEGL_adreno.so</code> ,<code>libGLESv1_CM_adreno.so</code>, and <code>libGLESv2.so</code>. I could just replace these libraries with the version built from mesa and voilà, you’re now loading a custom driver! This realization that I could just “read the code” was very powerful in debugging some more android specific issues I ran into, like dealing with gralloc.</p> -<p>Something cool that the opensource Freedreno & Turnip driver developers figured out was getting android to run test OpenGL applications from the adb shell without building android APKs. If you check out the <a href="https://gitlab.freedesktop.org/freedreno/freedreno">freedreno repo</a>, they have an <code>ndk-build.sh</code> script that can build tests in the <code>tests-*</code> folder. The nice benefit of this is that it provides an easy way to run simple test cases without worrying about the android window system integration. Another nifty feature about this repo is the <code>libwrap</code> tool that lets trace the commands being submitted to the GPU.</p> +<p>Another thing I had to figure out with Android, that was different +with these instructions, was how I would get Android to load mesa +versions of mesa libraries. That’s when my colleague <a +href="https://www.igalia.com/team/mark">Mark</a> pointed out to me that +Android is open source and I could just check the source code myself. +Sure enough you have find the OpenGL driver loader in <a +href="https://android.googlesource.com/platform/frameworks/native/+/master/opengl/libs/EGL/Loader.cpp">Android’s +source code</a>. From this code we can that Android will try to load a +few different files based on some settings, and in my case it would try +to load 3 different shaded libraries in the +<code>/vendor/lib64/egl</code> folder, <code>libEGL_adreno.so</code> +,<code>libGLESv1_CM_adreno.so</code>, and <code>libGLESv2.so</code>. I +could just replace these libraries with the version built from mesa and +voilà, you’re now loading a custom driver! This realization that I could +just “read the code” was very powerful in debugging some more android +specific issues I ran into, like dealing with gralloc.</p> +<p>Something cool that the opensource Freedreno & Turnip driver +developers figured out was getting android to run test OpenGL +applications from the adb shell without building android APKs. If you +check out the <a +href="https://gitlab.freedesktop.org/freedreno/freedreno">freedreno +repo</a>, they have an <code>ndk-build.sh</code> script that can build +tests in the <code>tests-*</code> folder. The nice benefit of this is +that it provides an easy way to run simple test cases without worrying +about the android window system integration. Another nifty feature about +this repo is the <code>libwrap</code> tool that lets trace the commands +being submitted to the GPU.</p> <h2 id="what-even-is-gralloc">What even is Gralloc?</h2> -<p>Gralloc is the graphics memory allocated in Android, and the OS will use it to allocate the surface for “windows”. This means that the memory we want to render the display to is managed by gralloc and not our KGSL backend. This means we have to get all the information about this surface from gralloc, and if you look in <code>src/egl/driver/dri2/platform_android.c</code> you will see existing code for handing gralloc. You would think “Hey there is no work for me here then”, but you would be wrong. The handle gralloc provides is hardware specific, and the code in <code>platform_android.c</code> assumes a DRM gralloc implementation. Thankfully the turnip developers had already gone through this struggle and if you look in <code>src/freedreno/vulkan/tu_android.c</code> you can see they have implemented a separate path when a Qualcomm msm implementation of gralloc is detected. I could copy this detection logic and add a separate path to <code>platform_android.c</code>.</p> -<h2 id="working-with-the-freedreno-community">Working with the Freedreno community</h2> -<p>When working on any project (open-source or otherwise), it’s nice to know that you aren’t working alone. Thankfully the <code>#freedreno</code> channel on <code>irc.oftc.net</code> is very active and full of helpful people to answer any questions you may have. While working on the backend, one area I wasn’t really sure how to address was the synchronization code for buffer objects. The backend exposed a function called <code>cpu_prep</code>, This function was just there to call the DRM implementation of <code>cpu_prep</code> on the buffer object. I wasn’t exactly sure how to implement this functionality with KGSL since it doesn’t use DRM buffer objects.</p> -<p>I ended up reaching out to the IRC channel and Rob Clark on the channel explained to me that he was actually working on moving a lot of the code for <code>cpu_prep</code> into common code so that a non-drm driver (like the KGSL backend I was working on) would just need to implement that operation as NOP (no operation).</p> -<h2 id="dealing-with-bugs-reverse-engineering-the-blob">Dealing with bugs & reverse engineering the blob</h2> -<p>I encountered a few different bugs when implementing the KGSL backend, but most of them consisted of me calling KGSL wrong, or handing synchronization incorrectly. Thankfully since Turnip is already running on KGSL, I could just more carefully compare my code to what Turnip is doing and figure out my logical mistake.</p> -<p>Some of the bugs I encountered required the backend interface in Freedreno to be modified to expose per a new per driver implementation of that backend function, instead of just using a common implementation. For example the existing function to map a buffer object into userspace assumed that the same <code>fd</code> for the device could be used for the buffer object in the <code>mmap</code> call. This worked fine for any buffer objects we created through KGSL but would not work for buffer objects created from gralloc (remember the above section on surface memory for windows comming from gralloc). To resolve this issue I exposed a new per backend implementation of “map” where I could take a different path if the buffer object came from gralloc.</p> -<p>While testing the KGSL backend I did encounter a new bug that seems to effect both my new KGSL backend and the Turnip KGSL backend. The bug is an <code>iommu fault</code> that occurs when the surface allocated by gralloc does not have a height that is aligned to 4. The blitting engine on a6xx GPUs copies in 16x4 chunks, so if the height is not aligned by 4 the GPU will try to write to pixels that exists outside the allocated memory. This issue only happens with KGSL backends since we import memory from gralloc, and gralloc allocates exactly enough memory for the surface, with no alignment on the height. If running on any other platform, the <code>fdl</code> (Freedreno Layout) code would be called to compute the minimum required size for a surface which would take into account the alignment requirement for the height. The blob driver Qualcomm didn’t seem to have this problem, even though its getting the exact same buffer from gralloc. So it must be doing something different to handle the none aligned height.</p> -<p>Because this issue relied on gralloc, the application needed to running as an Android APK to get a surface from gralloc. The best way to fix this issue would be to figure out what the blob driver is doing and try to replicate this behavior in Freedreno (assuming it isn’t doing something silly like switch to sysmem rendering). Unfortunately it didn’t look like the libwrap library worked to trace an APK.</p> -<p>The libwrap library relied on a linux feature known as <code>LD_PRELOAD</code> to load <code>libwrap.so</code> when the application starts and replace the system functions like <code>open</code> and <code>ioctl</code> with their own implementation that traces what is being submitted to the KGSL kernel mode driver. Thankfully android exposes this <code>LD_PRELOAD</code> mechanism through its “wrap” interface where you create a propety called <code>wrap.<app-name></code> with a value <code>LD_PRELOAD=<path to libwrap.so></code>. Android will then load your library like would be done in a normal linux shell. If you tried to do this with libwrap though you find very quickly that you would get corrupted traces. When android launches your APK, it doesn’t only launch your application, there are different threads for different android system related functions and some of them can also use OpenGL. The libwrap library is not designed to handle multiple threads using KGSL at the same time. After discovering this issue I created a <a href="https://gitlab.freedesktop.org/freedreno/freedreno/-/merge_requests/22">MR</a> that would store the tracing file handles as TLS (thread local storage) preventing the clobbering of the trace file, and also allowing you to view the traces generated by different threads separately from each other.</p> -<p>With this is in hand one could begin investing what the blob driver is doing to handle this unaligned surfaces.</p> +<p>Gralloc is the graphics memory allocated in Android, and the OS will +use it to allocate the surface for “windows”. This means that the memory +we want to render the display to is managed by gralloc and not our KGSL +backend. This means we have to get all the information about this +surface from gralloc, and if you look in +<code>src/egl/driver/dri2/platform_android.c</code> you will see +existing code for handing gralloc. You would think “Hey there is no work +for me here then”, but you would be wrong. The handle gralloc provides +is hardware specific, and the code in <code>platform_android.c</code> +assumes a DRM gralloc implementation. Thankfully the turnip developers +had already gone through this struggle and if you look in +<code>src/freedreno/vulkan/tu_android.c</code> you can see they have +implemented a separate path when a Qualcomm msm implementation of +gralloc is detected. I could copy this detection logic and add a +separate path to <code>platform_android.c</code>.</p> +<h2 id="working-with-the-freedreno-community">Working with the Freedreno +community</h2> +<p>When working on any project (open-source or otherwise), it’s nice to +know that you aren’t working alone. Thankfully the +<code>#freedreno</code> channel on <code>irc.oftc.net</code> is very +active and full of helpful people to answer any questions you may have. +While working on the backend, one area I wasn’t really sure how to +address was the synchronization code for buffer objects. The backend +exposed a function called <code>cpu_prep</code>, This function was just +there to call the DRM implementation of <code>cpu_prep</code> on the +buffer object. I wasn’t exactly sure how to implement this functionality +with KGSL since it doesn’t use DRM buffer objects.</p> +<p>I ended up reaching out to the IRC channel and Rob Clark on the +channel explained to me that he was actually working on moving a lot of +the code for <code>cpu_prep</code> into common code so that a non-drm +driver (like the KGSL backend I was working on) would just need to +implement that operation as NOP (no operation).</p> +<h2 id="dealing-with-bugs-reverse-engineering-the-blob">Dealing with +bugs & reverse engineering the blob</h2> +<p>I encountered a few different bugs when implementing the KGSL +backend, but most of them consisted of me calling KGSL wrong, or handing +synchronization incorrectly. Thankfully since Turnip is already running +on KGSL, I could just more carefully compare my code to what Turnip is +doing and figure out my logical mistake.</p> +<p>Some of the bugs I encountered required the backend interface in +Freedreno to be modified to expose per a new per driver implementation +of that backend function, instead of just using a common implementation. +For example the existing function to map a buffer object into userspace +assumed that the same <code>fd</code> for the device could be used for +the buffer object in the <code>mmap</code> call. This worked fine for +any buffer objects we created through KGSL but would not work for buffer +objects created from gralloc (remember the above section on surface +memory for windows comming from gralloc). To resolve this issue I +exposed a new per backend implementation of “map” where I could take a +different path if the buffer object came from gralloc.</p> +<p>While testing the KGSL backend I did encounter a new bug that seems +to effect both my new KGSL backend and the Turnip KGSL backend. The bug +is an <code>iommu fault</code> that occurs when the surface allocated by +gralloc does not have a height that is aligned to 4. The blitting engine +on a6xx GPUs copies in 16x4 chunks, so if the height is not aligned by 4 +the GPU will try to write to pixels that exists outside the allocated +memory. This issue only happens with KGSL backends since we import +memory from gralloc, and gralloc allocates exactly enough memory for the +surface, with no alignment on the height. If running on any other +platform, the <code>fdl</code> (Freedreno Layout) code would be called +to compute the minimum required size for a surface which would take into +account the alignment requirement for the height. The blob driver +Qualcomm didn’t seem to have this problem, even though its getting the +exact same buffer from gralloc. So it must be doing something different +to handle the none aligned height.</p> +<p>Because this issue relied on gralloc, the application needed to +running as an Android APK to get a surface from gralloc. The best way to +fix this issue would be to figure out what the blob driver is doing and +try to replicate this behavior in Freedreno (assuming it isn’t doing +something silly like switch to sysmem rendering). Unfortunately it +didn’t look like the libwrap library worked to trace an APK.</p> +<p>The libwrap library relied on a linux feature known as +<code>LD_PRELOAD</code> to load <code>libwrap.so</code> when the +application starts and replace the system functions like +<code>open</code> and <code>ioctl</code> with their own implementation +that traces what is being submitted to the KGSL kernel mode driver. +Thankfully android exposes this <code>LD_PRELOAD</code> mechanism +through its “wrap” interface where you create a propety called +<code>wrap.<app-name></code> with a value +<code>LD_PRELOAD=<path to libwrap.so></code>. Android will then +load your library like would be done in a normal linux shell. If you +tried to do this with libwrap though you find very quickly that you +would get corrupted traces. When android launches your APK, it doesn’t +only launch your application, there are different threads for different +android system related functions and some of them can also use OpenGL. +The libwrap library is not designed to handle multiple threads using +KGSL at the same time. After discovering this issue I created a <a +href="https://gitlab.freedesktop.org/freedreno/freedreno/-/merge_requests/22">MR</a> +that would store the tracing file handles as TLS (thread local storage) +preventing the clobbering of the trace file, and also allowing you to +view the traces generated by different threads separately from each +other.</p> +<p>With this is in hand one could begin investing what the blob driver +is doing to handle this unaligned surfaces.</p> <h2 id="whats-next">What’s next?</h2> -<p>Well the next obvious thing to fix is the aligned height issue which is still open. I’ve also worked on upstreaming my changes with this <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21570">WIP MR</a>.</p> +<p>Well the next obvious thing to fix is the aligned height issue which +is still open. I’ve also worked on upstreaming my changes with this <a +href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21570">WIP +MR</a>.</p> <figure> -<img src="/assets/freedreno/3d-mark.png" alt="Freedreno running 3d-mark" /><figcaption aria-hidden="true">Freedreno running 3d-mark</figcaption> +<img src="/assets/freedreno/3d-mark.png" +alt="Freedreno running 3d-mark" /> +<figcaption aria-hidden="true">Freedreno running 3d-mark</figcaption> </figure> </div> </div> </main> diff --git a/html/notes/generating-video.html b/html/notes/generating-video.html index 05c4ee4..66d23bc 100644 --- a/html/notes/generating-video.html +++ b/html/notes/generating-video.html @@ -21,6 +21,7 @@ <div class="header-links"> <a href="/now.html" class="header-link">Now</a> <a href="/about.html" class="header-link">About</a> + <a rel="me" href="https://mastodon.social/@hazematman">Social</a> </div> </div> <main> @@ -42,19 +43,35 @@ <div class="note-divider"></div> <div class="main-container"> <div class="note-body"> -<p>One thing I’m very interested in is computer graphics. This could be complex 3D graphics or simple 2D graphics. The idea of getting a computer to display visual data fascinates me. One fundamental part of showing visual data is interfacing with a computer monitor. This can be accomplished by generating a video signal that the monitor understands. Below I have written instructions on how an FPGA can be used to generate a video signal. I have specifically worked with the iCEBreaker FPGA but the theory contained within this should work with any FPGA or device that you can generate the appropriate timings for.</p> +<p>One thing I’m very interested in is computer graphics. This could be +complex 3D graphics or simple 2D graphics. The idea of getting a +computer to display visual data fascinates me. One fundamental part of +showing visual data is interfacing with a computer monitor. This can be +accomplished by generating a video signal that the monitor understands. +Below I have written instructions on how an FPGA can be used to generate +a video signal. I have specifically worked with the iCEBreaker FPGA but +the theory contained within this should work with any FPGA or device +that you can generate the appropriate timings for.</p> <h3 id="tools">Tools</h3> -<p>Hardware used (<a href="https://www.crowdsupply.com/1bitsquared/icebreaker-fpga">link for board</a>):</p> +<p>Hardware used (<a +href="https://www.crowdsupply.com/1bitsquared/icebreaker-fpga">link for +board</a>):</p> <ul> <li>iCEBreaker FPGA</li> <li>iCEBreaker 12-Bit DVI Pmod</li> </ul> <p>Software Used:</p> <ul> -<li>IceStorm FPGA toolchain (<a href="https://github.com/esden/summon-fpga-tools">follow install instructions here</a>)</li> +<li>IceStorm FPGA toolchain (<a +href="https://github.com/esden/summon-fpga-tools">follow install +instructions here</a>)</li> </ul> <h3 id="theory">Theory</h3> -<p>A video signal is composed of several parts, primarily the colour signals and the sync signals. For this DVI Pmod, there is also a data enable signal for the visible screen area. For the example here we are going to be generating a 640x480 60 Hz video signal. Below is a table describing the important data for our video signal.</p> +<p>A video signal is composed of several parts, primarily the colour +signals and the sync signals. For this DVI Pmod, there is also a data +enable signal for the visible screen area. For the example here we are +going to be generating a 640x480 60 Hz video signal. Below is a table +describing the important data for our video signal.</p> <table> <tbody> <tr> @@ -151,21 +168,65 @@ Vertical Back Porch Length <p>The data from this table raises a few questions:</p> <ol type="1"> <li>What is the Pixel Clock?</li> -<li>What is the difference between “Pixels/Lines” and “Visible Pixels/Lines”?</li> +<li>What is the difference between “Pixels/Lines” and “Visible +Pixels/Lines”?</li> <li>What is “Front Porch”, “Sync”, and “Back Porch”?</li> </ol> <h4 id="pixel-clock">Pixel Clock</h4> -<p>The pixel clock is a fairly straightforward idea; this is the rate at which we generate pixels. For video signal generation, the “pixel” is a fundamental building block and we count things in the number of pixels it takes up. Every time the pixel clock “ticks” we have incremented the number of pixels we have processed. So for a 640x480 video signal, a full line is 800 pixels, or 800 clock ticks. For the full 800x525 frame there is 800 ticks x 525 lines, or 420000 clock ticks. If we are running the display at 60 Hz, 420000 pixels per frame are generated 60 times per second. Therefore, 25200000 pixels or clock ticks will pass in one second. From this we can see the pixel clock frequency of 25.175 MHz is roughly equal to 25200000 clock ticks. There is a small deviance from the “true” values here, but monitors are flexible enough to accept this video signal (my monitor reports it as 640x480@60Hz), and all information I can find online says that 25.175 MHz is the value you want to use. Later on we will see that the pixel clock is not required to be exactly 25.175 Mhz.</p> -<h4 id="visible-area-vs-invisible-area">Visible Area vs Invisible Area</h4> -<p><img src="/assets/2020-04-07-generating-video/visible_invisible.png" /></p> -<p>From the above image we can see that a 640x480 video signal actually generates a resolution larger than 640x480. The true resolution we generate is 800x525, but only a 640x480 portion of that signal is visible. The area that is not visible is where we generate the sync signal. In other words, every part of the above image that is black is where a sync signal is being generated.</p> -<h4 id="front-porch-back-porch-sync">Front Porch, Back Porch & Sync</h4> -<p>To better understand the front porch, back porch and sync signal, let’s look at what the horizontal sync signal looks like during the duration of a line:</p> +<p>The pixel clock is a fairly straightforward idea; this is the rate at +which we generate pixels. For video signal generation, the “pixel” is a +fundamental building block and we count things in the number of pixels +it takes up. Every time the pixel clock “ticks” we have incremented the +number of pixels we have processed. So for a 640x480 video signal, a +full line is 800 pixels, or 800 clock ticks. For the full 800x525 frame +there is 800 ticks x 525 lines, or 420000 clock ticks. If we are running +the display at 60 Hz, 420000 pixels per frame are generated 60 times per +second. Therefore, 25200000 pixels or clock ticks will pass in one +second. From this we can see the pixel clock frequency of 25.175 MHz is +roughly equal to 25200000 clock ticks. There is a small deviance from +the “true” values here, but monitors are flexible enough to accept this +video signal (my monitor reports it as 640x480@60Hz), and all +information I can find online says that 25.175 MHz is the value you want +to use. Later on we will see that the pixel clock is not required to be +exactly 25.175 Mhz.</p> +<h4 id="visible-area-vs-invisible-area">Visible Area vs Invisible +Area</h4> +<p><img +src="/assets/2020-04-07-generating-video/visible_invisible.png" /></p> +<p>From the above image we can see that a 640x480 video signal actually +generates a resolution larger than 640x480. The true resolution we +generate is 800x525, but only a 640x480 portion of that signal is +visible. The area that is not visible is where we generate the sync +signal. In other words, every part of the above image that is black is +where a sync signal is being generated.</p> +<h4 id="front-porch-back-porch-sync">Front Porch, Back Porch & +Sync</h4> +<p>To better understand the front porch, back porch and sync signal, +let’s look at what the horizontal sync signal looks like during the +duration of a line:</p> <p><img src="/assets/2020-04-07-generating-video/sync.png" /></p> -<p>From this we can see that the “Front Porch” is the invisible pixels between the visible pixels and the sync pixels, and is represented by a logical one or high signal. The “Sync” is the invisible pixels between the front porch and back porch, and is represented by a logical zero or low signal. The “Back Porch” is the invisible pixels after the sync signal, and is represented by a logical one. For the case of 640x480 video, the visible pixel section lasts for 640 pixels. The front porch section lasts for 16 pixels, after which the sync signal will become a logical zero. This logical zero sync will last for 96 pixels, after which the sync signal will become a logical one again. The back porch will then last for 48 pixels. If you do a quick calculation right now of 640 + 16 + 96 + 48, we get 800 pixels which represents the full horizontal resolution of the display. The vertical sync signal works almost exactly the same, except the vertical sync signal acts on lines.</p> +<p>From this we can see that the “Front Porch” is the invisible pixels +between the visible pixels and the sync pixels, and is represented by a +logical one or high signal. The “Sync” is the invisible pixels between +the front porch and back porch, and is represented by a logical zero or +low signal. The “Back Porch” is the invisible pixels after the sync +signal, and is represented by a logical one. For the case of 640x480 +video, the visible pixel section lasts for 640 pixels. The front porch +section lasts for 16 pixels, after which the sync signal will become a +logical zero. This logical zero sync will last for 96 pixels, after +which the sync signal will become a logical one again. The back porch +will then last for 48 pixels. If you do a quick calculation right now of +640 + 16 + 96 + 48, we get 800 pixels which represents the full +horizontal resolution of the display. The vertical sync signal works +almost exactly the same, except the vertical sync signal acts on +lines.</p> <h3 id="implementation">Implementation</h3> -<p>The first thing we can do that is going to simplify a lot of the following logic is to keep track of which pixel, and which line we are on. The below code block creates two registers to keep track of the current pixel on the line (column) and the current line (line):</p> -<div class="sourceCode" id="cb1"><pre class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">9</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> line<span class="op">;</span></span> +<p>The first thing we can do that is going to simplify a lot of the +following logic is to keep track of which pixel, and which line we are +on. The below code block creates two registers to keep track of the +current pixel on the line (column) and the current line (line):</p> +<div class="sourceCode" id="cb1"><pre +class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">9</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> line<span class="op">;</span></span> <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">9</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> column<span class="op">;</span></span> <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span> <span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="kw">always</span> <span class="op">@(</span><span class="kw">posedge</span> clk <span class="dt">or</span> <span class="kw">posedge</span> reset<span class="op">)</span> <span class="kw">begin</span></span> @@ -187,16 +248,52 @@ Vertical Back Porch Length <span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span> <span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span> <span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"></a><span class="kw">end</span></span></code></pre></div> -<p>This block of Verilog works by first initializing the line and column register to zero on a reset. This is important to make sure that we start from known values, otherwise the line and column register could contain any value and our logic would not work. Next, we check if we are at the bottom of the screen by comparing the current column to 799 (the last pixel in the line) and the current line is 524 (the last line in the frame). If these conditions are both true then we reset the line and column back to zero to signify that we are starting a new frame. The next block checks if the current column equals 799. Because the above if statement failed,we know that we are at the end of the line but not the end of the frame. If this is true we increment the current line count and set the column back to zero to signify that we are starting a new line. The final block simply increments the current pixel count. If we reach this block ,we are neither at the end of the line or the end of the frame so we can simply increment to the next pixel.</p> -<p>Now that we are keeping track of the current column and current line, we can use this information to generate the horizontal and vertical sync signals. From the theory above we know that the sync signal is only low when we are between the front and back porch, at all other times the signal is high. From this we can generate the sync signal with an OR and two compares.</p> -<div class="sourceCode" id="cb2"><pre class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>logic horizontal_sync<span class="op">;</span></span> +<p>This block of Verilog works by first initializing the line and column +register to zero on a reset. This is important to make sure that we +start from known values, otherwise the line and column register could +contain any value and our logic would not work. Next, we check if we are +at the bottom of the screen by comparing the current column to 799 (the +last pixel in the line) and the current line is 524 (the last line in +the frame). If these conditions are both true then we reset the line and +column back to zero to signify that we are starting a new frame. The +next block checks if the current column equals 799. Because the above if +statement failed,we know that we are at the end of the line but not the +end of the frame. If this is true we increment the current line count +and set the column back to zero to signify that we are starting a new +line. The final block simply increments the current pixel count. If we +reach this block ,we are neither at the end of the line or the end of +the frame so we can simply increment to the next pixel.</p> +<p>Now that we are keeping track of the current column and current line, +we can use this information to generate the horizontal and vertical sync +signals. From the theory above we know that the sync signal is only low +when we are between the front and back porch, at all other times the +signal is high. From this we can generate the sync signal with an OR and +two compares.</p> +<div class="sourceCode" id="cb2"><pre +class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>logic horizontal_sync<span class="op">;</span></span> <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>logic vertical_sync<span class="op">;</span></span> <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> horizontal_sync <span class="op">=</span> column <span class="op"><</span> <span class="dv">656</span> <span class="op">||</span> column <span class="op">>=</span> <span class="dv">752</span><span class="op">;</span></span> <span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> vertical_sync <span class="op">=</span> line <span class="op"><</span> <span class="dv">490</span> <span class="op">||</span> line <span class="op">>=</span> <span class="dv">492</span><span class="op">;</span></span></code></pre></div> -<p>Let’s examine the horizontal sync signal more closely. This statement will evaluate to true if the current column is less than 656 or the current column is greater than or equal to 752. This means that the horizontal sync signal will be true except for when the current column is between 656 and 751 inclusively. That is starting on column 656 the horizontal sync signal will become false (low) and will remain that way for the next 96 pixels until we reach pixel 752 where it will return to being true (high). The vertical sync signal will work in the same way except it is turned on based on the current line. Therefore, the signal will remain high when the line is less than 490 and greater than or equal to 492, and will remain low between lines 490 and 491 inclusive.</p> +<p>Let’s examine the horizontal sync signal more closely. This statement +will evaluate to true if the current column is less than 656 or the +current column is greater than or equal to 752. This means that the +horizontal sync signal will be true except for when the current column +is between 656 and 751 inclusively. That is starting on column 656 the +horizontal sync signal will become false (low) and will remain that way +for the next 96 pixels until we reach pixel 752 where it will return to +being true (high). The vertical sync signal will work in the same way +except it is turned on based on the current line. Therefore, the signal +will remain high when the line is less than 490 and greater than or +equal to 492, and will remain low between lines 490 and 491 +inclusive.</p> <h4 id="putting-it-all-together">Putting It All Together</h4> -<p>Now that we have generated the video signal, we need to route it towards the video output connectors on the iCEBreaker 12-bit DVI Pmod. We also need to configure the iCEBreaker FPGA to have the appropriate pixel clock frequency. First to get the correct pixel clock we are going to use the following block of code:</p> -<div class="sourceCode" id="cb3"><pre class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>SB_PLL40_PAD #<span class="op">(</span></span> +<p>Now that we have generated the video signal, we need to route it +towards the video output connectors on the iCEBreaker 12-bit DVI Pmod. +We also need to configure the iCEBreaker FPGA to have the appropriate +pixel clock frequency. First to get the correct pixel clock we are going +to use the following block of code:</p> +<div class="sourceCode" id="cb3"><pre +class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>SB_PLL40_PAD #<span class="op">(</span></span> <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> .DIVR<span class="op">(</span><span class="bn">4'b0000</span><span class="op">),</span></span> <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> .DIVF<span class="op">(</span><span class="bn">7'b1000010</span><span class="op">),</span></span> <span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> .DIVQ<span class="op">(</span><span class="bn">3'b101</span><span class="op">),</span></span> @@ -218,7 +315,13 @@ Vertical Back Porch Length <span id="cb3-20"><a href="#cb3-20" aria-hidden="true" tabindex="-1"></a> .BYPASS<span class="op">(</span><span class="bn">1'b0</span><span class="op">),</span></span> <span id="cb3-21"><a href="#cb3-21" aria-hidden="true" tabindex="-1"></a> .LATCHINPUTVALUE<span class="op">(),</span></span> <span id="cb3-22"><a href="#cb3-22" aria-hidden="true" tabindex="-1"></a><span class="op">);</span></span></code></pre></div> -<p>This block is mainly a copy paste of the PLL setup code from the iCEBreaker examples, but with a few important changes. The DIVR, DIVF, and DIVQ values are changed to create a 25.125 MHz. This is not exactly 25.175 MHz, but it is close enough that the monitor is happy enough and recognizes it as a 640x480@60 Hz signal. These values were found through the “icepll” utility, below is an example of calling this utility from the command line:</p> +<p>This block is mainly a copy paste of the PLL setup code from the +iCEBreaker examples, but with a few important changes. The DIVR, DIVF, +and DIVQ values are changed to create a 25.125 MHz. This is not exactly +25.175 MHz, but it is close enough that the monitor is happy enough and +recognizes it as a 640x480@60 Hz signal. These values were found through +the “icepll” utility, below is an example of calling this utility from +the command line:</p> <pre><code>$ icepll -i 12 -o 25.175 F_PLLIN: 12.000 MHz (given) @@ -234,8 +337,14 @@ DIVF: 66 (7'b1000010) DIVQ: 5 (3'b101) FILTER_RANGE: 1 (3'b001)</code></pre> -<p>From here we can see we had an input clock of 12 MHz (This comes from the FTDI chip on the iCEBreaker board), and we wanted to get a 25.175 MHz output clock. The closest the PLL could generate was a 25.125 MHz clock with the settings provided for the DIVR, DIVF, and DIVQ values.</p> -<p>Now that we have a pixel clock we can wire up the necessary signals for the DVI video out. The DVI Pmod has the following mapping for all of its connectors:</p> +<p>From here we can see we had an input clock of 12 MHz (This comes from +the FTDI chip on the iCEBreaker board), and we wanted to get a 25.175 +MHz output clock. The closest the PLL could generate was a 25.125 MHz +clock with the settings provided for the DIVR, DIVF, and DIVQ +values.</p> +<p>Now that we have a pixel clock we can wire up the necessary signals +for the DVI video out. The DVI Pmod has the following mapping for all of +its connectors:</p> <table> <tbody> <tr> @@ -364,8 +473,16 @@ Vertical Sync </tr> </tbody> </table> -<p>From this we can see that we need 4 bits for each colour channel, a horizontal sync signal, a vertical sync signal, and additionally a data enable signal. The data enable signal is not part of a standard video signal and is just used by the DVI transmitter chip on the Pmod to signify when we are in visible pixel area or invisible pixel area. Therefore we will set the Date enable line when the current column is less than 640 and the current line is less than 480. Based on this we can connect the outputs like so:</p> -<div class="sourceCode" id="cb5"><pre class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">3</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> r<span class="op">;</span></span> +<p>From this we can see that we need 4 bits for each colour channel, a +horizontal sync signal, a vertical sync signal, and additionally a data +enable signal. The data enable signal is not part of a standard video +signal and is just used by the DVI transmitter chip on the Pmod to +signify when we are in visible pixel area or invisible pixel area. +Therefore we will set the Date enable line when the current column is +less than 640 and the current line is less than 480. Based on this we +can connect the outputs like so:</p> +<div class="sourceCode" id="cb5"><pre +class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">3</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> r<span class="op">;</span></span> <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">3</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> g<span class="op">;</span></span> <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">3</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> b<span class="op">;</span></span> <span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>logic data_enable<span class="op">;</span></span> @@ -374,12 +491,18 @@ Vertical Sync <span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a> <span class="op">{</span>r<span class="op">[</span><span class="dv">3</span><span class="op">],</span> r<span class="op">[</span><span class="dv">2</span><span class="op">],</span> g<span class="op">[</span><span class="dv">3</span><span class="op">],</span> g<span class="op">[</span><span class="dv">2</span><span class="op">],</span> r<span class="op">[</span><span class="dv">1</span><span class="op">],</span> r<span class="op">[</span><span class="dv">0</span><span class="op">],</span> g<span class="op">[</span><span class="dv">1</span><span class="op">],</span> g<span class="op">[</span><span class="dv">0</span><span class="op">]};</span></span> <span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> <span class="op">{</span>P1B1<span class="op">,</span> P1B2<span class="op">,</span> P1B3<span class="op">,</span> P1B4<span class="op">,</span> P1B7<span class="op">,</span> P1B8<span class="op">,</span> P1B9<span class="op">,</span> P1B10<span class="op">}</span> <span class="op">=</span> </span> <span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a> <span class="op">{</span>b<span class="op">[</span><span class="dv">3</span><span class="op">],</span> pixel_clock<span class="op">,</span> b<span class="op">[</span><span class="dv">2</span><span class="op">],</span> horizontal_sync<span class="op">,</span> b<span class="op">[</span><span class="dv">1</span><span class="op">],</span> b<span class="op">[</span><span class="dv">0</span><span class="op">],</span> data_enable<span class="op">,</span> vertical_sync<span class="op">};</span></span></code></pre></div> -<p>Now for testing purposes we are going to set the output colour to be fixed to pure red so additional logic to pick a pixel colour is not required for this example. We can do this as shown below:</p> -<div class="sourceCode" id="cb6"><pre class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> r <span class="op">=</span> <span class="bn">4'b1111</span><span class="op">;</span></span> +<p>Now for testing purposes we are going to set the output colour to be +fixed to pure red so additional logic to pick a pixel colour is not +required for this example. We can do this as shown below:</p> +<div class="sourceCode" id="cb6"><pre +class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> r <span class="op">=</span> <span class="bn">4'b1111</span><span class="op">;</span></span> <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> g <span class="op">=</span> <span class="bn">4'b0000</span><span class="op">;</span></span> <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> b <span class="op">=</span> <span class="bn">4'b0000</span><span class="op">;</span></span></code></pre></div> -<p>Putting all of the above code together with whatever additional inputs are required for the iCEBreaker FPGA gives us the following block of code:</p> -<div class="sourceCode" id="cb7"><pre class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> top</span> +<p>Putting all of the above code together with whatever additional +inputs are required for the iCEBreaker FPGA gives us the following block +of code:</p> +<div class="sourceCode" id="cb7"><pre +class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> top</span> <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="op">(</span></span> <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="dt">input</span> CLK<span class="op">,</span></span> <span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a><span class="dt">output</span> LEDR_N<span class="op">,</span></span> @@ -472,16 +595,33 @@ Vertical Sync <span id="cb7-91"><a href="#cb7-91" aria-hidden="true" tabindex="-1"></a><span class="op">);</span></span> <span id="cb7-92"><a href="#cb7-92" aria-hidden="true" tabindex="-1"></a></span> <span id="cb7-93"><a href="#cb7-93" aria-hidden="true" tabindex="-1"></a><span class="kw">endmodule</span></span></code></pre></div> -<p>To build this, you will require a .pcf file describing the pin mapping of the iCEBreaker board. I grabbed mine from the iCEBreaker examples <a href="https://raw.githubusercontent.com/icebreaker-fpga/icebreaker-examples/master/icebreaker.pcf">here</a>. Grab that file and put it in the same folder as the file for the code provided above. We can the run the following commands to generate a binary to program onto the FPGA:</p> +<p>To build this, you will require a .pcf file describing the pin +mapping of the iCEBreaker board. I grabbed mine from the iCEBreaker +examples <a +href="https://raw.githubusercontent.com/icebreaker-fpga/icebreaker-examples/master/icebreaker.pcf">here</a>. +Grab that file and put it in the same folder as the file for the code +provided above. We can the run the following commands to generate a +binary to program onto the FPGA:</p> <pre><code>yosys -ql out.log -p 'synth_ice40 -top top -json out.json' top.sv nextpnr-ice40 --up5k --json out.json --pcf icebreaker.pcf --asc out.asc icetime -d up5k -mtr out.rpt out.asc icepack out.asc out.bin</code></pre> -<p>This will generate an out.bin file that we will need to flash onto the board. Make sure your iCEBreaker FPGA is connected via USB to your computer and you can program it with the following commands.</p> +<p>This will generate an out.bin file that we will need to flash onto +the board. Make sure your iCEBreaker FPGA is connected via USB to your +computer and you can program it with the following commands.</p> <pre><code>iceprog out.bin</code></pre> -<p>Now connect up a video cable (my DVI Pmod has an HDMI connector, but it only carries the DVI video signal) to the board and monitor and you should get results like this:</p> -<p><img src="/assets/2020-04-07-generating-video/IMG_20200407_172119-1-1024x768.jpg" /></p> -<p>You can also see from the monitor settings menu that the video signal was recognized as 640x480@60 Hz. Now the code presented in this post is specific to the iCEBreaker board and the DVI Pmod, but the theory can be applied to any FPGA and any connector that uses a video signal like this. For example you could wire up a DAC with a resistor ladder to generate a VGA signal. The logic for the timings here would be exactly the same if you wanted a 640x480@60 Hz VGA signal.</p> +<p>Now connect up a video cable (my DVI Pmod has an HDMI connector, but +it only carries the DVI video signal) to the board and monitor and you +should get results like this:</p> +<p><img +src="/assets/2020-04-07-generating-video/IMG_20200407_172119-1-1024x768.jpg" /></p> +<p>You can also see from the monitor settings menu that the video signal +was recognized as 640x480@60 Hz. Now the code presented in this post is +specific to the iCEBreaker board and the DVI Pmod, but the theory can be +applied to any FPGA and any connector that uses a video signal like +this. For example you could wire up a DAC with a resistor ladder to +generate a VGA signal. The logic for the timings here would be exactly +the same if you wanted a 640x480@60 Hz VGA signal.</p> </div> </div> </main> </body> diff --git a/html/notes/global_game_jam_2023.html b/html/notes/global_game_jam_2023.html index 8cd3f4b..3a2a3a2 100644 --- a/html/notes/global_game_jam_2023.html +++ b/html/notes/global_game_jam_2023.html @@ -21,6 +21,7 @@ <div class="header-links"> <a href="/now.html" class="header-link">Now</a> <a href="/about.html" class="header-link">About</a> + <a rel="me" href="https://mastodon.social/@hazematman">Social</a> </div> </div> <main> @@ -42,23 +43,109 @@ <div class="note-divider"></div> <div class="main-container"> <div class="note-body"> -<p>At the beginning of this month I participated in the Games Institutes’s Global Game Jam event. <a href="https://uwaterloo.ca/games-institute/">The Games Institute</a> is an organization at my local university (The University of Waterloo) that focuses on games-based research. They host a game jam every school term and this term’s jam happened to coincide with the Global Game Jam. Since this event was open to everyone (and it’s been a few years since I’ve been a student at UW 👴️), I joined up to try and stretch some of my more creative muscles. The event was a 48-hour game jam that began on Friday, February 3rd and ended on Sunday,February 5th.</p> -<p>The game we created is called <a href="https://globalgamejam.org/2023/games/turtle-roots-5">Turtle Roots</a>, and it is a simple resource management game. You play as a magical turtle floating through the sky and collecting water in order to survive. The turtle can spend some of its “nutrients” to grow roots which will allow it to gather water and collect more nutrients. The challenge in the game is trying to survive for as long as possible without running out of water.</p> +<p>At the beginning of this month I participated in the Games +Institutes’s Global Game Jam event. <a +href="https://uwaterloo.ca/games-institute/">The Games Institute</a> is +an organization at my local university (The University of Waterloo) that +focuses on games-based research. They host a game jam every school term +and this term’s jam happened to coincide with the Global Game Jam. Since +this event was open to everyone (and it’s been a few years since I’ve +been a student at UW 👴️), I joined up to try and stretch some of my more +creative muscles. The event was a 48-hour game jam that began on Friday, +February 3rd and ended on Sunday,February 5th.</p> +<p>The game we created is called <a +href="https://globalgamejam.org/2023/games/turtle-roots-5">Turtle +Roots</a>, and it is a simple resource management game. You play as a +magical turtle floating through the sky and collecting water in order to +survive. The turtle can spend some of its “nutrients” to grow roots +which will allow it to gather water and collect more nutrients. The +challenge in the game is trying to survive for as long as possible +without running out of water.</p> <div class="gallery"> -<p><img src="/assets/global_game_jam_2023/screen_shot_1.png" /> <img src="/assets/global_game_jam_2023/screen_shot_2.png" /> <img src="/assets/global_game_jam_2023/screen_shot_3.png" /></p> +<p><img src="/assets/global_game_jam_2023/screen_shot_1.png" /> <img +src="/assets/global_game_jam_2023/screen_shot_2.png" /> <img +src="/assets/global_game_jam_2023/screen_shot_3.png" /></p> <p>Screenshots of Turtle Roots</p> </div> -<p>The game we created is called <a href="https://globalgamejam.org/2023/games/turtle-roots-5">Turtle Roots</a>, and it is a simple resource management game. You play as a magical turtle floating through the sky and collecting water in order to survive. The turtle can spend some of its “nutrients” to grow roots which will allow it to gather water and collect more nutrients. The challenge in the game is trying to survive for as long as possible without running out of water.</p> +<p>The game we created is called <a +href="https://globalgamejam.org/2023/games/turtle-roots-5">Turtle +Roots</a>, and it is a simple resource management game. You play as a +magical turtle floating through the sky and collecting water in order to +survive. The turtle can spend some of its “nutrients” to grow roots +which will allow it to gather water and collect more nutrients. The +challenge in the game is trying to survive for as long as possible +without running out of water.</p> <h2 id="the-team">The Team</h2> -<p>I attended the event solo and quickly partnered up with two other people, who also attended solo. One member had already participated in a game jam before and specialized in art. The other member was attending a game jam for the first time and was looking for the best way they could contribute. Having particular skills for sound, they ended up creating all the audio in our game. This left me as the sole programmer for our team.</p> +<p>I attended the event solo and quickly partnered up with two other +people, who also attended solo. One member had already participated in a +game jam before and specialized in art. The other member was attending a +game jam for the first time and was looking for the best way they could +contribute. Having particular skills for sound, they ended up creating +all the audio in our game. This left me as the sole programmer for our +team.</p> <h2 id="my-game-jam-experiences">My Game Jam Experiences</h2> -<p>In recent years,I participated in a <a href="/notes/n64brew-gamejam-2021.html">Nintendo 64 homebrew game jam</a> and the Puerto Rico Game Developers Association event for the global game jam, submitting <a href="https://globalgamejam.org/2022/games/magnetic-parkour-6">Magnetic Parkour</a>, I also participated in <a href="https://ldjam.com/">Ludum Dare</a> back around 2013 but unfortunately I’ve since lost the link to my submission. While in high school, my friend and I participated in the “Ottawa Tech Jame” (similar to a game jam), sort of worked like a game jam called “Ottawa Tech Jam” submitting <a href="http://www.fastquake.com/projects/zorvwarz/">Zorv Warz</a> and <a href="http://www.fastquake.com/projects/worldseed/">E410</a>. As you can probably tell, I really like gamedev. The desire to build my own video games is actually what originally got me into programming. When I was around 14 years old, I picked up a C++ programming book from the library since I wanted to try to build my own game and I heard most game developers use C++. I used some proprietary game development library (that I can’t recall the name of)to build 2D and 3D games in Windows using C++. I didn’t really get too far into it until high school when I started to learn SFML, SDL, and OpenGL. I also dabbled with Unity during that time as well. However,I’ve always had a strong desire to build most of the foundation of the game myself without using an engine. You can see this desire really come out in the work I did for Zorv Warz, E410, and the N64 homebrew game jam. When working with a team, I feel it can be a lot easier to use a game engine, even if it doesn’t scratch the same itch for me.</p> +<p>In recent years,I participated in a <a +href="/notes/n64brew-gamejam-2021.html">Nintendo 64 homebrew game +jam</a> and the Puerto Rico Game Developers Association event for the +global game jam, submitting <a +href="https://globalgamejam.org/2022/games/magnetic-parkour-6">Magnetic +Parkour</a>, I also participated in <a href="https://ldjam.com/">Ludum +Dare</a> back around 2013 but unfortunately I’ve since lost the link to +my submission. While in high school, my friend and I participated in the +“Ottawa Tech Jame” (similar to a game jam), sort of worked like a game +jam called “Ottawa Tech Jam” submitting <a +href="http://www.fastquake.com/projects/zorvwarz/">Zorv Warz</a> and <a +href="http://www.fastquake.com/projects/worldseed/">E410</a>. As you can +probably tell, I really like gamedev. The desire to build my own video +games is actually what originally got me into programming. When I was +around 14 years old, I picked up a C++ programming book from the library +since I wanted to try to build my own game and I heard most game +developers use C++. I used some proprietary game development library +(that I can’t recall the name of)to build 2D and 3D games in Windows +using C++. I didn’t really get too far into it until high school when I +started to learn SFML, SDL, and OpenGL. I also dabbled with Unity during +that time as well. However,I’ve always had a strong desire to build most +of the foundation of the game myself without using an engine. You can +see this desire really come out in the work I did for Zorv Warz, E410, +and the N64 homebrew game jam. When working with a team, I feel it can +be a lot easier to use a game engine, even if it doesn’t scratch the +same itch for me.</p> <h2 id="the-tech-behind-the-game">The Tech Behind the Game</h2> -<p>Lately I’ve had a growing interest in the game engine called <a href="https://godotengine.org/">Godot</a>, and wanted to use this opportunity to learn the engine more and build a game in it. Godot is interesting to me as its a completely open source game engine, and as you can probably guess from my <a href="/notes/2022_igalia_graphics_team.html">job</a>, open source software as well as free software is something I’m particularly interested in.</p> -<p>Godot is a really powerful game engine that handles a lot of complexity for you. For example,it has a built in parallax background component, that we took advantage of to add more depth to our game. This allows you to control the background scrolling speed for different layer of the background, giving the illusion of depth in a 2D game.</p> -<p>Another powerful feature of Godot is its physics engine. Godot makes it really easy to create physics objects in your scene and have them do interesting stuff. You might be wondering where physics comes into play in our game, and we actually use it for the root animations. I set up a sort of “rag doll” system for the roots to make them flop around in the air as the player moves, really giving a lot more “life” to an otherwise static game.</p> -<p>Godot has a built in scripting language called “GDScript” which is very similar to Python. I’ve really grown to like this language. It has an optional type system you can take advantage of that helps with reducing the number of bugs that exist in your game. It also has great connectivity with the editor. This proved useful as I could “export” variables in the game and allow my team members to modify certain parameters of the game without knowing any programming. This is super helpful with balancing, and more easily allows non-technical members of team to contribute to the game logic in a more concrete way.</p> -<p>Overall I’m very happy with how our game turned out. Last year I tried to participate in a few more game jams, but due to a combination of lack of personal motivation, poor team dynamics, and other factors, none of those game jams panned out. This was the first game jam in a while where I feel like I really connected with my team and I also feel like we made a super polished and fun game in the end.</p> +<p>Lately I’ve had a growing interest in the game engine called <a +href="https://godotengine.org/">Godot</a>, and wanted to use this +opportunity to learn the engine more and build a game in it. Godot is +interesting to me as its a completely open source game engine, and as +you can probably guess from my <a +href="/notes/2022_igalia_graphics_team.html">job</a>, open source +software as well as free software is something I’m particularly +interested in.</p> +<p>Godot is a really powerful game engine that handles a lot of +complexity for you. For example,it has a built in parallax background +component, that we took advantage of to add more depth to our game. This +allows you to control the background scrolling speed for different layer +of the background, giving the illusion of depth in a 2D game.</p> +<p>Another powerful feature of Godot is its physics engine. Godot makes +it really easy to create physics objects in your scene and have them do +interesting stuff. You might be wondering where physics comes into play +in our game, and we actually use it for the root animations. I set up a +sort of “rag doll” system for the roots to make them flop around in the +air as the player moves, really giving a lot more “life” to an otherwise +static game.</p> +<p>Godot has a built in scripting language called “GDScript” which is +very similar to Python. I’ve really grown to like this language. It has +an optional type system you can take advantage of that helps with +reducing the number of bugs that exist in your game. It also has great +connectivity with the editor. This proved useful as I could “export” +variables in the game and allow my team members to modify certain +parameters of the game without knowing any programming. This is super +helpful with balancing, and more easily allows non-technical members of +team to contribute to the game logic in a more concrete way.</p> +<p>Overall I’m very happy with how our game turned out. Last year I +tried to participate in a few more game jams, but due to a combination +of lack of personal motivation, poor team dynamics, and other factors, +none of those game jams panned out. This was the first game jam in a +while where I feel like I really connected with my team and I also feel +like we made a super polished and fun game in the end.</p> </div> </div> </main> </body> diff --git a/html/notes/n64brew-gamejam-2021.html b/html/notes/n64brew-gamejam-2021.html index 56091b4..4b7a09d 100644 --- a/html/notes/n64brew-gamejam-2021.html +++ b/html/notes/n64brew-gamejam-2021.html @@ -21,6 +21,7 @@ <div class="header-links"> <a href="/now.html" class="header-link">Now</a> <a href="/about.html" class="header-link">About</a> + <a rel="me" href="https://mastodon.social/@hazematman">Social</a> </div> </div> <main> @@ -42,30 +43,146 @@ <div class="note-divider"></div> <div class="main-container"> <div class="note-body"> -<p>So this year, myself and two others decided to participate together in the N64Brew homebrew GameJam, where we were supposed to build a homebrew game that would run on a real Nintendo 64. The game jam took place from October 8th until December 8th and was the second GameJam in N64Brew history. Unfortunately, we never ended up finishing the game, but we did build a really cool tech demo. Our project was called “Bug Game”, and if you want to check it out you can find it <a href="https://hazematman.itch.io/bug-game">here</a>. To play the game you’ll need a flash cart to load it on a real Nintendo 64, or you can use an accurate emulator such as <a href="https://ares.dev/">ares</a> or <a href="https://github.com/n64dev/cen64">cen64</a>. The reason an accurate emulator is required is that we made use of this new open source 3D microcode for N64 called “<a href="https://github.com/snacchus/libdragon/tree/ugfx">ugfx</a>”, created by the user Snacchus. This microcode is part of the Libdragon project, which is trying to build a completely open source library and toolchain to build N64 games, instead of relying on the official SDK that has been leaked to the public through liquidation auctions of game companies that have shut down over the years.</p> +<p>So this year, myself and two others decided to participate together +in the N64Brew homebrew GameJam, where we were supposed to build a +homebrew game that would run on a real Nintendo 64. The game jam took +place from October 8th until December 8th and was the second GameJam in +N64Brew history. Unfortunately, we never ended up finishing the game, +but we did build a really cool tech demo. Our project was called +“Bug Game”, and if you want to check it out you can find it <a +href="https://hazematman.itch.io/bug-game">here</a>. To play the game +you’ll need a flash cart to load it on a real Nintendo 64, or you can +use an accurate emulator such as <a +href="https://ares.dev/">ares</a> or <a +href="https://github.com/n64dev/cen64">cen64</a>. The reason an accurate +emulator is required is that we made use of this new open source 3D +microcode for N64 called “<a +href="https://github.com/snacchus/libdragon/tree/ugfx">ugfx</a>”, +created by the user Snacchus. This microcode is part of the Libdragon +project, which is trying to build a completely open source library and +toolchain to build N64 games, instead of relying on the official SDK +that has been leaked to the public through liquidation auctions of game +companies that have shut down over the years.</p> <div class="gallery"> -<p><img src="/assets/2021-12-10-n64brew-gamejam-2021/bug_1.png" /> <img src="/assets/2021-12-10-n64brew-gamejam-2021/bug_2.png" /> <img src="/assets/2021-12-10-n64brew-gamejam-2021/bug_4.png" /> <img src="/assets/2021-12-10-n64brew-gamejam-2021/bug_5.png" /> <img src="/assets/2021-12-10-n64brew-gamejam-2021/bug_3.png" /></p> +<p><img src="/assets/2021-12-10-n64brew-gamejam-2021/bug_1.png" /> <img +src="/assets/2021-12-10-n64brew-gamejam-2021/bug_2.png" /> <img +src="/assets/2021-12-10-n64brew-gamejam-2021/bug_4.png" /> <img +src="/assets/2021-12-10-n64brew-gamejam-2021/bug_5.png" /> <img +src="/assets/2021-12-10-n64brew-gamejam-2021/bug_3.png" /></p> <p>Screenshots of Bug Game</p> </div> <h2 id="libdragon-and-ugfx">Libdragon and UGFX</h2> -<p>Ugfx was a brand new development in the N64 homebrew scene. By complete coincidence, Snacchus happened to release it on September 21st, just weeks before the GameJam was announced. There have been many attempts to create an open source 3D microcode for the N64 (my <a href="https://github.com/Hazematman/libhfx">libhfx</a> project included), but ugfx was the first project to complete with easily usable documentation and examples. This was an exciting development for the open source N64 brew community, as for the first time we could build 3D games that ran on the N64 without using the legally questionable official SDK. I jumped at the opportunity to use this and be one of the first fully 3D games running on Libdragon.</p> -<p>One of the “drawbacks” of ufgx was that it tried to follow a lot of the design decisions the official 3D microcode for Nintendo used. This made it easier for people familiar with the official SDK to jump ship over to libdragon, but also went against the philosophy of the libdragon project to provide simple easy to use APIs. The Nintendo 64 was notoriously difficult to develop for, and one of the reasons for that was because of the extremely low level interface that the official 3D microcodes provided. Honestly writing 3D graphics code on the N64 reminds me more of writing a 3D OpenGL graphics driver (like I do in my day job), than building a graphics application. Unnecessarily increasing the burden of entry to developing 3D games on the Nintendo 64. Now that ugfx has been released, there is an ongoing effort in the community to revamp it and build a more user friendly API to access the 3D functionality of the N64.</p> +<p>Ugfx was a brand new development in the N64 homebrew scene. By +complete coincidence, Snacchus happened to release it on September 21st, +just weeks before the GameJam was announced. There have been many +attempts to create an open source 3D microcode for the N64 (my <a +href="https://github.com/Hazematman/libhfx">libhfx</a> project +included), but ugfx was the first project to complete with easily usable +documentation and examples. This was an exciting development for the +open source N64 brew community, as for the first time we could build 3D +games that ran on the N64 without using the legally questionable +official SDK. I jumped at the opportunity to use this and be one of the +first fully 3D games running on Libdragon.</p> +<p>One of the “drawbacks” of ufgx was that it tried to follow a lot of +the design decisions the official 3D microcode for Nintendo used. This +made it easier for people familiar with the official SDK to jump ship +over to libdragon, but also went against the philosophy of the libdragon +project to provide simple easy to use APIs. The Nintendo 64 was +notoriously difficult to develop for, and one of the reasons for that +was because of the extremely low level interface that the official 3D +microcodes provided. Honestly writing 3D graphics code on the N64 +reminds me more of writing a 3D OpenGL graphics driver (like I do in my +day job), than building a graphics application. Unnecessarily increasing +the burden of entry to developing 3D games on the Nintendo 64. Now that +ugfx has been released, there is an ongoing effort in the community to +revamp it and build a more user friendly API to access the 3D +functionality of the N64.</p> <h2 id="ease-of-development">Ease of development</h2> -<p>One of the major selling points of libdragon is that it tries to provide a standard toolchain with access to things like the c standard library as well as the c++ standard library. To save time on the development of bug game, I decided to put that claim to test. When building a 3D game from scratch two things that can be extremely time consuming are implementing linear algebra operations, and implementing physics that work in 3D. Luckily for modern developers, there are many open source libraries you can use instead of building these from scratch, like <a href="https://glm.g-truc.net/0.9.9/">GLM</a> for math operations and <a href="https://github.com/bulletphysics/bullet3">Bullet</a> for physics. I don’t believe anyone has tried to do this before, but knowing that libdragon provides a pretty standard c++ development environment I tried to build GLM and Bullet to run on the Nintendo 64 and I was successful! Both GLM and Bullet were able to run on real N64 hardware. This saved time during development as we were no longer concerned with having to build our own physics or math libraries. There were some tricks I needed to do to get bullet running on the hardware.</p> -<p>First bullet will allocate more memory for its internal pools than is available on the N64. This is an easy fix as you can adjust the heap sizes when you go to initialize Bullet using the below code:</p> -<div class="sourceCode" id="cb1"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>btDefaultCollisionConstructionInfo constructionInfo <span class="op">=</span> btDefaultCollisionConstructionInfo<span class="op">();</span></span> +<p>One of the major selling points of libdragon is that it tries to +provide a standard toolchain with access to things like the c standard +library as well as the c++ standard library. To save time on the +development of bug game, I decided to put that claim to test. When +building a 3D game from scratch two things that can be extremely time +consuming are implementing linear algebra operations, and implementing +physics that work in 3D. Luckily for modern developers, there are many +open source libraries you can use instead of building these from +scratch, like <a href="https://glm.g-truc.net/0.9.9/">GLM</a> for math +operations and <a +href="https://github.com/bulletphysics/bullet3">Bullet</a> for physics. +I don’t believe anyone has tried to do this before, but knowing that +libdragon provides a pretty standard c++ development environment I tried +to build GLM and Bullet to run on the Nintendo 64 and I was successful! +Both GLM and Bullet were able to run on real N64 hardware. This saved +time during development as we were no longer concerned with having to +build our own physics or math libraries. There were some tricks I needed +to do to get bullet running on the hardware.</p> +<p>First bullet will allocate more memory for its internal pools than is +available on the N64. This is an easy fix as you can adjust the heap +sizes when you go to initialize Bullet using the below code:</p> +<div class="sourceCode" id="cb1"><pre +class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>btDefaultCollisionConstructionInfo constructionInfo <span class="op">=</span> btDefaultCollisionConstructionInfo<span class="op">();</span></span> <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>constructionInfo<span class="op">.</span><span class="va">m_defaultMaxCollisionAlgorithmPoolSize</span> <span class="op">=</span> <span class="dv">512</span><span class="op">;</span></span> <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>constructionInfo<span class="op">.</span><span class="va">m_defaultMaxPersistentManifoldPoolSize</span> <span class="op">=</span> <span class="dv">512</span><span class="op">;</span></span> <span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>btDefaultCollisionConfiguration<span class="op">*</span> collisionConfiguration <span class="op">=</span> <span class="kw">new</span> btDefaultCollisionConfiguration<span class="op">(</span>constructionInfo<span class="op">);</span></span></code></pre></div> -<p>This lets you modify the memory pools and specify a size in KB for the pools to use. The above code will limit the internal pools to 1MB, allowing us to easily run within the 4MB of RAM that is available on the N64 without the expansion pak (an accessory to the N64 that increases the available RAM to 8MB).</p> -<p>The second issue I ran into with bullet was that the N64 floating point unit does not implement de-normalized floating point numbers. Now I’m not an expert in floating point numbers, but from my understanding, de-normalized numbers are a way to represent values between the smallest normal floating point number and zero. This allows floating point calculations to slowly fall towards zero in a more accurate way instead of rounding directly to zero. Since the N64 CPU does not implement de-normalized floats, if any calculations would have generated de-normalized float on the N64 they would instead cause a floating point exception. Because of the way the physics engine works, when two objects got very close together this would cause de-normalized floats to be generated and crash the FPU. This was a problem that had me stumped for a bit, I was concerned I would have to go into bullet’s source code and modify and calculations to round to zero if the result would be small enough. This would have been a monumental effort! Thankfully after digging through the NEC VR4300 programmer’s manual I was able to discover that there is a mode you can set the FPU to, which forces rounding towards zero if a de-normalized float would be generated. I enabled this mode and tested it out, and all my floating point troubles were resolved! I submitted a <a href="https://github.com/DragonMinded/libdragon/pull/195">pull request</a> (that was accepted) to the libdragon project to have this implemented by default, so no one else will run into the same annoying problems I ran into.</p> +<p>This lets you modify the memory pools and specify a size in KB for +the pools to use. The above code will limit the internal pools to 1MB, +allowing us to easily run within the 4MB of RAM that is available on the +N64 without the expansion pak (an accessory to the N64 that increases +the available RAM to 8MB).</p> +<p>The second issue I ran into with bullet was that the N64 floating +point unit does not implement de-normalized floating point numbers. Now +I’m not an expert in floating point numbers, but from my understanding, +de-normalized numbers are a way to represent values between the smallest +normal floating point number and zero. This allows floating point +calculations to slowly fall towards zero in a more accurate way instead +of rounding directly to zero. Since the N64 CPU does not implement +de-normalized floats, if any calculations would have generated +de-normalized float on the N64 they would instead cause a floating point +exception. Because of the way the physics engine works, when two objects +got very close together this would cause de-normalized floats to be +generated and crash the FPU. This was a problem that had me stumped for +a bit, I was concerned I would have to go into bullet’s source code and +modify and calculations to round to zero if the result would be small +enough. This would have been a monumental effort! Thankfully after +digging through the NEC VR4300 programmer’s manual I was able to +discover that there is a mode you can set the FPU to, which forces +rounding towards zero if a de-normalized float would be generated. I +enabled this mode and tested it out, and all my floating point troubles +were resolved! I submitted a <a +href="https://github.com/DragonMinded/libdragon/pull/195">pull +request</a> (that was accepted) to the libdragon project to have this +implemented by default, so no one else will run into the same annoying +problems I ran into.</p> <h2 id="whats-next">What’s next?</h2> -<p>If you decided to play our game you probably would have noticed that it’s not very much of a game. Even though this is the case I’m very happy with how the project turned out, as it’s one of the first 3D libdragon projects to be released. It also easily makes use of amazing open technologies like bullet physics, showcasing just how easy libdragon is to integrate with modern tools and libraries. As I mentioned before in this post there is an effort to take Snacchus’s work and build an easier to use graphics API that feels more like building graphics applications and less like building a graphics driver. The effort for that has already started and I plan to contribute to it. Some of the cool features this effort is bringing are:</p> +<p>If you decided to play our game you probably would have noticed that +it’s not very much of a game. Even though this is the case I’m very +happy with how the project turned out, as it’s one of the first 3D +libdragon projects to be released. It also easily makes use of amazing +open technologies like bullet physics, showcasing just how easy +libdragon is to integrate with modern tools and libraries. As I +mentioned before in this post there is an effort to take Snacchus’s work +and build an easier to use graphics API that feels more like building +graphics applications and less like building a graphics driver. The +effort for that has already started and I plan to contribute to it. Some +of the cool features this effort is bringing are:</p> <ul> -<li>A standard interface for display lists and microcode overlays. Easily allowing multiple different microcodes to seamless run on the RSP and swap out with display list commands. This will be valuable for using the RSP for audio and graphics at the same time.</li> -<li>A new 3D microcode that takes some lessons learned from ugfx to build a more powerful and easier to use interface.</li> +<li>A standard interface for display lists and microcode overlays. +Easily allowing multiple different microcodes to seamless run on the RSP +and swap out with display list commands. This will be valuable for using +the RSP for audio and graphics at the same time.</li> +<li>A new 3D microcode that takes some lessons learned from ugfx to +build a more powerful and easier to use interface.</li> </ul> -<p>Overall this is an exciting time for Nintendo 64 homebrew development! It’s easier than ever to build homebrew on the N64 without knowing about the arcane innards of the console. I hope that this continued development of libdragon will bring more people to the scene and allow us to see new and novel games running on the N64. One project I would be excited to start working on is using the serial port on modern N64 Flashcarts for networking, allowing the N64 to have online multiplayer through a computer connected over USB. I feel that projects like this could really elevate the kind of content that is available on the N64 and bring it into the modern era.</p> +<p>Overall this is an exciting time for Nintendo 64 homebrew +development! It’s easier than ever to build homebrew on the N64 without +knowing about the arcane innards of the console. I hope that this +continued development of libdragon will bring more people to the scene +and allow us to see new and novel games running on the N64. One project +I would be excited to start working on is using the serial port on +modern N64 Flashcarts for networking, allowing the N64 to have online +multiplayer through a computer connected over USB. I feel that projects +like this could really elevate the kind of content that is available on +the N64 and bring it into the modern era.</p> </div> </div> </main> </body> diff --git a/html/notes/rasterizing-triangles.html b/html/notes/rasterizing-triangles.html index a7756ec..57ad95e 100644 --- a/html/notes/rasterizing-triangles.html +++ b/html/notes/rasterizing-triangles.html @@ -21,6 +21,7 @@ <div class="header-links"> <a href="/now.html" class="header-link">Now</a> <a href="/about.html" class="header-link">About</a> + <a rel="me" href="https://mastodon.social/@hazematman">Social</a> </div> </div> <main> @@ -42,42 +43,123 @@ <div class="note-divider"></div> <div class="main-container"> <div class="note-body"> -<p>Lately I’ve been trying to implement a software renderer <a href="https://www.cs.drexel.edu/~david/Classes/Papers/comp175-06-pineda.pdf">following the algorithm described by Juan Pineda in “A Parallel Algorithm for Polygon Rasterization”</a>. For those unfamiliar with the paper, it describes an algorithm to rasterize triangles that has an extremely nice quality, that you simply need to preform a few additions per pixel to see if the next pixel is inside the triangle. It achieves this quality by defining an edge function that has the following property:</p> +<p>Lately I’ve been trying to implement a software renderer <a +href="https://www.cs.drexel.edu/~david/Classes/Papers/comp175-06-pineda.pdf">following +the algorithm described by Juan Pineda in “A Parallel Algorithm for +Polygon Rasterization”</a>. For those unfamiliar with the paper, it +describes an algorithm to rasterize triangles that has an extremely nice +quality, that you simply need to preform a few additions per pixel to +see if the next pixel is inside the triangle. It achieves this quality +by defining an edge function that has the following property:</p> <pre><code>E(x+1,y) = E(x,y) + dY E(x,y+1) = E(x,y) - dX</code></pre> -<p>This property is extremely nice for a rasterizer as additions are quite cheap to preform and with this method we limit the amount of work we have to do per pixel. One frustrating quality of this paper is that it suggest that you can calculate more properties than just if a pixel is inside the triangle with simple addition, but provides no explanation for how to do that. In this blog I would like to explore how you implement a Pineda style rasterizer that can calculate per pixel values using simple addition.</p> +<p>This property is extremely nice for a rasterizer as additions are +quite cheap to preform and with this method we limit the amount of work +we have to do per pixel. One frustrating quality of this paper is that +it suggest that you can calculate more properties than just if a pixel +is inside the triangle with simple addition, but provides no explanation +for how to do that. In this blog I would like to explore how you +implement a Pineda style rasterizer that can calculate per pixel values +using simple addition.</p> <figure> -<img src="/assets/2022-04-03-rasterizing-triangles/Screenshot-from-2022-04-03-13-43-13.png" alt="Triangle rasterized using code in this post" /><figcaption aria-hidden="true">Triangle rasterized using code in this post</figcaption> +<img +src="/assets/2022-04-03-rasterizing-triangles/Screenshot-from-2022-04-03-13-43-13.png" +alt="Triangle rasterized using code in this post" /> +<figcaption aria-hidden="true">Triangle rasterized using code in this +post</figcaption> </figure> -<p>In order to figure out how build this rasterizer <a href="https://www.reddit.com/r/GraphicsProgramming/comments/tqxxmu/interpolating_values_in_a_pineda_style_rasterizer/">I reached out to the internet</a> to help build some more intuition on how the properties of this rasterizer. From this reddit post I gained more intuition on how we can use the edge function values to linear interpolate values on the triangle. Here is there relevant comment that gave me all the information I needed</p> +<p>In order to figure out how build this rasterizer <a +href="https://www.reddit.com/r/GraphicsProgramming/comments/tqxxmu/interpolating_values_in_a_pineda_style_rasterizer/">I +reached out to the internet</a> to help build some more intuition on how +the properties of this rasterizer. From this reddit post I gained more +intuition on how we can use the edge function values to linear +interpolate values on the triangle. Here is there relevant comment that +gave me all the information I needed</p> <blockquote> <p>Think about the edge function’s key property:</p> -<p><em>recognize that the formula given for E(x,y) is the same as the formula for the magnitude of the cross product between the vector from (X,Y) to (X+dX, Y+dY), and the vector from (X,Y) to (x,y). By the well known property of cross products, the magnitude is zero if the vectors are colinear, and changes sign as the vectors cross from one side to the other.</em></p> -<p>The magnitude of the edge distance is the area of the parallelogram formed by <code>(X,Y)->(X+dX,Y+dY)</code> and <code>(X,Y)->(x,y)</code>. If you normalize by the parallelogram area at the <em>other</em> point in the triangle you get a barycentric coordinate that’s 0 along the <code>(X,Y)->(X+dX,Y+dY)</code> edge and 1 at the other point. You can precompute each interpolated triangle parameter normalized by this area at setup time, and in fact most hardware computes per-pixel step values (pre 1/w correction) so that all the parameters are computed as a simple addition as you walk along each raster.</p> -<p>Note that when you’re implementing all of this it’s critical to keep all the math in the integer domain (snapping coordinates to some integer sub-pixel precision, I’d recommend at least 4 bits) and using a tie-breaking function (typically top-left) for pixels exactly on the edge to avoid pixel double-hits or gaps in adjacent triangles.</p> +<p><em>recognize that the formula given for E(x,y) is the same as the +formula for the magnitude of the cross product between the vector from +(X,Y) to (X+dX, Y+dY), and the vector from (X,Y) to (x,y). By the well +known property of cross products, the magnitude is zero if the vectors +are colinear, and changes sign as the vectors cross from one side to the +other.</em></p> +<p>The magnitude of the edge distance is the area of the parallelogram +formed by <code>(X,Y)->(X+dX,Y+dY)</code> and +<code>(X,Y)->(x,y)</code>. If you normalize by the parallelogram area +at the <em>other</em> point in the triangle you get a barycentric +coordinate that’s 0 along the <code>(X,Y)->(X+dX,Y+dY)</code> edge +and 1 at the other point. You can precompute each interpolated triangle +parameter normalized by this area at setup time, and in fact most +hardware computes per-pixel step values (pre 1/w correction) so that all +the parameters are computed as a simple addition as you walk along each +raster.</p> +<p>Note that when you’re implementing all of this it’s critical to keep +all the math in the integer domain (snapping coordinates to some integer +sub-pixel precision, I’d recommend at least 4 bits) and using a +tie-breaking function (typically top-left) for pixels exactly on the +edge to avoid pixel double-hits or gaps in adjacent triangles.</p> <p>https://www.reddit.com/r/GraphicsProgramming/comments/tqxxmu/interpolating_values_in_a_pineda_style_rasterizer/i2krwxj/</p> </blockquote> -<p>From this comment you can see that it is trivial to calculate to calculate the barycentric coordinates of the triangle from the edge function. You simply need to divide the the calculated edge function value by the area of parallelogram. Now what is the area of triangle? Well this is where some <a href="https://www.scratchapixel.com/lessons/3d-basic-rendering/ray-tracing-rendering-a-triangle/barycentric-coordinates">more research</a> online helped. If the edge function defines the area of a parallelogram (2 times the area of the triangle) of <code>(X,Y)->(X+dX,Y+dY)</code> and <code>(X,Y)->(x,y)</code>, and we calculate three edge function values (one for each edge), then we have 2 times the area of each of the sub triangles that are defined by our point.</p> +<p>From this comment you can see that it is trivial to calculate to +calculate the barycentric coordinates of the triangle from the edge +function. You simply need to divide the the calculated edge function +value by the area of parallelogram. Now what is the area of triangle? +Well this is where some <a +href="https://www.scratchapixel.com/lessons/3d-basic-rendering/ray-tracing-rendering-a-triangle/barycentric-coordinates">more +research</a> online helped. If the edge function defines the area of a +parallelogram (2 times the area of the triangle) of +<code>(X,Y)->(X+dX,Y+dY)</code> and <code>(X,Y)->(x,y)</code>, and +we calculate three edge function values (one for each edge), then we +have 2 times the area of each of the sub triangles that are defined by +our point.</p> <figure> -<img src="https://www.scratchapixel.com/images/ray-triangle/barycentric.png?" alt="Triangle barycentric coordinates from scratchpixel tutorial" /><figcaption aria-hidden="true">Triangle barycentric coordinates from scratchpixel tutorial</figcaption> +<img +src="https://www.scratchapixel.com/images/ray-triangle/barycentric.png?" +alt="Triangle barycentric coordinates from scratchpixel tutorial" /> +<figcaption aria-hidden="true">Triangle barycentric coordinates from +scratchpixel tutorial</figcaption> </figure> -<p>From this its trivial to see that we can calculate 2 times the area of the triangle just by adding up all the individual areas of the sub triangles (I used triangles here, but really we are adding the area of sub parallelograms to get the area of the whole parallelogram that has 2 times the area of the triangle we are drawing), that is adding the value of all the edge functions together. From this we can see to linear interpolate any value on the triangle we can use the following equation</p> +<p>From this its trivial to see that we can calculate 2 times the area +of the triangle just by adding up all the individual areas of the sub +triangles (I used triangles here, but really we are adding the area of +sub parallelograms to get the area of the whole parallelogram that has 2 +times the area of the triangle we are drawing), that is adding the value +of all the edge functions together. From this we can see to linear +interpolate any value on the triangle we can use the following +equation</p> <pre><code>Value(x,y) = (e0*v0 + e1*v1 + e2*v2) / (e0 + e1 + e2) Value(x,y) = (e0*v0 + e1*v1 + e2*v2) / area</code></pre> -<p>Where <code>e0, e1, e2</code> are the edge function values and <code>v0, v1, v2</code> are the per vertex values we want to interpolate.</p> -<p>This is great for the calculating the per vertex values, but we still haven’t achieved the property of calculating the interpolate value per pixel with simple addition. To do that we need to use the property of the edge function I described above</p> +<p>Where <code>e0, e1, e2</code> are the edge function values and +<code>v0, v1, v2</code> are the per vertex values we want to +interpolate.</p> +<p>This is great for the calculating the per vertex values, but we still +haven’t achieved the property of calculating the interpolate value per +pixel with simple addition. To do that we need to use the property of +the edge function I described above</p> <pre><code>Value(x+1, y) = (E0(x+1, y)*v0 + E1(x+1, y)*v1 + E2(x+1, y)*v2) / area Value(x+1, y) = ((e0+dY0)*v0 + (e1+dY1)*v1 + (e2+dY2)*v2) / area Value(x+1, y) = (e0*v0 + dY0*v0 + e1*v1+dY1*v1 + e2*v2 + dY2*v2) / area Value(x+1, y) = (e0*v0 + e1*v1 + e2*v2)/area + (dY0*v0 + dY1*v1 + dY2*v2)/area Value(x+1, y) = Value(x,y) + (dY0*v0 + dY1*v1 + dY2*v2)/area</code></pre> -<p>From here we can see that if we work through all the math, we can find this same property where the interpolated value is equal to the previous interpolated value plus some number. Therefore if we pre-compute this addition value, when we iterate over the pixels we only need to add this pre-computed number to the interpolated value of the previous pixel. We can repeat this process again to figure out the equation of the pre-computed value for <code>Value(x, y+1)</code> but I’ll save you the time and provide both equations below</p> +<p>From here we can see that if we work through all the math, we can +find this same property where the interpolated value is equal to the +previous interpolated value plus some number. Therefore if we +pre-compute this addition value, when we iterate over the pixels we only +need to add this pre-computed number to the interpolated value of the +previous pixel. We can repeat this process again to figure out the +equation of the pre-computed value for <code>Value(x, y+1)</code> but +I’ll save you the time and provide both equations below</p> <pre><code>dYV = (dY0*v0 + dY1*v1 + dY2*v2)/area dXV = (dX0*v0 + dX1*v1 + dX2*v2)/area Value(x+1, y) = Value(x,y) + dYV Value(x, y+1) = Value(x,y) - dXV</code></pre> -<p>Where <code>dY0, dY1, dY2</code> are the differences between y coordinates as described in Pineda’s paper, <code>dX0, dX1, dX2</code> are the differences in x coordinates as described in Pineda’s paper, and the area is the pre-calculated sum of the edge functions</p> -<p>Now you should be able to build a Pineda style rasterizer that can calculate per pixel interpolated values using simple addition, by following pseudo code like this:</p> +<p>Where <code>dY0, dY1, dY2</code> are the differences between y +coordinates as described in Pineda’s paper, <code>dX0, dX1, dX2</code> +are the differences in x coordinates as described in Pineda’s paper, and +the area is the pre-calculated sum of the edge functions</p> +<p>Now you should be able to build a Pineda style rasterizer that can +calculate per pixel interpolated values using simple addition, by +following pseudo code like this:</p> <pre><code>func edge(x, y, xi, yi, dXi, dYi) return (x - xi)*dYi - (y-yi)*dXi @@ -122,7 +204,14 @@ func draw_triangle(x0, y0, x1, y1, x2, y2, v0, v1, v2): starting_e1 = e1 starting_e2 = e2 starting_v = v</code></pre> -<p>Now this pseudo code is not the most efficient as it will iterate over the entire screen to draw one triangle, but it provides a starting basis to show how to use these Pineda properties to calculate per vertex values. One thing to note if you do implement this is, if you use fixed point arithmetic, be careful to insure you have enough precision to calculate all of these values with overflow or underflow. This was an issue I ran into running out of precision when I did the divide by the area.</p> +<p>Now this pseudo code is not the most efficient as it will iterate +over the entire screen to draw one triangle, but it provides a starting +basis to show how to use these Pineda properties to calculate per vertex +values. One thing to note if you do implement this is, if you use fixed +point arithmetic, be careful to insure you have enough precision to +calculate all of these values with overflow or underflow. This was an +issue I ran into running out of precision when I did the divide by the +area.</p> </div> </div> </main> </body> diff --git a/html/now.html b/html/now.html index 29efca1..b64a8b1 100644 --- a/html/now.html +++ b/html/now.html @@ -20,6 +20,7 @@ <div class="header-links"> <a href="/now.html" class="header-link">Now</a> <a href="/about.html" class="header-link">About</a> + <a rel="me" href="https://mastodon.social/@hazematman">Social</a> </div> </div> <main> @@ -32,15 +33,28 @@ <p>A roughly monthly update about my life</p> <blockquote> <h2 id="january-2023">January 2023</h2> -<p>Happy new year! This new year is especially exciting for me because I’m starting a new job! I’m happy to announce that I’m moving away from the world of proprietary software. I will now be working with the open source graphics stack at Igalia. I’ve joined their graphics team, and I’m super excited to start contributing back to the open source community that has really shaped my life since I started using Linux exclusively back around 2011</p> +<p>Happy new year! This new year is especially exciting for me because +I’m starting a new job! I’m happy to announce that I’m moving away from +the world of proprietary software. I will now be working with the open +source graphics stack at Igalia. I’ve joined their graphics team, and +I’m super excited to start contributing back to the open source +community that has really shaped my life since I started using Linux +exclusively back around 2011</p> </blockquote> <blockquote> <h2 id="november-2022">November 2022</h2> -<p>November was a really exciting month for me. My wife and I had our wedding and then we went away on our honeymoon for 10 days in France. A new nice long break from work was nice and let me focus on non technical things, like improving my health.</p> +<p>November was a really exciting month for me. My wife and I had our +wedding and then we went away on our honeymoon for 10 days in France. A +new nice long break from work was nice and let me focus on non technical +things, like improving my health.</p> </blockquote> <blockquote> <h2 id="october-2022">October 2022</h2> -<p>After reading about digital gardens, I’ve decided to convert my whole website into its very own digital garden. I wasn’t too happy with existing themes and tools I could find online so I decided to try and build a digital garden engine myself using pandoc and custom filters.</p> +<p>After reading about digital gardens, I’ve decided to convert my whole +website into its very own digital garden. I wasn’t too happy with +existing themes and tools I could find online so I decided to try and +build a digital garden engine myself using pandoc and custom +filters.</p> </blockquote> </div> </div> </main> diff --git a/templates/main.html b/templates/main.html index 8ca1656..7256daa 100644 --- a/templates/main.html +++ b/templates/main.html @@ -35,6 +35,7 @@ $endif$ $endif$ <a href="/now.html" class="header-link">Now</a> <a href="/about.html" class="header-link">About</a> + <a rel="me" href="https://mastodon.social/@hazematman">Social</a> </div> </div> <main> |