Fryzek Conceptshttps://fryzekconcepts.comLucas is a developer working on cool thingsWed, 03 Jul 2024 15:07:36 -0000Generating Videohttps://fryzekconcepts.com/notes/generating-video.html<p>One thing I’m very interested in is computer graphics. This could be complex 3D graphics or simple 2D graphics. The idea of getting a computer to display visual data fascinates me. One fundamental part of showing visual data is interfacing with a computer monitor. This can be accomplished by generating a video signal that the monitor understands. Below I have written instructions on how an FPGA can be used to generate a video signal. I have specifically worked with the iCEBreaker FPGA but the theory contained within this should work with any FPGA or device that you can generate the appropriate timings for.</p> <h3 id="tools">Tools</h3> <p>Hardware used (<a href="https://www.crowdsupply.com/1bitsquared/icebreaker-fpga">link for board</a>):</p> <ul> <li>iCEBreaker FPGA</li> <li>iCEBreaker 12-Bit DVI Pmod</li> </ul> <p>Software Used:</p> <ul> <li>IceStorm FPGA toolchain (<a href="https://github.com/esden/summon-fpga-tools">follow install instructions here</a>)</li> </ul> <h3 id="theory">Theory</h3> <p>A video signal is composed of several parts, primarily the colour signals and the sync signals. For this DVI Pmod, there is also a data enable signal for the visible screen area. For the example here we are going to be generating a 640x480 60 Hz video signal. Below is a table describing the important data for our video signal.</p> <table> <tbody> <tr> <td> Pixel Clock </td> <td> 25.175 MHz </td> </tr> <tr> <td> Pixels Per Line </td> <td> 800 Pixels </td> </tr> <tr> <td> Pixels Visible Per Line </td> <td> 640 Pixels </td> </tr> <tr> <td> Horizontal Sync Front Porch Length </td> <td> 16 Pixels </td> </tr> <tr> <td> Horizontal Sync Length </td> <td> 96 Pixels </td> </tr> <tr> <td> Horizontal Sync Back Porch Length </td> <td> 48 Pixels </td> </tr> <tr> <td> Lines Per Frame </td> <td> 525 Lines </td> </tr> <tr> <td> Lines Visible Per Frame </td> <td> 480 Lines </td> </tr> <tr> <td> Vertical Front Porch Length </td> <td> 10 Lines </td> </tr> <tr> <td> Vertical Sync Length </td> <td> 2 Lines </td> </tr> <tr> <td> Vertical Back Porch Length </td> <td> 33 Lines </td> </tr> </tbody> </table> <p>Sourced from http://www.tinyvga.com/vga-timing/640x480@60Hz</p> <p>The data from this table raises a few questions:</p> <ol type="1"> <li>What is the Pixel Clock?</li> <li>What is the difference between “Pixels/Lines” and “Visible Pixels/Lines”?</li> <li>What is “Front Porch”, “Sync”, and “Back Porch”?</li> </ol> <h4 id="pixel-clock">Pixel Clock</h4> <p>The pixel clock is a fairly straightforward idea; this is the rate at which we generate pixels. For video signal generation, the “pixel” is a fundamental building block and we count things in the number of pixels it takes up. Every time the pixel clock “ticks” we have incremented the number of pixels we have processed. So for a 640x480 video signal, a full line is 800 pixels, or 800 clock ticks. For the full 800x525 frame there is 800 ticks x 525 lines, or 420000 clock ticks. If we are running the display at 60 Hz, 420000 pixels per frame are generated 60 times per second. Therefore, 25200000 pixels or clock ticks will pass in one second. From this we can see the pixel clock frequency of 25.175 MHz is roughly equal to 25200000 clock ticks. There is a small deviance from the “true” values here, but monitors are flexible enough to accept this video signal (my monitor reports it as 640x480@60Hz), and all information I can find online says that 25.175 MHz is the value you want to use. Later on we will see that the pixel clock is not required to be exactly 25.175 Mhz.</p> <h4 id="visible-area-vs-invisible-area">Visible Area vs Invisible Area</h4> <p><img src="/assets/2020-04-07-generating-video/visible_invisible.png" /></p> <p>From the above image we can see that a 640x480 video signal actually generates a resolution larger than 640x480. The true resolution we generate is 800x525, but only a 640x480 portion of that signal is visible. The area that is not visible is where we generate the sync signal. In other words, every part of the above image that is black is where a sync signal is being generated.</p> <h4 id="front-porch-back-porch-sync">Front Porch, Back Porch &amp; Sync</h4> <p>To better understand the front porch, back porch and sync signal, let’s look at what the horizontal sync signal looks like during the duration of a line:</p> <p><img src="/assets/2020-04-07-generating-video/sync.png" /></p> <p>From this we can see that the “Front Porch” is the invisible pixels between the visible pixels and the sync pixels, and is represented by a logical one or high signal. The “Sync” is the invisible pixels between the front porch and back porch, and is represented by a logical zero or low signal. The “Back Porch” is the invisible pixels after the sync signal, and is represented by a logical one. For the case of 640x480 video, the visible pixel section lasts for 640 pixels. The front porch section lasts for 16 pixels, after which the sync signal will become a logical zero. This logical zero sync will last for 96 pixels, after which the sync signal will become a logical one again. The back porch will then last for 48 pixels. If you do a quick calculation right now of 640 + 16 + 96 + 48, we get 800 pixels which represents the full horizontal resolution of the display. The vertical sync signal works almost exactly the same, except the vertical sync signal acts on lines.</p> <h3 id="implementation">Implementation</h3> <p>The first thing we can do that is going to simplify a lot of the following logic is to keep track of which pixel, and which line we are on. The below code block creates two registers to keep track of the current pixel on the line (column) and the current line (line):</p> <div class="sourceCode" id="cb1"><pre class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">9</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> line<span class="op">;</span></span> <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">9</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> column<span class="op">;</span></span> <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span> <span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="kw">always</span> <span class="op">@(</span><span class="kw">posedge</span> clk <span class="dt">or</span> <span class="kw">posedge</span> reset<span class="op">)</span> <span class="kw">begin</span></span> <span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span><span class="op">(</span>reset <span class="op">==</span> <span class="dv">1</span><span class="op">)</span> <span class="kw">begin</span></span> <span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> line <span class="op">&lt;=</span> <span class="dv">0</span><span class="op">;</span></span> <span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> column <span class="op">&lt;=</span> <span class="dv">0</span><span class="op">;</span></span> <span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span> <span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> <span class="kw">begin</span></span> <span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span><span class="op">(</span>column <span class="op">==</span> <span class="dv">799</span> <span class="op">&amp;&amp;</span> line <span class="op">==</span> <span class="dv">524</span><span class="op">)</span> <span class="kw">begin</span></span> <span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> line <span class="op">&lt;=</span> <span class="dv">0</span><span class="op">;</span></span> <span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> column <span class="op">&lt;=</span> <span class="dv">0</span><span class="op">;</span></span> <span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span> <span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> <span class="kw">if</span><span class="op">(</span>column <span class="op">==</span> <span class="dv">799</span><span class="op">)</span> <span class="kw">begin</span></span> <span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a> line <span class="op">&lt;=</span> line <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span> <span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a> column <span class="op">&lt;=</span> <span class="dv">0</span><span class="op">;</span></span> <span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span> <span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> <span class="kw">begin</span></span> <span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a> column <span class="op">&lt;=</span> column <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span> <span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span> <span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span> <span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"></a><span class="kw">end</span></span></code></pre></div> <p>This block of Verilog works by first initializing the line and column register to zero on a reset. This is important to make sure that we start from known values, otherwise the line and column register could contain any value and our logic would not work. Next, we check if we are at the bottom of the screen by comparing the current column to 799 (the last pixel in the line) and the current line is 524 (the last line in the frame). If these conditions are both true then we reset the line and column back to zero to signify that we are starting a new frame. The next block checks if the current column equals 799. Because the above if statement failed,we know that we are at the end of the line but not the end of the frame. If this is true we increment the current line count and set the column back to zero to signify that we are starting a new line. The final block simply increments the current pixel count. If we reach this block ,we are neither at the end of the line or the end of the frame so we can simply increment to the next pixel.</p> <p>Now that we are keeping track of the current column and current line, we can use this information to generate the horizontal and vertical sync signals. From the theory above we know that the sync signal is only low when we are between the front and back porch, at all other times the signal is high. From this we can generate the sync signal with an OR and two compares.</p> <div class="sourceCode" id="cb2"><pre class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>logic horizontal_sync<span class="op">;</span></span> <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>logic vertical_sync<span class="op">;</span></span> <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> horizontal_sync <span class="op">=</span> column <span class="op">&lt;</span> <span class="dv">656</span> <span class="op">||</span> column <span class="op">&gt;=</span> <span class="dv">752</span><span class="op">;</span></span> <span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> vertical_sync <span class="op">=</span> line <span class="op">&lt;</span> <span class="dv">490</span> <span class="op">||</span> line <span class="op">&gt;=</span> <span class="dv">492</span><span class="op">;</span></span></code></pre></div> <p>Let’s examine the horizontal sync signal more closely. This statement will evaluate to true if the current column is less than 656 or the current column is greater than or equal to 752. This means that the horizontal sync signal will be true except for when the current column is between 656 and 751 inclusively. That is starting on column 656 the horizontal sync signal will become false (low) and will remain that way for the next 96 pixels until we reach pixel 752 where it will return to being true (high). The vertical sync signal will work in the same way except it is turned on based on the current line. Therefore, the signal will remain high when the line is less than 490 and greater than or equal to 492, and will remain low between lines 490 and 491 inclusive.</p> <h4 id="putting-it-all-together">Putting It All Together</h4> <p>Now that we have generated the video signal, we need to route it towards the video output connectors on the iCEBreaker 12-bit DVI Pmod. We also need to configure the iCEBreaker FPGA to have the appropriate pixel clock frequency. First to get the correct pixel clock we are going to use the following block of code:</p> <div class="sourceCode" id="cb3"><pre class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>SB_PLL40_PAD #<span class="op">(</span></span> <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> .DIVR<span class="op">(</span><span class="bn">4&#39;b0000</span><span class="op">),</span></span> <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> .DIVF<span class="op">(</span><span class="bn">7&#39;b1000010</span><span class="op">),</span></span> <span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> .DIVQ<span class="op">(</span><span class="bn">3&#39;b101</span><span class="op">),</span></span> <span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> .FILTER_RANGE<span class="op">(</span><span class="bn">3&#39;b001</span><span class="op">),</span></span> <span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> .FEEDBACK_PATH<span class="op">(</span><span class="st">&quot;SIMPLE&quot;</span><span class="op">),</span></span> <span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a> .DELAY_ADJUSTMENT_MODE_FEEDBACK<span class="op">(</span><span class="st">&quot;FIXED&quot;</span><span class="op">),</span></span> <span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a> .FDA_FEEDBACK<span class="op">(</span><span class="bn">4&#39;b0000</span><span class="op">),</span></span> <span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a> .DELAY_ADJUSTMENT_MODE_RELATIVE<span class="op">(</span><span class="st">&quot;FIXED&quot;</span><span class="op">),</span></span> <span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a> .FDA_RELATIVE<span class="op">(</span><span class="bn">4&#39;b0000</span><span class="op">),</span></span> <span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a> .SHIFTREG_DIV_MODE<span class="op">(</span><span class="bn">2&#39;b00</span><span class="op">),</span></span> <span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a> .PLLOUT_SELECT<span class="op">(</span><span class="st">&quot;GENCLK&quot;</span><span class="op">),</span></span> <span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a> .ENABLE_ICEGATE<span class="op">(</span><span class="bn">1&#39;b0</span><span class="op">)</span></span> <span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"></a><span class="op">)</span> usb_pll_inst <span class="op">(</span></span> <span id="cb3-15"><a href="#cb3-15" aria-hidden="true" tabindex="-1"></a> .PACKAGEPIN<span class="op">(</span>CLK<span class="op">),</span></span> <span id="cb3-16"><a href="#cb3-16" aria-hidden="true" tabindex="-1"></a> .PLLOUTCORE<span class="op">(</span>pixel_clock<span class="op">),</span></span> <span id="cb3-17"><a href="#cb3-17" aria-hidden="true" tabindex="-1"></a> .EXTFEEDBACK<span class="op">(),</span></span> <span id="cb3-18"><a href="#cb3-18" aria-hidden="true" tabindex="-1"></a> .DYNAMICDELAY<span class="op">(),</span></span> <span id="cb3-19"><a href="#cb3-19" aria-hidden="true" tabindex="-1"></a> .RESETB<span class="op">(</span><span class="bn">1&#39;b1</span><span class="op">),</span></span> <span id="cb3-20"><a href="#cb3-20" aria-hidden="true" tabindex="-1"></a> .BYPASS<span class="op">(</span><span class="bn">1&#39;b0</span><span class="op">),</span></span> <span id="cb3-21"><a href="#cb3-21" aria-hidden="true" tabindex="-1"></a> .LATCHINPUTVALUE<span class="op">(),</span></span> <span id="cb3-22"><a href="#cb3-22" aria-hidden="true" tabindex="-1"></a><span class="op">);</span></span></code></pre></div> <p>This block is mainly a copy paste of the PLL setup code from the iCEBreaker examples, but with a few important changes. The DIVR, DIVF, and DIVQ values are changed to create a 25.125 MHz. This is not exactly 25.175 MHz, but it is close enough that the monitor is happy enough and recognizes it as a 640x480@60 Hz signal. These values were found through the “icepll” utility, below is an example of calling this utility from the command line:</p> <pre><code>$ icepll -i 12 -o 25.175 F_PLLIN: 12.000 MHz (given) F_PLLOUT: 25.175 MHz (requested) F_PLLOUT: 25.125 MHz (achieved) FEEDBACK: SIMPLE F_PFD: 12.000 MHz F_VCO: 804.000 MHz DIVR: 0 (4&#39;b0000) DIVF: 66 (7&#39;b1000010) DIVQ: 5 (3&#39;b101) FILTER_RANGE: 1 (3&#39;b001)</code></pre> <p>From here we can see we had an input clock of 12 MHz (This comes from the FTDI chip on the iCEBreaker board), and we wanted to get a 25.175 MHz output clock. The closest the PLL could generate was a 25.125 MHz clock with the settings provided for the DIVR, DIVF, and DIVQ values.</p> <p>Now that we have a pixel clock we can wire up the necessary signals for the DVI video out. The DVI Pmod has the following mapping for all of its connectors:</p> <table> <tbody> <tr> <td> PMOD 1 </td> <td> </td> <td> PMOD 2 </td> <td> </td> </tr> <tr> <td> <strong>P1A1</strong> </td> <td> Red bit 4 </td> <td> <strong>P1B1</strong> </td> <td> Blue bit 4 </td> </tr> <tr> <td> <strong>P1A2</strong> </td> <td> Red bit 3 </td> <td> <strong>P1B2</strong> </td> <td> Pixel clock </td> </tr> <tr> <td> <strong>P1A3</strong> </td> <td> Green bit 4 </td> <td> <strong>P1B3</strong> </td> <td> Blue bit 3 </td> </tr> <tr> <td> <strong>P1A4</strong> </td> <td> Green bit 3 </td> <td> <strong>P1B4</strong> </td> <td> Horizontal Sync </td> </tr> <tr> <td> <strong>P1A7</strong> </td> <td> Red bit 2 </td> <td> <strong>P1B7</strong> </td> <td> Blue bit 2 </td> </tr> <tr> <td> <strong>P1A8</strong> </td> <td> Red bit 1 </td> <td> <strong>P1B8</strong> </td> <td> Blue bit 1 </td> </tr> <tr> <td> <strong>P1A9</strong> </td> <td> Green bit 2 </td> <td> <strong>P1B9</strong> </td> <td> Data Enable </td> </tr> <tr> <td> <strong>P1A10</strong> </td> <td> Green bit 1 </td> <td> <strong>P1B10</strong> </td> <td> Vertical Sync </td> </tr> </tbody> </table> <p>From this we can see that we need 4 bits for each colour channel, a horizontal sync signal, a vertical sync signal, and additionally a data enable signal. The data enable signal is not part of a standard video signal and is just used by the DVI transmitter chip on the Pmod to signify when we are in visible pixel area or invisible pixel area. Therefore we will set the Date enable line when the current column is less than 640 and the current line is less than 480. Based on this we can connect the outputs like so:</p> <div class="sourceCode" id="cb5"><pre class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">3</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> r<span class="op">;</span></span> <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">3</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> g<span class="op">;</span></span> <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">3</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> b<span class="op">;</span></span> <span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>logic data_enable<span class="op">;</span></span> <span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> data_enable <span class="op">=</span> column <span class="op">&lt;</span> <span class="dv">640</span> <span class="op">&amp;&amp;</span> line <span class="op">&lt;</span> <span class="dv">480</span><span class="op">;</span></span> <span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> <span class="op">{</span>P1A1<span class="op">,</span> P1A2<span class="op">,</span> P1A3<span class="op">,</span> P1A4<span class="op">,</span> P1A7<span class="op">,</span> P1A8<span class="op">,</span> P1A9<span class="op">,</span> P1A10<span class="op">}</span> <span class="op">=</span> </span> <span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a> <span class="op">{</span>r<span class="op">[</span><span class="dv">3</span><span class="op">],</span> r<span class="op">[</span><span class="dv">2</span><span class="op">],</span> g<span class="op">[</span><span class="dv">3</span><span class="op">],</span> g<span class="op">[</span><span class="dv">2</span><span class="op">],</span> r<span class="op">[</span><span class="dv">1</span><span class="op">],</span> r<span class="op">[</span><span class="dv">0</span><span class="op">],</span> g<span class="op">[</span><span class="dv">1</span><span class="op">],</span> g<span class="op">[</span><span class="dv">0</span><span class="op">]};</span></span> <span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> <span class="op">{</span>P1B1<span class="op">,</span> P1B2<span class="op">,</span> P1B3<span class="op">,</span> P1B4<span class="op">,</span> P1B7<span class="op">,</span> P1B8<span class="op">,</span> P1B9<span class="op">,</span> P1B10<span class="op">}</span> <span class="op">=</span> </span> <span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a> <span class="op">{</span>b<span class="op">[</span><span class="dv">3</span><span class="op">],</span> pixel_clock<span class="op">,</span> b<span class="op">[</span><span class="dv">2</span><span class="op">],</span> horizontal_sync<span class="op">,</span> b<span class="op">[</span><span class="dv">1</span><span class="op">],</span> b<span class="op">[</span><span class="dv">0</span><span class="op">],</span> data_enable<span class="op">,</span> vertical_sync<span class="op">};</span></span></code></pre></div> <p>Now for testing purposes we are going to set the output colour to be fixed to pure red so additional logic to pick a pixel colour is not required for this example. We can do this as shown below:</p> <div class="sourceCode" id="cb6"><pre class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> r <span class="op">=</span> <span class="bn">4&#39;b1111</span><span class="op">;</span></span> <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> g <span class="op">=</span> <span class="bn">4&#39;b0000</span><span class="op">;</span></span> <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> b <span class="op">=</span> <span class="bn">4&#39;b0000</span><span class="op">;</span></span></code></pre></div> <p>Putting all of the above code together with whatever additional inputs are required for the iCEBreaker FPGA gives us the following block of code:</p> <div class="sourceCode" id="cb7"><pre class="sourceCode verilog"><code class="sourceCode verilog"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> top</span> <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="op">(</span></span> <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="dt">input</span> CLK<span class="op">,</span></span> <span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a><span class="dt">output</span> LEDR_N<span class="op">,</span></span> <span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a><span class="dt">output</span> LEDG_N<span class="op">,</span></span> <span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a><span class="dt">input</span> BTN_N<span class="op">,</span></span> <span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a><span class="dt">output</span> P1A1<span class="op">,</span> P1A2<span class="op">,</span> P1A3<span class="op">,</span> P1A4<span class="op">,</span> P1A7<span class="op">,</span> P1A8<span class="op">,</span> P1A9<span class="op">,</span> P1A10<span class="op">,</span></span> <span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a><span class="dt">output</span> P1B1<span class="op">,</span> P1B2<span class="op">,</span> P1B3<span class="op">,</span> P1B4<span class="op">,</span> P1B7<span class="op">,</span> P1B8<span class="op">,</span> P1B9<span class="op">,</span> P1B10</span> <span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a><span class="op">);</span></span> <span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a></span> <span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a><span class="ot">`define PIXELS_PER_LINE 10&#39;d800</span></span> <span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a><span class="ot">`define PIXELS_VISIBLE_PER_LINE 10&#39;d640</span></span> <span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a><span class="ot">`define LINES_PER_FRAME 10&#39;d525</span></span> <span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a><span class="ot">`define LINES_VISIBLE_PER_FRAME 10&#39;d480</span></span> <span id="cb7-15"><a href="#cb7-15" aria-hidden="true" tabindex="-1"></a><span class="ot">`define HORIZONTAL_FRONTPORCH 10&#39;d656</span></span> <span id="cb7-16"><a href="#cb7-16" aria-hidden="true" tabindex="-1"></a><span class="ot">`define HORIZONTAL_BACKPORCH 10&#39;d752</span></span> <span id="cb7-17"><a href="#cb7-17" aria-hidden="true" tabindex="-1"></a><span class="ot">`define VERTICAL_FRONTPORCH 10&#39;d490</span></span> <span id="cb7-18"><a href="#cb7-18" aria-hidden="true" tabindex="-1"></a><span class="ot">`define VERTICAL_BACKPORCH 10&#39;d492</span></span> <span id="cb7-19"><a href="#cb7-19" aria-hidden="true" tabindex="-1"></a></span> <span id="cb7-20"><a href="#cb7-20" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">9</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> line<span class="op">;</span></span> <span id="cb7-21"><a href="#cb7-21" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">9</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> column<span class="op">;</span></span> <span id="cb7-22"><a href="#cb7-22" aria-hidden="true" tabindex="-1"></a>logic horizontal_sync<span class="op">;</span></span> <span id="cb7-23"><a href="#cb7-23" aria-hidden="true" tabindex="-1"></a>logic vertical_sync<span class="op">;</span></span> <span id="cb7-24"><a href="#cb7-24" aria-hidden="true" tabindex="-1"></a>logic data_enable<span class="op">;</span></span> <span id="cb7-25"><a href="#cb7-25" aria-hidden="true" tabindex="-1"></a>logic pixel_clock<span class="op">;</span></span> <span id="cb7-26"><a href="#cb7-26" aria-hidden="true" tabindex="-1"></a>logic reset<span class="op">;</span></span> <span id="cb7-27"><a href="#cb7-27" aria-hidden="true" tabindex="-1"></a></span> <span id="cb7-28"><a href="#cb7-28" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">3</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> r<span class="op">;</span></span> <span id="cb7-29"><a href="#cb7-29" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">3</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> g<span class="op">;</span></span> <span id="cb7-30"><a href="#cb7-30" aria-hidden="true" tabindex="-1"></a>logic <span class="op">[</span><span class="dv">3</span><span class="op">:</span><span class="dv">0</span><span class="op">]</span> b<span class="op">;</span></span> <span id="cb7-31"><a href="#cb7-31" aria-hidden="true" tabindex="-1"></a></span> <span id="cb7-32"><a href="#cb7-32" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> horizontal_sync <span class="op">=</span> column <span class="op">&lt;</span> <span class="op">(</span><span class="ot">`HORIZONTAL_FRONTPORCH</span><span class="op">)</span> <span class="op">||</span> column <span class="op">&gt;=</span> <span class="op">(</span><span class="ot">`HORIZONTAL_BACKPORCH</span><span class="op">);</span></span> <span id="cb7-33"><a href="#cb7-33" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> vertical_sync <span class="op">=</span> line <span class="op">&lt;</span> <span class="op">(</span><span class="ot">`VERTICAL_FRONTPORCH</span><span class="op">)</span> <span class="op">||</span> line <span class="op">&gt;=</span> <span class="op">(</span><span class="ot">`VERTICAL_BACKPORCH</span><span class="op">);</span></span> <span id="cb7-34"><a href="#cb7-34" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> data_enable <span class="op">=</span> <span class="op">(</span>column <span class="op">&lt;</span> <span class="ot">`PIXELS_VISIBLE_PER_LINE</span><span class="op">)</span> <span class="op">&amp;&amp;</span> <span class="op">(</span>line <span class="op">&lt;</span> <span class="ot">`LINES_VISIBLE_PER_FRAME</span><span class="op">);</span></span> <span id="cb7-35"><a href="#cb7-35" aria-hidden="true" tabindex="-1"></a></span> <span id="cb7-36"><a href="#cb7-36" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> reset <span class="op">=</span> <span class="op">~</span>BTN_N<span class="op">;</span></span> <span id="cb7-37"><a href="#cb7-37" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> LEDR_N <span class="op">=</span> <span class="dv">1</span><span class="op">;</span></span> <span id="cb7-38"><a href="#cb7-38" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> LEDG_N <span class="op">=</span> <span class="dv">1</span><span class="op">;</span></span> <span id="cb7-39"><a href="#cb7-39" aria-hidden="true" tabindex="-1"></a></span> <span id="cb7-40"><a href="#cb7-40" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> r <span class="op">=</span> <span class="bn">4&#39;b1111</span><span class="op">;</span></span> <span id="cb7-41"><a href="#cb7-41" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> g <span class="op">=</span> <span class="bn">4&#39;b0000</span><span class="op">;</span></span> <span id="cb7-42"><a href="#cb7-42" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> b <span class="op">=</span> <span class="bn">4&#39;b0000</span><span class="op">;</span></span> <span id="cb7-43"><a href="#cb7-43" aria-hidden="true" tabindex="-1"></a></span> <span id="cb7-44"><a href="#cb7-44" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> <span class="op">{</span>P1A1<span class="op">,</span> P1A2<span class="op">,</span> P1A3<span class="op">,</span> P1A4<span class="op">,</span> P1A7<span class="op">,</span> P1A8<span class="op">,</span> P1A9<span class="op">,</span> P1A10<span class="op">}</span> <span class="op">=</span> </span> <span id="cb7-45"><a href="#cb7-45" aria-hidden="true" tabindex="-1"></a> <span class="op">{</span>r<span class="op">[</span><span class="dv">3</span><span class="op">],</span> r<span class="op">[</span><span class="dv">2</span><span class="op">],</span> g<span class="op">[</span><span class="dv">3</span><span class="op">],</span> g<span class="op">[</span><span class="dv">2</span><span class="op">],</span> r<span class="op">[</span><span class="dv">1</span><span class="op">],</span> r<span class="op">[</span><span class="dv">0</span><span class="op">],</span> g<span class="op">[</span><span class="dv">1</span><span class="op">],</span> g<span class="op">[</span><span class="dv">0</span><span class="op">]};</span></span> <span id="cb7-46"><a href="#cb7-46" aria-hidden="true" tabindex="-1"></a><span class="kw">assign</span> <span class="op">{</span>P1B1<span class="op">,</span> P1B2<span class="op">,</span> P1B3<span class="op">,</span> P1B4<span class="op">,</span> P1B7<span class="op">,</span> P1B8<span class="op">,</span> P1B9<span class="op">,</span> P1B10<span class="op">}</span> <span class="op">=</span> </span> <span id="cb7-47"><a href="#cb7-47" aria-hidden="true" tabindex="-1"></a> <span class="op">{</span>b<span class="op">[</span><span class="dv">3</span><span class="op">],</span> pixel_clock<span class="op">,</span> b<span class="op">[</span><span class="dv">2</span><span class="op">],</span> horizontal_sync<span class="op">,</span> b<span class="op">[</span><span class="dv">1</span><span class="op">],</span> b<span class="op">[</span><span class="dv">0</span><span class="op">],</span> data_enable<span class="op">,</span> vertical_sync<span class="op">};</span></span> <span id="cb7-48"><a href="#cb7-48" aria-hidden="true" tabindex="-1"></a></span> <span id="cb7-49"><a href="#cb7-49" aria-hidden="true" tabindex="-1"></a><span class="co">// Pixel and line counter</span></span> <span id="cb7-50"><a href="#cb7-50" aria-hidden="true" tabindex="-1"></a><span class="kw">always</span> <span class="op">@(</span><span class="kw">posedge</span> pixel_clock <span class="dt">or</span> <span class="kw">posedge</span> reset<span class="op">)</span> <span class="kw">begin</span></span> <span id="cb7-51"><a href="#cb7-51" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span><span class="op">(</span>reset <span class="op">==</span> <span class="dv">1</span><span class="op">)</span> <span class="kw">begin</span></span> <span id="cb7-52"><a href="#cb7-52" aria-hidden="true" tabindex="-1"></a> line <span class="op">&lt;=</span> <span class="ot">`LINES_PER_FRAME</span> <span class="op">-</span> <span class="dv">2</span><span class="op">;</span></span> <span id="cb7-53"><a href="#cb7-53" aria-hidden="true" tabindex="-1"></a> column <span class="op">&lt;=</span> <span class="ot">`PIXELS_PER_LINE</span> <span class="op">-</span> <span class="dv">16</span><span class="op">;</span></span> <span id="cb7-54"><a href="#cb7-54" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span> <span id="cb7-55"><a href="#cb7-55" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> <span class="kw">begin</span></span> <span id="cb7-56"><a href="#cb7-56" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span><span class="op">(</span>column <span class="op">==</span> <span class="op">(</span><span class="ot">`PIXELS_PER_LINE</span> <span class="op">-</span> <span class="dv">1</span><span class="op">)</span> <span class="op">&amp;&amp;</span> line <span class="op">==</span> <span class="op">(</span><span class="ot">`LINES_PER_FRAME</span> <span class="op">-</span> <span class="dv">1</span><span class="op">))</span> <span class="kw">begin</span></span> <span id="cb7-57"><a href="#cb7-57" aria-hidden="true" tabindex="-1"></a> line <span class="op">&lt;=</span> <span class="dv">0</span><span class="op">;</span></span> <span id="cb7-58"><a href="#cb7-58" aria-hidden="true" tabindex="-1"></a> column <span class="op">&lt;=</span> <span class="dv">0</span><span class="op">;</span></span> <span id="cb7-59"><a href="#cb7-59" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span> <span id="cb7-60"><a href="#cb7-60" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> <span class="kw">if</span><span class="op">(</span>column <span class="op">==</span> <span class="ot">`PIXELS_PER_LINE</span> <span class="op">-</span> <span class="dv">1</span><span class="op">)</span> <span class="kw">begin</span></span> <span id="cb7-61"><a href="#cb7-61" aria-hidden="true" tabindex="-1"></a> line <span class="op">&lt;=</span> line <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span> <span id="cb7-62"><a href="#cb7-62" aria-hidden="true" tabindex="-1"></a> column <span class="op">&lt;=</span> <span class="dv">0</span><span class="op">;</span></span> <span id="cb7-63"><a href="#cb7-63" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span> <span id="cb7-64"><a href="#cb7-64" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> <span class="kw">begin</span></span> <span id="cb7-65"><a href="#cb7-65" aria-hidden="true" tabindex="-1"></a> column <span class="op">&lt;=</span> column <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span> <span id="cb7-66"><a href="#cb7-66" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span> <span id="cb7-67"><a href="#cb7-67" aria-hidden="true" tabindex="-1"></a> <span class="kw">end</span></span> <span id="cb7-68"><a href="#cb7-68" aria-hidden="true" tabindex="-1"></a><span class="kw">end</span></span> <span id="cb7-69"><a href="#cb7-69" aria-hidden="true" tabindex="-1"></a></span> <span id="cb7-70"><a href="#cb7-70" aria-hidden="true" tabindex="-1"></a>SB_PLL40_PAD #<span class="op">(</span></span> <span id="cb7-71"><a href="#cb7-71" aria-hidden="true" tabindex="-1"></a> .DIVR<span class="op">(</span><span class="bn">4&#39;b0000</span><span class="op">),</span></span> <span id="cb7-72"><a href="#cb7-72" aria-hidden="true" tabindex="-1"></a> .DIVF<span class="op">(</span><span class="bn">7&#39;b1000010</span><span class="op">),</span></span> <span id="cb7-73"><a href="#cb7-73" aria-hidden="true" tabindex="-1"></a> .DIVQ<span class="op">(</span><span class="bn">3&#39;b101</span><span class="op">),</span></span> <span id="cb7-74"><a href="#cb7-74" aria-hidden="true" tabindex="-1"></a> .FILTER_RANGE<span class="op">(</span><span class="bn">3&#39;b001</span><span class="op">),</span></span> <span id="cb7-75"><a href="#cb7-75" aria-hidden="true" tabindex="-1"></a> .FEEDBACK_PATH<span class="op">(</span><span class="st">&quot;SIMPLE&quot;</span><span class="op">),</span></span> <span id="cb7-76"><a href="#cb7-76" aria-hidden="true" tabindex="-1"></a> .DELAY_ADJUSTMENT_MODE_FEEDBACK<span class="op">(</span><span class="st">&quot;FIXED&quot;</span><span class="op">),</span></span> <span id="cb7-77"><a href="#cb7-77" aria-hidden="true" tabindex="-1"></a> .FDA_FEEDBACK<span class="op">(</span><span class="bn">4&#39;b0000</span><span class="op">),</span></span> <span id="cb7-78"><a href="#cb7-78" aria-hidden="true" tabindex="-1"></a> .DELAY_ADJUSTMENT_MODE_RELATIVE<span class="op">(</span><span class="st">&quot;FIXED&quot;</span><span class="op">),</span></span> <span id="cb7-79"><a href="#cb7-79" aria-hidden="true" tabindex="-1"></a> .FDA_RELATIVE<span class="op">(</span><span class="bn">4&#39;b0000</span><span class="op">),</span></span> <span id="cb7-80"><a href="#cb7-80" aria-hidden="true" tabindex="-1"></a> .SHIFTREG_DIV_MODE<span class="op">(</span><span class="bn">2&#39;b00</span><span class="op">),</span></span> <span id="cb7-81"><a href="#cb7-81" aria-hidden="true" tabindex="-1"></a> .PLLOUT_SELECT<span class="op">(</span><span class="st">&quot;GENCLK&quot;</span><span class="op">),</span></span> <span id="cb7-82"><a href="#cb7-82" aria-hidden="true" tabindex="-1"></a> .ENABLE_ICEGATE<span class="op">(</span><span class="bn">1&#39;b0</span><span class="op">)</span></span> <span id="cb7-83"><a href="#cb7-83" aria-hidden="true" tabindex="-1"></a><span class="op">)</span> usb_pll_inst <span class="op">(</span></span> <span id="cb7-84"><a href="#cb7-84" aria-hidden="true" tabindex="-1"></a> .PACKAGEPIN<span class="op">(</span>CLK<span class="op">),</span></span> <span id="cb7-85"><a href="#cb7-85" aria-hidden="true" tabindex="-1"></a> .PLLOUTCORE<span class="op">(</span>pixel_clock<span class="op">),</span></span> <span id="cb7-86"><a href="#cb7-86" aria-hidden="true" tabindex="-1"></a> .EXTFEEDBACK<span class="op">(),</span></span> <span id="cb7-87"><a href="#cb7-87" aria-hidden="true" tabindex="-1"></a> .DYNAMICDELAY<span class="op">(),</span></span> <span id="cb7-88"><a href="#cb7-88" aria-hidden="true" tabindex="-1"></a> .RESETB<span class="op">(</span><span class="bn">1&#39;b1</span><span class="op">),</span></span> <span id="cb7-89"><a href="#cb7-89" aria-hidden="true" tabindex="-1"></a> .BYPASS<span class="op">(</span><span class="bn">1&#39;b0</span><span class="op">),</span></span> <span id="cb7-90"><a href="#cb7-90" aria-hidden="true" tabindex="-1"></a> .LATCHINPUTVALUE<span class="op">(),</span></span> <span id="cb7-91"><a href="#cb7-91" aria-hidden="true" tabindex="-1"></a><span class="op">);</span></span> <span id="cb7-92"><a href="#cb7-92" aria-hidden="true" tabindex="-1"></a></span> <span id="cb7-93"><a href="#cb7-93" aria-hidden="true" tabindex="-1"></a><span class="kw">endmodule</span></span></code></pre></div> <p>To build this, you will require a .pcf file describing the pin mapping of the iCEBreaker board. I grabbed mine from the iCEBreaker examples <a href="https://raw.githubusercontent.com/icebreaker-fpga/icebreaker-examples/master/icebreaker.pcf">here</a>. Grab that file and put it in the same folder as the file for the code provided above. We can the run the following commands to generate a binary to program onto the FPGA:</p> <pre><code>yosys -ql out.log -p &#39;synth_ice40 -top top -json out.json&#39; top.sv nextpnr-ice40 --up5k --json out.json --pcf icebreaker.pcf --asc out.asc icetime -d up5k -mtr out.rpt out.asc icepack out.asc out.bin</code></pre> <p>This will generate an out.bin file that we will need to flash onto the board. Make sure your iCEBreaker FPGA is connected via USB to your computer and you can program it with the following commands.</p> <pre><code>iceprog out.bin</code></pre> <p>Now connect up a video cable (my DVI Pmod has an HDMI connector, but it only carries the DVI video signal) to the board and monitor and you should get results like this:</p> <p><img src="/assets/2020-04-07-generating-video/IMG_20200407_172119-1-1024x768.jpg" /></p> <p>You can also see from the monitor settings menu that the video signal was recognized as 640x480@60 Hz. Now the code presented in this post is specific to the iCEBreaker board and the DVI Pmod, but the theory can be applied to any FPGA and any connector that uses a video signal like this. For example you could wire up a DAC with a resistor ladder to generate a VGA signal. The logic for the timings here would be exactly the same if you wanted a 640x480@60 Hz VGA signal.</p> Mon, 06 Apr 2020 23:00:00 -0000https://fryzekconcepts.com/notes/generating-video.htmlN64Brew GameJam 2021https://fryzekconcepts.com/notes/n64brew-gamejam-2021.html<p>So this year, myself and two others decided to participate together in the N64Brew homebrew GameJam, where we were supposed to build a homebrew game that would run on a real Nintendo 64. The game jam took place from October 8th until December 8th and was the second GameJam in N64Brew history. Unfortunately, we never ended up finishing the game, but we did build a really cool tech demo. Our project was called “Bug Game”, and if you want to check it out you can find it <a href="https://hazematman.itch.io/bug-game">here</a>. To play the game you’ll need a flash cart to load it on a real Nintendo 64, or you can use an accurate emulator such as <a href="https://ares.dev/">ares</a> or <a href="https://github.com/n64dev/cen64">cen64</a>. The reason an accurate emulator is required is that we made use of this new open source 3D microcode for N64 called “<a href="https://github.com/snacchus/libdragon/tree/ugfx">ugfx</a>”, created by the user Snacchus. This microcode is part of the Libdragon project, which is trying to build a completely open source library and toolchain to build N64 games, instead of relying on the official SDK that has been leaked to the public through liquidation auctions of game companies that have shut down over the years.</p> <div class="gallery"> <p><img src="/assets/2021-12-10-n64brew-gamejam-2021/bug_1.png" /> <img src="/assets/2021-12-10-n64brew-gamejam-2021/bug_2.png" /> <img src="/assets/2021-12-10-n64brew-gamejam-2021/bug_4.png" /> <img src="/assets/2021-12-10-n64brew-gamejam-2021/bug_5.png" /> <img src="/assets/2021-12-10-n64brew-gamejam-2021/bug_3.png" /></p> <p>Screenshots of Bug Game</p> </div> <h2 id="libdragon-and-ugfx">Libdragon and UGFX</h2> <p>Ugfx was a brand new development in the N64 homebrew scene. By complete coincidence, Snacchus happened to release it on September 21st, just weeks before the GameJam was announced. There have been many attempts to create an open source 3D microcode for the N64 (my <a href="https://github.com/Hazematman/libhfx">libhfx</a> project included), but ugfx was the first project to complete with easily usable documentation and examples. This was an exciting development for the open source N64 brew community, as for the first time we could build 3D games that ran on the N64 without using the legally questionable official SDK. I jumped at the opportunity to use this and be one of the first fully 3D games running on Libdragon.</p> <p>One of the “drawbacks” of ufgx was that it tried to follow a lot of the design decisions the official 3D microcode for Nintendo used. This made it easier for people familiar with the official SDK to jump ship over to libdragon, but also went against the philosophy of the libdragon project to provide simple easy to use APIs. The Nintendo 64 was notoriously difficult to develop for, and one of the reasons for that was because of the extremely low level interface that the official 3D microcodes provided. Honestly writing 3D graphics code on the N64 reminds me more of writing a 3D OpenGL graphics driver (like I do in my day job), than building a graphics application. Unnecessarily increasing the burden of entry to developing 3D games on the Nintendo 64. Now that ugfx has been released, there is an ongoing effort in the community to revamp it and build a more user friendly API to access the 3D functionality of the N64.</p> <h2 id="ease-of-development">Ease of development</h2> <p>One of the major selling points of libdragon is that it tries to provide a standard toolchain with access to things like the c standard library as well as the c++ standard library. To save time on the development of bug game, I decided to put that claim to test. When building a 3D game from scratch two things that can be extremely time consuming are implementing linear algebra operations, and implementing physics that work in 3D. Luckily for modern developers, there are many open source libraries you can use instead of building these from scratch, like <a href="https://glm.g-truc.net/0.9.9/">GLM</a> for math operations and <a href="https://github.com/bulletphysics/bullet3">Bullet</a> for physics. I don’t believe anyone has tried to do this before, but knowing that libdragon provides a pretty standard c++ development environment I tried to build GLM and Bullet to run on the Nintendo 64 and I was successful! Both GLM and Bullet were able to run on real N64 hardware. This saved time during development as we were no longer concerned with having to build our own physics or math libraries. There were some tricks I needed to do to get bullet running on the hardware.</p> <p>First bullet will allocate more memory for its internal pools than is available on the N64. This is an easy fix as you can adjust the heap sizes when you go to initialize Bullet using the below code:</p> <div class="sourceCode" id="cb1"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>btDefaultCollisionConstructionInfo constructionInfo <span class="op">=</span> btDefaultCollisionConstructionInfo<span class="op">();</span></span> <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>constructionInfo<span class="op">.</span><span class="va">m_defaultMaxCollisionAlgorithmPoolSize</span> <span class="op">=</span> <span class="dv">512</span><span class="op">;</span></span> <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>constructionInfo<span class="op">.</span><span class="va">m_defaultMaxPersistentManifoldPoolSize</span> <span class="op">=</span> <span class="dv">512</span><span class="op">;</span></span> <span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>btDefaultCollisionConfiguration<span class="op">*</span> collisionConfiguration <span class="op">=</span> <span class="kw">new</span> btDefaultCollisionConfiguration<span class="op">(</span>constructionInfo<span class="op">);</span></span></code></pre></div> <p>This lets you modify the memory pools and specify a size in KB for the pools to use. The above code will limit the internal pools to 1MB, allowing us to easily run within the 4MB of RAM that is available on the N64 without the expansion pak (an accessory to the N64 that increases the available RAM to 8MB).</p> <p>The second issue I ran into with bullet was that the N64 floating point unit does not implement de-normalized floating point numbers. Now I’m not an expert in floating point numbers, but from my understanding, de-normalized numbers are a way to represent values between the smallest normal floating point number and zero. This allows floating point calculations to slowly fall towards zero in a more accurate way instead of rounding directly to zero. Since the N64 CPU does not implement de-normalized floats, if any calculations would have generated de-normalized float on the N64 they would instead cause a floating point exception. Because of the way the physics engine works, when two objects got very close together this would cause de-normalized floats to be generated and crash the FPU. This was a problem that had me stumped for a bit, I was concerned I would have to go into bullet’s source code and modify and calculations to round to zero if the result would be small enough. This would have been a monumental effort! Thankfully after digging through the NEC VR4300 programmer’s manual I was able to discover that there is a mode you can set the FPU to, which forces rounding towards zero if a de-normalized float would be generated. I enabled this mode and tested it out, and all my floating point troubles were resolved! I submitted a <a href="https://github.com/DragonMinded/libdragon/pull/195">pull request</a> (that was accepted) to the libdragon project to have this implemented by default, so no one else will run into the same annoying problems I ran into.</p> <h2 id="whats-next">What’s next?</h2> <p>If you decided to play our game you probably would have noticed that it’s not very much of a game. Even though this is the case I’m very happy with how the project turned out, as it’s one of the first 3D libdragon projects to be released. It also easily makes use of amazing open technologies like bullet physics, showcasing just how easy libdragon is to integrate with modern tools and libraries. As I mentioned before in this post there is an effort to take Snacchus’s work and build an easier to use graphics API that feels more like building graphics applications and less like building a graphics driver. The effort for that has already started and I plan to contribute to it. Some of the cool features this effort is bringing are:</p> <ul> <li>A standard interface for display lists and microcode overlays. Easily allowing multiple different microcodes to seamless run on the RSP and swap out with display list commands. This will be valuable for using the RSP for audio and graphics at the same time.</li> <li>A new 3D microcode that takes some lessons learned from ugfx to build a more powerful and easier to use interface.</li> </ul> <p>Overall this is an exciting time for Nintendo 64 homebrew development! It’s easier than ever to build homebrew on the N64 without knowing about the arcane innards of the console. I hope that this continued development of libdragon will bring more people to the scene and allow us to see new and novel games running on the N64. One project I would be excited to start working on is using the serial port on modern N64 Flashcarts for networking, allowing the N64 to have online multiplayer through a computer connected over USB. I feel that projects like this could really elevate the kind of content that is available on the N64 and bring it into the modern era.</p> Fri, 10 Dec 2021 00:00:00 -0000https://fryzekconcepts.com/notes/n64brew-gamejam-2021.htmlRasterizing Triangleshttps://fryzekconcepts.com/notes/rasterizing-triangles.html<p>Lately I’ve been trying to implement a software renderer <a href="https://www.cs.drexel.edu/~david/Classes/Papers/comp175-06-pineda.pdf">following the algorithm described by Juan Pineda in “A Parallel Algorithm for Polygon Rasterization”</a>. For those unfamiliar with the paper, it describes an algorithm to rasterize triangles that has an extremely nice quality, that you simply need to preform a few additions per pixel to see if the next pixel is inside the triangle. It achieves this quality by defining an edge function that has the following property:</p> <pre><code>E(x+1,y) = E(x,y) + dY E(x,y+1) = E(x,y) - dX</code></pre> <p>This property is extremely nice for a rasterizer as additions are quite cheap to preform and with this method we limit the amount of work we have to do per pixel. One frustrating quality of this paper is that it suggest that you can calculate more properties than just if a pixel is inside the triangle with simple addition, but provides no explanation for how to do that. In this blog I would like to explore how you implement a Pineda style rasterizer that can calculate per pixel values using simple addition.</p> <figure> <img src="/assets/2022-04-03-rasterizing-triangles/Screenshot-from-2022-04-03-13-43-13.png" alt="Triangle rasterized using code in this post" /> <figcaption aria-hidden="true">Triangle rasterized using code in this post</figcaption> </figure> <p>In order to figure out how build this rasterizer <a href="https://www.reddit.com/r/GraphicsProgramming/comments/tqxxmu/interpolating_values_in_a_pineda_style_rasterizer/">I reached out to the internet</a> to help build some more intuition on how the properties of this rasterizer. From this reddit post I gained more intuition on how we can use the edge function values to linear interpolate values on the triangle. Here is there relevant comment that gave me all the information I needed</p> <blockquote> <p>Think about the edge function’s key property:</p> <p><em>recognize that the formula given for E(x,y) is the same as the formula for the magnitude of the cross product between the vector from (X,Y) to (X+dX, Y+dY), and the vector from (X,Y) to (x,y). By the well known property of cross products, the magnitude is zero if the vectors are colinear, and changes sign as the vectors cross from one side to the other.</em></p> <p>The magnitude of the edge distance is the area of the parallelogram formed by <code>(X,Y)-&gt;(X+dX,Y+dY)</code> and <code>(X,Y)-&gt;(x,y)</code>. If you normalize by the parallelogram area at the <em>other</em> point in the triangle you get a barycentric coordinate that’s 0 along the <code>(X,Y)-&gt;(X+dX,Y+dY)</code> edge and 1 at the other point. You can precompute each interpolated triangle parameter normalized by this area at setup time, and in fact most hardware computes per-pixel step values (pre 1/w correction) so that all the parameters are computed as a simple addition as you walk along each raster.</p> <p>Note that when you’re implementing all of this it’s critical to keep all the math in the integer domain (snapping coordinates to some integer sub-pixel precision, I’d recommend at least 4 bits) and using a tie-breaking function (typically top-left) for pixels exactly on the edge to avoid pixel double-hits or gaps in adjacent triangles.</p> <p>https://www.reddit.com/r/GraphicsProgramming/comments/tqxxmu/interpolating_values_in_a_pineda_style_rasterizer/i2krwxj/</p> </blockquote> <p>From this comment you can see that it is trivial to calculate to calculate the barycentric coordinates of the triangle from the edge function. You simply need to divide the the calculated edge function value by the area of parallelogram. Now what is the area of triangle? Well this is where some <a href="https://www.scratchapixel.com/lessons/3d-basic-rendering/ray-tracing-rendering-a-triangle/barycentric-coordinates">more research</a> online helped. If the edge function defines the area of a parallelogram (2 times the area of the triangle) of <code>(X,Y)-&gt;(X+dX,Y+dY)</code> and <code>(X,Y)-&gt;(x,y)</code>, and we calculate three edge function values (one for each edge), then we have 2 times the area of each of the sub triangles that are defined by our point.</p> <figure> <img src="https://www.scratchapixel.com/images/ray-triangle/barycentric.png?" alt="Triangle barycentric coordinates from scratchpixel tutorial" /> <figcaption aria-hidden="true">Triangle barycentric coordinates from scratchpixel tutorial</figcaption> </figure> <p>From this its trivial to see that we can calculate 2 times the area of the triangle just by adding up all the individual areas of the sub triangles (I used triangles here, but really we are adding the area of sub parallelograms to get the area of the whole parallelogram that has 2 times the area of the triangle we are drawing), that is adding the value of all the edge functions together. From this we can see to linear interpolate any value on the triangle we can use the following equation</p> <pre><code>Value(x,y) = (e0*v0 + e1*v1 + e2*v2) / (e0 + e1 + e2) Value(x,y) = (e0*v0 + e1*v1 + e2*v2) / area</code></pre> <p>Where <code>e0, e1, e2</code> are the edge function values and <code>v0, v1, v2</code> are the per vertex values we want to interpolate.</p> <p>This is great for the calculating the per vertex values, but we still haven’t achieved the property of calculating the interpolate value per pixel with simple addition. To do that we need to use the property of the edge function I described above</p> <pre><code>Value(x+1, y) = (E0(x+1, y)*v0 + E1(x+1, y)*v1 + E2(x+1, y)*v2) / area Value(x+1, y) = ((e0+dY0)*v0 + (e1+dY1)*v1 + (e2+dY2)*v2) / area Value(x+1, y) = (e0*v0 + dY0*v0 + e1*v1+dY1*v1 + e2*v2 + dY2*v2) / area Value(x+1, y) = (e0*v0 + e1*v1 + e2*v2)/area + (dY0*v0 + dY1*v1 + dY2*v2)/area Value(x+1, y) = Value(x,y) + (dY0*v0 + dY1*v1 + dY2*v2)/area</code></pre> <p>From here we can see that if we work through all the math, we can find this same property where the interpolated value is equal to the previous interpolated value plus some number. Therefore if we pre-compute this addition value, when we iterate over the pixels we only need to add this pre-computed number to the interpolated value of the previous pixel. We can repeat this process again to figure out the equation of the pre-computed value for <code>Value(x, y+1)</code> but I’ll save you the time and provide both equations below</p> <pre><code>dYV = (dY0*v0 + dY1*v1 + dY2*v2)/area dXV = (dX0*v0 + dX1*v1 + dX2*v2)/area Value(x+1, y) = Value(x,y) + dYV Value(x, y+1) = Value(x,y) - dXV</code></pre> <p>Where <code>dY0, dY1, dY2</code> are the differences between y coordinates as described in Pineda’s paper, <code>dX0, dX1, dX2</code> are the differences in x coordinates as described in Pineda’s paper, and the area is the pre-calculated sum of the edge functions</p> <p>Now you should be able to build a Pineda style rasterizer that can calculate per pixel interpolated values using simple addition, by following pseudo code like this:</p> <pre><code>func edge(x, y, xi, yi, dXi, dYi) return (x - xi)*dYi - (y-yi)*dXi func draw_triangle(x0, y0, x1, y1, x2, y2, v0, v1, v2): dX0 = x0 - x2 dX1 = x1 - x0 dX2 = x2 - x1 dY0 = y0 - y2 dY1 = y1 - y0 dY2 = y2 - y1 start_x = 0 start_y = 0 e0 = edge(start_x, start_y, x0, y0, dX0, dY0) e1 = edge(start_x, start_y, x1, y1, dX1, dY1) e2 = edge(start_x, start_y, x2, y2, dX2, dY2) area = e0 + e1 + e2 dYV = (dY0*v0 + dY1*v1 + dY2*v2) / area dXV = (dX0*v0 + dX1*v1 + dX2*v2) / area v = (e0*v0 + e1*v1 + e2*v2) / area starting_e0 = e0 starting_e1 = e1 starting_e2 = e2 starting_v = v for y = 0 to screen_height: for x = 0 to screen_width: if(e0 &gt;= 0 &amp;&amp; e1 &gt;= 0 &amp;&amp; e2 &gt;= 0) draw_pixel(x, y, v) e0 = e0 + dY0 e1 = e1 + dY1 e2 = e2 + dY2 v = v + dYV e0 = starting_e0 - dX0 e1 = starting_e1 - dX1 e2 = starting_e2 - dX2 v = starting_v - dXV starting_e0 = e0 starting_e1 = e1 starting_e2 = e2 starting_v = v</code></pre> <p>Now this pseudo code is not the most efficient as it will iterate over the entire screen to draw one triangle, but it provides a starting basis to show how to use these Pineda properties to calculate per vertex values. One thing to note if you do implement this is, if you use fixed point arithmetic, be careful to insure you have enough precision to calculate all of these values with overflow or underflow. This was an issue I ran into running out of precision when I did the divide by the area.</p> Sat, 02 Apr 2022 23:00:00 -0000https://fryzekconcepts.com/notes/rasterizing-triangles.htmlBaremetal RISC-Vhttps://fryzekconcepts.com/notes/baremetal-risc-v.html<p>After re-watching suckerpinch’s <a href="https://www.youtube.com/watch?v=ar9WRwCiSr0">“Reverse Emulation”</a> video I got inspired to try and replicate what he did, but instead do it on an N64. Now my idea here is not to preform reverse emulation on the N64 itself but instead to use the SBC as a cheap way to make a dev focused flash cart. Seeing that sukerpinch was able to meet the timings of the NES bus made me think it might be possible to meet the N64 bus timings taking an approach similar to his.</p> <h2 id="why-risc-v-baremetal">Why RISC-V Baremetal?</h2> <p>The answer here is more utilitarian then idealistic, I originally wanted to use a Raspberry Pi since I thought that board may be more accessible if other people want to try and replicate this project. Instead what I found is that it is impossible to procure a Raspberry Pi. Not to be deterred I purchased a <a href="https://linux-sunxi.org/Allwinner_Nezha">“Allwinner Nezha”</a> a while back and its just been collecting dust in my storage. I figured this would be a good project to test the board out on since it has a large amount of RAM (1GB on my board), a fast processor (1 GHz), and accessible GPIO. As for why baremetal? Well one of the big problems suckerpinch ran into was being interrupted by the Linux kernel while his software was running. The board was fast enough to respond to the bus timings but Linux would throw off those timings with preemption. This is why I’m taking the approach to do everything baremetal. Giving 100% of the CPU time to my program emulating the CPU bus.</p> <h2 id="risc-v-baremetal-development">RISC-V Baremetal Development</h2> <p>Below I’ll document how I got a baremetal program running on the Nezha board, to provide guidance to anyone who wants to try doing something like this themselves.</p> <h3 id="toolchain-setup">Toolchain Setup</h3> <p>In order to do any RISC-V development we will need to setup a RISC-V toolchain that isn’t tied to a specific OS like linux. Thankfully the RISC-V org set up a simple to use git repo that has a script to build an entire RISC-V toolchain on your machine. Since you’re building the whole toolchain from source this will take some time on my machine (Ryzen 4500u, 16GB of RAM, 1TB PCIe NVMe storage), it took around ~30 minutes to build the whole tool chain. You can find the repo <a href="https://github.com/riscv-collab/riscv-gnu-toolchain">here</a>, and follow the instructions in the <code>Installation (Newlib)</code> section of the README. That will setup a bare bones OS independent toolchain that can use newlib for the cstdlib (not that I am currently using it in my software).</p> <h3 id="setting-up-a-program">Setting up a Program</h3> <p>This is probably one of the more complicated steps in baremetal programming as this will involve setting up a linker script, which can sometimes feel like an act of black magic to get right. I’ll try to walk through some linker script basics to show how I setup mine. The linker script <code>linker.ld</code> I’m using is below</p> <pre class="ld"><code>SECTIONS { . = 0x45000000; .text : { PROVIDE(__text_start = .); *(.text.start) *(.text*) . = ALIGN(4096); PROVIDE(__text_end = .); } .data : { PROVIDE(__data_start = .); . = ALIGN(16); *(.rodata*); *(.data .data.*) PROVIDE(__data_end = .); } . += 1024; PROVIDE(__stack_start = .); . = ALIGN(16); . += 4096; PROVIDE(__stack_end = .); /DISCARD/ : { *(.riscv.attributes); *(.comment); } }</code></pre> <p>The purpose of a linkscript is to describe how our binary will be organized, the script I wrote will do the follow</p> <ol type="1"> <li>Start the starting address offset to <code>0x45000000</code>, This is the address we are going to load the binary into memory, so any pointers in the program will need to be offset from this address</li> <li>start the binary off with the <code>.text</code> section which will contain the executable code, in the text section we want the code for <code>.text.start</code> to come first. this is the code that implements the “C runtime”. That is this is the code with the <code>_start</code> function that will setup the stack pointer and call into the C <code>main</code> function. After that we will place the text for all the other functions in our binary. We keep this section aligned to <code>4096</code> bytes, and the <code>PROVIDE</code> functions creates a symbol with a pointer to that location in memory. We won’t use the text start and end pointers in our program but it can be useful if you want to know stuff about your binary at runtime of your program</li> <li>Next is the <code>.data</code> section that has all the data for our program. Here you can see I also added the <code>rodata</code> or read only section to the data section. The reason I did this is because I’m not going to bother with properly implementing read only data. We also keep the data aligned to 16 bytes to ensure that every memory access will be aligned for a 64bit RISCV memory access.</li> <li>The last “section” is not a real section but some extra padding at the end to reserve the stack. Here I am reserving 4096 (4Kb) for the stack of my program.</li> <li>Lastly I’m going to discard a few sections that GCC will compile into the binary that I don’t need at all.</li> </ol> <p>Now this probably isn’t the best way to write a linker script. For example the stack is just kind of a hack in it, and I don’t implement the <code>.bss</code> section for zero initialized data.</p> <p>With this linker script we can now setup a basic program, we can use the code presented below as the <code>main.c</code> file</p> <div class="sourceCode" id="cb2"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im">&lt;stdint.h&gt;</span></span> <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span> <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="pp">#define UART0_BASE </span><span class="bn">0x02500000</span></span> <span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="pp">#define UART0_DATA_REG </span><span class="op">(</span>UART0_BASE<span class="pp"> </span><span class="op">+</span><span class="pp"> </span><span class="bn">0x0000</span><span class="op">)</span></span> <span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a><span class="pp">#define UART0_USR </span><span class="op">(</span>UART0_BASE<span class="pp"> </span><span class="op">+</span><span class="pp"> </span><span class="bn">0x007c</span><span class="op">)</span></span> <span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a></span> <span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="pp">#define write_reg</span><span class="op">(</span><span class="pp">r</span><span class="op">,</span><span class="pp"> v</span><span class="op">)</span><span class="pp"> write_reg_handler</span><span class="op">((</span><span class="dt">volatile</span><span class="pp"> </span><span class="dt">uint32_t</span><span class="op">*)(</span><span class="pp">r</span><span class="op">),</span><span class="pp"> </span><span class="op">(</span><span class="pp">v</span><span class="op">))</span></span> <span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> write_reg_handler<span class="op">(</span><span class="dt">volatile</span> <span class="dt">uint32_t</span> <span class="op">*</span>reg<span class="op">,</span> <span class="dt">const</span> <span class="dt">uint32_t</span> value<span class="op">)</span></span> <span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a><span class="op">{</span></span> <span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a> reg<span class="op">[</span><span class="dv">0</span><span class="op">]</span> <span class="op">=</span> value<span class="op">;</span></span> <span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span> <span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a></span> <span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a><span class="pp">#define read_reg</span><span class="op">(</span><span class="pp">r</span><span class="op">)</span><span class="pp"> read_reg_handler</span><span class="op">((</span><span class="dt">volatile</span><span class="pp"> </span><span class="dt">uint32_t</span><span class="op">*)(</span><span class="pp">r</span><span class="op">))</span></span> <span id="cb2-14"><a href="#cb2-14" aria-hidden="true" tabindex="-1"></a><span class="dt">uint32_t</span> read_reg_handler<span class="op">(</span><span class="dt">volatile</span> <span class="dt">uint32_t</span> <span class="op">*</span>reg<span class="op">)</span></span> <span id="cb2-15"><a href="#cb2-15" aria-hidden="true" tabindex="-1"></a><span class="op">{</span></span> <span id="cb2-16"><a href="#cb2-16" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> reg<span class="op">[</span><span class="dv">0</span><span class="op">];</span></span> <span id="cb2-17"><a href="#cb2-17" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span> <span id="cb2-18"><a href="#cb2-18" aria-hidden="true" tabindex="-1"></a></span> <span id="cb2-19"><a href="#cb2-19" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> _putchar<span class="op">(</span><span class="dt">char</span> c<span class="op">)</span></span> <span id="cb2-20"><a href="#cb2-20" aria-hidden="true" tabindex="-1"></a><span class="op">{</span></span> <span id="cb2-21"><a href="#cb2-21" aria-hidden="true" tabindex="-1"></a> <span class="cf">while</span><span class="op">((</span>read_reg<span class="op">(</span>UART0_USR<span class="op">)</span> <span class="op">&amp;</span> <span class="bn">0b10</span><span class="op">)</span> <span class="op">==</span> <span class="dv">0</span><span class="op">)</span></span> <span id="cb2-22"><a href="#cb2-22" aria-hidden="true" tabindex="-1"></a> <span class="op">{</span></span> <span id="cb2-23"><a href="#cb2-23" aria-hidden="true" tabindex="-1"></a> asm<span class="op">(</span><span class="st">&quot;nop&quot;</span><span class="op">);</span></span> <span id="cb2-24"><a href="#cb2-24" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span> <span id="cb2-25"><a href="#cb2-25" aria-hidden="true" tabindex="-1"></a></span> <span id="cb2-26"><a href="#cb2-26" aria-hidden="true" tabindex="-1"></a> write_reg<span class="op">(</span>UART0_DATA_REG<span class="op">,</span> c<span class="op">);</span></span> <span id="cb2-27"><a href="#cb2-27" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span> <span id="cb2-28"><a href="#cb2-28" aria-hidden="true" tabindex="-1"></a></span> <span id="cb2-29"><a href="#cb2-29" aria-hidden="true" tabindex="-1"></a><span class="dt">const</span> <span class="dt">char</span> <span class="op">*</span>hello_world <span class="op">=</span> <span class="st">&quot;Hello World!</span><span class="sc">\r\n</span><span class="st">&quot;</span><span class="op">;</span></span> <span id="cb2-30"><a href="#cb2-30" aria-hidden="true" tabindex="-1"></a></span> <span id="cb2-31"><a href="#cb2-31" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> main<span class="op">()</span></span> <span id="cb2-32"><a href="#cb2-32" aria-hidden="true" tabindex="-1"></a><span class="op">{</span></span> <span id="cb2-33"><a href="#cb2-33" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span><span class="op">(</span><span class="dt">const</span> <span class="dt">char</span> <span class="op">*</span>c <span class="op">=</span> hello_world<span class="op">;</span> c<span class="op">[</span><span class="dv">0</span><span class="op">]</span> <span class="op">!=</span> <span class="ch">&#39;</span><span class="sc">\0</span><span class="ch">&#39;</span><span class="op">;</span> c<span class="op">++)</span></span> <span id="cb2-34"><a href="#cb2-34" aria-hidden="true" tabindex="-1"></a> <span class="op">{</span></span> <span id="cb2-35"><a href="#cb2-35" aria-hidden="true" tabindex="-1"></a> _putchar<span class="op">(</span>c<span class="op">);</span></span> <span id="cb2-36"><a href="#cb2-36" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span> <span id="cb2-37"><a href="#cb2-37" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div> <p>This program will write the string “Hello World!” to the serial port. Now a common question for code like this is how did I know to set all the <code>UART0</code> registers? Well the way to find this information is to look at the datasheet, programmer’s manual, or user manual for the chip you are using. In this case we are using an Allwinner D1 and we can find the user manual with all the registers on the linux-sunxi page <a href="https://linux-sunxi.org/D1">here</a>. On pages 900 to 940 we can see a description on how the serial works for this SoC. I also looked at the schematic <a href="https://dl.linux-sunxi.org/D1/D1_Nezha_development_board_schematic_diagram_20210224.pdf">here</a>, to see that the serial port we have is wired to <code>UART0</code> on the SoC. From here we are relying on uboot to boot the board which will setup the serial port for us, which means we can just write to the UART data register to start printing content to the console.</p> <p>We will also need need to setup a basic assembly program to setup the stack and call our main function. Below you can see my example called <code>start.S</code></p> <div class="sourceCode" id="cb3"><pre class="sourceCode asm"><code class="sourceCode fasm"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>.<span class="bu">section</span> <span class="op">.</span>text<span class="op">.</span>start</span> <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> .global _start</span> <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="fu">_start:</span></span> <span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> la <span class="kw">sp</span><span class="op">,</span> __stack_start</span> <span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> j main</span></code></pre></div> <p>This assembly file just creates a section called <code>.text.start</code> and a global symbol for a function called <code>_start</code> which will be the first function our program executes. All this assembly file does is setup the stack pointer register <code>sp</code> to with the address (using the load address <code>la</code> pseudo instruction) to the stack we setup in the linker script, and then call the main function by jumping directly to it.</p> <h3 id="building-the-program">Building the Program</h3> <p>Building the program is pretty straight forward, we need to tell gcc to build the two source files without including the c standard library, and then to link the binary using our linker script. we can do this with the following command</p> <pre><code>riscv64-unknown-elf-gcc march=rv64g --std=gnu99 -msmall-data-limit=0 -c main.c riscv64-unknown-elf-gcc march=rv64g --std=gnu99 -msmall-data-limit=0 -c start.S riscv64-unknown-elf-gcc march=rv64g -march=rv64g -ffreestanding -nostdlib -msmall-data-limit=0 -T linker.ld start.o main.o -o app.elf riscv64-unknown-elf-objcopy -O binary app.elf app.bin</code></pre> <p>This will build our source files into <code>.o</code> files first, then combine those <code>.o</code> files into a <code>.elf</code> file, finally converting the <code>.elf</code> into a raw binary file where we use the <code>.bin</code> extension. We need a raw binary file as we want to just load our program into memory and begin executing. If we load the <code>.elf</code> file it will have the elf header and other extra data that is not executable in it. In order to run a <code>.elf</code> file we would need an elf loader, which goes beyond the scope of this example.</p> <h3 id="running-the-program">Running the Program</h3> <p>Now we have the raw binary its time to try and load it. I found that the uboot configuration that comes with the board has pretty limited support for loading binaries. So we are going to take advantage of the <code>loadx</code> command to load the binary over serial. In the uboot terminal we are going to run the command:</p> <pre><code>loadx 45000000</code></pre> <p>Now the next steps will depend on which serial terminal you are using. We want to use the <code>XMODEM</code> protocol to load the binary. In the serial terminal I am using <code>gnu screen</code> you can execute arbitrary programs and send their output to the serial terminal. You can do this by hitting the key combination “CTRL-A + :” and then typing in <code>exec !! sx app.bin</code>. This will send the binary to the serial terminal using the XMODEM protocol. If you are not using GNU screen look up instructions for how to send an XMODEM binary. Now that the binary is loaded we can type in</p> <pre><code>go 45000000</code></pre> <p>The should start to execute the program and you should see <code>Hello World!</code> printed to the console!</p> <p><img src="/assets/2022-06-09-baremetal-risc-v/riscv-terminal.png" /></p> <h2 id="whats-next">What’s Next?</h2> <p>Well the sky is the limit! We have a method to load and run a program that can do anything on the Nezha board now. Looking through the datasheet we can see how to access the GPIO on the board to blink an LED. If you’re really ambitious you could try getting ethernet or USB working in a baremetal environment. I am going to continue on my goal of emulating the N64 cartridge bus which will require me to get GPIO working as well as interrupts on the GPIO lines. If you want to see the current progress of my work you can check it out on github <a href="https://github.com/Hazematman/N64-Cart-Emulator">here</a>.</p> Wed, 08 Jun 2022 23:00:00 -0000https://fryzekconcepts.com/notes/baremetal-risc-v.htmlDigital Gardenhttps://fryzekconcepts.com/notes/digital_garden.html<p>After reading Maggie Appleton page on <a href="https://maggieappleton.com/garden-history">digital gardens</a> I was inspired to convert my own website into a digital garden.</p> <p>I have many half baked ideas that I seem to be able to finish. Some of them get to a published state like <a href="/notes/rasterizing-triangles.html">Rasterizing Triangles</a> and <a href="/notes/baremetal-risc-v.html">Baremetal RISC-V</a>, but many of them never make it to the published state. The idea of digital garden seems very appealing to me, as it encourages you to post on a topic even if you haven’t made it “publishable” yet.</p> <h2 id="how-this-site-works">How this site works</h2> <p>I wanted a bit of challenge when putting together this website as I don’t do a lot of web development in my day to day life, so I thought it would be a good way to learn more things. This site has been entirely built from scratch using a custom static site generator I setup with pandoc. It relies on pandoc’s filters to implement some of the classic “Digital Garden” features like back linking. The back linking feature has not been totally developed yet and right now it just provides with a convenient way to link to other notes or pages on this site.</p> <p>I hope to develop this section more and explain how I got various features in pandoc to work as a static site generator.</p> Sat, 29 Oct 2022 23:00:00 -0000https://fryzekconcepts.com/notes/digital_garden.html2022 Graphics Team Contributions at Igaliahttps://fryzekconcepts.com/notes/2022_igalia_graphics_team.html<p>This year I started a new job working with <a href="https://www.igalia.com/technology/graphics">Igalia’s Graphics Team</a>. For those of you who don’t know <a href="https://www.igalia.com/">Igalia</a> they are a <a href="https://en.wikipedia.org/wiki/Igalia">“worker-owned, employee-run cooperative model consultancy focused on open source software”</a>.</p> <p>As a new member of the team, I thought it would be a great idea to summarize the incredible amount of work the team completed in 2022. If you’re interested keep reading!</p> <h2 id="vulkan-1.2-conformance-on-rpi-4">Vulkan 1.2 Conformance on RPi 4</h2> <p>One of the big milestones for the team in 2022 was <a href="https://www.khronos.org/conformance/adopters/conformant-products#submission_694">achieving Vulkan 1.2 conformance on the Raspberry Pi 4</a>. The folks over at the Raspberry Pi company wrote a nice <a href="https://www.raspberrypi.com/news/vulkan-update-version-1-2-conformance-for-raspberry-pi-4/">article</a> about the achievement. Igalia has been partnering with the Raspberry Pi company to bring build and improve the graphics driver on all versions of the Raspberry Pi.</p> <p>The Vulkan 1.2 spec ratification came with a few <a href="https://registry.khronos.org/vulkan/specs/1.2-extensions/html/vkspec.html#versions-1.2">extensions</a> that were promoted to Core. This means a conformant Vulkan 1.2 driver needs to implement those extensions. Alejandro Piñeiro wrote this interesting <a href="https://blogs.igalia.com/apinheiro/2022/05/v3dv-status-update-2022-05-16/">blog post</a> that talks about some of those extensions.</p> <p>Vulkan 1.2 also came with a number of optional extensions such as <code>VK_KHR_pipeline_executable_properties</code>. My colleague Iago Toral wrote an excellent <a href="https://blogs.igalia.com/itoral/2022/05/09/vk_khr_pipeline_executables/">blog post</a> on how we implemented that extension on the Raspberry Pi 4 and what benefits it provides for debugging.</p> <h2 id="vulkan-1.3-support-on-turnip">Vulkan 1.3 support on Turnip</h2> <p>Igalia has been heavily supporting the Open-Source Turnip Vulkan driver for Qualcomm Adreno GPUs, and in 2022 we helped it achieve Vulkan 1.3 conformance. Danylo Piliaiev on the graphics team here at Igalia, wrote a great <a href="https://blogs.igalia.com/dpiliaiev/turnip-vulkan-1-3/">blog post</a> on this achievement! One of the biggest challenges for the Turnip driver is that it is a completely reverse-engineered driver that has been built without access to any hardware documentation or reference driver code.</p> <p>With Vulkan 1.3 conformance has also come the ability to run more commercial games on Adreno GPUs through the use of the DirectX translation layers. If you would like to see more of this check out this <a href="https://blogs.igalia.com/dpiliaiev/turnip-july-2022-update/">post</a> from Danylo where he talks about getting “The Witcher 3”, “The Talos Principle”, and “OMD2” running on the A660 GPU. Outside of Vulkan 1.3 support he also talks about some of the extensions that were implemented to allow “Zink” (the OpenGL over Vulkan driver) to run Turnip, and bring OpenGL 4.6 support to Adreno GPUs.</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/oVFWy25uiXA"></iframe></div></p> <h2 id="vulkan-extensions">Vulkan Extensions</h2> <p>Several developers on the Graphics Team made several key contributions to Vulkan Extensions and the Vulkan conformance test suite (CTS). My colleague Ricardo Garcia made an excellent <a href="https://rg3.name/202212122137.html">blog post</a> about those contributions. Below I’ve listed what Igalia did for each of the extensions:</p> <ul> <li>VK_EXT_image_2d_view_of_3d <ul> <li>We reviewed the spec and are listed as contributors to this extension</li> </ul></li> <li>VK_EXT_shader_module_identifier <ul> <li>We reviewed the spec, contributed to it, and created tests for this extension</li> </ul></li> <li>VK_EXT_attachment_feedback_loop_layout <ul> <li>We reviewed, created tests and contributed to this extension</li> </ul></li> <li>VK_EXT_mesh_shader <ul> <li>We contributed to the spec and created tests for this extension</li> </ul></li> <li>VK_EXT_mutable_descriptor_type <ul> <li>We reviewed the spec and created tests for this extension</li> </ul></li> <li>VK_EXT_extended_dynamic_state3 <ul> <li>We wrote tests and reviewed the spec for this extension</li> </ul></li> </ul> <h2 id="amdgpu-kernel-driver-contributions">AMDGPU kernel driver contributions</h2> <p>Our resident “Not an AMD expert” Melissa Wen made several contributions to the AMDGPU driver. Those contributions include connecting parts of the <a href="https://lore.kernel.org/amd-gfx/20220329201835.2393141-1-mwen@igalia.com/">pixel blending and post blending code in AMD’s <code>DC</code> module to <code>DRM</code></a> and <a href="https://lore.kernel.org/amd-gfx/20220804161349.3561177-1-mwen@igalia.com/">fixing a bug related to how panel orientation is set when a display is connected</a>. She also had a <a href="https://indico.freedesktop.org/event/2/contributions/50/">presentation at XDC 2022</a>, where she talks about techniques you can use to understand and debug AMDGPU, even when there aren’t hardware docs available.</p> <p>André Almeida also completed and submitted work on <a href="https://lore.kernel.org/dri-devel/20220714191745.45512-1-andrealmeid@igalia.com/">enabled logging features for the new GFXOFF hardware feature in AMD GPUs</a>. He also created a userspace application (which you can find <a href="https://gitlab.freedesktop.org/andrealmeid/gfxoff_tool">here</a>), that lets you interact with this feature through the <code>debugfs</code> interface. Additionally, he submitted a <a href="https://lore.kernel.org/dri-devel/20220929184307.258331-1-contact@emersion.fr/">patch</a> for async page flips (which he also talked about in his <a href="https://indico.freedesktop.org/event/2/contributions/61/">XDC 2022 presentation</a>) which is still yet to be merged.</p> <h2 id="modesetting-without-glamor-on-rpi">Modesetting without Glamor on RPi</h2> <p>Christopher Michael joined the Graphics Team in 2022 and along with Chema Casanova made some key contributions to enabling hardware acceleration and mode setting on the Raspberry Pi without the use of <a href="https://www.freedesktop.org/wiki/Software/Glamor/">Glamor</a> which allows making more video memory available to graphics applications running on a Raspberry Pi.</p> <p>The older generation Raspberry Pis (1-3) only have a maximum of 256MB of memory available for video memory, and using Glamor will consume part of that video memory. Christopher wrote an excellent <a href="https://blogs.igalia.com/cmichael/2022/05/30/modesetting-a-glamor-less-rpi-adventure/">blog post</a> on this work. Both him and Chema also had a joint presentation at XDC 2022 going into more detail on this work.</p> <h2 id="linux-format-magazine-column">Linux Format Magazine Column</h2> <p>Our very own Samuel Iglesias had a column published in Linux Format Magazine. It’s a short column about reaching Vulkan 1.1 conformance for v3dv &amp; Turnip Vulkan drivers, and how Open-Source GPU drivers can go from a “hobby project” to the defacto driver for the platform. Check it out on page 7 of <a href="https://linuxformat.com/linux-format-288.html">issue #288</a>!</p> <h2 id="xdc-2022">XDC 2022</h2> <p>X.Org Developers Conference is one of the big conferences for us here at the Graphics Team. Last year at XDC 2022 our Team presented 5 talks in Minneapolis, Minnesota. XDC 2022 took place towards the end of the year in October, so it provides some good context on how the team closed out the year. If you didn’t attend or missed their presentation, here’s a breakdown:</p> <h3 id="replacing-the-geometry-pipeline-with-mesh-shaders-ricardo-garcía"><a href="https://indico.freedesktop.org/event/2/contributions/48/">“Replacing the geometry pipeline with mesh shaders”</a> (Ricardo García)</h3> <p>Ricardo presents what exactly mesh shaders are in Vulkan. He made many contributions to this extension including writing 1000s of CTS tests for this extension with a <a href="https://rg3.name/202210222107.html">blog post</a> on his presentation that should check out!</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/aRNJ4xj_nDs"></iframe></div></p> <h3 id="status-of-vulkan-on-raspberry-pi-iago-toral"><a href="https://indico.freedesktop.org/event/2/contributions/68/">“Status of Vulkan on Raspberry Pi”</a> (Iago Toral)</h3> <p>Iago goes into detail about the current status of the Raspberry Pi Vulkan driver. He talks about achieving Vulkan 1.2 conformance, as well as some of the challenges the team had to solve due to hardware limitations of the Broadcom GPU.</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/GM9IojyzCVM"></iframe></div></p> <h3 id="enable-hardware-acceleration-for-gl-applications-without-glamor-on-xorg-modesetting-driver-jose-maría-casanova-christopher-michael"><a href="https://indico.freedesktop.org/event/2/contributions/60/">“Enable hardware acceleration for GL applications without Glamor on Xorg modesetting driver”</a> (Jose María Casanova, Christopher Michael)</h3> <p>Chema and Christopher talk about the challenges they had to solve to enable hardware acceleration on the Raspberry Pi without Glamor.</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/Bo_MOM7JTeQ"></iframe></div></p> <h3 id="im-not-an-amd-expert-but-melissa-wen"><a href="https://indico.freedesktop.org/event/2/contributions/50/">“I’m not an AMD expert, but…”</a> (Melissa Wen)</h3> <p>In this non-technical presentation, Melissa talks about techniques developers can use to understand and debug drivers without access to hardware documentation.</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/CMm-yhsMB7U"></iframe></div></p> <h3 id="async-page-flip-in-atomic-api-andré-almeida"><a href="https://indico.freedesktop.org/event/2/contributions/61/">“Async page flip in atomic API”</a> (André Almeida)</h3> <p>André talks about the work that has been done to enable asynchronous page flipping in DRM’s atomic API with an introduction to the topic by explaining about what exactly is asynchronous page flip, and why you would want it.</p> <p><div class="youtube-video"><iframe src="https://www.youtube.com/embed/qayPPIfrqtE"></iframe></div></p> <h2 id="fosdem-2022">FOSDEM 2022</h2> <p>Another important conference for us is FOSDEM, and last year we presented 3 of the 5 talks in the graphics dev room. FOSDEM took place in early February 2022, these talks provide some good context of where the team started in 2022.</p> <h3 id="the-status-of-turnip-driver-development-hyunjun-ko"><a href="https://archive.fosdem.org/2022/schedule/event/turnip/">The status of Turnip driver development</a> (Hyunjun Ko)</h3> <p>Hyunjun presented the current state of the Turnip driver, also talking about the difficulties of developing a driver for a platform without hardware documentation. He talks about how Turnip developers reverse engineer the behaviour of the hardware, and then implement that in an open-source driver. He also made a companion <a href="https://blogs.igalia.com/zzoon/graphics/mesa/2022/02/21/complement-story/">blog post</a> to checkout along with his presentation.</p> <h3 id="v3dv-status-update-for-open-source-vulkan-driver-for-raspberry-pi-4-alejandro-piñeiro"><a href="https://archive.fosdem.org/2022/schedule/event/v3dv/">v3dv: Status Update for Open Source Vulkan Driver for Raspberry Pi 4</a> (Alejandro Piñeiro)</h3> <p>Igalia has been presenting the status of the v3dv driver since December 2019 and in this presentation, Alejandro talks about the status of the v3dv driver in early 2022. He talks about achieving conformance, the extensions that had to be implemented, and the future plans of the v3dv driver.</p> <h3 id="fun-with-border-colors-in-vulkan-ricardo-garcia"><a href="https://archive.fosdem.org/2022/schedule/event/vulkan_borders/">Fun with border colors in Vulkan</a> (Ricardo Garcia)</h3> <p>Ricardo presents about the work he did on the <code>VK_EXT_border_color_swizzle</code> extension in Vulkan. He talks about the specific contributions he did and how the extension fits in with sampling color operations in Vulkan.</p> <h2 id="gsoc-igalia-ce">GSoC &amp; Igalia CE</h2> <p>Last year Melissa &amp; André co-mentored contributors working on introducing KUnit tests to the AMD display driver. This project was hosted as a <a href="https://summerofcode.withgoogle.com/">“Google Summer of Code” (GSoC)</a> project from the X.Org Foundation. If you’re interested in seeing their work Tales da Aparecida, Maíra Canal, Magali Lemes, and Isabella Basso presented their work at the <a href="https://lpc.events/event/16/contributions/1310/">Linux Plumbers Conference 2022</a> and across two talks at XDC 2022. Here you can see their <a href="https://indico.freedesktop.org/event/2/contributions/65/">first</a> presentation and here you can see their <a href="https://indico.freedesktop.org/event/2/contributions/164/">second</a> second presentation.</p> <p>André &amp; Melissa also mentored two <a href="https://www.igalia.com/coding-experience/">“Igalia Coding Experience” (CE)</a> projects, one related to IGT GPU test tools on the VKMS kernel driver, and the other for IGT GPU test tools on the V3D kernel driver. If you’re interested in reading up on some of that work, Maíra Canal <a href="https://mairacanal.github.io/january-update-finishing-my-igalia-ce/">wrote about her experience</a> being part of the Igalia CE.</p> <p>Ella Stanforth was also part of the Igalia Coding Experience, being mentored by Iago &amp; Alejandro. They worked on the <code>VK_KHR_sampler_ycbcr_conversion</code> extension for the v3dv driver. Alejandro talks about their work in his <a href="https://blogs.igalia.com/apinheiro/2023/01/v3dv-status-update-2023-01/">blog post here</a>.</p> <h1 id="whats-next">What’s Next?</h1> <p>The graphics team is looking forward to having a jam-packed 2023 with just as many if not more contributions to the Open-Source graphics stack! I’m super excited to be part of the team, and hope to see my name in our 2023 recap post!</p> <p>Also, you might have heard that <a href="https://www.igalia.com/2022/xdc-2023">Igalia will be hosting XDC 2023</a> in the beautiful city of A Coruña! We hope to see you there where there will be many presentations from all the great people working on the Open-Source graphics stack, and most importantly where you can <a href="https://www.youtube.com/watch?v=7hWcu8O9BjM">dream in the Atlantic!</a></p> <figure> <img src="https://www.igalia.com/assets/i/news/XDC-event-banner.jpg" alt="Photo of A Coruña" /> <figcaption aria-hidden="true">Photo of A Coruña</figcaption> </figure> Thu, 02 Feb 2023 00:00:00 -0000https://fryzekconcepts.com/notes/2022_igalia_graphics_team.htmlGlobal Game Jam 2023 - GI Jamhttps://fryzekconcepts.com/notes/global_game_jam_2023.html<p>At the beginning of this month I participated in the Games Institutes’s Global Game Jam event. <a href="https://uwaterloo.ca/games-institute/">The Games Institute</a> is an organization at my local university (The University of Waterloo) that focuses on games-based research. They host a game jam every school term and this term’s jam happened to coincide with the Global Game Jam. Since this event was open to everyone (and it’s been a few years since I’ve been a student at UW 👴️), I joined up to try and stretch some of my more creative muscles. The event was a 48-hour game jam that began on Friday, February 3rd and ended on Sunday,February 5th.</p> <p>The game we created is called <a href="https://globalgamejam.org/2023/games/turtle-roots-5">Turtle Roots</a>, and it is a simple resource management game. You play as a magical turtle floating through the sky and collecting water in order to survive. The turtle can spend some of its “nutrients” to grow roots which will allow it to gather water and collect more nutrients. The challenge in the game is trying to survive for as long as possible without running out of water.</p> <div class="gallery"> <p><img src="/assets/global_game_jam_2023/screen_shot_1.png" /> <img src="/assets/global_game_jam_2023/screen_shot_2.png" /> <img src="/assets/global_game_jam_2023/screen_shot_3.png" /></p> <p>Screenshots of Turtle Roots</p> </div> <p>The game we created is called <a href="https://globalgamejam.org/2023/games/turtle-roots-5">Turtle Roots</a>, and it is a simple resource management game. You play as a magical turtle floating through the sky and collecting water in order to survive. The turtle can spend some of its “nutrients” to grow roots which will allow it to gather water and collect more nutrients. The challenge in the game is trying to survive for as long as possible without running out of water.</p> <h2 id="the-team">The Team</h2> <p>I attended the event solo and quickly partnered up with two other people, who also attended solo. One member had already participated in a game jam before and specialized in art. The other member was attending a game jam for the first time and was looking for the best way they could contribute. Having particular skills for sound, they ended up creating all the audio in our game. This left me as the sole programmer for our team.</p> <h2 id="my-game-jam-experiences">My Game Jam Experiences</h2> <p>In recent years,I participated in a <a href="/notes/n64brew-gamejam-2021.html">Nintendo 64 homebrew game jam</a> and the Puerto Rico Game Developers Association event for the global game jam, submitting <a href="https://globalgamejam.org/2022/games/magnetic-parkour-6">Magnetic Parkour</a>, I also participated in <a href="https://ldjam.com/">Ludum Dare</a> back around 2013 but unfortunately I’ve since lost the link to my submission. While in high school, my friend and I participated in the “Ottawa Tech Jame” (similar to a game jam), sort of worked like a game jam called “Ottawa Tech Jam” submitting <a href="http://www.fastquake.com/projects/zorvwarz/">Zorv Warz</a> and <a href="http://www.fastquake.com/projects/worldseed/">E410</a>. As you can probably tell, I really like gamedev. The desire to build my own video games is actually what originally got me into programming. When I was around 14 years old, I picked up a C++ programming book from the library since I wanted to try to build my own game and I heard most game developers use C++. I used some proprietary game development library (that I can’t recall the name of)to build 2D and 3D games in Windows using C++. I didn’t really get too far into it until high school when I started to learn SFML, SDL, and OpenGL. I also dabbled with Unity during that time as well. However,I’ve always had a strong desire to build most of the foundation of the game myself without using an engine. You can see this desire really come out in the work I did for Zorv Warz, E410, and the N64 homebrew game jam. When working with a team, I feel it can be a lot easier to use a game engine, even if it doesn’t scratch the same itch for me.</p> <h2 id="the-tech-behind-the-game">The Tech Behind the Game</h2> <p>Lately I’ve had a growing interest in the game engine called <a href="https://godotengine.org/">Godot</a>, and wanted to use this opportunity to learn the engine more and build a game in it. Godot is interesting to me as its a completely open source game engine, and as you can probably guess from my <a href="/notes/2022_igalia_graphics_team.html">job</a>, open source software as well as free software is something I’m particularly interested in.</p> <p>Godot is a really powerful game engine that handles a lot of complexity for you. For example,it has a built in parallax background component, that we took advantage of to add more depth to our game. This allows you to control the background scrolling speed for different layer of the background, giving the illusion of depth in a 2D game.</p> <p>Another powerful feature of Godot is its physics engine. Godot makes it really easy to create physics objects in your scene and have them do interesting stuff. You might be wondering where physics comes into play in our game, and we actually use it for the root animations. I set up a sort of “rag doll” system for the roots to make them flop around in the air as the player moves, really giving a lot more “life” to an otherwise static game.</p> <p>Godot has a built in scripting language called “GDScript” which is very similar to Python. I’ve really grown to like this language. It has an optional type system you can take advantage of that helps with reducing the number of bugs that exist in your game. It also has great connectivity with the editor. This proved useful as I could “export” variables in the game and allow my team members to modify certain parameters of the game without knowing any programming. This is super helpful with balancing, and more easily allows non-technical members of team to contribute to the game logic in a more concrete way.</p> <p>Overall I’m very happy with how our game turned out. Last year I tried to participate in a few more game jams, but due to a combination of lack of personal motivation, poor team dynamics, and other factors, none of those game jams panned out. This was the first game jam in a while where I feel like I really connected with my team and I also feel like we made a super polished and fun game in the end.</p> Sat, 11 Feb 2023 00:00:00 -0000https://fryzekconcepts.com/notes/global_game_jam_2023.htmlJourney Through Freedrenohttps://fryzekconcepts.com/notes/freedreno_journey.html<figure> <img src="/assets/freedreno/glinfo_freedreno.png" alt="Android running Freedreno" /> <figcaption aria-hidden="true">Android running Freedreno</figcaption> </figure> <p>As part of my training at Igalia I’ve been attempting to write a new backend for Freedreno that targets the proprietary “KGSL” kernel mode driver. For those unaware there are two “main” kernel mode drivers on Qualcomm SOCs for the GPU, there is the “MSM”, and “KGSL”. “MSM” is DRM compliant, and Freedreno already able to run on this driver. “KGSL” is the proprietary KMD that Qualcomm’s proprietary userspace driver targets. Now why would you want to run freedreno against KGSL, when MSM exists? Well there are a few ones, first MSM only really works on an up-streamed kernel, so if you have to run a down-streamed kernel you can continue using the version of KGSL that the manufacturer shipped with your device. Second this allows you to run both the proprietary adreno driver and the open source freedreno driver on the same device just by swapping libraries, which can be very nice for quickly testing something against both drivers.</p> <h2 id="when-drm-isnt-just-drm">When “DRM” isn’t just “DRM”</h2> <p>When working on a new backend, one of the critical things to do is to make use of as much “common code” as possible. This has a number of benefits, least of all reducing the amount of code you have to write. It also allows reduces the number of bugs that will likely exist as you are relying on well tested code, and it ensures that the backend is mostly likely going to continue to work with new driver updates.</p> <p>When I started the work for a new backend I looked inside mesa’s <code>src/freedreno/drm</code> folder. This has the current backend code for Freedreno, and its already modularized to support multiple backends. It currently has support for the above mentioned MSM kernel mode driver as well as virtio (a backend that allows Freedreno to be used from within in a virtualized environment). From the name of this path, you would think that the code in this module would only work with kernel mode drivers that implement DRM, but actually there is only a handful of places in this module where DRM support is assumed. This made it a good starting point to introduce the KGSL backend and piggy back off the common code.</p> <p>For example the <code>drm</code> module has a lot of code to deal with the management of synchronization primitives, buffer objects, and command submit lists. All managed at a abstraction above “DRM” and to re-implement this code would be a bad idea.</p> <h2 id="how-to-get-android-to-behave">How to get Android to behave</h2> <p>One of this big struggles with getting the KGSL backend working was figuring out how I could get Android to load mesa instead of Qualcomm blob driver that is shipped with the device image. Thankfully a good chunk of this work has already been figured out when the Turnip developers (Turnip is the open source Vulkan implementation for Adreno GPUs) figured out how to get Turnip running on android with KGSL. Thankfully one of my coworkers <a href="https://blogs.igalia.com/dpiliaiev/">Danylo</a> is one of those Turnip developers, and he gave me a lot of guidance on getting Android setup. One thing to watch out for is the outdated instructions <a href="https://docs.mesa3d.org/android.html">here</a>. These instructions <em>almost</em> work, but require some modifications. First if you’re using a more modern version of the Android NDK, the compiler has been replaced with LLVM/Clang, so you need to change which compiler is being used. Second flags like <code>system</code> in the cross compiler script incorrectly set the system as <code>linux</code> instead of <code>android</code>. I had success using the below cross compiler script. Take note that the compiler paths need to be updated to match where you extracted the android NDK on your system.</p> <pre class="meson"><code>[binaries] ar = &#39;/home/lfryzek/Documents/projects/igalia/freedreno/android-ndk-r25b-linux/android-ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-ar&#39; c = [&#39;ccache&#39;, &#39;/home/lfryzek/Documents/projects/igalia/freedreno/android-ndk-r25b-linux/android-ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android29-clang&#39;] cpp = [&#39;ccache&#39;, &#39;/home/lfryzek/Documents/projects/igalia/freedreno/android-ndk-r25b-linux/android-ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android29-clang++&#39;, &#39;-fno-exceptions&#39;, &#39;-fno-unwind-tables&#39;, &#39;-fno-asynchronous-unwind-tables&#39;, &#39;-static-libstdc++&#39;] c_ld = &#39;lld&#39; cpp_ld = &#39;lld&#39; strip = &#39;/home/lfryzek/Documents/projects/igalia/freedreno/android-ndk-r25b-linux/android-ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-strip&#39; # Android doesn&#39;t come with a pkg-config, but we need one for Meson to be happy not # finding all the optional deps it looks for. Use system pkg-config pointing at a # directory we get to populate with any .pc files we want to add for Android pkgconfig = [&#39;env&#39;, &#39;PKG_CONFIG_LIBDIR=/home/lfryzek/Documents/projects/igalia/freedreno/android-ndk-r25b-linux/android-ndk-r25b/pkgconfig:/home/lfryzek/Documents/projects/igalia/freedreno/install-android/lib/pkgconfig&#39;, &#39;/usr/bin/pkg-config&#39;] [host_machine] system = &#39;android&#39; cpu_family = &#39;arm&#39; cpu = &#39;armv8&#39; endian = &#39;little&#39;</code></pre> <p>Another thing I had to figure out with Android, that was different with these instructions, was how I would get Android to load mesa versions of mesa libraries. That’s when my colleague <a href="https://www.igalia.com/team/mark">Mark</a> pointed out to me that Android is open source and I could just check the source code myself. Sure enough you have find the OpenGL driver loader in <a href="https://android.googlesource.com/platform/frameworks/native/+/master/opengl/libs/EGL/Loader.cpp">Android’s source code</a>. From this code we can that Android will try to load a few different files based on some settings, and in my case it would try to load 3 different shaded libraries in the <code>/vendor/lib64/egl</code> folder, <code>libEGL_adreno.so</code> ,<code>libGLESv1_CM_adreno.so</code>, and <code>libGLESv2.so</code>. I could just replace these libraries with the version built from mesa and voilà, you’re now loading a custom driver! This realization that I could just “read the code” was very powerful in debugging some more android specific issues I ran into, like dealing with gralloc.</p> <p>Something cool that the opensource Freedreno &amp; Turnip driver developers figured out was getting android to run test OpenGL applications from the adb shell without building android APKs. If you check out the <a href="https://gitlab.freedesktop.org/freedreno/freedreno">freedreno repo</a>, they have an <code>ndk-build.sh</code> script that can build tests in the <code>tests-*</code> folder. The nice benefit of this is that it provides an easy way to run simple test cases without worrying about the android window system integration. Another nifty feature about this repo is the <code>libwrap</code> tool that lets trace the commands being submitted to the GPU.</p> <h2 id="what-even-is-gralloc">What even is Gralloc?</h2> <p>Gralloc is the graphics memory allocated in Android, and the OS will use it to allocate the surface for “windows”. This means that the memory we want to render the display to is managed by gralloc and not our KGSL backend. This means we have to get all the information about this surface from gralloc, and if you look in <code>src/egl/driver/dri2/platform_android.c</code> you will see existing code for handing gralloc. You would think “Hey there is no work for me here then”, but you would be wrong. The handle gralloc provides is hardware specific, and the code in <code>platform_android.c</code> assumes a DRM gralloc implementation. Thankfully the turnip developers had already gone through this struggle and if you look in <code>src/freedreno/vulkan/tu_android.c</code> you can see they have implemented a separate path when a Qualcomm msm implementation of gralloc is detected. I could copy this detection logic and add a separate path to <code>platform_android.c</code>.</p> <h2 id="working-with-the-freedreno-community">Working with the Freedreno community</h2> <p>When working on any project (open-source or otherwise), it’s nice to know that you aren’t working alone. Thankfully the <code>#freedreno</code> channel on <code>irc.oftc.net</code> is very active and full of helpful people to answer any questions you may have. While working on the backend, one area I wasn’t really sure how to address was the synchronization code for buffer objects. The backend exposed a function called <code>cpu_prep</code>, This function was just there to call the DRM implementation of <code>cpu_prep</code> on the buffer object. I wasn’t exactly sure how to implement this functionality with KGSL since it doesn’t use DRM buffer objects.</p> <p>I ended up reaching out to the IRC channel and Rob Clark on the channel explained to me that he was actually working on moving a lot of the code for <code>cpu_prep</code> into common code so that a non-drm driver (like the KGSL backend I was working on) would just need to implement that operation as NOP (no operation).</p> <h2 id="dealing-with-bugs-reverse-engineering-the-blob">Dealing with bugs &amp; reverse engineering the blob</h2> <p>I encountered a few different bugs when implementing the KGSL backend, but most of them consisted of me calling KGSL wrong, or handing synchronization incorrectly. Thankfully since Turnip is already running on KGSL, I could just more carefully compare my code to what Turnip is doing and figure out my logical mistake.</p> <p>Some of the bugs I encountered required the backend interface in Freedreno to be modified to expose per a new per driver implementation of that backend function, instead of just using a common implementation. For example the existing function to map a buffer object into userspace assumed that the same <code>fd</code> for the device could be used for the buffer object in the <code>mmap</code> call. This worked fine for any buffer objects we created through KGSL but would not work for buffer objects created from gralloc (remember the above section on surface memory for windows comming from gralloc). To resolve this issue I exposed a new per backend implementation of “map” where I could take a different path if the buffer object came from gralloc.</p> <p>While testing the KGSL backend I did encounter a new bug that seems to effect both my new KGSL backend and the Turnip KGSL backend. The bug is an <code>iommu fault</code> that occurs when the surface allocated by gralloc does not have a height that is aligned to 4. The blitting engine on a6xx GPUs copies in 16x4 chunks, so if the height is not aligned by 4 the GPU will try to write to pixels that exists outside the allocated memory. This issue only happens with KGSL backends since we import memory from gralloc, and gralloc allocates exactly enough memory for the surface, with no alignment on the height. If running on any other platform, the <code>fdl</code> (Freedreno Layout) code would be called to compute the minimum required size for a surface which would take into account the alignment requirement for the height. The blob driver Qualcomm didn’t seem to have this problem, even though its getting the exact same buffer from gralloc. So it must be doing something different to handle the none aligned height.</p> <p>Because this issue relied on gralloc, the application needed to running as an Android APK to get a surface from gralloc. The best way to fix this issue would be to figure out what the blob driver is doing and try to replicate this behavior in Freedreno (assuming it isn’t doing something silly like switch to sysmem rendering). Unfortunately it didn’t look like the libwrap library worked to trace an APK.</p> <p>The libwrap library relied on a linux feature known as <code>LD_PRELOAD</code> to load <code>libwrap.so</code> when the application starts and replace the system functions like <code>open</code> and <code>ioctl</code> with their own implementation that traces what is being submitted to the KGSL kernel mode driver. Thankfully android exposes this <code>LD_PRELOAD</code> mechanism through its “wrap” interface where you create a propety called <code>wrap.&lt;app-name&gt;</code> with a value <code>LD_PRELOAD=&lt;path to libwrap.so&gt;</code>. Android will then load your library like would be done in a normal linux shell. If you tried to do this with libwrap though you find very quickly that you would get corrupted traces. When android launches your APK, it doesn’t only launch your application, there are different threads for different android system related functions and some of them can also use OpenGL. The libwrap library is not designed to handle multiple threads using KGSL at the same time. After discovering this issue I created a <a href="https://gitlab.freedesktop.org/freedreno/freedreno/-/merge_requests/22">MR</a> that would store the tracing file handles as TLS (thread local storage) preventing the clobbering of the trace file, and also allowing you to view the traces generated by different threads separately from each other.</p> <p>With this is in hand one could begin investing what the blob driver is doing to handle this unaligned surfaces.</p> <h2 id="whats-next">What’s next?</h2> <p>Well the next obvious thing to fix is the aligned height issue which is still open. I’ve also worked on upstreaming my changes with this <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21570">WIP MR</a>.</p> <figure> <img src="/assets/freedreno/3d-mark.png" alt="Freedreno running 3d-mark" /> <figcaption aria-hidden="true">Freedreno running 3d-mark</figcaption> </figure> Tue, 28 Feb 2023 00:00:00 -0000https://fryzekconcepts.com/notes/freedreno_journey.htmlIgalia’s Mesa 23.1 Contributions - Behind the Sceneshttps://fryzekconcepts.com/notes/mesa_23_1_contributions_behind_the_scenes.html<p>It’s an exciting time for Mesa as its next major release is unveiled this week. Igalia has played an important role in this milestone, with Eric Engestrom managing the release and 11 other Igalians contributing over 110 merge requests. A sample of these contributions are detailed below.</p> <h2 id="radv-implement-vk.check_status">radv: Implement vk.check_status</h2> <p>As part of an effort to enhance the reliability of GPU resets on amdgpu, Tony implemented a GPU reset notification feature in the RADV Vulkan driver. This new function improves the robustness of the RADV driver. The driver can now check if the GPU has been reset by a userspace application, allowing the driver to recover their contexts, exit, or engage in some other appropriate action.</p> <p>You can read more about Tony’s changes in the link below</p> <ul> <li><a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22253">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22253</a></li> </ul> <h2 id="turnip-kgsl-backend-rewrite">turnip: KGSL backend rewrite</h2> <p>With a goal of improving feature parity of the KGSL kernel mode driver with its drm counterpart, Mark has been rewriting the backend for KGSL. These changes leverage the new, common backend Vulkan infrastructure inside Mesa and fix multiple bugs. In addition, they introduce support for importing/exporting sync FDs, pre-signalled fences, and timeline semaphore support.</p> <p>If you’re interested in taking a deeper dive into Mark’s changes, you can read the following MR:</p> <ul> <li><a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21651">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21651</a></li> </ul> <h2 id="turnip-a7xx-preparation-transition-to-c">turnip: a7xx preparation, transition to C++</h2> <p>Danylo has adopted a significant role for two major changes inside turnip: 1)contributing to the effort to migrate turnip to C++ and 2)supporting the next generation a7xx Adreno GPUs from Qualcomm. A more detailed overview of Danylo’s changes can be found in the linked MRs below:</p> <ul> <li><a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21931">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21931</a></li> <li><a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22148">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22148</a></li> </ul> <h2 id="v3dv3dv-various-fixes-cts-conformance">v3d/v3dv various fixes &amp; CTS conformance</h2> <p>Igalia maintains the v3d OpenGL driver and v3dv Vulkan drive for broadcom videocore GPUs which can be found on devices such as the Raspberry Pi. Iago, Alex and Juan have combined their expertise to implement multiple fixes for both the v3d gallium driver and the v3dv vulkan driver on the Raspberry Pi. These changes include CPU performance optimizations, support for 16-bit floating point vertex attributes, and raising support in the driver to OpenGL 3.1 level functionality. This Igalian trio has also been addressing fixes for conformance issues raised in the Vulkan 1.3.5 conformance test suite (CTS).</p> <p>You can dive into some of their Raspberry Pi driver changes here:</p> <ul> <li><a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22131">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22131</a></li> <li><a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21361">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21361</a></li> <li><a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20787">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20787</a></li> </ul> <h2 id="ci-build-system-and-cleanup">ci, build system, and cleanup</h2> <p>In addition to managing the 23.1 release, Eric has also implemented many fixes in Mesa’s infrastructure. He has assisted with addressing a number of CI issues within Mesa on various drivers from v3d to panfrost. Eric also dedicated part of his time to general clean-up of the Mesa code (e.g. removing duplicate functions, fixing and improving the meson-based build system, and removing dead code).</p> <p>If you’re interested in seeing some of his work, check out some of the MRs below:</p> <ul> <li><a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22410">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22410</a></li> <li><a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21504">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21504</a></li> <li><a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21558">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21558</a></li> <li><a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20180">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20180</a></li> </ul> Wed, 10 May 2023 23:00:00 -0000https://fryzekconcepts.com/notes/mesa_23_1_contributions_behind_the_scenes.htmlConverting from 3D to 2Dhttps://fryzekconcepts.com/notes/converting_from_3d_to_2d.html<p>Recently I’ve been working on a project where I needed to convert an application written in OpenGL to a software renderer. The matrix transformation code in OpenGL made use of the GLM library for matrix math, and I needed to convert the 4x4 matrices to be 3x3 matrices to work with the software renderer. There was some existing code to do this that was broken, and looked something like this:</p> <pre><code>glm::mat3 mat3x3 = glm::mat3(mat4x4);</code></pre> <p>Don’t worry if you don’t see the problem already, I’m going to illustrate in more detail with the example of a translation matrix. In 3D a standard translation matrix to translate by a vector <code>(x, y, z)</code> looks something like this:</p> <pre><code>[1 0 0 x] [0 1 0 y] [0 0 1 z] [0 0 0 1]</code></pre> <p>Then when we multiply this matrix by a vector like <code>(a, b, c, 1)</code> the result is <code>(a + x, b + y, c + z, 1)</code>. If you don’t understand why the matrix is 4x4 or why we have that extra 1 at the end don’t worry, I’ll explain that in more detail later.</p> <p>Now using the existing conversion code to get a 3x3 matrix will simply take the first 3 columns and first 3 rows of the matrix and produce a 3x3 matrix from those. Converting the translation matrix above using this code produces the following matrix:</p> <pre><code>[1 0 0] [0 1 0] [0 0 1]</code></pre> <p>See the problem now? The <code>(x, y, z)</code> values disappeared! In the conversion process we lost these critical values from the translation matrix, and now if we multiply by this matrix nothing will happen since we are just left with the identity matrix. So if we can’t use this simple “cast” function in GLM, what can we use?</p> <p>Well one thing we can do is preserve the last column and last row of the matrix. So assume we have a 4x4 matrix like this:</p> <pre><code>[a b c d] [e f g h] [i j k l] [m n o p]</code></pre> <p>Then preserving the last row and column we should get a matrix like this:</p> <pre><code>[a b d] [e f h] [m n p]</code></pre> <p>And if we use this conversion process for the same translation matrix we will get:</p> <pre><code>[1 0 x] [0 1 y] [0 0 1]</code></pre> <p>Now we see that the <code>(x, y)</code> part of the translation is preserved, and if we try to multiply this matrix by the vector <code>(a, b, 1)</code> the result will be <code>(a + x, b + y, 1)</code>. The translation is preserved in the conversion!</p> <h2 id="why-do-we-have-to-use-this-conversion">Why do we have to use this conversion?</h2> <p>The reason the conversion is more complicated is hidden in how we defined the translation matrix and vector we wanted to translate. The vector was actually a 4D vector with the final component set to 1. The reason we do this is that we actually want to represent an affine space instead of just a vector space. An affine space being a type of space where you can have both points and vectors. A point is exactly what you would expect it to be just a point in space from some origin, and vector is a direction with magnitude but no origin. This is important because strictly speaking translation isn’t actually defined for vectors in a normal vector space. Additionally if you try to construct a matrix to represent translation for a vector space you’ll find that its impossible to derive a matrix to do this and that operation is not a linear function. On the other hand operations like translation are well defined in an affine space and do what you would expect.</p> <p>To get around the problem of vector spaces, mathematicians more clever than I figured out you can implement an affine space in a normal vector space by increasing the dimension of the vector space by one, and by adding an extra row and column to the transformation matrices used. They called this a <strong>homogeneous coordinate system</strong>. This lets you say that a vector is actually just a point if the 4th component is 1, but if its 0 its just a vector. Using this abstraction one can implement all the well defined operations for an affine space (like translation!).</p> <p>So using the “homogeneous coordinate system” abstraction, translation is an operation that defined by taking a point and moving it by a vector. Lets look at how that works with the translation matrix I used as an example above. If you multiply that matrix by a 4D vector where the 4th component is 0, it will just return the same vector. Now if we multiply by a 4D vector where the 4th component is 1, it will return the point translated by the vector we used to construct that translation matrix. This implements the translation operation as its defined in an affine space!</p> <p>If you’re interested in understanding more about homogeneous coordinate spaces, (like how the translation matrix is derived in the first place) I would encourage you to look at resources like <a href="https://books.google.ca/books/about/Mathematics_for_Computer_Graphics_Applic.html?id=YmQy799flPkC&amp;redir_esc=y">“Mathematics for Computer Graphics Applications”</a>. They provide a much more detailed explanation than I am providing here. (The homogeneous coordinate system also has some benefits for representing projections which I won’t get into here, but are explained in that text book.)</p> <p>Now to finally answer the question about why we needed to preserve those final columns and vectors. Based on what we now know, we weren’t actually just converting from a “3D space” to a “2D space” we were converting from a “3D homogeneous space” to a “2D homogeneous space”. The process of converting from a higher dimension matrix to a lower dimensional matrix is lossy and some transformation details are going to be lost in process (like for example the translation along the z-axis). There is no way to tell what kind of space a given matrix is supposed to transform just by looking at the matrix itself. The matrix does not carry any information about about what space its operating in and any conversion function would need to know that information to properly convert that matrix. Therefore we need develop our own conversion function that preserves the transformations that are important to our application when moving from a “3D homogeneous space” to a “2D homogeneous space”.</p> <p>Hopefully this explanation helps if you are every working on converting 3D transformation code to 2D.</p> Sun, 24 Sep 2023 23:00:00 -0000https://fryzekconcepts.com/notes/converting_from_3d_to_2d.htmlA Dive into Vulkanised 2024https://fryzekconcepts.com/notes/vulkanised_2024.html<figure> <img src="/assets/vulkanised_2024/vulkanized_logo_web.jpg" alt="Vulkanised sign at google’s office" /> <figcaption aria-hidden="true">Vulkanised sign at google’s office</figcaption> </figure> <p>Last week I had an exciting opportunity to attend the Vulkanised 2024 conference. For those of you not familar with the event, it is <a href="https://vulkan.org/events/vulkanised-2024">“The Premier Vulkan Developer Conference”</a> hosted by the Vulkan working group from Khronos. With the excitement out of the way, I decided to write about some of the interesting information that came out of the conference.</p> <h2 id="a-few-presentations">A Few Presentations</h2> <p>My colleagues Iago, Stéphane, and Hyunjun each had the opportunity to present on some of their work into the wider Vulkan ecosystem.</p> <figure> <img src="/assets/vulkanised_2024/vulkan_video_web.jpg" alt="Stéphane and Hyujun presenting" /> <figcaption aria-hidden="true">Stéphane and Hyujun presenting</figcaption> </figure> <p>Stéphane &amp; Hyunjun presented “Implementing a Vulkan Video Encoder From Mesa to Streamer”. They jointly talked about the work they performed to implement the Vulkan video extensions in Intel’s ANV Mesa driver as well as in GStreamer. This was an interesting presentation because you got to see how the new Vulkan video extensions affected both driver developers implementing the extensions and application developers making use of the extensions for real time video decoding and encoding. <a href="https://vulkan.org/user/pages/09.events/vulkanised-2024/vulkanised-2024-stephane-cerveau-ko-igalia.pdf">Their presentation is available on vulkan.org</a>.</p> <figure> <img src="/assets/vulkanised_2024/opensource_vulkan_web.jpg" alt="Iago presenting" /> <figcaption aria-hidden="true">Iago presenting</figcaption> </figure> <p>Later my colleague Iago presented jointly with Faith Ekstrand (a well-known Linux graphic stack contributor from Collabora) on “8 Years of Open Drivers, including the State of Vulkan in Mesa”. They both talked about the current state of Vulkan in the open source driver ecosystem, and some of the benefits open source drivers have been able to take advantage of, like the common Vulkan runtime code and a shared compiler stack. You can check out <a href="https://vulkan.org/user/pages/09.events/vulkanised-2024/Vulkanised-2024-faith-ekstrand-collabora-Iago-toral-igalia.pdf">their presentation for all the details</a>.</p> <p>Besides Igalia’s presentations, there were several more which I found interesting, with topics such as Vulkan developer tools, experiences of using Vulkan in real work applications, and even how to teach Vulkan to new developers. Here are some highlights for some of them.</p> <h3 id="using-vulkan-synchronization-validation-effectively"><a href="https://vulkan.org/user/pages/09.events/vulkanised-2024/vulkanised-2024-john-zulauf-lunarg.pdf">Using Vulkan Synchronization Validation Effectively</a></h3> <p>John Zulauf had a presentation of the Vulkan synchronization validation layers that he has been working on. If you are not familiar with these, then you should really check them out. They work by tracking how resources are used inside Vulkan and providing error messages with some hints if you use a resource in a way where it is not synchronized properly. It can’t catch every error, but it’s a great tool in the toolbelt of Vulkan developers to make their lives easier when it comes to debugging synchronization issues. As John said in the presentation, synchronization in Vulkan is hard, and nearly every application he tested the layers on reveled a synchronization issue, no matter how simple it was. He can proudly say he is a vkQuake contributor now because of these layers.</p> <h3 id="years-of-teaching-vulkan-with-example-for-video-extensions"><a href="https://vulkan.org/user/pages/09.events/vulkanised-2024/vulkanised-2024-helmut-hlavacs.pdf">6 Years of Teaching Vulkan with Example for Video Extensions</a></h3> <p>This was an interesting presentation from a professor at the university of Vienna about his experience teaching graphics as well as game development to students who may have little real programming experience. He covered the techniques he uses to make learning easier as well as resources that he uses. This would be a great presentation to check out if you’re trying to teach Vulkan to others.</p> <h3 id="vulkan-synchronization-made-easy"><a href="https://vulkan.org/user/pages/09.events/vulkanised-2024/vulkanised-2024-grigory-dzhavadyan.pdf">Vulkan Synchronization Made Easy</a></h3> <p>Another presentation focused on Vulkan sync, but instead of debugging it, Grigory showed how his graphics library abstracts sync away from the user without implementing a render graph. He presented an interesting technique that is similar to how the sync validation layers work when it comes ensuring that resources are always synchronized before use. If you’re building your own engine in Vulkan, this is definitely something worth checking out.</p> <h3 id="vulkan-video-encode-api-a-deep-dive"><a href="https://vulkan.org/user/pages/09.events/vulkanised-2024/vulkanised-2024-tony-zlatinski-nvidia.pdf">Vulkan Video Encode API: A Deep Dive</a></h3> <p>Tony at Nvidia did a deep dive into the new Vulkan Video extensions, explaining a bit about how video codecs work, and also including a roadmap for future codec support in the video extensions. Especially interesting for us was that he made a nice call-out to Igalia and our work on Vulkan Video CTS and open source driver support on slide (6) :)</p> <h2 id="thoughts-on-vulkanised">Thoughts on Vulkanised</h2> <p>Vulkanised is an interesting conference that gives you the intersection of people working on Vulkan drivers, game developers using Vulkan for their graphics backend, visual FX tool developers using Vulkan-based tools in their pipeline, industrial application developers using Vulkan for some embedded commercial systems, and general hobbyists who are just interested in Vulkan. As an example of some of these interesting audience members, I got to talk with a member of the Blender foundation about his work on the Vulkan backend to Blender.</p> <p>Lastly the event was held at Google’s offices in Sunnyvale. Which I’m always happy to travel to, not just for the better weather (coming from Canada), but also for the amazing restaurants and food that’s in the Bay Area!</p> <figure> <img src="/assets/vulkanised_2024/food_web.jpg" alt="Great bay area food" /> <figcaption aria-hidden="true">Great bay area food</figcaption> </figure> Wed, 14 Feb 2024 00:00:00 -0000https://fryzekconcepts.com/notes/vulkanised_2024.htmlSoftware Rendering and Androidhttps://fryzekconcepts.com/notes/android_swrast.html<p>My current project at Igalia has had me working on Mesa’s software renderers, llvmpipe and lavapipe. I’ve been working to get them running on Android, and I wanted to document the progress I’ve made, the challenges I’ve faced, and talk a little bit about the development process for a project like this. My work is not totally merged into upstream mesa yet, but you can see the MRs I made here:</p> <ul> <li><a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29344">llvmpipe: Add android platform integration</a></li> <li><a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29785">u_gralloc/fallback: Set fd from handle directly</a></li> <li><a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27805">llvmpipe &amp; lavalpipe: Implement sync fd import/export extensions</a></li> <li><a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28735">lavapipe: Implement <code>VK_EXT_external_memory_dma_buf</code></a></li> </ul> <h2 id="setting-up-an-android-development-environment">Setting up an Android development environment</h2> <p>Getting system level software to build and run on Android is unfortunately not straightforward. Since we are doing software rendering we don’t need a physical device and instead we can make use of the Android emulator, and if you didn’t know Android has two emulators, the common one most people use is “goldfish” and the other lesser known is “cuttlefish”. For this project I did my work on the cuttlefish emulator as its meant for testing the Android OS itself instead of just Android apps and is more reflective of real hardware. The cuttlefish emulator takes a little bit more work to setup, and I’ve found that it only works properly in Debian based linux distros. I run Fedora, so I had to run the emulator in a debian VM.</p> <p>Thankfully Google has good instructions for building and running cuttlefish, which you can find <a href="https://source.android.com/docs/devices/cuttlefish/get-started">here</a>. The instructions show you how to setup the emulator using nightly build images from Google. We’ll also need to setup our own Android OS images so after we’ve confirmed we can run the emulator, we need to start looking at building AOSP.</p> <p>For building our own AOSP image, we can also follow the instructions from Google <a href="https://source.android.com/docs/setup/build/building">here</a>. For the target we’ll want <code>aosp_cf_x86_64_phone-trunk_staging-eng</code>. At this point it’s a good idea to verify that you can build the image, which you can do by following the rest of the instructions on the page. Building AOSP from source does take a while though, so prepare to wait potentially an entire day for the image to build. Also if you get errors complaining that you’re out of memory, you can try to reduce the number of parallel builds. Google officially recommends to have 64GB of RAM, and I only had 32GB so some packages had to be built with the parallel builds set to 1 so I wouldn’t run out of RAM.</p> <p>For running this custom-built image on Cuttlefish, you can just copy all the <code>*.img</code> files from <code>out/target/product/vsoc_x86_64/</code> to the root cuttlefish directory, and then launch cuttlefish. If everything worked successfully you should be able to see your custom built AOSP image running in the cuttlefish webui.</p> <h2 id="building-mesa-targeting-android">Building Mesa targeting Android</h2> <p>Working from the changes in MR <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29344">!29344</a> building llvmpipe or lavapipe targeting Android should just work™️. To get to that stage required a few changes. First llvmpipe actually already had some support on Android, as long as it was running on a device that supports a DRM display driver. In that case it could use the <code>dri</code> window system integration which already works on Android. I wanted to get llvmpipe (and lavapipe) running without dri, so I had to add support for Android in the <code>drisw</code> window system integration.</p> <p>To support Android in <code>drisw</code>, this mainly meant adding support for importing dmabuf as framebuffers. The Android windowing system will provide us with a “gralloc” buffer which inside has a dmabuf fd that represents the framebuffer. Adding support for importing dmabufs in drisw means we can import and begin drawing to these frame buffers. Most the changes to support that can be found in <a href="https://gitlab.freedesktop.org/mesa/mesa/-/blob/9705df53408777d493eab19e5a58c432c1e75acb/src/gallium/frontends/dri/drisw.c#L405"><code>drisw_allocate_textures</code></a> and the underlying changes to llvmpipe to support importing dmabufs in MR <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27805">!27805</a>. The EGL Android platform code also needed some changes to use the <code>drisw</code> window system code. Previously this code would only work with true dri drivers, but with some small tweaks it was possible to get to have it initialize the drisw window system and then using it for rendering if no hardware devices are available.</p> <p>For lavapipe the changes were a lot simpler. The Android Vulkan loader requires your driver to have <code>HAL_MODULE_INFO_SYM</code> symbol in the binary, so that got created and populated correctly, following other Vulkan drivers in Mesa like turnip. Then the image creation code had to be modified to support the <code>VK_ANDROID_native_buffer</code> extension which allows the Android Vulkan loader to create images using Android native buffer handles. Under the hood this means getting the dmabuf fd from the native buffer handle. Thankfully mesa already has some common code to handle this, so I could just use that. Some other small changes were also necessary to address crashes and other failures that came up during testing.</p> <p>With the changes out of of the way we can now start building Mesa on Android. For this project I had to update the Android documentation for Mesa to include steps for building LLVM for Android since the version Google ships with the NDK is missing libraries that llvmpipe/lavapipe need to function. You can see the updated documentation <a href="https://gitlab.freedesktop.org/mesa/mesa/-/blob/9705df53408777d493eab19e5a58c432c1e75acb/docs/drivers/llvmpipe.rst">here</a> and <a href="https://gitlab.freedesktop.org/mesa/mesa/-/blob/9705df53408777d493eab19e5a58c432c1e75acb/docs/android.rst">here</a>. After sorting out LLVM, building llvmpipe/lavapipe is the same as building any other Mesa driver for Android: we setup a cross file to tell meson how to cross compile and then we run meson. At this point you could manual modify the Android image and copy these files to the vm, but I also wanted to support building a new AOSP image directly including the driver. In order to do that you also have to rename the driver binaries to match Android’s naming convention, and make sure SO_NAME matches as well. If you check out <a href="https://gitlab.freedesktop.org/mesa/mesa/-/blob/9705df53408777d493eab19e5a58c432c1e75acb/docs/android.rst?plain=1#L183">this</a> section of the documentation I wrote, it covers how to do that.</p> <p>If you followed all of that you should have built an version of llvmpipe and lavapipe that you can run on Android’s cuttlefish emulator.</p> <figure> <img src="/assets/2024-06-27-android-swrast/lavapipe.png" alt="Android running lavapipe" /> <figcaption aria-hidden="true">Android running lavapipe</figcaption> </figure> <h2 id="references">References</h2> <ul> <li><a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29344" class="uri">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29344</a> <ul> <li>Main MR with Android changes</li> </ul></li> <li><a href="https://source.android.com/docs/devices/cuttlefish/get-started" class="uri">https://source.android.com/docs/devices/cuttlefish/get-started</a> <ul> <li>Google’s official guide for getting started with the Cuttlefish emulator</li> </ul></li> <li><a href="https://source.android.com/docs/setup/build/building" class="uri">https://source.android.com/docs/setup/build/building</a> <ul> <li>Google’s official guide for building AOSP images</li> </ul></li> <li><a href="https://gitlab.freedesktop.org/mesa/mesa/-/blob/9705df53408777d493eab19e5a58c432c1e75acb/docs/drivers/llvmpipe.rst" class="uri">https://gitlab.freedesktop.org/mesa/mesa/-/blob/9705df53408777d493eab19e5a58c432c1e75acb/docs/drivers/llvmpipe.rst</a> <ul> <li>My updated documentation in MR for llvmpipe</li> </ul></li> <li><a href="https://gitlab.freedesktop.org/mesa/mesa/-/blob/9705df53408777d493eab19e5a58c432c1e75acb/docs/android.rst" class="uri">https://gitlab.freedesktop.org/mesa/mesa/-/blob/9705df53408777d493eab19e5a58c432c1e75acb/docs/android.rst</a> <ul> <li>My updated documentation in MR for Android integration in mesa</li> </ul></li> </ul> Wed, 26 Jun 2024 23:00:00 -0000https://fryzekconcepts.com/notes/android_swrast.html