<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Data Trenches</title><link>https://data-trenches.leandrof.space/</link><description>Recent content on Data Trenches</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><managingEditor>leandrojlfernandes@gmail.com (Leandro Fernandes)</managingEditor><webMaster>leandrojlfernandes@gmail.com (Leandro Fernandes)</webMaster><lastBuildDate>Mon, 09 Mar 2026 12:00:00 -0600</lastBuildDate><atom:link href="https://data-trenches.leandrof.space/index.xml" rel="self" type="application/rss+xml"/><item><title>Automating PERM Case Status Monitoring: My EB-3 Green Card Journey</title><link>https://data-trenches.leandrof.space/posts/perm-automation/</link><pubDate>Mon, 09 Mar 2026 12:00:00 -0600</pubDate><author>leandrojlfernandes@gmail.com (Leandro Fernandes)</author><guid>https://data-trenches.leandrof.space/posts/perm-automation/</guid><description>&lt;h1 id="my-eb-3-green-card-journey-automating-perm-status-checks">My EB-3 Green Card Journey: Automating PERM Status Checks&lt;/h1>
&lt;h2 id="the-immigration-process">The Immigration Process&lt;/h2>
&lt;p>Going through the U.S. employment-based immigration process is a journey that tests your patience in ways you never imagined. As someone pursuing an &lt;strong>EB-3 visa&lt;/strong> (skilled worker category), I found myself navigating the complex maze of PERM labor certification - the first major step toward permanent residency.&lt;/p>
&lt;h3 id="understanding-perm">Understanding PERM&lt;/h3>
&lt;p>PERM (Program Electronic Review Management) is the process where the Department of Labor certifies that there are no qualified U.S. workers available for the position offered to the foreign worker. For most employment-based green cards, including EB-3, this is a mandatory step. The process involves:&lt;/p></description></item><item><title>Real-time ML Model Serving</title><link>https://data-trenches.leandrof.space/projects/realtime-ml-serving/</link><pubDate>Sun, 04 Jan 2026 00:00:00 +0000</pubDate><author>leandrojlfernandes@gmail.com (Leandro Fernandes)</author><guid>https://data-trenches.leandrof.space/projects/realtime-ml-serving/</guid><description>&lt;h2 id="the-challenge">The Challenge&lt;/h2>
&lt;p>Deploying machine learning models to serve real-time inference requests for client-facing applications with strict latency requirements.&lt;/p>
&lt;h2 id="the-solution">The Solution&lt;/h2>
&lt;p>Built and deployed low-latency inference services using modern microservices architecture:&lt;/p>
&lt;ul>
&lt;li>FastAPI-based REST endpoints&lt;/li>
&lt;li>Docker containerization for consistency&lt;/li>
&lt;li>Load balancing and auto-scaling&lt;/li>
&lt;li>Health monitoring and logging&lt;/li>
&lt;/ul>
&lt;h3 id="technologies-used">Technologies Used&lt;/h3>
&lt;ul>
&lt;li>FastAPI&lt;/li>
&lt;li>Docker&lt;/li>
&lt;li>Machine Learning Deployment&lt;/li>
&lt;li>API Development&lt;/li>
&lt;/ul>
&lt;h2 id="impact">Impact&lt;/h2>
&lt;ul>
&lt;li>Real-time model inference capabilities&lt;/li>
&lt;li>Low-latency responses for client applications&lt;/li>
&lt;li>Scalable architecture handling varying load&lt;/li>
&lt;li>Easy model updates and rollbacks&lt;/li>
&lt;li>Production-grade reliability&lt;/li>
&lt;/ul>
&lt;p>This project showcased the ability to bridge the gap between ML models and production applications, ensuring models could be consumed by real users with minimal latency.&lt;/p></description></item><item><title>NLP Analytics Engine</title><link>https://data-trenches.leandrof.space/projects/nlp-analytics-engine/</link><pubDate>Thu, 18 Sep 2025 00:00:00 +0000</pubDate><author>leandrojlfernandes@gmail.com (Leandro Fernandes)</author><guid>https://data-trenches.leandrof.space/projects/nlp-analytics-engine/</guid><description>&lt;h2 id="the-challenge">The Challenge&lt;/h2>
&lt;p>Building a production-grade NLP analytics engine capable of processing semantic data from 25,000 daily targets while maintaining high availability and delivering actionable insights to enterprise clients.&lt;/p>
&lt;h2 id="the-solution">The Solution&lt;/h2>
&lt;p>Designed and implemented an end-to-end pipeline from model training to deployment, including:&lt;/p>
&lt;ul>
&lt;li>Data ingestion and preprocessing pipeline&lt;/li>
&lt;li>Model training infrastructure&lt;/li>
&lt;li>Inference serving layer&lt;/li>
&lt;li>Monitoring and alerting system&lt;/li>
&lt;/ul>
&lt;h3 id="technologies-used">Technologies Used&lt;/h3>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Machine Learning/NLP libraries&lt;/li>
&lt;li>Distributed processing&lt;/li>
&lt;li>Containerization (Docker)&lt;/li>
&lt;li>API development (FastAPI)&lt;/li>
&lt;/ul>
&lt;h2 id="impact">Impact&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>$700k recurring revenue&lt;/strong> generated from the analytics solution&lt;/li>
&lt;li>Processes semantic data from &lt;strong>25,000+ daily targets&lt;/strong>&lt;/li>
&lt;li>Production-grade reliability and performance&lt;/li>
&lt;li>Real-time analytics delivery to clients&lt;/li>
&lt;/ul>
&lt;p>This project demonstrated the full lifecycle of deploying ML models in production, from data pipeline to client-facing application. The atual output of this project can&amp;rsquo;t be shared publicly given it was trained with confidential data.&lt;/p></description></item><item><title>PostgreSQL 17 Beta: B-Tree just got promoted to Index CEO</title><link>https://data-trenches.leandrof.space/posts/postgresql-17-btree/</link><pubDate>Mon, 10 Jun 2024 00:00:00 +0000</pubDate><author>leandrojlfernandes@gmail.com (Leandro Fernandes)</author><guid>https://data-trenches.leandrof.space/posts/postgresql-17-btree/</guid><description>&lt;p>The release of &lt;a href="https://www.postgresql.org/about/news/postgresql-17-beta-1-released-2865/">PostgreSQL 17 beta&lt;/a> brought a bunch of new interesting features. Improvements to the vacuum execution time, memory consumption, faster ANALYZE, etc.., but the one that most databases and developers will appreciate that also caught my eye right off the bat, are the improvements to the B-Tree Index when using the IN or ANY clauses. I&amp;rsquo;ve read improvements ranging from 10% to 30% without any change to your database or table structure so I wanted to test it out for one of my production use cases.&lt;/p></description></item><item><title>PostgreSQL: A deep dive of Prepared Statements</title><link>https://data-trenches.leandrof.space/posts/postgresql-prepared-statements/</link><pubDate>Mon, 27 May 2024 00:00:00 +0000</pubDate><author>leandrojlfernandes@gmail.com (Leandro Fernandes)</author><guid>https://data-trenches.leandrof.space/posts/postgresql-prepared-statements/</guid><description>&lt;p>In a PostgreSQL database, preparing a statement involves parsing, analyzing and rewriting that specific statement. The result is compiled and stored in memory (which is usually called as statement caching). When a previous prepared statement is executed, PostgreSQL can skip the parsing, analyzing and rewriting steps and use the precompiled version instead. This can significantly improve database performance, especially for queries that are executed frequently or have complex plans. The more you execute a query, the bigger the probability for you to see significant improvements.&lt;/p></description></item><item><title>PySpark Infrastructure Optimization</title><link>https://data-trenches.leandrof.space/projects/pyspark-optimization/</link><pubDate>Sat, 17 Feb 2024 00:00:00 +0000</pubDate><author>leandrojlfernandes@gmail.com (Leandro Fernandes)</author><guid>https://data-trenches.leandrof.space/projects/pyspark-optimization/</guid><description>&lt;h2 id="the-challenge">The Challenge&lt;/h2>
&lt;p>Handling massive-scale data processing while maintaining reasonable query latency and managing compute resource costs in a distributed environment.&lt;/p>
&lt;h2 id="the-solution">The Solution&lt;/h2>
&lt;p>Architected distributed processing jobs using PySpark with multiple optimization strategies:&lt;/p>
&lt;ul>
&lt;li>Algorithmic improvements to reduce computational complexity&lt;/li>
&lt;li>Storage optimization using Trino and Hive&lt;/li>
&lt;li>Query execution plan optimization&lt;/li>
&lt;li>Resource allocation tuning&lt;/li>
&lt;li>Data partitioning strategies&lt;/li>
&lt;/ul>
&lt;h3 id="technologies-used">Technologies Used&lt;/h3>
&lt;ul>
&lt;li>PySpark&lt;/li>
&lt;li>Apache Hadoop&lt;/li>
&lt;li>Trino&lt;/li>
&lt;li>Hive&lt;/li>
&lt;li>Distributed Systems&lt;/li>
&lt;/ul>
&lt;h2 id="impact">Impact&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>25% reduction&lt;/strong> in query latency&lt;/li>
&lt;li>&lt;strong>25% decrease&lt;/strong> in resource consumption&lt;/li>
&lt;li>Improved processing efficiency for massive datasets&lt;/li>
&lt;li>Significant cost savings on compute resources&lt;/li>
&lt;/ul>
&lt;p>This optimization effort required deep understanding of distributed systems, Spark internals, and data storage patterns to achieve measurable performance gains.&lt;/p></description></item><item><title>Installing GlobalProtect VPN Cli on a Unix env</title><link>https://data-trenches.leandrof.space/posts/vpn-global-protect/</link><pubDate>Sun, 02 Apr 2023 12:33:01 -0600</pubDate><author>leandrojlfernandes@gmail.com (Leandro Fernandes)</author><guid>https://data-trenches.leandrof.space/posts/vpn-global-protect/</guid><description>&lt;h1 id="installing-globalprotect-vpn-cli-on-a-unix-environment">Installing GlobalProtect VPN CLI on a Unix environment&lt;/h1>
&lt;p>GlobalProtect is a virtual private network (VPN) solution, developed and maintained by PaloAlto, that provides secure remote access to enterprise resources. It allows users to connect to corporate networks from anywhere in the world, making it a popular choice for businesses with a distributed workforce. In this article, we will guide you through the steps to install GlobalProtect CLI VPN on a Unix-based machine.&lt;/p></description></item><item><title>SQL Count - What the count() function acually counts?</title><link>https://data-trenches.leandrof.space/posts/sqlcount/</link><pubDate>Sun, 10 Jul 2022 11:44:01 -0600</pubDate><author>leandrojlfernandes@gmail.com (Leandro Fernandes)</author><guid>https://data-trenches.leandrof.space/posts/sqlcount/</guid><description>&lt;h1 id="sql-count--what-the-count-function-actually-counts">SQL Count — What the count() function actually counts?&lt;/h1>
&lt;p>The count() function is widely used in SQL and sometimes without fully understanding the impact that the entry parameter to the function can alter the output of your results. Most commonly, the functions can have the following as inputs parameters:&lt;/p>
&lt;ol>
&lt;li>Count(*)&lt;/li>
&lt;li>Count(1)&lt;/li>
&lt;li>Count(column name)&lt;/li>
&lt;/ol>
&lt;h3 id="test-scenario">Test Scenario&lt;/h3>
&lt;p>Let’s create a table and insert a few records to try each count variation and compare the results.&lt;/p></description></item><item><title>GitLab CI - Auto Deploy to your VPS</title><link>https://data-trenches.leandrof.space/posts/gitlabci/</link><pubDate>Mon, 09 Aug 2021 11:48:42 -0600</pubDate><author>leandrojlfernandes@gmail.com (Leandro Fernandes)</author><guid>https://data-trenches.leandrof.space/posts/gitlabci/</guid><description>&lt;p>&lt;img src="image.png" alt="alt text">&lt;/p>
&lt;h1 id="gitlab-ci---auto-deploy-to-your-vps">GitLab CI - Auto Deploy to your VPS&lt;/h1>
&lt;p>I was developing a project where I wanted to have an online Staging environment to host our application and have the latest changes being deployed automatically upon each merge request to the Staging branch.&lt;/p>
&lt;p>I’ve started by searching how to setup automatic deploy via my &lt;strong>.gitlab-ci.yml&lt;/strong> file, the main goal would be to run our pipeline and, if my test cases were successful, we would SSH into my VPS and pull the latest version of the branch.&lt;/p></description></item><item><title>nAttrMon Open Source Contribution</title><link>https://data-trenches.leandrof.space/projects/nattrmon-contribution/</link><pubDate>Tue, 07 May 2019 00:00:00 +0000</pubDate><author>leandrojlfernandes@gmail.com (Leandro Fernandes)</author><guid>https://data-trenches.leandrof.space/projects/nattrmon-contribution/</guid><description>&lt;h2 id="the-challenge">The Challenge&lt;/h2>
&lt;p>Improve system observability across distributed clusters by developing custom plugins for the &lt;a href="https://github.com/OpenAF/nAttrMon">nAttrMon&lt;/a> monitoring tool.&lt;/p>
&lt;h2 id="the-solution">The Solution&lt;/h2>
&lt;p>Contributed code and developed custom monitors to detect and visualize real-time system bottlenecks:&lt;/p>
&lt;ul>
&lt;li>Real-time system monitoring&lt;/li>
&lt;li>Custom monitoring architecture&lt;/li>
&lt;li>Alerting and notification systems&lt;/li>
&lt;li>Performance metrics collection&lt;/li>
&lt;/ul>
&lt;h3 id="technologies-used">Technologies Used&lt;/h3>
&lt;ul>
&lt;li>Java&lt;/li>
&lt;li>Open Source Development&lt;/li>
&lt;li>System Monitoring&lt;/li>
&lt;li>Distributed Systems&lt;/li>
&lt;/ul>
&lt;h2 id="impact">Impact&lt;/h2>
&lt;ul>
&lt;li>Enhanced observability across distributed clusters&lt;/li>
&lt;li>&lt;strong>Reduced mean-time-to-resolution (MTTR)&lt;/strong> for outages&lt;/li>
&lt;li>Better system performance insights&lt;/li>
&lt;li>Community contribution to open source project&lt;/li>
&lt;li>Improved system stability for carrier networks&lt;/li>
&lt;/ul>
&lt;p>This open source work demonstrated the ability to understand and contribute to complex systems while providing practical value to the community.&lt;/p></description></item><item><title>About</title><link>https://data-trenches.leandrof.space/about/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><author>leandrojlfernandes@gmail.com (Leandro Fernandes)</author><guid>https://data-trenches.leandrof.space/about/</guid><description>&lt;p>Hey there! I&amp;rsquo;m Leandro Fernandes, and I absolutely love solving tough data problems. I&amp;rsquo;ve spent years building systems that process millions of records daily, designing NLP data pipelines that understand what people are saying, and figuring out how to make those pipelines faster and smarter. Whether it&amp;rsquo;s optimizing PySpark jobs or deploying ML models that work in real-time, I get excited about turning raw data into something useful.&lt;/p>
&lt;p>In 2022, I made the big leap from Portugal to the USA to take on a Senior Data Engineer role at Mobileum in Dallas, Texas. It&amp;rsquo;s been quite a journey - from working as an Application Data Engineer in Braga to working in NLP analytics projects and big data systems.&lt;/p></description></item></channel></rss>