Data Trenches

Automating PERM Case Status Monitoring: My EB-3 Green Card Journey

leandrojlfernandes@gmail.com (Leandro Fernandes) — Mon, 09 Mar 2026 12:00:00 -0600

My EB-3 Green Card Journey: Automating PERM Status Checks

The Immigration Process

Going through the U.S. employment-based immigration process is a journey that tests your patience in ways you never imagined. As someone pursuing an EB-3 visa (skilled worker category), I found myself navigating the complex maze of PERM labor certification - the first major step toward permanent residency.

Understanding PERM

PERM (Program Electronic Review Management) is the process where the Department of Labor certifies that there are no qualified U.S. workers available for the position offered to the foreign worker. For most employment-based green cards, including EB-3, this is a mandatory step. The process involves:

Real-time ML Model Serving

leandrojlfernandes@gmail.com (Leandro Fernandes) — Sun, 04 Jan 2026 00:00:00 +0000

The Challenge

Deploying machine learning models to serve real-time inference requests for client-facing applications with strict latency requirements.

The Solution

Built and deployed low-latency inference services using modern microservices architecture:

FastAPI-based REST endpoints
Docker containerization for consistency
Load balancing and auto-scaling
Health monitoring and logging

Technologies Used

FastAPI
Docker
Machine Learning Deployment
API Development

Impact

Real-time model inference capabilities
Low-latency responses for client applications
Scalable architecture handling varying load
Easy model updates and rollbacks
Production-grade reliability

This project showcased the ability to bridge the gap between ML models and production applications, ensuring models could be consumed by real users with minimal latency.

NLP Analytics Engine

leandrojlfernandes@gmail.com (Leandro Fernandes) — Thu, 18 Sep 2025 00:00:00 +0000

The Challenge

Building a production-grade NLP analytics engine capable of processing semantic data from 25,000 daily targets while maintaining high availability and delivering actionable insights to enterprise clients.

The Solution

Designed and implemented an end-to-end pipeline from model training to deployment, including:

Data ingestion and preprocessing pipeline
Model training infrastructure
Inference serving layer
Monitoring and alerting system

Technologies Used

Python
Machine Learning/NLP libraries
Distributed processing
Containerization (Docker)
API development (FastAPI)

Impact

$700k recurring revenue generated from the analytics solution
Processes semantic data from 25,000+ daily targets
Production-grade reliability and performance
Real-time analytics delivery to clients

This project demonstrated the full lifecycle of deploying ML models in production, from data pipeline to client-facing application. The atual output of this project can’t be shared publicly given it was trained with confidential data.

PostgreSQL 17 Beta: B-Tree just got promoted to Index CEO

leandrojlfernandes@gmail.com (Leandro Fernandes) — Mon, 10 Jun 2024 00:00:00 +0000

The release of PostgreSQL 17 beta brought a bunch of new interesting features. Improvements to the vacuum execution time, memory consumption, faster ANALYZE, etc.., but the one that most databases and developers will appreciate that also caught my eye right off the bat, are the improvements to the B-Tree Index when using the IN or ANY clauses. I’ve read improvements ranging from 10% to 30% without any change to your database or table structure so I wanted to test it out for one of my production use cases.

PostgreSQL: A deep dive of Prepared Statements

leandrojlfernandes@gmail.com (Leandro Fernandes) — Mon, 27 May 2024 00:00:00 +0000

In a PostgreSQL database, preparing a statement involves parsing, analyzing and rewriting that specific statement. The result is compiled and stored in memory (which is usually called as statement caching). When a previous prepared statement is executed, PostgreSQL can skip the parsing, analyzing and rewriting steps and use the precompiled version instead. This can significantly improve database performance, especially for queries that are executed frequently or have complex plans. The more you execute a query, the bigger the probability for you to see significant improvements.

PySpark Infrastructure Optimization

leandrojlfernandes@gmail.com (Leandro Fernandes) — Sat, 17 Feb 2024 00:00:00 +0000

The Challenge

Handling massive-scale data processing while maintaining reasonable query latency and managing compute resource costs in a distributed environment.

The Solution

Architected distributed processing jobs using PySpark with multiple optimization strategies:

Algorithmic improvements to reduce computational complexity
Storage optimization using Trino and Hive
Query execution plan optimization
Resource allocation tuning
Data partitioning strategies

Technologies Used

PySpark
Apache Hadoop
Trino
Hive
Distributed Systems

Impact

25% reduction in query latency
25% decrease in resource consumption
Improved processing efficiency for massive datasets
Significant cost savings on compute resources

This optimization effort required deep understanding of distributed systems, Spark internals, and data storage patterns to achieve measurable performance gains.

Installing GlobalProtect VPN Cli on a Unix env

leandrojlfernandes@gmail.com (Leandro Fernandes) — Sun, 02 Apr 2023 12:33:01 -0600

Installing GlobalProtect VPN CLI on a Unix environment

GlobalProtect is a virtual private network (VPN) solution, developed and maintained by PaloAlto, that provides secure remote access to enterprise resources. It allows users to connect to corporate networks from anywhere in the world, making it a popular choice for businesses with a distributed workforce. In this article, we will guide you through the steps to install GlobalProtect CLI VPN on a Unix-based machine.

SQL Count - What the count() function acually counts?

leandrojlfernandes@gmail.com (Leandro Fernandes) — Sun, 10 Jul 2022 11:44:01 -0600

SQL Count — What the count() function actually counts?

The count() function is widely used in SQL and sometimes without fully understanding the impact that the entry parameter to the function can alter the output of your results. Most commonly, the functions can have the following as inputs parameters:

Count(*)
Count(1)
Count(column name)

Test Scenario

Let’s create a table and insert a few records to try each count variation and compare the results.

GitLab CI - Auto Deploy to your VPS

leandrojlfernandes@gmail.com (Leandro Fernandes) — Mon, 09 Aug 2021 11:48:42 -0600

GitLab CI - Auto Deploy to your VPS

I was developing a project where I wanted to have an online Staging environment to host our application and have the latest changes being deployed automatically upon each merge request to the Staging branch.

I’ve started by searching how to setup automatic deploy via my .gitlab-ci.yml file, the main goal would be to run our pipeline and, if my test cases were successful, we would SSH into my VPS and pull the latest version of the branch.

nAttrMon Open Source Contribution

leandrojlfernandes@gmail.com (Leandro Fernandes) — Tue, 07 May 2019 00:00:00 +0000

The Challenge

Improve system observability across distributed clusters by developing custom plugins for the nAttrMon monitoring tool.

The Solution

Contributed code and developed custom monitors to detect and visualize real-time system bottlenecks:

Real-time system monitoring
Custom monitoring architecture
Alerting and notification systems
Performance metrics collection

Technologies Used

Java
Open Source Development
System Monitoring
Distributed Systems

Impact

Enhanced observability across distributed clusters
Reduced mean-time-to-resolution (MTTR) for outages
Better system performance insights
Community contribution to open source project
Improved system stability for carrier networks

This open source work demonstrated the ability to understand and contribute to complex systems while providing practical value to the community.

About

leandrojlfernandes@gmail.com (Leandro Fernandes) — Mon, 01 Jan 0001 00:00:00 +0000

Hey there! I’m Leandro Fernandes, and I absolutely love solving tough data problems. I’ve spent years building systems that process millions of records daily, designing NLP data pipelines that understand what people are saying, and figuring out how to make those pipelines faster and smarter. Whether it’s optimizing PySpark jobs or deploying ML models that work in real-time, I get excited about turning raw data into something useful.

In 2022, I made the big leap from Portugal to the USA to take on a Senior Data Engineer role at Mobileum in Dallas, Texas. It’s been quite a journey - from working as an Application Data Engineer in Braga to working in NLP analytics projects and big data systems.