Marcelo Tallón

The stack I master

Technologies I use to build enterprise data platforms

Data Warehousing

Snowflake (5+ años)
PostgreSQL (15+ años)
MySQL (12+ años)

Transformation

DBT (4+ años)
SQL Avanzado (20+ años)
Pandas (8+ años)

Orchestration

Apache Airflow (6+ años)
Dagster (2+ años)
Apache Kafka (3+ años)

Programming

Python (15+ años)
Django (12+ años)
Bash/Shell (18+ años)

Cloud & DevOps

Docker (8+ años)
AWS (6+ años)
CI/CD GitLab (7+ años)

Analytics & ML

Data Modeling (15+ años)
Machine Learning (5+ años)
Data Quality (10+ años)

Security & Networks

Ethical Hacking (12+ años)
Network Security (15+ años)
System Architecture (18+ años)

Development

REST APIs (12+ años)
Microservices (6+ años)
Git & Version Control (15+ años)

const data = {
  pipeline: "ETL",
  status: "success"
}

SELECT *
FROM users
WHERE active = true

def transform(df):
    return df.groupby('id')
           .agg({'value': 'sum'})

if (result.success) {
  console.log('✓ Done');
}

async function fetchData() {
  const res = await api.get();
  return res.json();
}

class DataPipeline:
    def __init__(self):
        self.data = []

const metrics = data
  .filter(x => x.valid)
  .map(x => x.value)

CREATE TABLE analytics (
  id INT PRIMARY KEY,
  value DECIMAL(10,2)
)

import pandas as pd
import numpy as np

df = pd.read_csv('data.csv')

UPDATE campaigns
SET status = 'active'
WHERE budget > 1000

for item in collection:
    if item.is_valid():
        process(item)

const config = {
  api: process.env.API_URL,
  timeout: 5000
}

WITH ranked AS (
  SELECT *, ROW_NUMBER()
  OVER (PARTITION BY id)
)

try:
    result = pipeline.run()
except Exception as e:
    logger.error(e)

export default function() {
  return new Promise((resolve) => {
    resolve(data);
  });
}

Projects with impact

Real cases with real metrics

ETL Migration

Enterprise ETL 2.0 Platform

Led complete redesign and migration of legacy ETL system (1.0 → 2.0). Architected modular pipelines with Airflow, Kafka (Avro), Snowflake, and AWS Athena/S3. Multiprocessing extraction, Parquet intermediate storage, and dynamic DAG generation.

40%

Faster extraction

40+

DAGs migrated

DBT Infrastructure

Cookiecutter DBT Framework

Built enterprise DBT framework with cookiecutter templates and modular architecture. Smart CI/CD detects changes and builds only what's necessary. Harbor registry integration, Vault secrets, and automated Airflow deployment.

70%

Build time saved

15+

Active projects

Real-time Streaming

Kafka Avro Pipeline

Designed real-time streaming architecture using Kafka with Avro serialization. Event-driven data ingestion with schema evolution support. Integrated with Snowflake for low-latency analytics.

<2min

End-to-end latency

100M+

Events/day

Data Orchestration

Dynamic DAG Generator

Created dynamic Airflow DAG system based on configurable metadata. Auto-generates Extract-Transform-Load workflows from YAML configs. Enables rapid onboarding of new data sources without code changes.

10x

Faster onboarding

50+

Production DAGs

Data Validation

Pandera Schema Validator

Implemented automated data quality framework using Pandera for schema validation, type checking, and constraint enforcement. Integrated with ETL pipeline for fail-fast data integrity checks.

100%

Schema coverage

5TB+

Validated data

CI/CD Automation

GitLab Harbor Pipeline

Architected complete CI/CD pipeline with GitLab for automated testing, Docker image builds, and Harbor registry deployment. Changed projects auto-detected, built, and deployed to Airflow QA/Prod environments.

Zero

Manual deploys

15min

Deploy time

My Journey

From soldier to Data Architect

2008

Argentine Army

Soldier • Systems & Networks

Military service teaching discipline, strategy under pressure, and systematic problem-solving. First exposure to networks and security protocols.

2011

University + First Lines of Code

Computer Systems Engineering

Started Computer Systems Engineering degree. First Python programs. Discovered passion for automation and data structures.

2015

Professional Developer

Software Engineer

Rapid growth from junior to senior developer. Built integrations, databases, and full-stack applications. Mastered Python, Flask, Django, and JavaScript.

2017

Technical Leadership

CTO & Tech Lead

Led technical teams, made architectural decisions, and drove product roadmaps. Vue.js, Node.js, AWS infrastructure, and team mentoring.

2020

Data Engineering Specialist

Engineering Team Lead

Led complete ETL system redesign. Architected pipelines with Airflow, Kafka, Snowflake. 40+ DAGs migrated, 100M+ events/day. Built DBT framework from scratch.

2025

Data Platform Architect

Present • Building the Future

Designing enterprise data platforms. Modular architectures, real-time streaming, CI/CD automation. Sharing knowledge through mentorship and writing.

Plan

Code

Test

Deploy

Build

Ship

Data

Value

How I Work

My workflow: From problem to production

STEP

ENTENDER

Inmersión profunda. Pregunto "¿por qué?" 5 veces. Entiendo restricciones e impacto.

STEP

DISEÑAR

Arquitectura primero. Modular, escalable, mantenible. Sistemas que duran años.

STEP

CONSTRUIR

Código limpio con tests. CI/CD desde día uno. Lanzo MVPs rápido, itero.

STEP

OPTIMIZAR

Mido todo. Trackeo métricas. Nunca dejo de mejorar. Decisiones basadas en datos.

Achievements

Impact delivered, without the corporate badges

Led ETL 2.0 Migration

Architected and led complete redesign of enterprise ETL system. 40+ legacy DAGs migrated to modern, modular architecture.

Built Team from Ground Up

Grew and mentored engineering team of 5+ developers. Established best practices, code standards, and CI/CD workflows.

40% Performance Improvement

Optimized extraction processes through batch optimization, multiprocessing, and smart scheduling. Reduced runtime by 40%.

5+ Billion Events Processed

Designed pipelines processing 100M+ events daily. 5TB+ of data validated and transformed with zero data loss.

Built DBT Framework

Created enterprise DBT framework with cookiecutter templates, smart CI/CD, and automated deployment. 15+ active projects.

Published "Que No Te Boludeen"

Authored 248-page book on manipulation detection and strategic thinking. Applied intelligence and psychology techniques to everyday life.

How I Code

Real Python from my ETL architecture

extract.py

#!/usr/bin/env python3

# -*- coding: utf-8 -*-

"""

ETL Pipeline - High-Performance Parallel Extraction

Production-grade pipeline with multiprocessing.

Achieves 40% performance improvement vs sequential.

"""

import multiprocessing

from typing import Dict, List, Optional, Any

from datetime import datetime, timedelta

from helpers import run_parallel, Timer, get_logger

from connectors import ETLdb, KafkaConsumer, AthenaClient

# Constants

LOOKBACK_DAYS = 7

KAFKA_TIMEOUT_MS = 5000

EVENTS_TOPIC = "events-topic"

logger = get_logger(__name__)

def extract_snowflake(results: Dict[str, Any]) -> None:

"""

Extract events from Snowflake DWH (last 7 days).

Args:

results: Dict updated with 'snowflake_data' key.

"""

with Timer("Snowflake extraction", logger.info):

try:

with ETLdb(database="snowflake") as db:

query = """

SELECT

event_id,

event_type,

user_id,

event_timestamp,

properties

FROM raw_events

WHERE event_date >= DATEADD(day, -7, CURRENT_DATE())

AND is_valid = TRUE

ORDER BY event_timestamp DESC

"""

data = db.extract_query(query)

results['snowflake_data'] = data

logger.info(f"✓ Extracted {len(data):,} rows from Snowflake")

except Exception as e:

logger.error(f"✗ Snowflake extraction failed: {e}")

results['snowflake_data'] = []

def extract_kafka(results: Dict[str, Any]) -> None:

"""

Extract real-time events from Kafka with Avro.

Args:

results: Dict updated with 'kafka_events' key.

"""

consumer = KafkaConsumer(

topic=EVENTS_TOPIC,

avro_schema="event_schema_v2",

auto_offset_reset="latest"

)

events = consumer.poll(timeout_ms=KAFKA_TIMEOUT_MS)

results['kafka_events'] = events

logger.info(f"✓ Consumed {len(events):,} Kafka events")

def extract_data() -> Dict[str, Any]:

"""

Orchestrate parallel data extraction from all sources.

Executes multiple extraction functions concurrently using

multiprocessing. Delivers 40% performance gain vs sequential.

Returns:

Dictionary containing extracted data from all sources:

- 'snowflake_data': Historical events from DWH

- 'kafka_events': Real-time streaming events

- 'athena_metrics': Aggregated metrics from S3

"""

logger.info("Starting parallel extraction pipeline...")

with Timer("Total extraction", logger.info):

results = run_parallel(

extractors=[

extract_snowflake,

extract_kafka,

extract_athena,

max_workers=multiprocessing.cpu_count()

)

logger.info("✓ Pipeline completed successfully")

return results

Beyond the code

Discipline, strategy and resilience applied to everything I do

Taekwondo

Black Belt • 15+ years

Years of training taught me that mastery is built through daily repetition, strategic thinking under pressure, and the discipline to keep improving even when progress seems invisible.

Strategic thinking →

Mental resilience →

Focus and discipline →

Continuous improvement →

Show Jumping

Equestrian • 10+ years

Working with horses demands absolute patience, subtle communication, and the ability to remain calm in unpredictable situations. Every ride is a lesson in trust and precision.

Patience and timing →

Clear communication →

Composure under pressure →

Precision and control →

Philosophical Development

Fraternal Organizations • 8+ years

Member of fraternal organizations dedicated to personal growth, ethical principles, and the pursuit of knowledge. These spaces taught me the value of lifelong learning, critical thinking, and service to others.

Continuous learning →

Ethical decision-making →

Critical thinking →

Service mindset →

★★★★★

"Por fin un libro que va al grano. Herramientas reales que uso todos los días."

— Roberto M.

★★★★★

"Lo leí de un tirón. Las técnicas me abrieron los ojos."

— Laura G.

★★★★★

"El manejo del silencio cambió mi forma de comunicarme."

— Martín P.

★★★★★

"No es teoría de autoayuda, son herramientas prácticas."

— Roberto M.

★★★★★

"Los ejemplos prácticos me abrieron los ojos."

— Laura G.

★★★★★

"Ya detecté dos maniobras que antes me pasaban desapercibidas."

— Roberto M.

★★★★★

"El enfoque sobre operaciones cotidianas es brillante."

— Martín P.

★★★★★

"Las técnicas de lectura del terreno son invaluables."

— Laura G.

Que No Te Boludeen

Practical manual to detect manipulations and defend yourself in everyday life. Intelligence, psychology and strategy techniques applied to daily life.

978-631-6397-74-9 | Libella Publishing | 248 pages

View the book

LIB-2025-ARG-001

From soldier to Data Architect

The stack I master

Data Warehousing

Transformation

Orchestration

Programming

Cloud & DevOps

Analytics & ML

Security & Networks

Development

Projects with impact

Enterprise ETL 2.0 Platform

Cookiecutter DBT Framework

Kafka Avro Pipeline

Dynamic DAG Generator

Pandera Schema Validator

GitLab Harbor Pipeline

My Journey

Argentine Army

Soldier • Systems & Networks

University + First Lines of Code

Computer Systems Engineering

Professional Developer

Software Engineer

Technical Leadership

CTO & Tech Lead

Data Engineering Specialist

Engineering Team Lead

Data Platform Architect

Present • Building the Future

How I Work

ENTENDER

DISEÑAR

CONSTRUIR

OPTIMIZAR

Achievements

Led ETL 2.0 Migration

Built Team from Ground Up

40% Performance Improvement

5+ Billion Events Processed

Built DBT Framework

Published "Que No Te Boludeen"

How I Code

Beyond the code

Taekwondo

Show Jumping

Philosophical Development

Que No Te Boludeen

Let's build something together

From soldier
to Data Architect