From soldier
to Data Architect

20+ years building data systems that process millions of events every day. From the Argentine Army to designing enterprise platforms. From complex problems to elegant architectures.

Marcelo Tallón

Marcelo Tallón

$ whoami → data-architect

20+
Years
100M+
Records/day
50+
Pipelines
5TB+
Data

The stack I master

Technologies I use to build enterprise data platforms

Data Warehousing

  • Snowflake (5+ años)
  • PostgreSQL (15+ años)
  • MySQL (12+ años)

Transformation

  • DBT (4+ años)
  • SQL Avanzado (20+ años)
  • Pandas (8+ años)

Orchestration

  • Apache Airflow (6+ años)
  • Dagster (2+ años)
  • Apache Kafka (3+ años)

Programming

  • Python (15+ años)
  • Django (12+ años)
  • Bash/Shell (18+ años)

Cloud & DevOps

  • Docker (8+ años)
  • AWS (6+ años)
  • CI/CD GitLab (7+ años)

Analytics & ML

  • Data Modeling (15+ años)
  • Machine Learning (5+ años)
  • Data Quality (10+ años)

Security & Networks

  • Ethical Hacking (12+ años)
  • Network Security (15+ años)
  • System Architecture (18+ años)

Development

  • REST APIs (12+ años)
  • Microservices (6+ años)
  • Git & Version Control (15+ años)
const data = { pipeline: "ETL", status: "success" }
SELECT * FROM users WHERE active = true
def transform(df): return df.groupby('id') .agg({'value': 'sum'})
if (result.success) { console.log('✓ Done'); }
async function fetchData() { const res = await api.get(); return res.json(); }
class DataPipeline: def __init__(self): self.data = []
const metrics = data .filter(x => x.valid) .map(x => x.value)
CREATE TABLE analytics ( id INT PRIMARY KEY, value DECIMAL(10,2) )
import pandas as pd import numpy as np df = pd.read_csv('data.csv')
UPDATE campaigns SET status = 'active' WHERE budget > 1000
for item in collection: if item.is_valid(): process(item)
const config = { api: process.env.API_URL, timeout: 5000 }
WITH ranked AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY id) )
try: result = pipeline.run() except Exception as e: logger.error(e)
export default function() { return new Promise((resolve) => { resolve(data); }); }

Projects with impact

Real cases with real metrics

ETL Migration

Enterprise ETL 2.0 Platform

Led complete redesign and migration of legacy ETL system (1.0 → 2.0). Architected modular pipelines with Airflow, Kafka (Avro), Snowflake, and AWS Athena/S3. Multiprocessing extraction, Parquet intermediate storage, and dynamic DAG generation.

40%
Faster extraction
40+
DAGs migrated
DBT Infrastructure

Cookiecutter DBT Framework

Built enterprise DBT framework with cookiecutter templates and modular architecture. Smart CI/CD detects changes and builds only what's necessary. Harbor registry integration, Vault secrets, and automated Airflow deployment.

70%
Build time saved
15+
Active projects
Real-time Streaming

Kafka Avro Pipeline

Designed real-time streaming architecture using Kafka with Avro serialization. Event-driven data ingestion with schema evolution support. Integrated with Snowflake for low-latency analytics.

<2min
End-to-end latency
100M+
Events/day
Data Orchestration

Dynamic DAG Generator

Created dynamic Airflow DAG system based on configurable metadata. Auto-generates Extract-Transform-Load workflows from YAML configs. Enables rapid onboarding of new data sources without code changes.

10x
Faster onboarding
50+
Production DAGs
Data Validation

Pandera Schema Validator

Implemented automated data quality framework using Pandera for schema validation, type checking, and constraint enforcement. Integrated with ETL pipeline for fail-fast data integrity checks.

100%
Schema coverage
5TB+
Validated data
CI/CD Automation

GitLab Harbor Pipeline

Architected complete CI/CD pipeline with GitLab for automated testing, Docker image builds, and Harbor registry deployment. Changed projects auto-detected, built, and deployed to Airflow QA/Prod environments.

Zero
Manual deploys
15min
Deploy time

My Journey

From soldier to Data Architect

2008

Argentine Army

Soldier • Systems & Networks

Military service teaching discipline, strategy under pressure, and systematic problem-solving. First exposure to networks and security protocols.

2011

University + First Lines of Code

Computer Systems Engineering

Started Computer Systems Engineering degree. First Python programs. Discovered passion for automation and data structures.

2015

Professional Developer

Software Engineer

Rapid growth from junior to senior developer. Built integrations, databases, and full-stack applications. Mastered Python, Flask, Django, and JavaScript.

2017

Technical Leadership

CTO & Tech Lead

Led technical teams, made architectural decisions, and drove product roadmaps. Vue.js, Node.js, AWS infrastructure, and team mentoring.

2020

Data Engineering Specialist

Engineering Team Lead

Led complete ETL system redesign. Architected pipelines with Airflow, Kafka, Snowflake. 40+ DAGs migrated, 100M+ events/day. Built DBT framework from scratch.

2025

Data Platform Architect

Present • Building the Future

Designing enterprise data platforms. Modular architectures, real-time streaming, CI/CD automation. Sharing knowledge through mentorship and writing.

Plan
Code
Test
Deploy
Build
Ship
Data
Value

How I Work

My workflow: From problem to production

01
STEP

ENTENDER

Inmersión profunda. Pregunto "¿por qué?" 5 veces. Entiendo restricciones e impacto.

02
STEP

DISEÑAR

Arquitectura primero. Modular, escalable, mantenible. Sistemas que duran años.

03
STEP

CONSTRUIR

Código limpio con tests. CI/CD desde día uno. Lanzo MVPs rápido, itero.

04
STEP

OPTIMIZAR

Mido todo. Trackeo métricas. Nunca dejo de mejorar. Decisiones basadas en datos.

Achievements

Impact delivered, without the corporate badges

Led ETL 2.0 Migration

Architected and led complete redesign of enterprise ETL system. 40+ legacy DAGs migrated to modern, modular architecture.

Built Team from Ground Up

Grew and mentored engineering team of 5+ developers. Established best practices, code standards, and CI/CD workflows.

40% Performance Improvement

Optimized extraction processes through batch optimization, multiprocessing, and smart scheduling. Reduced runtime by 40%.

5+ Billion Events Processed

Designed pipelines processing 100M+ events daily. 5TB+ of data validated and transformed with zero data loss.

Built DBT Framework

Created enterprise DBT framework with cookiecutter templates, smart CI/CD, and automated deployment. 15+ active projects.

Published "Que No Te Boludeen"

Authored 248-page book on manipulation detection and strategic thinking. Applied intelligence and psychology techniques to everyday life.

How I Code

Real Python from my ETL architecture

extract.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
ETL Pipeline - High-Performance Parallel Extraction
Production-grade pipeline with multiprocessing.
Achieves 40% performance improvement vs sequential.
"""
import multiprocessing
from typing import Dict, List, Optional, Any
from datetime import datetime, timedelta
from helpers import run_parallel, Timer, get_logger
from connectors import ETLdb, KafkaConsumer, AthenaClient
# Constants
LOOKBACK_DAYS = 7
KAFKA_TIMEOUT_MS = 5000
EVENTS_TOPIC = "events-topic"
logger = get_logger(__name__)
def extract_snowflake(results: Dict[str, Any]) -> None:
"""
Extract events from Snowflake DWH (last 7 days).
Args:
results: Dict updated with 'snowflake_data' key.
"""
with Timer("Snowflake extraction", logger.info):
try:
with ETLdb(database="snowflake") as db:
query = """
SELECT
event_id,
event_type,
user_id,
event_timestamp,
properties
FROM raw_events
WHERE event_date >= DATEADD(day, -7, CURRENT_DATE())
AND is_valid = TRUE
ORDER BY event_timestamp DESC
"""
data = db.extract_query(query)
results['snowflake_data'] = data
logger.info(f"✓ Extracted {len(data):,} rows from Snowflake")
except Exception as e:
logger.error(f"✗ Snowflake extraction failed: {e}")
results['snowflake_data'] = []
def extract_kafka(results: Dict[str, Any]) -> None:
"""
Extract real-time events from Kafka with Avro.
Args:
results: Dict updated with 'kafka_events' key.
"""
consumer = KafkaConsumer(
topic=EVENTS_TOPIC,
avro_schema="event_schema_v2",
auto_offset_reset="latest"
)
events = consumer.poll(timeout_ms=KAFKA_TIMEOUT_MS)
results['kafka_events'] = events
logger.info(f"✓ Consumed {len(events):,} Kafka events")
def extract_data() -> Dict[str, Any]:
"""
Orchestrate parallel data extraction from all sources.
Executes multiple extraction functions concurrently using
multiprocessing. Delivers 40% performance gain vs sequential.
Returns:
Dictionary containing extracted data from all sources:
- 'snowflake_data': Historical events from DWH
- 'kafka_events': Real-time streaming events
- 'athena_metrics': Aggregated metrics from S3
"""
logger.info("Starting parallel extraction pipeline...")
with Timer("Total extraction", logger.info):
results = run_parallel(
extractors=[
extract_snowflake,
extract_kafka,
extract_athena,
],
max_workers=multiprocessing.cpu_count()
)
logger.info("✓ Pipeline completed successfully")
return results

Beyond the code

Discipline, strategy and resilience applied to everything I do

Taekwondo

Black Belt • 15+ years

Years of training taught me that mastery is built through daily repetition, strategic thinking under pressure, and the discipline to keep improving even when progress seems invisible.

Strategic thinking
Mental resilience
Focus and discipline
Continuous improvement

Show Jumping

Equestrian • 10+ years

Working with horses demands absolute patience, subtle communication, and the ability to remain calm in unpredictable situations. Every ride is a lesson in trust and precision.

Patience and timing
Clear communication
Composure under pressure
Precision and control

Philosophical Development

Fraternal Organizations • 8+ years

Member of fraternal organizations dedicated to personal growth, ethical principles, and the pursuit of knowledge. These spaces taught me the value of lifelong learning, critical thinking, and service to others.

Continuous learning
Ethical decision-making
Critical thinking
Service mindset
★★★★★
"Por fin un libro que va al grano. Herramientas reales que uso todos los días."
— Roberto M.
★★★★★
"Lo leí de un tirón. Las técnicas me abrieron los ojos."
— Laura G.
★★★★★
"El manejo del silencio cambió mi forma de comunicarme."
— Martín P.
★★★★★
"No es teoría de autoayuda, son herramientas prácticas."
— Roberto M.
★★★★★
"Los ejemplos prácticos me abrieron los ojos."
— Laura G.
★★★★★
"Ya detecté dos maniobras que antes me pasaban desapercibidas."
— Roberto M.
★★★★★
"El enfoque sobre operaciones cotidianas es brillante."
— Martín P.
★★★★★
"Las técnicas de lectura del terreno son invaluables."
— Laura G.

Que No Te Boludeen

Practical manual to detect manipulations and defend yourself in everyday life. Intelligence, psychology and strategy techniques applied to daily life.

978-631-6397-74-9 | Libella Publishing | 248 pages

View the book
Que No Te Boludeen
LIB-2025-ARG-001

Let's build something together

Data project? Architecture consulting?

Connect on LinkedIn

Let's discuss your next data architecture challenge

Connect