Jan Jacek Wejchert

MSc Business Analytics & Data Science Student

Working at the intersection of data, analytics, and software

Madrid, Spain
Open to opportunities in Europe

About Me

I am an ambitious student raised in Warsaw, Poland, with a long-standing passion for mathematics and analytical thinking. Currently pursuing a Master's degree in Business Analytics and Data Science, I combine quantitative reasoning with programming to solve real-world problems.

Technical Skills

Technologies and tools I work with

Programming & Analytical Languages

Python SQL R (RStudio) Stata Mathematica

Tools & Environments

Jupyter Notebook PyCharm GitHub VS Code RStudio SQL development environments

Databases & Storage

Relational databases (DB2, MySQL) MongoDB HDFS & object storage (S3) Data lakes

Data Analysis & Modeling

Data cleaning & preparation Exploratory data analysis Time series analysis Forecasting

Experience

Professional experience and career highlights

Brevan Howard Intern

Global Macro Hedge Fund

Intern at Brevan Howard, gaining hands-on exposure to global macro trading, portfolio construction, and risk frameworks while supporting portfolio managers with analytical tools and proprietary financial modeling.

2024 Summer
London Abu Dhabi

Passion Capital Intern

Early-stage venture capital firm

Intern at Passion Capital, contributing to early-stage venture capital sourcing and due diligence through startup analysis, founder meetings, and investment research across AI and fintech.

2023 Summer
London

Education

My academic background

Master of Science in Business Analytics and Data Science

IE School of Science and Technology, Madrid, Spain

Running GPA: 3,92 out of 4

2025 - 2026

Bachelor of Science in Economics

University of St Andrews, St Andrews, Scotland

Graduated with Honours of the Second Class (Division l)

2021 - 2025

International A Levels

Akademeia High School, Warsaw, Poland

Economics, Mathematics, Further Mathematics, Polish (A*, A*, A*, A)

2019 - 2021

Get In Touch

Let's connect and create something amazing together

Projects

A showcase of my recent work and side projects

Data Analysis

F1 Data Analysis Project

Comprehensive analysis of Formula 1 historical race data to identify drivers with exceptional position-gaining performance. Utilized Python and Pandas to process race results, implement scoring algorithms, and conduct decade-based comparative analysis.

Data Analysis Python Pandas Statistical Analysis
Time Series

Time Series & Forecasting Project

Applied time series analysis using real-world data to explore trends, seasonality, and forecasting performance. Implemented and evaluated classical forecasting methods including moving averages and exponential smoothing.

Python Time Series Analysis Forecasting Data Visualization Model Evaluation
Data Pipeline

Earthquake Data Pipeline & Analysis

End-to-end data pipeline that ingests live earthquake data, stores it in object storage, and analyzes it using Apache Spark. Demonstrates realistic data workflows and scalable analytics.

Apache NiFi MinIO (S3) Apache Spark
Economic Modeling

Solving a Growth Model Using Shooting and Genetic Algorithms

Solves a deterministic neoclassical growth model using two numerical approaches: shooting algorithm and genetic algorithm. Compares convergence, stability, and behavior of classical optimization methods versus evolutionary algorithms.

Mathematica Shooting Algorithm Genetic Algorithm Economic Modeling Numerical Optimization
Optimization

Graph Optimization with Dynamic Programming

Complete shortest-path solver in Mathematica using dynamic programming and Bellman iteration. Constructs distance matrices, computes optimal cost-to-go functions, and recovers optimal paths with minimum total cost.

Mathematica Dynamic Programming Bellman Iteration Graph Optimization Shortest Path

Resume

Download or view my full resume

Download PDF

Jan Jacek Wejchert

MSc Business Analytics & Data Science Student

Professional Summary

Ambitious MSc student in Business Analytics and Data Science with a strong foundation in mathematics, economics, and programming. Passionate about working at the intersection of data, analytics, and software to solve complex business problems. Currently developing expertise in data analysis, modern data architectures, and coding through rigorous academic coursework. Highly motivated to apply analytical thinking and technical skills in a professional setting, with a commitment to continuous learning and collaborative problem-solving.

Education

Master of Science in Business Analytics and Data Science

IE School of Science and Technology, Madrid, Spain

2025 - 2026

Running GPA: 3.92 out of 4

Bachelor of Science in Economics

University of St Andrews, St Andrews, Scotland

2021 - 2025

Graduated with Honours of the Second Class (Division I)

International A Levels

Akademeia High School, Warsaw, Poland

2019 - 2021

Economics, Mathematics, Further Mathematics, Polish (A*, A*, A*, A)

Technical Skills

Programming & Analytical Languages

Python, SQL, R (RStudio), Stata, Mathematica

Tools & Environments

Jupyter Notebook, PyCharm, GitHub, VS Code, RStudio, SQL development environments

Databases & Storage

Relational databases (DB2, MySQL), MongoDB, HDFS & object storage (S3), Data lakes

Data Analysis & Modeling

Data cleaning & preparation, Exploratory data analysis, Time series analysis, Forecasting

Academic Works

Research papers, academic projects, and scholarly contributions

Academic Work Title

Work subtitle and description

The Comeback King: F1's Greatest Position-Gainer

A Python data analysis project exploring F1 driver comeback performance

Download Presentation

Project Overview

This project analyzes Formula 1 historical data to identify the greatest "comeback driver" in F1 history - the driver who gained the most positions during races across their career. Using a multi-category scoring system and a decade-based knockout competition, we crowned Sebastian Vettel as the ultimate Comeback King.

Language
Python
Tools
Pandas, Jupyter Notebook
Winner
Sebastian Vettel

Methodology

Categories for Evaluation:

1.
Average positions gained per race - includes dropped positions
2.
Total positions gained in all races - includes dropped positions
3.
Record positions gained within one race
4.
Circuits with highest average positions gained
5.
Circuit records for most positions gained
6.
Comeback Rate - percentage of races with positive position gain

Scoring System:

In each category, points were awarded to top 3 drivers:

  • 1st place: 3 points
  • 2nd place: 2 points
  • 3rd place: 1 point
  • Ties: All tied drivers receive points for that position

Competition Structure:

Drivers were grouped by decade (1950s-2020s), with decade winners advancing through knockout rounds until a final champion was crowned.

Python Code

1. Filter drivers with at least 24 races (1 season)

f1new = f1.groupby("driver")[["grid_starting_position"]].count().reset_index()
f1_group = f1new[f1new["grid_starting_position"] >= 24]["driver"]
f1_filtered = f1[f1["driver"].isin(f1_group)]

2. Calculate positions gained and filter by decade

race_finishers = f1_filtered[~f1_filtered["final_position"].isna()].copy()
race_finishers["positions_gained"] = race_finishers["grid_starting_position"] - race_finishers["final_position"]
race_finishers = race_finishers[(race_finishers["year"] >= 1960) & (race_finishers["year"] < 1970)]

3. Set up scoring system

from collections import defaultdict

driver_points = defaultdict(int)  # driver -> total points
rank_to_points = {1: 3, 2: 2, 3: 1}

4. Function to add points from race results

def add_points_from_series(ser, points_dict):
    current_rank = 0
    last_value = object()  # something that can't equal a real value
    for driver, value in ser.items():
        # new distinct value -> new place (1st, 2nd, 3rd, ...)
        if value != last_value:
            current_rank += 1
            last_value = value
        # only 1st/2nd/3rd place get points
        if current_rank > 3:
            break
        points_dict[driver] += rank_to_points[current_rank]

5. Evaluate all categories

# Category 1: Average positions gained per race
s1 = race_finishers.groupby("driver")["positions_gained"].mean().sort_values(ascending=False).head()
add_points_from_series(s1, driver_points)

# Category 2: Total positions gained in all races
s2 = race_finishers.groupby("driver")["positions_gained"].count().sort_values(ascending=False).head()
add_points_from_series(s2, driver_points)

# Category 3: Record positions gained within one race
s3 = race_finishers.groupby("driver")["positions_gained"].max().sort_values(ascending=False)
add_points_from_series(s3, driver_points)

# Category 4: Circuits with highest average positions gained
avg_gains = race_finishers.groupby(["circuit_name", "driver"])["positions_gained"].mean().reset_index()
best_avg = avg_gains.groupby("circuit_name")["positions_gained"].max().reset_index().rename(
    columns={"positions_gained": "max_avg_positions_gained"})
result = avg_gains.merge(best_avg, on="circuit_name")
result = result[result["positions_gained"] == result["max_avg_positions_gained"]]
s4 = result["driver"].value_counts().head(20)
add_points_from_series(s4, driver_points)

# Category 5: Circuit records for most positions gained
max_gains = race_finishers.groupby("circuit_name")["positions_gained"].max().reset_index().rename(
    columns={"positions_gained": "max_positions_gained"})
result = race_finishers.merge(max_gains, on="circuit_name")
result = result[result["positions_gained"] == result["max_positions_gained"]]
s5 = result["driver"].value_counts().head(10)
add_points_from_series(s5, driver_points)

# Category 6: Comeback Rate (percentage of races with positive position gain)
comeback_rate = race_finishers.assign(
    comeback = race_finishers["positions_gained"] > 0
).groupby("driver")["comeback"].mean() * 100
s6 = comeback_rate.sort_values(ascending=False).head(10)
add_points_from_series(s6, driver_points)

6. Determine decade winner

champion = dict(driver_points)
print(champion)

max_value = max(champion.values())
keys_with_max_value = [k for k, v in champion.items() if v == max_value]
print(keys_with_max_value)

Competition Results

Decade Champions

1950s
Johnny Claes
1960s
Carel Godin de Beaufort
1970s
Hector Rebaque
1980s
Marc Surer
1990s
Alex Caffi
2000s
Tarso Marques
2010s
Sebastian Vettel
2020s
Max Verstappen

Final Winner

🏆 Sebastian Vettel 🏆
The Comeback King

Time Series Analysis & Forecasting (COâ‚‚ Concentration Data)

An applied time series analysis project exploring trends, seasonality, and forecasting performance

Overview

This project focuses on the analysis and forecasting of atmospheric COâ‚‚ concentration levels using historical time series data. The objective was to identify long-term trends and seasonal patterns in the data, and to evaluate the performance of classical forecasting methods on a real-world environmental dataset.

Data & Context

The analysis uses monthly COâ‚‚ concentration data, covering several decades, allowing for clear observation of both long-term upward trends and recurring seasonal fluctuations. The dataset was cleaned, structured, and indexed as a time series to enable proper temporal analysis.

Analysis & Methodology

The project followed a structured time series workflow:

1

Exploratory Analysis

Visualization of long-term trends and seasonal behavior

2

Time Series Decomposition

Breaking down the series into trend, seasonal, and residual components

3

Baseline Smoothing Methods

Implementation of smoothing techniques to reduce noise and capture underlying dynamics

Forecasting Techniques Implemented

Simple Moving Averages

To smooth short-term volatility and identify underlying patterns

Exponential Smoothing

To assign greater weight to recent observations for more responsive forecasts

Parameter Comparison

Comparison of forecasts across different smoothing parameters

Out-of-Sample Evaluation

Using train/test splits to assess predictive accuracy

Key Findings

Strong Upward Trend & Seasonality

The COâ‚‚ series exhibits a strong, persistent upward trend alongside clear seasonal cycles

Moving Averages Performance

Moving averages effectively smooth noise but lag during periods of rapid change

Exponential Smoothing Advantages

Exponential smoothing provides more responsive forecasts and better short-term performance

Trade-off Analysis

Model choice involves a clear trade-off between stability and adaptability

Tools & Technologies

Python Pandas Time Series Analysis Libraries Data Visualization

Skills Demonstrated

Time series structuring and indexing
Trend and seasonality analysis
Forecasting and model evaluation
Analytical interpretation of temporal data

Graph Optimization with Dynamic Programming

A complete shortest-path solver in Mathematica using dynamic programming and Bellman iteration

Overview

In this project, I built a complete shortest-path solver in Mathematica using dynamic programming and Bellman iteration. Starting from raw graph edge data, the workflow constructs a distance matrix, iteratively computes a cost-to-go function, and then recovers the optimal path and its total cost from a chosen start node to the destination.

Example Output

Optimal Path: {17, 23, 33, 41, 53, 56, 57, 60, 67, 70, 73, 76, 85, 89, 99}

Minimum Cost: 194.22

Problem Statement

Given a directed weighted graph (nodes + edges + weights), the goal is to:

  1. Convert the graph representation into a distance matrix Q, assigning Infinity to non-connected node pairs.
  2. Use Bellman's operator to compute the optimal cost-to-go J for each node.
  3. Use Q and J to reconstruct the cheapest path and its total cost.

Implementation Details

1) Data Import & Distance Matrix Construction (dataToMatrix)

I implemented a module that reads graph data from a text-based input file, parses each line into (source, destination, weight) connections, and constructs a full distance matrix Q where:

  • Q[i, j] = weight if an edge exists
  • Q[i, j] = Infinity if nodes are not connected
  • The destination node has a diagonal value of 0 to act as the terminal condition

Key Features:

  • • Robust parsing of node identifiers (e.g., stripping the "node" prefix)
  • • Defensive checks for malformed rows and failed imports
  • • Correct handling of 1-based indexing in Mathematica while working with 0-based node labels

2) Bellman Operator Update (bellmanIteration)

I implemented the Bellman update step as a vector operation over nodes. For each node v, compute:

Jn+1(v) = minw(Q(v, w) + Jn(w))

The implementation explicitly handles missing edges by treating Infinity weights as invalid transitions. The output of this step is a new cost-to-go vector.

3) Convergence to Final Cost-to-Go (findCostToGo)

This module repeatedly applies the Bellman operator until convergence:

Max Iterations
500
Tolerance
10-6

To avoid instability from unreachable nodes, the convergence check ignores Infinity values when computing the norm difference. The output is the final cost-to-go vector where J[node] represents the minimum cost required to reach the destination node from that node.

4) Optimal Path Recovery (findPathAndTotalCost)

Once Q and J are computed, I reconstruct the optimal path from a chosen start node by repeatedly selecting the next node that minimizes:

Q(current, w) + J(w)

This yields the sequence of nodes visited and the total accumulated cost along the path. The module prints:

  • Optimal Path
  • Minimum Cost

Results

Using the provided test graph data, the implementation produced:

Optimal Path: {17, 23, 33, 41, 53, 56, 57, 60, 67, 70, 73, 76, 85, 89, 99}
Minimum Cost: 194.22

This confirms the algorithm correctly computes both the optimal policy (via J) and the associated optimal route (via greedy recovery using J).

Why This Project Matters

This project demonstrates the ability to:

Translate algorithmic theory into working code
Implement dynamic programming and iterative optimization methods
Handle real input parsing and edge cases (missing connections, indexing, Infinity handling)
Build an end-to-end solution that outputs interpretable results

Real-World Applications

Routing problems (transport, logistics) Network optimization Planning and decision-making under costs

Tools & Skills

Tools

Mathematica

Skills

Dynamic Programming Bellman Iteration Graph Optimization Shortest Path Data Parsing Algorithmic Implementation

Code Structure (Modules)

dataToMatrix[filePath]

Import graph & build distance matrix Q

bellmanIteration[Q, Jn]

Compute Bellman update Jn+1

findCostToGo[Q, maxIter, tol]

Iterate until convergence to final J

findPathAndTotalCost[Q, J, startNode]

Recover optimal path + total cost

Solving a Growth Model Using Shooting and Genetic Algorithms

A comparison of classical optimization methods versus evolutionary algorithms in solving dynamic economic models

Overview

This project solves a deterministic neoclassical growth model using two fundamentally different numerical approaches: a shooting algorithm and a genetic algorithm. The objective is to compute the transition path of capital from an initial condition to its steady state and to compare the convergence, stability, and behavior of classical optimization methods versus evolutionary algorithms.

The project combines economic theory, numerical optimization, and computational experimentation, highlighting the trade-offs between structure-exploiting and search-based solution methods.

Model Framework

The underlying model is a standard infinite-horizon growth model with capital accumulation and Cobb–Douglas production. A representative household maximizes discounted utility subject to a budget constraint, while firms maximize profits in competitive markets.

Production Function

Yt = Ktα(ALt)1-α

Capital Evolution

Kt+1 = (1 - δ)Kt + It

Key Parameters

Capital Share (α)
0.33
Discount Factor (β)
0.98
Depreciation (δ)
0.1
Technology (A)
1
Initial Capital (Kâ‚€)
0.1
Time Horizon (T)
70

Analytical Foundations

Before implementing numerical solutions, the project:

Fully defines the competitive equilibrium

Derives the first-order conditions for households and firms

Obtains the Euler equation governing optimal consumption and capital accumulation

Solves analytically for the steady-state capital stock K*

This analytical groundwork ensures that numerical solutions can be evaluated against correct theoretical benchmarks.

Shooting Algorithm

Methodology

The shooting algorithm solves the model by exploiting the Euler equation directly. Starting from an initial guess for the capital path, the algorithm iterates forward and adjusts guesses until the terminal condition—convergence to the steady state—is satisfied.

Implementation Details:

  • • Solving a system of nonlinear equations using FindRoot
  • • Iterating capital forward over 70 periods
  • • Ensuring convergence to the analytically derived steady state

Results

The shooting algorithm produces a smooth and monotonic transition path for capital. Capital converges steadily toward the steady-state level, closely matching theoretical predictions. This method serves as a benchmark solution due to its precision and stability.

Genetic Algorithm

Motivation

To explore a model-agnostic alternative, the same growth model is solved using a genetic algorithm. Unlike the shooting method, the genetic algorithm does not directly impose the Euler equation. Instead, it searches over possible savings paths and evaluates them based on lifetime utility.

Fitness Function

A custom fitness function is defined to:

  • Simulate capital, output, consumption, and investment paths
  • Compute discounted lifetime utility
  • Penalize paths that fail to converge to the steady state after T periods

This ensures that only economically meaningful solutions achieve high fitness scores.

Genetic Algorithm Structure

The implementation includes all core evolutionary components:

Selection
Retains the top 50% of solutions by fitness
Parent Selection
Probabilistic selection weighted by fitness
Crossover
Binary encoding with random crossover points
Mutation
Bit-flipping with a 2.5% mutation rate
Population Size
80
Generations
1000+

The algorithm tracks both mean fitness and maximum fitness across generations, allowing analysis of convergence behavior.

Results and Comparison

Capital Stock Dynamics

The shooting algorithm generates a smooth and stable capital path that converges monotonically to the steady state. In contrast, the genetic algorithm produces a much noisier capital trajectory. While capital fluctuates significantly due to stochastic mutation and crossover, it still converges toward a level close to the steady state.

These fluctuations reflect the exploratory nature of genetic algorithms, which trade precision for flexibility and global search capability.

Savings Rate Behavior

The difference between the two methods is even more pronounced when examining savings rates.

Shooting Algorithm

Produces a smooth, declining savings rate consistent with optimal intertemporal behavior

Genetic Algorithm

Produces a highly volatile savings rate, though its average level aligns broadly with the shooting solution

This highlights a key trade-off: genetic algorithms can approximate optimal policies without explicit analytical conditions, but at the cost of short-run stability.

Key Takeaways

The shooting algorithm is highly efficient and precise when analytical structure is available.

The genetic algorithm provides a flexible, model-agnostic optimization framework.

Despite its stochastic nature, the genetic algorithm converges toward economically meaningful solutions.

Increasing population size, running more generations, or reducing mutation rates would likely improve stability and convergence.

Why This Project Matters

This project demonstrates the ability to:

Translate economic theory into computational solutions
Implement and compare fundamentally different optimization techniques
Design fitness functions and convergence criteria
Interpret numerical results through an economic lens

The comparison highlights the strengths and limitations of both classical numerical methods and evolutionary algorithms in dynamic optimization problems.

Tools & Skills

Tools

Mathematica

Skills

Shooting Algorithm Genetic Algorithm Economic Modeling Numerical Optimization Dynamic Programming Computational Economics

Earthquake Data Pipeline & Analysis

End-to-end data pipeline: NiFi → MinIO → Spark

Overview

This project builds an end-to-end data pipeline that ingests live earthquake data, stores it in object storage, and analyzes it using Apache Spark. The goal is to simulate a realistic data workflow and extract meaningful insights from continuously updated, real-world data.

Pipeline Structure

NiFi → MinIO → Spark Notebook

Data Ingestion (Apache NiFi)

Apache NiFi is used to automate the ingestion of earthquake data from the USGS Earthquake API.

Two NiFi Flows

JSON Flow

Ingests earthquake data in JSON format

CSV Flow

Ingests earthquake data in CSV format

Flow Capabilities:

  • • Pulls earthquake data on a schedule
  • • Processes and splits incoming records
  • • Adds timestamps and metadata
  • • Writes results to object storage

To make the project reproducible, I provide two NiFi flow files (JSON) that can be imported directly into NiFi to recreate the pipelines.

Data Storage (MinIO – S3 Compatible)

All ingested data is stored in MinIO, an S3-compatible object storage system used as a lightweight data lake.

Decoupled Architecture

MinIO allows the ingestion and analytics layers to be fully decoupled:

  • • NiFi focuses only on ingestion
  • • Spark reads data directly from storage for analysis
  • • Both JSON and CSV datasets are stored in a structured and consistent way

Data Processing & Analysis (Apache Spark)

All analysis is performed in a Spark notebook using the DataFrames API.

Notebook Workflow:

  • • Reads earthquake data directly from MinIO
  • • Converts raw files into Spark DataFrames
  • • Cleans and structures the data
  • • Performs scalable exploratory analysis using Spark transformations and SQL

What the Analysis Reveals

Using Spark, the notebook extracts several key insights from the earthquake data:

Earthquake Activity is Highly Skewed

Most recorded earthquakes have low magnitudes, while high-magnitude events are relatively rare. This becomes clear when aggregating and visualizing magnitude distributions.

Clear Temporal Patterns Emerge

Aggregations over time show that earthquake occurrences are not evenly distributed. Certain periods exhibit clusters of increased activity, highlighting the importance of time-based analysis rather than static summaries.

Geographical Concentration of Events

Grouping events by location reveals that earthquakes are concentrated in specific regions, consistent with known tectonic boundaries. Spark makes it easy to aggregate and compare activity across regions at scale.

Magnitude vs Frequency Trade-off

While smaller earthquakes occur frequently, larger earthquakes contribute disproportionately to overall seismic risk. This contrast is visible when comparing frequency counts with magnitude-weighted summaries.

Scalability of Analysis

Using Spark DataFrames allows these insights to be computed efficiently even as the dataset grows, reinforcing why distributed processing is well-suited for this type of continuously updating data.

Why This Project Is Useful

This project demonstrates:

How to ingest live external data
How to design clean and reproducible ingestion pipelines
How to use object storage as a data lake
How to extract insights from large datasets using Spark

It shows practical skills across data engineering and data analytics, rather than isolated scripts or toy examples.

Tools & Technologies

Apache NiFi MinIO (S3) Apache Spark Spark DataFrames Spark SQL

Files Provided

To fully reproduce the project:

NiFi Flow for JSON Data Ingestion

Import this flow file into NiFi to recreate the JSON ingestion pipeline

NiFi Flow for CSV Data Ingestion

Import this flow file into NiFi to recreate the CSV ingestion pipeline

Spark Notebook

Contains the full analysis with data processing and insights extraction