Jan Jacek Wejchert

MSc Business Analytics & Data Science Student

Working at the intersection of data, analytics, and software

Madrid, Spain

Open to opportunities in Europe

About Me

I am an ambitious student raised in Warsaw, Poland, with a long-standing passion for mathematics and analytical thinking. Currently pursuing a Master's degree in Business Analytics and Data Science, I combine quantitative reasoning with programming to solve real-world problems.

I am an ambitious student raised in Warsaw, Poland, with a long-standing passion for mathematics and analytical thinking. From an early stage, mathematics stood out to me as the most fundamental discipline for understanding the world around us, which led me to pursue Mathematics and Further Mathematics at A-level. I strongly believe that mathematical thinking provides one of the most solid foundations for problem-solving across any field.

Alongside mathematics, I began exploring coding through summer schools, where I was first exposed to the creative and logical aspects of programming. This early experience sparked a growing interest that would later become a central part of my academic and professional direction.

To build a broad and rigorous foundation, I chose to pursue an undergraduate degree in Economics at the University of St Andrews. I viewed economics as a strong baseline discipline - one that combines quantitative reasoning with real-world decision-making and general business knowledge. During my time at St Andrews, I continued to deepen my mathematical background by taking multiple mathematics modules in my first and second years, while also exploring other areas such as philosophy, which helped me develop critical and abstract thinking.

It was during university that coding truly captured my attention. Through coursework and projects, I found myself genuinely enjoying spending hours working through programming challenges and building solutions - a clear signal that this was an area I wanted to pursue more seriously.

At that point, I set out to find a path that combined my three core interests: mathematics, economics, and programming. This led me to pursue a Master's degree in Business Analytics and Data Science - a decision that has proven to be exactly the right one. I am currently completing this degree, and I find the work both challenging and deeply engaging. For the first time, I feel I have a clear and coherent direction for my foreseeable future.

Through this program, I am developing strong skills in coding, data analysis, and modern data architectures, and I am highly motivated to continue expanding this knowledge. I am excited to find an opportunity where I can apply these skills in practice, contribute meaningfully, and demonstrate the value I can bring in a professional setting.

Outside of academics and technology, sport plays a central role in my life. It is essential to my mental well-being and one of my favourite ways to connect with others through shared passion and competition. While studying at St Andrews, I was fortunate to have access to the Old Course, allowing me to play golf regularly, and I was also part of a competitive tennis team representing the university against other institutions across Scotland - an experience I found both rewarding and formative.

Overall, I consider myself a highly motivated and compassionate individual. When I discover something that genuinely interests me, I commit to it fully and with intensity. I work well in collaborative environments, value teamwork, and always aim to contribute meaningfully to group efforts.

Technical Skills

Technologies and tools I work with

Programming & Analytical Languages

Python SQL R (RStudio) Stata Mathematica

Tools & Environments

Jupyter Notebook PyCharm GitHub VS Code RStudio SQL development environments

Databases & Storage

Relational databases (DB2, MySQL) MongoDB HDFS & object storage (S3) Data lakes

Data Analysis & Modeling

Data cleaning & preparation Exploratory data analysis Time series analysis Forecasting

Experience

Professional experience and career highlights

Brevan Howard Intern

Global Macro Hedge Fund

Intern at Brevan Howard, gaining hands-on exposure to global macro trading, portfolio construction, and risk frameworks while supporting portfolio managers with analytical tools and proprietary financial modeling.

2024 Summer

London Abu Dhabi

Passion Capital Intern

Early-stage venture capital firm

Intern at Passion Capital, contributing to early-stage venture capital sourcing and due diligence through startup analysis, founder meetings, and investment research across AI and fintech.

2023 Summer

London

Education

My academic background

Master of Science in Business Analytics and Data Science

IE School of Science and Technology, Madrid, Spain

Running GPA: 3,92 out of 4

2025 - 2026

Bachelor of Science in Economics

University of St Andrews, St Andrews, Scotland

Graduated with Honours of the Second Class (Division l)

2021 - 2025

International A Levels

Akademeia High School, Warsaw, Poland

Economics, Mathematics, Further Mathematics, Polish (A*, A*, A*, A)

2019 - 2021

Get In Touch

Let's connect and create something amazing together

Email

jan.wejchert@student.ie.edu

GitHub

github.com/janwej

LinkedIn

linkedin.com/in/jan-wejchert

Projects

A showcase of my recent work and side projects

Data Analysis

F1 Data Analysis Project

Comprehensive analysis of Formula 1 historical race data to identify drivers with exceptional position-gaining performance. Utilized Python and Pandas to process race results, implement scoring algorithms, and conduct decade-based comparative analysis.

Data Analysis Python Pandas Statistical Analysis

Time Series

Time Series & Forecasting Project

Applied time series analysis using real-world data to explore trends, seasonality, and forecasting performance. Implemented and evaluated classical forecasting methods including moving averages and exponential smoothing.

Python Time Series Analysis Forecasting Data Visualization Model Evaluation

Data Pipeline

Earthquake Data Pipeline & Analysis

End-to-end data pipeline that ingests live earthquake data, stores it in object storage, and analyzes it using Apache Spark. Demonstrates realistic data workflows and scalable analytics.

Apache NiFi MinIO (S3) Apache Spark

Economic Modeling

Solving a Growth Model Using Shooting and Genetic Algorithms

Solves a deterministic neoclassical growth model using two numerical approaches: shooting algorithm and genetic algorithm. Compares convergence, stability, and behavior of classical optimization methods versus evolutionary algorithms.

Mathematica Shooting Algorithm Genetic Algorithm Economic Modeling Numerical Optimization

Optimization

Graph Optimization with Dynamic Programming

Complete shortest-path solver in Mathematica using dynamic programming and Bellman iteration. Constructs distance matrices, computes optimal cost-to-go functions, and recovers optimal paths with minimum total cost.

Mathematica Dynamic Programming Bellman Iteration Graph Optimization Shortest Path

Resume

Download or view my full resume

Download PDF

Jan Jacek Wejchert

MSc Business Analytics & Data Science Student

jan.wejchert@student.ie.edu

Madrid, Spain

github.com/janwej LinkedIn

Professional Summary

Ambitious MSc student in Business Analytics and Data Science with a strong foundation in mathematics, economics, and programming. Passionate about working at the intersection of data, analytics, and software to solve complex business problems. Currently developing expertise in data analysis, modern data architectures, and coding through rigorous academic coursework. Highly motivated to apply analytical thinking and technical skills in a professional setting, with a commitment to continuous learning and collaborative problem-solving.

Education

Master of Science in Business Analytics and Data Science

IE School of Science and Technology, Madrid, Spain

2025 - 2026

Running GPA: 3.92 out of 4

Bachelor of Science in Economics

University of St Andrews, St Andrews, Scotland

2021 - 2025

Graduated with Honours of the Second Class (Division I)

International A Levels

Akademeia High School, Warsaw, Poland

2019 - 2021

Economics, Mathematics, Further Mathematics, Polish (A*, A*, A*, A)

Technical Skills

Programming & Analytical Languages

Python, SQL, R (RStudio), Stata, Mathematica

Tools & Environments

Jupyter Notebook, PyCharm, GitHub, VS Code, RStudio, SQL development environments

Databases & Storage

Relational databases (DB2, MySQL), MongoDB, HDFS & object storage (S3), Data lakes

Data Analysis & Modeling

Data cleaning & preparation, Exploratory data analysis, Time series analysis, Forecasting

Academic Works

Research papers, academic projects, and scholarly contributions

Academic Work Title

Work subtitle and description

The Comeback King: F1's Greatest Position-Gainer

A Python data analysis project exploring F1 driver comeback performance

Download Presentation

Project Overview

This project analyzes Formula 1 historical data to identify the greatest "comeback driver" in F1 history - the driver who gained the most positions during races across their career. Using a multi-category scoring system and a decade-based knockout competition, we crowned Sebastian Vettel as the ultimate Comeback King.

Language

Python

Tools

Pandas, Jupyter Notebook

Winner

Sebastian Vettel

Methodology

Categories for Evaluation:

1.

Average positions gained per race - includes dropped positions

2.

Total positions gained in all races - includes dropped positions

3.

Record positions gained within one race

4.

Circuits with highest average positions gained

5.

Circuit records for most positions gained

6.

Comeback Rate - percentage of races with positive position gain

Scoring System:

In each category, points were awarded to top 3 drivers:

1st place: 3 points
2nd place: 2 points
3rd place: 1 point
Ties: All tied drivers receive points for that position

Competition Structure:

Drivers were grouped by decade (1950s-2020s), with decade winners advancing through knockout rounds until a final champion was crowned.

Python Code

1. Filter drivers with at least 24 races (1 season)

f1new = f1.groupby("driver")[["grid_starting_position"]].count().reset_index()
f1_group = f1new[f1new["grid_starting_position"] >= 24]["driver"]
f1_filtered = f1[f1["driver"].isin(f1_group)]

2. Calculate positions gained and filter by decade

race_finishers = f1_filtered[~f1_filtered["final_position"].isna()].copy()
race_finishers["positions_gained"] = race_finishers["grid_starting_position"] - race_finishers["final_position"]
race_finishers = race_finishers[(race_finishers["year"] >= 1960) & (race_finishers["year"] < 1970)]

3. Set up scoring system

from collections import defaultdict

driver_points = defaultdict(int)  # driver -> total points
rank_to_points = {1: 3, 2: 2, 3: 1}

4. Function to add points from race results

def add_points_from_series(ser, points_dict):
    current_rank = 0
    last_value = object()  # something that can't equal a real value
    for driver, value in ser.items():
        # new distinct value -> new place (1st, 2nd, 3rd, ...)
        if value != last_value:
            current_rank += 1
            last_value = value
        # only 1st/2nd/3rd place get points
        if current_rank > 3:
            break
        points_dict[driver] += rank_to_points[current_rank]

5. Evaluate all categories

# Category 1: Average positions gained per race
s1 = race_finishers.groupby("driver")["positions_gained"].mean().sort_values(ascending=False).head()
add_points_from_series(s1, driver_points)

# Category 2: Total positions gained in all races
s2 = race_finishers.groupby("driver")["positions_gained"].count().sort_values(ascending=False).head()
add_points_from_series(s2, driver_points)

# Category 3: Record positions gained within one race
s3 = race_finishers.groupby("driver")["positions_gained"].max().sort_values(ascending=False)
add_points_from_series(s3, driver_points)

# Category 4: Circuits with highest average positions gained
avg_gains = race_finishers.groupby(["circuit_name", "driver"])["positions_gained"].mean().reset_index()
best_avg = avg_gains.groupby("circuit_name")["positions_gained"].max().reset_index().rename(
    columns={"positions_gained": "max_avg_positions_gained"})
result = avg_gains.merge(best_avg, on="circuit_name")
result = result[result["positions_gained"] == result["max_avg_positions_gained"]]
s4 = result["driver"].value_counts().head(20)
add_points_from_series(s4, driver_points)

# Category 5: Circuit records for most positions gained
max_gains = race_finishers.groupby("circuit_name")["positions_gained"].max().reset_index().rename(
    columns={"positions_gained": "max_positions_gained"})
result = race_finishers.merge(max_gains, on="circuit_name")
result = result[result["positions_gained"] == result["max_positions_gained"]]
s5 = result["driver"].value_counts().head(10)
add_points_from_series(s5, driver_points)

# Category 6: Comeback Rate (percentage of races with positive position gain)
comeback_rate = race_finishers.assign(
    comeback = race_finishers["positions_gained"] > 0
).groupby("driver")["comeback"].mean() * 100
s6 = comeback_rate.sort_values(ascending=False).head(10)
add_points_from_series(s6, driver_points)

6. Determine decade winner

champion = dict(driver_points)
print(champion)

max_value = max(champion.values())
keys_with_max_value = [k for k, v in champion.items() if v == max_value]
print(keys_with_max_value)

Competition Results

Decade Champions

1950s

Johnny Claes

1960s

Carel Godin de Beaufort

1970s

Hector Rebaque

1980s

Marc Surer

1990s

Alex Caffi

2000s

Tarso Marques

2010s

Sebastian Vettel

2020s

Max Verstappen

Final Winner

🏆 Sebastian Vettel 🏆

The Comeback King

Time Series Analysis & Forecasting (CO₂ Concentration Data)

An applied time series analysis project exploring trends, seasonality, and forecasting performance

Download Notebook Download Data

Overview

This project focuses on the analysis and forecasting of atmospheric CO₂ concentration levels using historical time series data. The objective was to identify long-term trends and seasonal patterns in the data, and to evaluate the performance of classical forecasting methods on a real-world environmental dataset.

Data & Context

The analysis uses monthly CO₂ concentration data, covering several decades, allowing for clear observation of both long-term upward trends and recurring seasonal fluctuations. The dataset was cleaned, structured, and indexed as a time series to enable proper temporal analysis.

Analysis & Methodology

The project followed a structured time series workflow:

1

Exploratory Analysis

Visualization of long-term trends and seasonal behavior

2

Time Series Decomposition

Breaking down the series into trend, seasonal, and residual components

3

Baseline Smoothing Methods

Implementation of smoothing techniques to reduce noise and capture underlying dynamics

Forecasting Techniques Implemented

Simple Moving Averages

To smooth short-term volatility and identify underlying patterns

Exponential Smoothing

To assign greater weight to recent observations for more responsive forecasts

Parameter Comparison

Comparison of forecasts across different smoothing parameters

Out-of-Sample Evaluation

Using train/test splits to assess predictive accuracy

Key Findings

Strong Upward Trend & Seasonality

The CO₂ series exhibits a strong, persistent upward trend alongside clear seasonal cycles

Moving Averages Performance

Moving averages effectively smooth noise but lag during periods of rapid change

Exponential Smoothing Advantages

Exponential smoothing provides more responsive forecasts and better short-term performance

Trade-off Analysis

Model choice involves a clear trade-off between stability and adaptability

Tools & Technologies

Python Pandas Time Series Analysis Libraries Data Visualization

Skills Demonstrated

Time series structuring and indexing

Trend and seasonality analysis

Forecasting and model evaluation

Analytical interpretation of temporal data

Graph Optimization with Dynamic Programming

A complete shortest-path solver in Mathematica using dynamic programming and Bellman iteration

Download Notebook

Overview

In this project, I built a complete shortest-path solver in Mathematica using dynamic programming and Bellman iteration. Starting from raw graph edge data, the workflow constructs a distance matrix, iteratively computes a cost-to-go function, and then recovers the optimal path and its total cost from a chosen start node to the destination.

Example Output

Optimal Path: {17, 23, 33, 41, 53, 56, 57, 60, 67, 70, 73, 76, 85, 89, 99}

Minimum Cost: 194.22

Problem Statement

Given a directed weighted graph (nodes + edges + weights), the goal is to:

Convert the graph representation into a distance matrix Q, assigning Infinity to non-connected node pairs.
Use Bellman's operator to compute the optimal cost-to-go J for each node.
Use Q and J to reconstruct the cheapest path and its total cost.

Implementation Details

1) Data Import & Distance Matrix Construction (dataToMatrix)

I implemented a module that reads graph data from a text-based input file, parses each line into (source, destination, weight) connections, and constructs a full distance matrix Q where:

Q[i, j] = weight if an edge exists
Q[i, j] = Infinity if nodes are not connected
The destination node has a diagonal value of 0 to act as the terminal condition

Key Features:

• Robust parsing of node identifiers (e.g., stripping the "node" prefix)
• Defensive checks for malformed rows and failed imports
• Correct handling of 1-based indexing in Mathematica while working with 0-based node labels

2) Bellman Operator Update (bellmanIteration)

I implemented the Bellman update step as a vector operation over nodes. For each node v, compute:

J_n+1(v) = min_w(Q(v, w) + J_n(w))

The implementation explicitly handles missing edges by treating Infinity weights as invalid transitions. The output of this step is a new cost-to-go vector.

3) Convergence to Final Cost-to-Go (findCostToGo)

This module repeatedly applies the Bellman operator until convergence:

Max Iterations

500

Tolerance

10^-6

To avoid instability from unreachable nodes, the convergence check ignores Infinity values when computing the norm difference. The output is the final cost-to-go vector where J[node] represents the minimum cost required to reach the destination node from that node.

4) Optimal Path Recovery (findPathAndTotalCost)

Once Q and J are computed, I reconstruct the optimal path from a chosen start node by repeatedly selecting the next node that minimizes:

Q(current, w) + J(w)

This yields the sequence of nodes visited and the total accumulated cost along the path. The module prints:

Optimal Path
Minimum Cost

Results

Using the provided test graph data, the implementation produced:

Optimal Path: {17, 23, 33, 41, 53, 56, 57, 60, 67, 70, 73, 76, 85, 89, 99}

Minimum Cost: 194.22

This confirms the algorithm correctly computes both the optimal policy (via J) and the associated optimal route (via greedy recovery using J).

Why This Project Matters

This project demonstrates the ability to:

Translate algorithmic theory into working code

Implement dynamic programming and iterative optimization methods

Handle real input parsing and edge cases (missing connections, indexing, Infinity handling)

Build an end-to-end solution that outputs interpretable results

Real-World Applications

Routing problems (transport, logistics) Network optimization Planning and decision-making under costs

Tools & Skills

Tools

Mathematica

Skills

Dynamic Programming Bellman Iteration Graph Optimization Shortest Path Data Parsing Algorithmic Implementation

Code Structure (Modules)

dataToMatrix[filePath]

Import graph & build distance matrix Q

bellmanIteration[Q, Jn]

Compute Bellman update J_n+1

findCostToGo[Q, maxIter, tol]

Iterate until convergence to final J

findPathAndTotalCost[Q, J, startNode]

Recover optimal path + total cost

Solving a Growth Model Using Shooting and Genetic Algorithms

A comparison of classical optimization methods versus evolutionary algorithms in solving dynamic economic models

Download Notebook

Overview

This project solves a deterministic neoclassical growth model using two fundamentally different numerical approaches: a shooting algorithm and a genetic algorithm. The objective is to compute the transition path of capital from an initial condition to its steady state and to compare the convergence, stability, and behavior of classical optimization methods versus evolutionary algorithms.

The project combines economic theory, numerical optimization, and computational experimentation, highlighting the trade-offs between structure-exploiting and search-based solution methods.

Model Framework

The underlying model is a standard infinite-horizon growth model with capital accumulation and Cobb–Douglas production. A representative household maximizes discounted utility subject to a budget constraint, while firms maximize profits in competitive markets.

Production Function

Y_t = K_t^α(AL_t)^1-α

Capital Evolution

K_t+1 = (1 - δ)K_t + I_t

Key Parameters

Capital Share (α)

0.33

Discount Factor (β)

0.98

Depreciation (δ)

0.1

Technology (A)

1

Initial Capital (K₀)

0.1

Time Horizon (T)

70

Analytical Foundations

Before implementing numerical solutions, the project:

Fully defines the competitive equilibrium

Derives the first-order conditions for households and firms

Obtains the Euler equation governing optimal consumption and capital accumulation

Solves analytically for the steady-state capital stock K*

This analytical groundwork ensures that numerical solutions can be evaluated against correct theoretical benchmarks.

Shooting Algorithm

Methodology

The shooting algorithm solves the model by exploiting the Euler equation directly. Starting from an initial guess for the capital path, the algorithm iterates forward and adjusts guesses until the terminal condition—convergence to the steady state—is satisfied.

Implementation Details:

• Solving a system of nonlinear equations using FindRoot
• Iterating capital forward over 70 periods
• Ensuring convergence to the analytically derived steady state

Results

The shooting algorithm produces a smooth and monotonic transition path for capital. Capital converges steadily toward the steady-state level, closely matching theoretical predictions. This method serves as a benchmark solution due to its precision and stability.

Genetic Algorithm

Motivation

To explore a model-agnostic alternative, the same growth model is solved using a genetic algorithm. Unlike the shooting method, the genetic algorithm does not directly impose the Euler equation. Instead, it searches over possible savings paths and evaluates them based on lifetime utility.

Fitness Function

A custom fitness function is defined to:

Simulate capital, output, consumption, and investment paths
Compute discounted lifetime utility
Penalize paths that fail to converge to the steady state after T periods

This ensures that only economically meaningful solutions achieve high fitness scores.

Genetic Algorithm Structure

The implementation includes all core evolutionary components:

Selection

Retains the top 50% of solutions by fitness

Parent Selection

Probabilistic selection weighted by fitness

Crossover

Binary encoding with random crossover points

Mutation

Bit-flipping with a 2.5% mutation rate

Population Size

80

Generations

1000+

The algorithm tracks both mean fitness and maximum fitness across generations, allowing analysis of convergence behavior.

Results and Comparison

Capital Stock Dynamics

The shooting algorithm generates a smooth and stable capital path that converges monotonically to the steady state. In contrast, the genetic algorithm produces a much noisier capital trajectory. While capital fluctuates significantly due to stochastic mutation and crossover, it still converges toward a level close to the steady state.

These fluctuations reflect the exploratory nature of genetic algorithms, which trade precision for flexibility and global search capability.

Savings Rate Behavior

The difference between the two methods is even more pronounced when examining savings rates.

Shooting Algorithm

Produces a smooth, declining savings rate consistent with optimal intertemporal behavior

Genetic Algorithm

Produces a highly volatile savings rate, though its average level aligns broadly with the shooting solution

This highlights a key trade-off: genetic algorithms can approximate optimal policies without explicit analytical conditions, but at the cost of short-run stability.

Key Takeaways

The shooting algorithm is highly efficient and precise when analytical structure is available.

The genetic algorithm provides a flexible, model-agnostic optimization framework.

Despite its stochastic nature, the genetic algorithm converges toward economically meaningful solutions.

Increasing population size, running more generations, or reducing mutation rates would likely improve stability and convergence.

Why This Project Matters

This project demonstrates the ability to:

Translate economic theory into computational solutions

Implement and compare fundamentally different optimization techniques

Design fitness functions and convergence criteria

Interpret numerical results through an economic lens

The comparison highlights the strengths and limitations of both classical numerical methods and evolutionary algorithms in dynamic optimization problems.

Tools & Skills

Tools

Mathematica

Skills

Shooting Algorithm Genetic Algorithm Economic Modeling Numerical Optimization Dynamic Programming Computational Economics

Earthquake Data Pipeline & Analysis

End-to-end data pipeline: NiFi → MinIO → Spark

Download JSON Flow Download CSV Flow Download Spark Notebook

Overview

This project builds an end-to-end data pipeline that ingests live earthquake data, stores it in object storage, and analyzes it using Apache Spark. The goal is to simulate a realistic data workflow and extract meaningful insights from continuously updated, real-world data.

Pipeline Structure

NiFi → MinIO → Spark Notebook

Data Ingestion (Apache NiFi)

Apache NiFi is used to automate the ingestion of earthquake data from the USGS Earthquake API.

Two NiFi Flows

JSON Flow

Ingests earthquake data in JSON format

CSV Flow

Ingests earthquake data in CSV format

Flow Capabilities:

• Pulls earthquake data on a schedule
• Processes and splits incoming records
• Adds timestamps and metadata
• Writes results to object storage

To make the project reproducible, I provide two NiFi flow files (JSON) that can be imported directly into NiFi to recreate the pipelines.

Data Storage (MinIO – S3 Compatible)

All ingested data is stored in MinIO, an S3-compatible object storage system used as a lightweight data lake.

Decoupled Architecture

MinIO allows the ingestion and analytics layers to be fully decoupled:

• NiFi focuses only on ingestion
• Spark reads data directly from storage for analysis
• Both JSON and CSV datasets are stored in a structured and consistent way

Data Processing & Analysis (Apache Spark)

All analysis is performed in a Spark notebook using the DataFrames API.

Notebook Workflow:

• Reads earthquake data directly from MinIO
• Converts raw files into Spark DataFrames
• Cleans and structures the data
• Performs scalable exploratory analysis using Spark transformations and SQL

What the Analysis Reveals

Using Spark, the notebook extracts several key insights from the earthquake data:

Earthquake Activity is Highly Skewed

Most recorded earthquakes have low magnitudes, while high-magnitude events are relatively rare. This becomes clear when aggregating and visualizing magnitude distributions.

Clear Temporal Patterns Emerge

Aggregations over time show that earthquake occurrences are not evenly distributed. Certain periods exhibit clusters of increased activity, highlighting the importance of time-based analysis rather than static summaries.

Geographical Concentration of Events

Grouping events by location reveals that earthquakes are concentrated in specific regions, consistent with known tectonic boundaries. Spark makes it easy to aggregate and compare activity across regions at scale.

Magnitude vs Frequency Trade-off

While smaller earthquakes occur frequently, larger earthquakes contribute disproportionately to overall seismic risk. This contrast is visible when comparing frequency counts with magnitude-weighted summaries.

Scalability of Analysis

Using Spark DataFrames allows these insights to be computed efficiently even as the dataset grows, reinforcing why distributed processing is well-suited for this type of continuously updating data.

Why This Project Is Useful

This project demonstrates:

How to ingest live external data

How to design clean and reproducible ingestion pipelines

How to use object storage as a data lake

How to extract insights from large datasets using Spark

It shows practical skills across data engineering and data analytics, rather than isolated scripts or toy examples.

Tools & Technologies

Apache NiFi MinIO (S3) Apache Spark Spark DataFrames Spark SQL

Files Provided

To fully reproduce the project:

NiFi Flow for JSON Data Ingestion

Import this flow file into NiFi to recreate the JSON ingestion pipeline

NiFi Flow for CSV Data Ingestion

Import this flow file into NiFi to recreate the CSV ingestion pipeline

Spark Notebook

Contains the full analysis with data processing and insights extraction

Menu

Jan Jacek Wejchert

About Me

Check out my coding projects

Technical Skills

Programming & Analytical Languages

Tools & Environments

Databases & Storage

Data Analysis & Modeling

Experience

Brevan Howard Intern

Passion Capital Intern

Education

Master of Science in Business Analytics and Data Science

Bachelor of Science in Economics

International A Levels

Get In Touch

Email

GitHub

LinkedIn

Projects

F1 Data Analysis Project

Time Series & Forecasting Project

Earthquake Data Pipeline & Analysis

Solving a Growth Model Using Shooting and Genetic Algorithms

Graph Optimization with Dynamic Programming

Resume

Jan Jacek Wejchert

Professional Summary

Education

Master of Science in Business Analytics and Data Science

Bachelor of Science in Economics

International A Levels

Technical Skills

Programming & Analytical Languages

Tools & Environments

Databases & Storage

Data Analysis & Modeling

Academic Works

Academic Work Title

The Comeback King: F1's Greatest Position-Gainer

Project Overview

Methodology

Categories for Evaluation:

Scoring System:

Competition Structure:

Python Code

1. Filter drivers with at least 24 races (1 season)

2. Calculate positions gained and filter by decade

3. Set up scoring system

4. Function to add points from race results

5. Evaluate all categories

6. Determine decade winner

Competition Results

Decade Champions

Final Winner

Time Series Analysis & Forecasting (CO₂ Concentration Data)

Overview

Data & Context

Analysis & Methodology

Exploratory Analysis

Time Series Decomposition

Baseline Smoothing Methods

Forecasting Techniques Implemented

Simple Moving Averages

Exponential Smoothing

Parameter Comparison

Out-of-Sample Evaluation

Key Findings

Strong Upward Trend & Seasonality

Moving Averages Performance

Exponential Smoothing Advantages

Trade-off Analysis

Tools & Technologies

Skills Demonstrated

Graph Optimization with Dynamic Programming

Overview

Example Output

Problem Statement

Implementation Details