I was leading Platform and Infrastructure teams at Aukro (Czech eBay) as Chief Technology Officer & member of the board of directors I focused primarily on Developer Experience, CI/CD, SRE, Operational Excellence, Data Infrastructure and providing engineering leverage to the business and product engineering teams.
Assignment: Improve availability
Unstable platform, breaking during peak times. Almost no processes in place, in development & product.
Impact: Speed and resiliency improvements of the platform, added thorough monitoring systems which made possible to remove bottlenecks of the system, laser-focus on targets under development capacity constraints.
Assignment: Monitoring, alerting, APM
Impact: Grafana setup with proper alerting in place, ELK stack expanded with APM for performance monitoring, quick & fast optimizations.
Assignment: Missing processed, traceability
Impact: SCRUM introduced with all ceremonies. Jira stories/task estimation brought in place with proper worklogging. Automation of reporting - made possible to focus on the right points from a business & tech angle.
Migration to Kubernetes
Assignment: Reduce infrastructure & Management costs
Entire platform consisted of semi-automated Ansible scripts, manual-operated virtual servers, firewalls, etc.
Impact: Kubernetes was a clear choice - it allowed us to run the entire platform in production and many development environments. We dockerize the entire platform and prepared Kubernetes clusters on our own HW, the cluster design & setup was outsourced to an external supplier. We decommissioned old resources and migrated to less expensive architectures to save money.
Assignment: Migrate from Jenkins to Gitlab CI
Impact: Implemented delivery pipelines “as code” committed to a project’s source control repository. Biggest benefit - treating the CD pipeline as a part of the application to be versioned and reviewed like any other code. It was not possible before - 1 pipeline for all branches. This allowed us to scale Kubernetes to dynamic environments per feature branch.
Assignment: Loading pages, searching was too slow. CWV statistics.
Impacts: reduced 90th percentile “user perceived load time” by over 70% for all purchase funnel pages on desktop web and mobile. Customers are able browse & search items more fluidly than ever before.
The vast majority of dependencies and servers have been without updates for several years - Elastic Search in version 2.x, Spring MVC on the platform project.
Impacts: Every dependency got updated to their respective latest version. Which brought not only better security but also higher performance and availability. Java 8 was updated to Java 17, again delivering better performance & security.
After successful stabilization we sucesfully hired additional DEV & IT staff which allowed us to build multiple development teams, while using internal-sourcing and external contractor companies. Built a QA department leading the way to fully automated CD.
Java, Python, C#, Angular, PostgreSQL, Elastic Search, Redis, RabbitMQ, ASP.NET Core & Microsoft SQL and more. Kubernetes, SonarQube, Vert.X, Quarkus introduced for better security & performance.
- over 100M items + 2M added every month
- 3.8 millions bids/month, 128k/day, 100 bids/s in 💥 peak time
- PostgreSQL DB ~ 1.4 TB
- Elastic Search - 200 GB, 80 shards
- 15 TB of images
- 200 GB of invoices