Biggest Czech on-line marketplace with auctions - Aukro Rebuilt

Estimated reading time: 3 min

I was leading Platform and Infrastructure teams at Aukro (Czech eBay) as Chief Technology Officer & member of the board of directors I focused primarily on Developer Experience, CI/CD, SRE, Operational Excellence, Data Infrastructure and providing engineering leverage to the business and product engineering teams.

Aukro

Stabilization, Processes

Assignment: Improve availability

Unstable platform, breaking during peak times. Almost no processes in place, in development & product.

Impact: Speed and resiliency improvements of the platform, added thorough monitoring systems which made possible to remove bottlenecks of the system, laser-focus on targets under development capacity constraints.

Assignment: Monitoring, alerting, APM

Impact: Grafana setup with proper alerting in place, ELK stack expanded with APM for performance monitoring, quick & fast optimizations.

Assignment: Missing processed, traceability

Impact: SCRUM introduced with all ceremonies. Jira stories/task estimation brought in place with proper worklogging. Automation of reporting - made possible to focus on the right points from a business & tech angle.

Migration to Kubernetes

Assignment: Reduce infrastructure & Management costs

Entire platform consisted of semi-automated Ansible scripts, manual-operated virtual servers, firewalls, etc.

Impact: Kubernetes was a clear choice - it allowed us to run the entire platform in production and many development environments. We dockerize the entire platform and prepared Kubernetes clusters on our own HW, the cluster design & setup was outsourced to an external supplier. We decommissioned old resources and migrated to less expensive architectures to save money.

Assignment: Migrate from Jenkins to Gitlab CI

Impact: Implemented delivery pipelines “as code” committed to a project’s source control repository. Biggest benefit - treating the CD pipeline as a part of the application to be versioned and reviewed like any other code. It was not possible before - 1 pipeline for all branches. This allowed us to scale Kubernetes to dynamic environments per feature branch.

Speed improvements

Assignment: Loading pages, searching was too slow. CWV statistics.

Impacts: reduced 90th percentile “user perceived load time” by over 70% for all purchase funnel pages on desktop web and mobile. Customers are able browse & search items more fluidly than ever before.

Technical debt

Assignment: Innovation.

The vast majority of dependencies and servers have been without updates for several years - Elastic Search in version 2.x, Spring MVC on the platform project.

Impacts: Every dependency got updated to their respective latest version. Which brought not only better security but also higher performance and availability. Java 8 was updated to Java 17, again delivering better performance & security.

Scale-up

After successful stabilization we sucesfully hired additional DEV & IT staff which allowed us to build multiple development teams, while using internal-sourcing and external contractor companies. Built a QA department leading the way to fully automated CD.

Technologies, stats

Java, Python, C#, Angular, PostgreSQL, Elastic Search, Redis, RabbitMQ, ASP.NET Core & Microsoft SQL and more. Kubernetes, SonarQube, Vert.X, Quarkus introduced for better security & performance.

  • over 100M items + 2M added every month
  • 3.8 millions bids/month, 128k/day, 100 bids/s in 💥 peak time
  • PostgreSQL DB ~ 1.4 TB
  • Elastic Search - 200 GB, 80 shards
  • 15 TB of images
  • 200 GB of invoices