This year, I once again participated in ISUCON14 as team MONOS with Saza and Moririn. We scored 6,659 points, placing 277th overall. Compared to last time (13th overall, 3rd among students), it was quite a disappointing result, but we had a fun day.

The Japanese version of this article is available here.

GitHub repository: https://github.com/saza-ku/isucon14

ISUCON13 (previous edition): https://onoe.dev/blog/isucon13

This article is part of the Money Forward Kansai Advent Calendar 2024, published on December 9th. The previous article was by umisora: “Scalebaseを使ってHubSpotのカオス化を防ぎながら、BizOps業務をスマート化したお話”.

Before the Contest

As a team, we practiced with ISUCON 13, 12 finals, and 9 qualifier. I didn’t get much individual practice this year.

As with last time, we used the convenient template Saza had built, which makes setup, deployment, and benchmarking easy to run. In addition to alp, pt-query-digest, pprof, and netdata, this year we also added tracing with OpenTelemetry & Jaeger. Benchmark results are automatically written to GitHub Issues upon completion. Reference: https://github.com/saza-ku/isucon14/issues/66

Contest Day

Early Phase

This year, the initial setup went very smoothly, and we had the first benchmark completed around 10:30.

I started by noticing the high volume of GET /api/chair/notification requests and increased RetryAfterMs from 30ms to 3000ms. The other two added indexes and handled standard optimizations and multi-server setup.

[10:29] 643(#2) : Initial quick benchmark(3718092) [10:33] 654(#3) : Initial full benchmark(6fdc857) [10:48] 1717(#5) : Add index on ride_statuses(#6) [10:53] 2961(#7) : Set RetryAfterMs to 3000ms(#8) [10:56] 2019(#9) : Add index on chair_locations(#10) (score likely dropped because #8 wasn’t included) [10:59] 3204(#11) : Add index on rides(#13) [11:02] 3950(#12) : Add index on chairs(f68c81a) [11:17] 4965(#16) : Move MySQL to 2nd server(#13) [11:24] 4837(#18) : Add index on chairs(#19) [11:39] 4796(#20) : Standard optimizations(#21)

Middle Phase

The distance_table extraction that we had been working on since the morning was completed just after noon. It was related to the top slow query. First, we extracted the subquery that served as the distance_table into a standalone table. Then, at the point of writing to the chair_locations table, we pre-calculate total_distance and total_distance_updated_at. Furthermore, since the endpoint POST /api/chair/coordinate that writes to the chair_locations table allows delayed data propagation, we update the distance_table via batch processing. Moririn noticed that there was only one place writing to the chair_locations table, which made the implementation possible.

Count: 164  Time=1.14s (186s)  Lock=0.00s (0s)  Rows=4.9 (797), isucon[isucon]@isucon1
  SELECT id,
  owner_id,
  name,
  access_token,
  model,
  is_active,
  created_at,
  updated_at,
  IFNULL(total_distance, N) AS total_distance,
  total_distance_updated_at
  FROM chairs
  LEFT JOIN (SELECT chair_id,
  SUM(IFNULL(distance, N)) AS total_distance,
  MAX(created_at)          AS total_distance_updated_at
  FROM (SELECT chair_id,
  created_at,
  ABS(latitude - LAG(latitude) OVER (PARTITION BY chair_id ORDER BY created_at)) +
  ABS(longitude - LAG(longitude) OVER (PARTITION BY chair_id ORDER BY created_at)) AS distance
  FROM chair_locations) tmp
  GROUP BY chair_id) distance_table ON distance_table.chair_id = chairs.id
  WHERE owner_id = 'S'

[12:22] 4895(#26) : Remove N+1 in getLatestRideStatus(#27) [12:55] 5750(#35) : Extract distance_table(#22)

Late Phase

This is where everyone got stuck. Despite having sufficient resources, the number of users wasn’t increasing and load wasn’t building up. While monitoring the benchmarker’s behavior, Saza and Moririn worked on improving the matching algorithm and tuning various parameters to improve user satisfaction, but nothing seemed to work. Implementation branch 1 (WIP): https://github.com/saza-ku/isucon14/tree/fix-matcing Implementation branch 2 (WIP): https://github.com/saza-ku/isucon14/tree/fix-matcing-2

Moririn and I were working on eliminating N+1 queries, but we kept hitting bugs in the implementation. I was working on improving the N+1 in chairPostCoordinate. I managed to implement the batch processing part, but ultimately couldn’t complete the subsequent N+1 elimination. Implementation branch (WIP): https://github.com/saza-ku/isucon14/tree/coordinate-named-exec

Judging that eliminating N+1 queries wouldn’t matter unless we could increase the number of users, I dropped the N+1 work and started implementing SSE (Server-Sent Events) for GET /api/app/notification. I managed rides by RideID using channels, pushing to a queue on ride_statuses table updates and popping to return responses. However, this also didn’t work in the end. Implementation branch (WIP): https://github.com/saza-ku/isucon14/tree/notification

In the end, our score didn’t improve from the early afternoon onward. It was quite frustrating. We finally removed the measurement tools and got lucky with a benchmark run to finish at 6,659 points.

[13:37] 6304(#44): Batch processing for chairPostCoordinate(#45) [17:50] 6659 : Remove measurement tools(30d5b06 )

Reflection

I think the matching algorithm was the bottleneck, and unless we could improve it, no other bottlenecks would surface.

I’m better at infrastructure-level optimizations, so I realized I’m quite weak in situations where the bottleneck doesn’t manifest as resource shortage but rather as user satisfaction and benchmarker behavior.

Understanding user and benchmarker behavior is very important, so I want to improve in this area going forward.

Closing Thoughts

I had a great time again this year. Starting next year, we’ll no longer be a student team, but we want to keep pushing forward.