3rd Place Student / 13th Overall at ISUCON13 (111,625 points)
Together with Saza and Moririn, we placed 3rd among student teams and 13th overall at ISUCON13 with a score of 111,625 points. Our team name was MONOS. It was my first time participating, while the other two had competed in the previous edition. I’ll jot down what we did in chronological order as a retrospective.
GitHub repository: https://github.com/Saza-ku/isucon13
Our team member Saza’s write-up is here.
Before the Contest
For practice, I individually worked through private-isu and the ISUCON 11 qualifier, and as a team we tackled the ISUCON 12 qualifier, 11 finals, and 12 finals.
For both practice and the actual contest, we used a great template Saza had built. It bundles scripts and documentation that make setup, deployment, and benchmarking easy to run. It was incredibly convenient and made practice much more efficient. We used alp, pt-query-digest, pprof, and netdata as our measurement tools, and benchmark results were automatically written to GitHub Issues upon completion.
Contest Day
Early Phase
At the start, Saza handled instance and repository setup through the initial benchmark, while Moririn and I read the manual and codebase.
After setup was complete, we ran benchmarks a few times but the initialization kept failing due to DNS resolution issues. The root cause was that the IP address configuration for DNS resolution (the environment variable ISUCON13_POWERDNS_SUBDOMAIN_ADDRESS) was set to the public IP of the second instance by default — switching it to the first instance fixed the problem.
Due to this issue and some benchmarker bugs, it took a while to get measurement results. Meanwhile, Moririn and I discussed that livecomments seemed important and went ahead with removing unnecessary SQL queries based on intuition. Saza quickly applied the standard optimizations and set up a multi-server configuration (NGINX+App+PowerDNS, PowerDNS MySQL, App MySQL).
[11:43] 3300(#14) : Initial setup(#16) [12:10] 3864(#21) : Ranking improvement(#12) [12:13] 4500(#23) : Multi-server setup(#23) [12:22] 5682(#24) : NG word search improvement(#2) [12:37] ?(#29): Ranking improvement 2(#31) [12:43] 4500(#33) : Multi-server setup 2(#32) [12:54] 9000(#35): Index on livestream_tags(#36)
Middle Phase
After we started getting proper measurement results, we divided the work as follows:
- Me: Icon-related optimizations
- Saza: DNS water torture attack countermeasures
- Moririn: Slow query optimization
I worked on two improvements for icons: “removing image data from the DB” and “returning 304 responses.” “Removing from the DB” is a common optimization seen in past ISUCONs. (#41)
“Returning 304” was an improvement based on the application manual. The app includes a hash value (SHA256) of the image data in the User response. When the benchmarker fetches image data, it sends this hash value in the If-None-Match HTTP header, so if the hash matches, we can return 304 Not Modified.
I first attempted to have NGINX serve the images directly. Specifically, I turned on NGINX’s etag functionality and changed the hash value returned by the app from plain SHA256 to the format NGINX uses (hex of LastModified UNIX timestamp-hex of Length)1. I expected NGINX would then handle returning 304 or the file data as appropriate. However, the benchmarker’s consistency check failed, revealing that the hash value couldn’t be changed from SHA256. There was also an NGINX plugin to change the etag algorithm2, but it didn’t work, so I gave up on this approach. (implementation branch)
In the end, I stored the hash values in Memcached keyed by UserName, and when fetching images, if the If-None-Match header value matched, I returned 304. This drastically reduced SELECT queries on the icons table, resulting in a major improvement. Incidentally, the If-None-Match header value has " (double quotes) on both ends of the hash, which caused cache misses and some debugging pain. (#75, #90)
[13:15] 16941(#40) : Remove icon from DB(#41) [13:26] 19508(#43) : Slot improvement(#44) [15:00] 38091(#49) : Livecomment and reaction improvement(#50, #53)3 [15:34] 45424(#60) : Index on ng_words(#59) [16:13] 63000(#73) : Return 304 for icon(#75)
Late Phase
After finishing the icon improvements, I added indexes while reviewing slow queries. (#6, #84)
The DNS water torture attack countermeasures Saza was working on turned out to be quite challenging4. He tried storing records in memory and temporarily returning A records even for invalid subdomains while having the app return errors, but things weren’t going well and it looked like a tough struggle.
Moririn was steadily improving SQL queries. The fillXXXResponse methods spanned multiple endpoints, which made it somewhat tricky as they were likely to conflict with my changes.
About 40 minutes before the end, I noticed that the theme SELECT query at the top of the slow query list could be cached. After preparing log and measurement tool cleanup, I implemented it at full speed and it worked on the first try. (#100)
It got a bit tight at the end, but after removing logs and measurement tools, we restarted, ran the benchmark, and did a browser check to finish. The final score was 111,625 points.
[16:32] 68000(#79) : Index on livecomments(#6) [16:32] 73000(#83) : Index on reactions(#84) [17:04] 74000(#89) : Forgotten IconHash cache(#90) [17:14] 84529(#93) : Search improvement(#93) [17:40] ?(#99) : Theme cache(#100) [17:50] 111625(final) : Remove logs/measurement tools, DB Wait(#95, #98)
Improvements We Couldn’t Make
Eliminating the N+1 queries around livecomments was difficult, but considering the benchmarker’s behavior where more comments and reactions lead to more tips and users, I think the improvement would have been significant.
We also discussed that if we could counter the DNS water torture attack, it would reduce the resource usage of the PowerDNS MySQL server, allowing us to split the app onto the second instance5.
Closing Thoughts
There were plenty of standard optimization opportunities alongside the new challenge of the DNS server, making it a very enjoyable day where we could go all out. It’s a shame we didn’t place in the top ranks, but we want to achieve better results next year.
We briefly hit 2nd place at this point (https://x.com/saza_ku/status/1728447246826635414) ↩︎
Looking at the DNS water torture attack article written by the problem setter, we discussed that this was probably the background behind the challenge ↩︎
During the post-mortem, we were surprised to see multiple teams had built their own DNS servers ↩︎
