Golang optimizations for high‑volume services
Lessons from a Postgres → Elasticsearch pipeline
Intro
Building services that sit on top of a Postgres replication slot and continuously stream data into Elasticsearch is a great way to get low‑latency search without hammering your primary database with ad‑hoc queries. But as soon as traffic ramps up, these services become a stress test for Go’s memory allocator, garbage collector, and JSON stack.
This post walks through optimizations applied to a real-world service that:
Connects to a Postgres replication slot
Transforms and enriches the change events
Uses Elasticsearch’s bulk indexer to index and delete documents
The constraints: the service cannot stop reading from the replication slot for long (or Postgres disk will grow), and it cannot buffer unbounded data in memory (or Go’s heap will). The goal is to keep latency and memory stable under sustained high volume.
The core problem: unbounded streams under tight constraints
Replication slots are relentless: as long as your primary receives writes, the slot will keep producing changes. If your consumer slows down, Postgres has to retain more WAL segments, increasing disk usage on the database server. If your consumer tries to “just buffer more” in memory, the heap will balloon and garbage collection will kick in more frequently, stealing CPU from useful work.
In this setup, you typically have three competing forces:
Backpressure from Elasticsearch bulk indexing
The continuous stream of changes from the replication slot
The Go runtime’s allocator and garbage collector trying to keep up with allocations in your hot path
The design work is about turning this into a stable flow: limit in-flight work, keep memory usage predictable, and reduce per-message overhead.
JSON performance: why switch from encoding/json to jsoniter
One of the earliest hot spots in services like this is JSON encoding/decoding for documents going into Elasticsearch. The standard library’s encoding/json is correct and convenient, but it trades some performance for safety and reflection-based flexibility.
High-volume services often switch to jsoniter (github.com/json-iterator/go) for a few reasons:
Faster encoding/decoding for common patterns
Less reflection overhead when configured with codegen or field caching
Better throughput when serializing large batches of similar structs
The wins are most visible when:
You are serializing many small documents at high frequency (typical for bulk indexing)
You can keep your types stable and avoid excessive use of
interface{}and mapsYou care about reducing allocations in the JSON path and shaving microseconds off each object
However, replacing encoding/json is never just a drop-in optimization; it changes behavior in subtle ways, especially around nulls and omitted fields.
Jsoniter compatibility
Jsoniter provides different configurations, one of them is the ConfigCompatibleWithStandardLibrary which attempt to be generating a really similar payload than the standard library.
There is, however, some edge cases.
Here one I went through:
Jsoniter doesn’t seems to work well with the json tag “omitzero” and libraries like guregu/null.v4. Prefer using “omitempty” to have the same behavior because jsoniter will check the .Valid() method.
All in all, this is a drop in replacement, but adding some tests to make sure you don’t have side effects in complex systems will make a big difference.
Controlling allocations with sync.Pool
Once the JSON serialization hot path is reasonably efficient, memory allocations often become the next bottleneck. Every replication event that flows through your service may involve:
Allocating a struct to represent the change
Allocating buffers for JSON encoding
Allocating intermediate slices and maps during transformations
Under sustained load, this can produce a flood of short-lived objects. The garbage collector has to scan and reclaim them, and that work shows up as CPU usage and latency spikes.
sync.Pool is a practical tool for these patterns:
It lets you reuse objects (structs, buffers, small slices) across requests without manual “object lifecycle” tracking
Objects in the pool are eligible for garbage collection if they are not in use, so the pool does not create a permanent memory leak
For hot types (e.g., your “replication event” struct or reusable
[]bytebuffers for JSON encoding), pooling can significantly reduce the number of allocations per event
Good use cases in this pipeline include:
Reusing buffers for building bulk requests (e.g.,
bytes.Bufferor[]byte)Reusing small structs that hold metadata for each change event
Reusing temporary scratch space used during transformations
Some practical guidelines:
Only pool objects that are frequently allocated and easy to reset to a zero state
Add helper methods like
Reset()so the code path that returns an object to the pool always leaves it in a clean stateAvoid pooling objects that embed context, locks, or anything with complex lifecycle or ownership semantics
Used carefully, sync.Pool can cut heap allocations dramatically in a high-throughput Go service, which brings GC frequency and pause times down.
Garbage collection tuning and experimental GCs
Even after optimizing allocations, GC behavior remains critical in long-lived services under high load.
Starting with go1.25, you can enable an experimental GC at build time that promise better performances. https://go.dev/blog/greenteagc
Reduce GC-induced latency spikes in services that care more about throughput and tail latency than about absolute minimal memory usage
Provide more even performance under bursts by scheduling GC work more smoothly over time
In a pipeline that must keep up with a replication slot and a bulk indexer, that trade-off is often desirable:
Slightly higher steady-state memory usage is acceptable if it avoids GC pauses that temporarily slow down ingestion
Less erratic latency helps keep Elasticsearch batches flowing and prevents backpressure from building up
However, tuning or switching GC behavior should be the last step, not the first. It works best when:
Allocations in hot paths have already been reduced with pooling, pre-allocation, and careful data structures
JSON and other serialization work has been profiled and streamlined
The service has clear SLOs around memory and latency, and you are comfortable trading one for the other within defined bounds
GC tweaks can then be used to shift the balance slightly rather than to compensate for fundamentally inefficient code.
Putting it together: a stable, high‑volume change pipeline
When all of these optimizations come together, the architecture looks like this:
A controlled number of goroutines read from the replication slot and push events through a bounded internal queue
Each event passes through transformation and enrichment code that avoids unnecessary allocations and uses
sync.Poolfor reusable buffers and structsJSON serialization uses a faster encoder like
jsoniterA bulk indexer sends batched operations to Elasticsearch with size and concurrency tuned to avoid large in-memory batches but still keep the cluster saturated
GC behavior is tuned only after profiling, to smooth out remaining latency spikes without blowing out memory usage
The result is a Go service that can keep up with a constant stream of database changes, avoid unbounded buffering, and make efficient use of CPU and memory—all while staying within the operational constraints of Postgres and Elasticsearch.



