S3 Pricing for Streaming Media: What Storage and Egress Actually Cost

Open an S3 bill for a streaming catalogue and you see one number. That single figure hides three independent cost mechanisms — storage, egress, and requests — and each is driven by a different part of your pipeline. Treating the bill as fixed transport overhead is the mistake that keeps teams optimising the wrong thing for months.

The naive model is simple: store every rendition in S3 Standard, watch the monthly total, and assume it scales with catalogue size. It rarely does. For a segmented HLS or DASH catalogue serving millions of streams, the storage line is often the smallest of the three. The dominant cost is usually egress — bytes leaving S3 toward a CDN origin or directly to viewers — followed by request charges generated by the way segmented playback fragments every title into thousands of small objects. Storage of cold, rarely-streamed renditions tends to come last.

That ordering matters because each mechanism has a different lever. If egress dominates, the fix is CDN offload and origin-shield configuration, not lifecycle policy. If storage dominates, the fix is a storage-class strategy or pruning low-value renditions upstream in transcoding. If requests dominate, the fix lives in segment duration and manifest design. A bill read as a single number gives you none of this. A bill decomposed into its three drivers tells you exactly which knob to turn.

How Does S3 Pricing Work for a Video Catalogue?

S3 charges along three axes that are billed independently, and conflating them is the root of most cost confusion.

The first is storage — a per-GB-month rate that depends on the storage class. S3 Standard is the default; Standard-Infrequent Access (Standard-IA) costs less per GB but adds a retrieval fee and a minimum storage duration; Glacier and Glacier Deep Archive cost dramatically less per GB but introduce retrieval latency measured in minutes to hours. Per AWS’s published pricing, the per-GB-month delta between Standard and Glacier Deep Archive is roughly an order of magnitude or more, which is why archive tiers matter for a long-tail catalogue.

The second is data transfer out (egress) — a per-GB rate charged when bytes leave S3 to the public internet or to another region. This is the axis teams underestimate. Every byte a viewer streams that is served from S3 origin — rather than absorbed by a CDN edge cache — is billed egress.

The third is requests — a small per-1,000 charge on GET, PUT, and other operations. It looks negligible until you remember that a single HLS playback session with six-second segments fetches hundreds of objects, and a catalogue of millions of sessions multiplies that into a meaningful line item.

The reason the bill confuses people is that these three move on different signals. Storage tracks catalogue size and rendition count. Egress tracks viewer demand and CDN cache-hit ratio. Requests track segment granularity and session volume. They do not rise and fall together, so a single “cost per GB stored” intuition is structurally wrong.

Which S3 Cost Component Usually Dominates?

For most streaming catalogues we have looked at, egress and requests dominate over raw storage — but the ratio depends heavily on whether a CDN sits in front of the origin (observed pattern across media engagements; not a published benchmark). The table below is a decision rubric for reading your own bill, not a fixed result.

S3 Cost Driver → Dominant Lever

If the dominant cost is…	The driver is usually…	The lever is…	Where the lever lives
Egress (data transfer out)	Low CDN cache-hit ratio; origin serving viewer bytes	CDN offload, origin shield, longer cache TTLs	Delivery / CDN config
Requests (GET volume)	Short segment duration on a large session count	Longer segment duration, manifest tuning	Packaging / segmentation
Storage — hot tier	Many active renditions in S3 Standard	Rendition pruning at transcode time	Transcoding pipeline
Storage — cold tier	Long-tail catalogue held in Standard	Lifecycle transition to IA / Glacier	Lifecycle policy

The diagnostic discipline is to attribute the bill before naming the lever. If you cut renditions to save storage when egress was the real driver, you degrade quality and barely move the bill. This is the same attribution-before-action principle behind how an AI inference cost audit finds the real bottleneck — you measure where the money goes before you change anything. The cost reasoning here is closer to the distinction between cost, efficiency, and value as separate quantities than to a single spend figure: a high egress bill that serves a high-engagement catalogue is value, not waste, and the model has to separate the two.

How Do Storage Classes Map to Hot vs Cold Renditions?

Storage-class selection is an access-frequency decision, not a file-type decision. The question is not “is this a video?” but “how often is this specific object streamed?”

Hot renditions — the bitrate ladders for currently-promoted titles, recent episodes, anything in a recommendation carousel — belong in S3 Standard. They are read constantly, and IA’s per-retrieval fee plus minimum-duration charge would cost more than Standard for frequently-accessed objects.

Cold renditions are different. A back-catalogue title streamed a handful of times a month, or a rarely-selected high-bitrate rendition that most clients never request, is a candidate for Standard-IA. The long tail that is almost never streamed — archival masters, deprecated renditions kept for compliance — is where Glacier Deep Archive earns its place, accepting retrieval latency in exchange for a per-GB cost a fraction of Standard.

The mistake is treating the whole catalogue as one tier. A lifecycle policy that transitions objects to IA after 30 days of no access, and to Glacier after a longer window, can move a large share of catalogue bytes to cheaper tiers without touching the hot path. The share eligible for transition is itself a metric worth instrumenting — it tells you the upper bound on what a storage-class strategy can save before you build it.

How Does Segmented HLS/DASH Affect Storage and Request Cost?

Segmentation is the part of the pipeline that quietly inflates both object count and request volume. A single title encoded as a six-rendition bitrate ladder, segmented into six-second chunks across a two-hour film, produces on the order of tens of thousands of small objects — segments plus manifests — not one file.

That has two cost consequences. On storage, many small objects carry per-object overhead and complicate lifecycle transitions, because IA and Glacier apply minimum-size and minimum-duration assumptions that small segments handle inefficiently. On requests, every playback session issues a GET for each segment it plays, so request charges scale with sessions × segments-per-session rather than with catalogue size.

This is where transcoding decisions and storage cost intersect. The number of renditions you generate, the segment duration you choose, and the bitrate ladder you commit to are all set upstream — and they determine your S3 object count before a single viewer arrives. Shorter segments improve adaptive-switching responsiveness but raise request cost; longer segments cut requests but coarsen quality adaptation. That trade-off is the same family of decisions covered in how video transcoding cost and quality trade-offs actually work, and it is why storage-and-delivery economics cannot be optimised in isolation from the encode ladder. If you want to understand how the ladder itself is constructed, the practical guide to bitrate, quality, and streaming cost covers the rendition side directly.

When Does CDN Egress Offload Actually Reduce Cost?

CDN offload reduces S3 egress when the CDN serves bytes from its edge cache instead of pulling them from origin. The mechanism is the cache-hit ratio: a byte served from edge cache is not billed as S3 data transfer out at all.

The trade-off is not free. A CDN charges its own egress, plus request fees, and adds origin-shield and cache-configuration complexity. The model that matters is the comparison between S3-direct egress cost and CDN-delivered cost at your actual cache-hit ratio. For a catalogue with high concentration — a few titles drive most demand — cache-hit ratios are high, origin pulls are rare, and S3 egress drops to a thin slice serving cold misses. For a flat, long-tail demand curve where every title is streamed occasionally, cache-hit ratios fall, origin pulls rise, and the offload benefit shrinks.

The practical point: egress cost depends on viewer demand shape, not just volume. Two catalogues of identical size and identical total streams can have wildly different egress bills depending on how concentrated their demand is. Profiling that demand shape — not just total bytes — is what reveals whether CDN tuning or storage strategy is your real lever. Sizing the origin and edge capacity for the steady-state demand you actually see, rather than peak, follows the same logic as production capacity planning for steady-state load: you model the sustained curve, not the burst.

What Should You Instrument Before Optimising?

You cannot optimise a bill you have not decomposed. Before changing anything, instrument the four quantities that map S3 charges to viewer behaviour:

Storage cost-per-GB-month by storage class — so you know how much of the catalogue already sits in Standard versus IA versus Glacier, and what the blend costs.
Egress cost-per-stream to CDN origin — measured as origin-pull bytes divided by streams served, which exposes cache-hit ratio indirectly.
Request charges per playback session — GET volume per session times the per-request rate, which surfaces segment-granularity cost.
Share of catalogue eligible for IA or Glacier transition — the access-frequency distribution that bounds what lifecycle policy can save.

These four turn one bill into three independent optimisation problems, each with its own owner. Storage class is a lifecycle-policy decision. Egress is a CDN-and-delivery decision. Requests are a packaging decision. And rendition count — the input that drives storage and request volume both — is a transcoding decision. Attributing spend across these is exactly the scope of a focused cost sprint; the storage-and-delivery surface is one slice of the broader cost-per-stream picture our inference cost-cut engagement profiles before naming a dominant lever. For broadcast and streaming teams, this attribution work is where our media and broadcast practice starts a cost engagement.

FAQ

How does S3 pricing work, and what does it mean in practice?

S3 bills along three independent axes: storage (per GB-month, varying by storage class), data transfer out / egress (per GB leaving S3), and requests (a small per-1,000 charge on operations like GET). In practice this means a streaming bill is not one number but three mechanisms that move on different signals — storage tracks catalogue size, egress tracks viewer demand and CDN cache-hit ratio, and requests track segment granularity and session volume.

What are the main S3 cost components for a streaming catalogue — storage, egress, and requests — and which usually dominates?

Storage, egress, and requests are the three components. For most streaming catalogues we have looked at, egress and requests dominate over raw storage, though the ratio depends heavily on whether a CDN sits in front of the origin (observed pattern, not a published benchmark). The discipline is to attribute the bill across all three before naming a lever, because cutting the wrong one moves the total very little.

How do S3 storage classes (Standard, Infrequent Access, Glacier) map to hot vs cold video renditions?

Storage class is an access-frequency decision. Hot renditions — current titles, recommendation-carousel content — belong in S3 Standard because they are read constantly. Cold renditions stream rarely and suit Standard-IA, and the long tail that is almost never streamed suits Glacier Deep Archive, which trades retrieval latency for a per-GB cost a fraction of Standard.

How does storing multiple bitrate renditions and segmented HLS/DASH manifests affect S3 storage and request costs?

A single title encoded as a multi-rendition ladder and segmented into short chunks produces tens of thousands of small objects, not one file. This inflates storage object count (and complicates lifecycle transitions, since IA and Glacier assume larger objects) and inflates request charges, which scale with sessions times segments-per-session rather than with catalogue size.

When does CDN egress offload reduce S3 transfer cost, and how do you model the trade-off?

CDN offload reduces S3 egress when the CDN serves bytes from its edge cache instead of pulling them from origin — a cache hit is not billed as S3 data transfer out. The trade-off depends on cache-hit ratio, which depends on demand shape: concentrated demand (a few titles dominate) yields high hit ratios and large savings, while flat long-tail demand yields more origin pulls and a smaller benefit. Model the comparison at your actual cache-hit ratio, not on volume alone.

How does S3 cost relate to transcoding decisions — does pruning low-value renditions reduce storage spend?

Yes, but only if storage is the dominant driver. Rendition count is set upstream in transcoding and determines S3 object count and storage footprint before any viewer arrives, so pruning low-value renditions reduces both storage and request volume. If egress dominates instead, pruning degrades quality while barely moving the bill — which is why you attribute the cost before choosing the lever.

What S3 metrics should you instrument before optimising streaming storage cost?

Instrument four quantities: storage cost-per-GB-month by storage class, egress cost-per-stream to CDN origin, request charges per playback session, and the share of catalogue eligible for IA or Glacier transition. Together these decompose one bill into three independent optimisation problems — lifecycle policy, CDN delivery, and packaging — each with a clear owner.

How do S3 storage costs compare to S3 Glacier Deep Archive for long-tail catalogue renditions that are rarely streamed?

Per AWS’s published pricing, the per-GB-month cost of Glacier Deep Archive is roughly an order of magnitude or more below S3 Standard. For long-tail renditions streamed a handful of times — archival masters, deprecated renditions kept for compliance — that delta makes Deep Archive the right tier, provided you can tolerate retrieval latency measured in minutes to hours.

How does S3 data-transfer-out pricing differ when egress goes directly to viewers versus through a CDN origin pull?

Direct-to-viewer egress bills every streamed byte as S3 data transfer out. Through a CDN, only origin-pull bytes (cache misses) are billed as S3 egress, while edge-served bytes are not — so the S3 egress component shrinks in proportion to the cache-hit ratio. The CDN then charges its own egress and request fees, so the real comparison is S3-direct cost versus CDN-delivered cost at your measured hit ratio.

Where the Real Lever Sits

The lever you reach for should be the one the data names, not the one the bill’s headline suggests. A streaming S3 bill that profiles cleanly into storage, egress, and requests turns “our storage costs too much” into a specific, answerable question: is the spend driven by cold renditions held in the wrong class, by bytes leaving the origin because the CDN is not absorbing them, or by a segmentation scheme that multiplies request volume? Those three answers point at three different teams and three different fixes. The failure mode is acting before the attribution exists — pruning renditions to cut a bill that egress was driving, and degrading quality for nothing.