Modern fetch and 3 ways to get Buffer output from aws-sdk v3 s3 GetObjectCommand

Many people have trouble with the AWS SDK for JavaScript v3's GetObjectCommand when trying to get Buffer output from s3. In this post, I will cover the reason why this happens, solutions and related information. This post is strongly related to the #1877 issue.

TLDR: For those who come here to find a just-work version. Here is it. This is the working version in nodejs. If you are running on browser, see the details in the rest of the post.

import {GetObjectCommand, S3Client} from '@aws-sdk/client-s3'
import type {Readable} from 'stream'

const s3Client = new S3Client({
    apiVersion: '2006-03-01',
    region: 'us-west-2',
    credentials: {
        accessKeyId: '<access key>',
        secretAccessKey: '<access secret>',
    }
})
const response = await s3Client
    .send(new GetObjectCommand({
        Key: '<key>',
        Bucket: '<bucket>',
    }))
const stream = response.Body as Readable
// if you are using node version < 17.5.0
return new Promise<Buffer>((resolve, reject) => {
    const chunks: Buffer[] = []
    stream.on('data', chunk => chunks.push(chunk))
    stream.once('end', () => resolve(Buffer.concat(chunks)))
    stream.once('error', reject)
})

// if you are using node version >= 17.5.0
return Buffer.concat(await stream.toArray())

Javascript version (commonjs)

const {GetObjectCommand, S3Client} = require('@aws-sdk/client-s3')

const s3Client = new S3Client({
    apiVersion: '2006-03-01',
    region: 'us-west-2',
    credentials: {
        accessKeyId: '<access key>',
        secretAccessKey: '<access secret>',
    }
})
const response = await s3Client
    .send(new GetObjectCommand({
        Key: '<key>',
        Bucket: '<bucket>',
    }))
const stream = response.Body
// if you are using node version < 17.5.0
return new Promise((resolve, reject) => {
    const chunks = []
    stream.on('data', chunk => chunks.push(chunk))
    stream.once('end', () => resolve(Buffer.concat(chunks)))
    stream.once('error', reject)
})

// if you are using node version >= 17.5.0
return Buffer.concat(await stream.toArray())

Why this post?

Recently, I migrated my storage from AWS S3 to DigitalOcean spaces service to save data transfer costs, which included upgrading the storage adapter for this blog (s3-ghost). At the time of the upgrade, the AWS SDK Javascript v3 looks getting mature, so I decided to upgrade it too from v2.

Initially, everything went fine and I released the update. However, 2 days after the release, I realized that my blog was dead (actually, It was dead for 2 days until this post). I checked the server log, and see the following error.

The "data" argument must be of type string or an instance of Buffer, TypedArray, or DataView. Received an instance of IncomingMessage

This error happened in the call to AWS SDK GetObjectCommand. It turned out that getting the Buffer output from the SDK command is not trivial and there is much interesting information I want to share in this post and also for my future reference.


How?

This is a sample code to send a GetObjectCommand request.

import {GetObjectCommand, S3Client} from "@aws-sdk/client-s3";

const s3Client = new S3Client({
    apiVersion: '2006-03-01',
    region: 'us-west-2',
    credentials: {
        accessKeyId: '<access key>',
        secretAccessKey: '<access secret>',
    }
})
const response = await s3Client
    .send(new GetObjectCommand({
        Key: '<key>',
        Bucket: '<bucket>',
    }))
const body = response.Body

From the official docs of GetObjectCommandOutput.Body, the body's type is Readable | ReadableStream | Blob. Why these 3 types?

GetObjectCommandOutput.Body's type is Readable | ReadableStream | Blob

Let's start digging into the source code of the AWS Javascript v3 sdk.

@aws-sdk/client-s3 package uses @aws-sdk/node-http-handler  (source), @aws-sdk/fetch-http-handler  (source) as requestHandler.

In browser environment

Looking at the source code of the SDK's @aws-sdk/fetch-http-handler package. The sdk uses global fetch to send network requests.

AWS SDK's fetch-http-handler uses fetch internally

Because the global fetch is used, it requires polyfill if your browser does not support fetch. whatwg-fetch is a common choice for polyfill.

caniuse's fetch api

In browser, response.body is a ReadableStream.

Standard fetch's response.body's type is ReadableStream

Where and why the Blob type comes to the output?

If we look again at the source code of the SDK. When response.body is not available, the SDK returns a blob as a workaround in old browsers/polyfill.

const hasReadableStream = response.body !== undefined;

// Return the response with buffered body
if (!hasReadableStream) {
    return response.blob().then(/*...*/);
}

If your browser is new, you can just skip the Blob type and cast the output type to Readable | ReadableStream in Typescript.

Previously, reponse.body was not supported in many browsers

In node environment

In node environment, @aws-sdk/node-http-handler is used to send network requests. From the source code of the package:

@aws-sdk/node-http-handler source code
@aws-sdk/node-http-handler source code

Suppose that the request does not use SSL, nodejs's http is used. In the flow of the source code, the GetObjectCommandOutput.Body is assigned to the http.IncomingMessage class as the first parameter of the callback in the 'response' event on the ClientRequest class.

IncomingMessage extends stream.Readable, that is why we got the Readable type for the GetObjectCommandOutput.Body.

This also explains why I got the Received an instance of IncomingMessage error described in this post, initially.

When the request uses SSL, https module is used instead. However, the nodejs doc does not mention the type of the response/request in detail. Probably, the type is the same as of the http module.


Conclusion

GetObjectCommandOutput.Body type is

  • In node: Readable (or precisely, the subclass of Readable, namely IncomingMessage).
  • In browser:
    - If the fetch API in your browser does not support request.body, Blob type is returned.
    - (This is most of the case) Otherwise, ReadableStream type is returned.

How to handle the output stream

I will introduce 3 ways, isomorphic way, node-only way, and browser-only way.

The isomorphic method

The trick is to use the Response class.

In node environment, import by import {Response} from 'node-fetch'.

In the browser environment, the Reponse object is available in the global scope. Note: you always need to polyfill fetch if your browser does not support fetch API natively.

const res = new Response(body)

Response is a very handy class in which you can convert the stream to many types. For example:

// blob type
const blob = await res.blob()

// json
const json = await res.json()

// string
const text = await res.text()

// buffer
const buffer = await res.arrayBuffer() // note: res.buffer() is deprecated

The buffer's type is nodejs Buffer in node (node-fetch), and ArrayBuffer in browser (native fetch).

The node-only way

Use this implementation to convert a Readable to a buffer in node environment.

import type { Readable } from "stream"

const streamToBuffer = (stream: Readable) => new Promise<Buffer>((resolve, reject) => {
	const chunks: Buffer[] = []
	stream.on('data', chunk => chunks.push(chunk))
	stream.once('end', () => resolve(Buffer.concat(chunks)))
	stream.once('error', reject)
})

If you are using nodejs version >= 17.5.0, Readable.toArray provides a shorter version.

import type { Readable } from "stream"

const streamToBuffer = (stream: Readable) => Buffer.concat(await stream.toArray())

Note that, at the time of this writing (Feb 13, 202), Readable.toArray is an experimental feature.

The browser-only way

Use this implementation to convert a ReadableStream to a buffer in browser environment.

// Buffer is a subclass of Uint8Array, so it can be used as a ReadableStream's source
// https://nodejs.org/api/buffer.html
export const concatBuffers = (buffers: Uint8Array[]) => {
	const totalLength = buffers.reduce((sum, buffer) => sum + buffer.byteLength, 0)
	const result = new Uint8Array(totalLength)
	let offset = 0
	for (const buffer of buffers) {
		result.set(new Uint8Array(buffer), offset)
		offset += buffer.byteLength
	}
	return result
}

const streamToBuffer = (stream: ReadableStream) => new Promise<Buffer>(async (resolve, reject) => {
	const reader = stream.getReader()
	const chunks: Uint8Array[] = []

	try {
		while (true) {
			const {
				done,
				value
			} = await reader.read()
			if (done) break
			chunks.push(value!)
		}
	} finally {
		// safari (iOS and macOS) doesn't support .releaseReader()
		// https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamDefaultReader/releaseLock#browser_compatibility
		reader?.releaseLock()
	}
	return concatBuffers(chunks)
})