Modern fetch and how to get Buffer output from aws-sdk v3 GetObjectCommand

Modern fetch and how to get Buffer output from aws-sdk v3 GetObjectCommand

Many people have trouble with the AWS SDK for JavaScript v3's GetObjectCommand when trying to get Buffer output from s3. In this post, I will cover the reason why this happens, solutions and related information. This post is strongly related to the #1877 issue.

For those who come here to find a just-work version. Here is it. This is the working version in nodejs. If you are running on browser, see the details in the rest of the post.

import {GetObjectCommand, S3Client} from '@aws-sdk/client-s3'
import type {Readable} from 'stream'

const s3Client = new S3Client({
    apiVersion: '2006-03-01',
    region: 'us-west-2',
    credentials: {
        accessKeyId: '<access key>',
        secretAccessKey: '<access secret>',
    }
})
const response = await s3Client
    .send(new GetObjectCommand({
        Key: '<key>',
        Bucket: '<bucket>',
    }))
const stream = response.Body as Readable

return new Promise<Buffer>((resolve, reject) => {
    const chunks: Buffer[] = []
    stream.on('data', chunk => chunks.push(chunk))
    stream.once('end', () => resolve(Buffer.concat(chunks)))
    stream.once('error', reject)
})

Why this post?

Recently, I migrated my storage from AWS S3 to DigitalOcean spaces service to save data transfer costs, which included upgrading the storage adapter for this blog (s3-ghost). At the time of the upgrade, the AWS SDK Javascript v3 looks getting mature, so I decided to upgrade it too from v2.

Initially, everything went fine and I released the update. However, 2 days after the release, I realized that my blog was dead (actually, It was dead for 2 days until this post). I checked the server log, and see the following error.

The "data" argument must be of type string or an instance of Buffer, TypedArray, or DataView. Received an instance of IncomingMessage

This error happened in the call to AWS SDK GetObjectCommand. It turns out that getting the Buffer output from the SDK command is not trivial and there is much interesting information I want to share in this post and also for my future reference.


How?

This is the sample code to send a GetObjectCommand request.

import {GetObjectCommand, S3Client} from "@aws-sdk/client-s3";

const s3Client = new S3Client({
    apiVersion: '2006-03-01',
    region: 'us-west-2',
    credentials: {
        accessKeyId: '<access key>',
        secretAccessKey: '<access secret>',
    }
})
const response = await s3Client
    .send(new GetObjectCommand({
        Key: '<key>',
        Bucket: '<bucket>',
    }))
const body = response.Body

From the official docs of GetObjectCommandOutput.Body, the body's type is Readable | ReadableStream | Blob. Why these 3 types?

Looking at the source code of the SDK's fetch-http-handler package, we know that the sdk uses the modern and standard fetch, which probably uses the popular node-fetch package.

The node-fetch package

If you are using webpack and node-fetch version 2, in browser environment, node-fetch exports the global.fetch object without exporting any node native object. So that it is safe to import node-fetch even in your browser-only code.

If your browser does not support fetch API natively, you need to polyfill it manually (whatwg-fetch).

From version 3, unfortunately, they decided to remove browser support and node-fetch became node only package.

Prior to v3.x, we included a browser field in the package.json file. Since node-fetch is intended to be used on the server, we have removed this field. If you are using node-fetch client-side

node-fetch also switched to ES module only package which is a controversarial move. Currently, my project has several dependencies (one of them is @babel/node package) that are stuck with common js module causing it difficult to migrate to ES module.

From @babel/node document:

ES6-style module-loading may not function as expected
Due to technical limitations ES6-style module-loading is not fully supported in a babel-node REPL.

The Body type

Back to the main topic, I went to check the specs of the response.body of fetch which is also the type of GetObjectCommandOutput.Body.

In node environment, response.body is nodejs's Readable stream.

In browser, response.body is a ReadableStream.

Note that they are totally different classes with different interfaces and different ways on how to obtain the output buffer. In the next section, I will show how to handle them. For now, we can guess the reason why the SDK team chose to switch to the new interface instead of just returning a buffer.

  • Using fetch is the modern and new standard way of fetching data. It will less likely to be deprecated in the future.
  • Stream output can be pipelined to another stream and can be processed inline instead of storing everything in memory.

Where and why the Blob type comes to the output?

If we look again at the source code of the SDK. When response.body is not available, the sdk returns a blob as a workaround in old browsers/polyfill.

const hasReadableStream = response.body !== undefined;

// Return the response with buffered body
if (!hasReadableStream) {
    return response.blob().then(/*...*/);
}

If your browser is new, you can just skip the Blob type and cast the output type to Readable | ReadableStream if you are using Typescript.


How to handle the output stream

I will introduce 3 ways, isomorphic way, node only way, and browser only way.

The isomorphic method

The trick is to use the Response class.

In node environment, import by import {Response} from 'node-fetch'.

In the browser environment, the Reponse object is available in the global scope. Note: you always need to polyfill fetch if your browser does not support it natively.

const res = new Response(body)

Response is a very handy class in which you can convert the stream to many types. For example:

// blob type
const blob = await res.blob()

// json
const json = await res.json()

// string
const text = await res.text()

// buffer
const buffer = await res.arrayBuffer() // note: res.buffer() is deprecated

The buffer's type is nodejs Buffer in node (node-fetch), and ArrayBuffer in browser (native fetch).

If you are worrying that installing node-fetch will increase your project size then this shouldn't be your concern because node-fetch is indeed a dependence of the AWS SDK.

In fact, I am still struggling finding a source where aws sdk uses node-fetch. I only could find the node-fetch declaration in yarn.lock which might be required in development only.

The node only way

Use this implementation to convert a Readable to a buffer in node environment.

import type { Readable } from "stream"

const streamToBuffer = (stream: Readable) => new Promise<Buffer>((resolve, reject) => {
	const chunks: Buffer[] = []
	stream.on('data', chunk => chunks.push(chunk))
	stream.once('end', () => resolve(Buffer.concat(chunks)))
	stream.once('error', reject)
})

The browser only way

Use this implementation to convert a ReadableStream to a buffer in browser environment.

// Buffer is a subclass of Uint8Array, so it can be used as a ReadableStream's source
// https://nodejs.org/api/buffer.html
export const concatBuffers = (buffers: Uint8Array[]) => {
	const totalLength = buffers.reduce((sum, buffer) => sum + buffer.byteLength, 0)
	const result = new Uint8Array(totalLength)
	let offset = 0
	for (const buffer of buffers) {
		result.set(new Uint8Array(buffer), offset)
		offset += buffer.byteLength
	}
	return result
}

const streamToBuffer = (stream: ReadableStream) => new Promise<Buffer>(async (resolve, reject) => {
	const reader = stream.getReader()
	const chunks: Uint8Array[] = []

	while (true) {
		const {
			done,
			value
		} = await reader.read()
		if (done) break
		chunks.push(value!)
	}
	return concatBuffers(chunks)
})

Additional topic: what is the streamed type of the body

Because the source code of node-fetch is available, I only discuss the case of node-fetch.

From nodejs docs, the 'data' event streams type of Buffer, string, or, any.

chunk <Buffer> | <string> | <any> The chunk of data. For streams that are not operating in object mode, the chunk will be either a string or Buffer. For streams that are in object mode, the chunk can be any JavaScript value other than null.

node-fetch uses node's http and https to send a request, and listens to the 'response' event on the returned IncomingMessage (here is why I got the Received an instance of IncomingMessage error described in this post, initially).

The nodejs document does not write anything more about the type of the parameter passed to the event handler but it is probably a Buffer.