Go and JSON encoding/decoding

After a few years of writing JSON APIs for various purposes, with more success than this article may lead you to believe, I have come to the conclusion that the JSON support in the wider Go ecosystem is broken. And it isn’t really Go’s fault, but fixing it so it plays nice with other very common programming languages is something that hasn’t been given much consideration to.

All the problems of JSON encoding and decoding stem from being strict about the encoding/json package behavior, imagining an ideal world where multiple programming languages have the same support for common numeric types like int64, or even uint64 (that one is nearly non-portable).

Javascript has signed 53 bit integers (technically a float64 with a 53 bit characteristic, the rest a floating point mantissa). This means, that any int64/uint64 value produced by the standard library encoding/json Go is inherently unsafe when it comes to Javascript, and will be truncated to match the available bit precision of JS numeric types.

At this point, you should be doing a double take, since nearly everything written today is javascript based. I’m not just talking about browsers, even Visual Studio Code is written in JavaScript! Also React Native for Android and iOS apps. Things have gone so far that people regularly cite Atwoods Law:

Any application that can be written in JavaScript, will eventually be written in JavaScript.

But why should you, as a Go developer, really care about int64 in JSON?

This means, that numeric values of that size, which are common for database IDs generated by twitter snowflake or sony’s sonyflake or something as trivial as auto-incrementing values, need to be JSON encoded as strings to be consumed by the standard JSON.parse in browsers and NodeJS runtimes.

Now, encoding/json has allowances built in for encoding numerics as string, by adding a string type hint to the json tag on the field.

type User struct {
	ID int64 `json:"id,string"`
}

The above example will force the standard library json encoder and decoder to encode the ID value as a quoted string. Unfortunately, it also MUST decode the ID value ONLY as a quoted string, unquoted values are not accepted.

In the case of Javascript, you’re going to have strings and strings will also be sent back to the APIs, which is cool. But some other languages which are loosly or strongly typed can handle int64 values without any problem. So, why not accurately represent the data structures in them with whatever local int64 type they have?

At this point, we’d have to look at protocol buffers. A proto definition may include types like int64 and uint64, and allowances are already made in their protojson package to encode int64 and uint64 values as strings.

	case pref.Int64Kind, pref.Sint64Kind, pref.Uint64Kind,
		pref.Sfixed64Kind, pref.Fixed64Kind:
		// 64-bit integers are written out as JSON string.
		e.WriteString(val.String())

So, protojson actually comes with JS compatibility in mind. Similarly, it will also decode int64 and uint64 values, regardless of if they have been quoted in the encoded JSON. The following is valid json for decoding with protojson, but impossible with the standard encoding/json package:

{"id": 123}
{"id": "123"}

The actual generated code from proto definitions looks somewhat like this:

type User struct {
	// omitted internal fields

	ID int64 `protobuf:"varint,1,opt,name=ID,proto3" json:"ID,omitempty"`
	//...
}

Sadly, the generator doesn’t add a ,string hint to the json: tag, so if you try to encode a this struct with encoding/json, you will produce JSON that’s not suitable for Javascript, nor compatible with protojson output. But you are dealing with protobufs at that point, so you can use the protojson encoder anyway.

If you don’t use protobufs, you’re left with few options:

  • add a ,string hint to all int64/uint64 values,
  • don’t use encoding/json for decoding JSON, or
  • use a custom type alias with flexible encoder/decoder

Before we deal with custom types, there’s a number of JSON libraries that may allow us to decode json without being strict about quotes:

This package is a drop-in replacement for encoding/json, but also follows the same strictness in terms of encoding or decoding values. The main project goal is to provide a stdlib compatible package optimized for speed.

json: cannot unmarshal "\"456\"}" into Go struct field main.User.id. of type int64

Verdict: miss.

Another drop-in replacement, but provides some configuration options, unfortunately none that would allow reading numbers from quoted or unquoted JSON. There are however functions named RegisterTypeEncoder and RegisterTypeDecoder, which makes me believe that a custom decoder for int64 is possible. The most promising find so far.

Verdict: maybe.

This package has a unique API that allows partial parsing of JSON data, where you don’t need to parse the complete JSON document to get partial results. Of course, it’s also strict about the expected values:

json: cannot unmarshal string into Go struct field User.id of type int64

It doesn’t implement it’s own JSON decoder, so it relies on the standard library to do the heavy lifting. As such, it produces the same errors.

Verdict: miss.

Doesn’t support scanning JSON into structs. Optimized for speed and for reading individual values where you know the key path beforehand.

Verdict: miss.

All these don’t support scanning JSON into structs. Optimized for programmatic JSON traversal without much reliance on schema, or types. DJSON particularly produces interface{} outputs, which you must type-switch to read out concrete typed values.

Verdict: miss.

And here’s a few others:

Ultimately, the effort to produce what protojson does for any given struct and not just proto messages seems to be non-trivial. So far I haven’t found a package which would mimic protojson behavior on interface{} inputs with concrete int64/uint64 fields/values.

As far as work arounds go:

In javascript, there’s the json-bigint package, which has an absurd number of downloads per week (about 2.5M+ downloads at time of writing). As it’s opt-in, it’s very easy to forget about using it. Ideally, I’d like first-party support for that in JS environments, but alas, it goes against the JSON spec.

On the server-side there’s a json.Number type, which apparently is a container type which accepts both quoted or unquoted values. This is a source of some debate on issue 34472 if this behaviour strictly corresponds to the documentation or JSON specifications, or if the documentation should be updated to correspond to the implementation.

type User struct {
        ID json.Number `json:"id"`
}

func main() {
        user := User{}
        if err := json.NewDecoder(strings.NewReader(`{"id":123}`)).Decode(&user); err != nil {
                fmt.Println(err)
        }
        fmt.Println(user.ID)
        if err := json.NewDecoder(strings.NewReader(`{"id":"456"}`)).Decode(&user); err != nil {
                fmt.Println(err)
        }
        fmt.Println(user.ID)
}

If you run this, both “123” and “456” values are printed, but actually getting an int64 value means you need to call the Int64() method on the value:

var v int64
v, _ = user.ID.Int64()

If a numeric, particularly int64 type is accepted, the conversion to that type could fail because of an invalid value as the decoding step of the JSON fails. By using json.Value, which is an alias of a string type, there is effectively no parsing/conversion going on during decoding, but only when Int64() is invoked.

The argument being thrown around on the linked issue above is that numeric values should be encoded as unquoted in JSON for it to be valid, feels erroneous. While you can’t unquote "abc" into abc and expect the JSON to validate, any JSON with quoted numbers is completely valid. In fact, even the ,string tag suggests this:

The “string” option signals that a field is stored as JSON inside a JSON-encoded string. It applies only to fields of string, floating point, integer, or boolean types. This extra level of encoding is sometimes used when communicating with JavaScript programs.

While I understand that this should be opt-in, I don’t get the strictness in supporting only homogenous types when decoding a JSON document. I feel it hurts interoperability with the various languages which don’t support int64 types, especially if they co-exist with other languages that do.

And it also hurts to have this behavior be opt-in via field tags, instead of it being an encoding flag. There are at least two examples in the standard library which come to mind:

  • encoding/base64 has 4 differently behaving encoders,
  • log has a standard logger, SetFlags() and New(),

Both these examples demonstate that the json package could be expanded with configuration options as well as configure currently unavailable behavior, like quoting int64/uint64 values by default when encoding json.

Obviously, none of this is likely to occur in encoding/json and possibly neither in Go2 without a well thought out proposal, and judging from what I’ve seen, pacifying the objections on that subject might prove futile. But somebody might fork it to solve a number of stackoverflow answers in regards to the experienced strictness, I’m just strangely confused that it hasn’t happened yet.

So, how can we use an decoder that is less strict in reading numeric values, but at the same time, the encoder can produce valid output which considers languages like Javascript and encodes int64/uint64 as strings?

You could create a type alias for your int64/uint64 types, and satisfy your own json.Marshaler and json.Unmarshaler interfaces. Any errors would bubble up to the JSON encoder/decoder, and all you’d need to do, is convert them to a literal int64/uint64 type when required.

Here’s a playground link, how a custom int64 marshaller type would look like. Basically all that’s done is that quotes are removed when decoding, and the number itself is converted to a string on encoding. The actual value validation so that e.g. "deadbeef" or "" are considered invalid int values, is retained.

Do we have a particular JSON decoder which we can use as a drop-in to the standard encoding/json package that would use these rules to encode and decode int64/uint64 values? Not one I found. It seems your best option is to generate the structs you want from proto definitions, and then use protojson on that. It’s not a great trade-off so I hope somebody points me to a more reasonable fork of encoding/json that is less strict.

While I have you here...

It would be great if you buy one of my books:

I promise you'll learn a lot more if you buy one. Buying a copy supports me writing more about similar topics. Say thank you and buy my books.

Feel free to send me an email if you want to book my time for consultancy/freelance services. I'm great at APIs, Go, Docker, VueJS and scaling services, among many other things.