Go and JSON encoding/decoding
After a few years of writing JSON APIs for various purposes, with more success than this article may lead you to believe, I have come to the conclusion that the JSON support in the wider Go ecosystem is broken. And it isn’t really Go’s fault, but fixing it so it plays nice with other very common programming languages is something that hasn’t been given much consideration to.
All the problems of JSON encoding and decoding stem from being strict about
the encoding/json
package behavior, imagining an ideal world where multiple
programming languages have the same support for common numeric types like
int64
, or even uint64
(that one is nearly non-portable).
Javascript has signed 53 bit integers (technically a float64 with a 53 bit characteristic, the rest a floating point mantissa). This means, that any int64/uint64 value produced by the standard library encoding/json Go is inherently unsafe when it comes to Javascript, and will be truncated to match the available bit precision of JS numeric types.
At this point, you should be doing a double take, since nearly everything written today is javascript based. I’m not just talking about browsers, even Visual Studio Code is written in JavaScript! Also React Native for Android and iOS apps. Things have gone so far that people regularly cite Atwoods Law:
Any application that can be written in JavaScript, will eventually be written in JavaScript.
But why should you, as a Go developer, really care about int64 in JSON?
This means, that numeric values of that size, which are common for
database IDs generated by twitter snowflake or sony’s
sonyflake or something as trivial as
auto-incrementing values, need to be JSON encoded as strings to be
consumed by the standard JSON.parse
in browsers and NodeJS runtimes.
Now, encoding/json
has allowances built in for encoding numerics as
string, by adding a string type hint to the json tag on the field.
type User struct {
ID int64 `json:"id,string"`
}
The above example will force the standard library json encoder and
decoder to encode the ID
value as a quoted string. Unfortunately, it
also MUST decode the ID value ONLY as a quoted string, unquoted values
are not accepted.
In the case of Javascript, you’re going to have strings and strings will
also be sent back to the APIs, which is cool. But some other languages
which are loosly or strongly typed can handle int64 values without any
problem. So, why not accurately represent the data structures in them
with whatever local int64
type they have?
At this point, we’d have to look at protocol buffers. A proto definition may include types like int64 and uint64, and allowances are already made in their protojson package to encode int64 and uint64 values as strings.
case pref.Int64Kind, pref.Sint64Kind, pref.Uint64Kind,
pref.Sfixed64Kind, pref.Fixed64Kind:
// 64-bit integers are written out as JSON string.
e.WriteString(val.String())
So, protojson actually comes with JS compatibility in mind. Similarly, it will also decode int64 and uint64 values, regardless of if they have been quoted in the encoded JSON. The following is valid json for decoding with protojson, but impossible with the standard encoding/json package:
{"id": 123}
{"id": "123"}
The actual generated code from proto definitions looks somewhat like this:
type User struct {
// omitted internal fields
ID int64 `protobuf:"varint,1,opt,name=ID,proto3" json:"ID,omitempty"`
//...
}
Sadly, the generator doesn’t add a ,string
hint to the json:
tag, so
if you try to encode a this struct with encoding/json, you will produce
JSON that’s not suitable for Javascript, nor compatible with protojson
output. But you are dealing with protobufs at that point, so you can use
the protojson encoder anyway.
If you don’t use protobufs, you’re left with few options:
- add a
,string
hint to all int64/uint64 values, - don’t use encoding/json for decoding JSON, or
- use a custom type alias with flexible encoder/decoder
Before we deal with custom types, there’s a number of JSON libraries that may allow us to decode json without being strict about quotes:
This package is a drop-in replacement for encoding/json, but also follows the same strictness in terms of encoding or decoding values. The main project goal is to provide a stdlib compatible package optimized for speed.
json: cannot unmarshal "\"456\"}" into Go struct field main.User.id. of type int64
Verdict: miss.
Another drop-in replacement, but provides some configuration options,
unfortunately none that would allow reading numbers from quoted or
unquoted JSON. There are however functions named RegisterTypeEncoder
and RegisterTypeDecoder
, which makes me believe that a custom decoder
for int64
is possible. The most promising find so far.
Verdict: maybe.
This package has a unique API that allows partial parsing of JSON data, where you don’t need to parse the complete JSON document to get partial results. Of course, it’s also strict about the expected values:
json: cannot unmarshal string into Go struct field User.id of type int64
It doesn’t implement it’s own JSON decoder, so it relies on the standard library to do the heavy lifting. As such, it produces the same errors.
Verdict: miss.
Doesn’t support scanning JSON into structs. Optimized for speed and for reading individual values where you know the key path beforehand.
Verdict: miss.
- https://github.com/bitly/go-simplejson
- https://github.com/antonholmquist/jason
- https://github.com/Jeffail/gabs
- https://github.com/a8m/djson
All these don’t support scanning JSON into structs. Optimized for programmatic JSON traversal without much reliance on schema, or types. DJSON particularly produces interface{} outputs, which you must type-switch to read out concrete typed values.
Verdict: miss.
And here’s a few others:
- https://github.com/mreiferson/go-ujson
- https://github.com/ugorji/go/codec
- https://github.com/pquerna/ffjson
- https://github.com/mailru/easyjson
Ultimately, the effort to produce what protojson does for any given struct and not just proto messages seems to be non-trivial. So far I haven’t found a package which would mimic protojson behavior on interface{} inputs with concrete int64/uint64 fields/values.
As far as work arounds go:
In javascript, there’s the json-bigint package, which has an absurd number of downloads per week (about 2.5M+ downloads at time of writing). As it’s opt-in, it’s very easy to forget about using it. Ideally, I’d like first-party support for that in JS environments, but alas, it goes against the JSON spec.
On the server-side there’s a json.Number
type, which apparently is a
container type which accepts both quoted or unquoted values. This is a
source of some debate on issue
34472 if this behaviour
strictly corresponds to the documentation or JSON specifications, or if
the documentation should be updated to correspond to the implementation.
type User struct {
ID json.Number `json:"id"`
}
func main() {
user := User{}
if err := json.NewDecoder(strings.NewReader(`{"id":123}`)).Decode(&user); err != nil {
fmt.Println(err)
}
fmt.Println(user.ID)
if err := json.NewDecoder(strings.NewReader(`{"id":"456"}`)).Decode(&user); err != nil {
fmt.Println(err)
}
fmt.Println(user.ID)
}
If you run this, both “123” and “456” values are printed, but actually
getting an int64 value means you need to call the Int64()
method on the
value:
var v int64
v, _ = user.ID.Int64()
If a numeric, particularly int64
type is accepted, the conversion to
that type could fail because of an invalid value as the decoding step of
the JSON fails. By using json.Value
, which is an alias of a string
type, there is effectively no parsing/conversion going on during
decoding, but only when Int64()
is invoked.
The argument being thrown around on the linked issue above is that
numeric values should be encoded as unquoted in JSON for it to be valid,
feels erroneous. While you can’t unquote "abc"
into abc
and expect
the JSON to validate, any JSON with quoted numbers is completely valid.
In fact, even the ,string
tag suggests this:
The “string” option signals that a field is stored as JSON inside a JSON-encoded string. It applies only to fields of string, floating point, integer, or boolean types. This extra level of encoding is sometimes used when communicating with JavaScript programs.
While I understand that this should be opt-in, I don’t get the strictness in supporting only homogenous types when decoding a JSON document. I feel it hurts interoperability with the various languages which don’t support int64 types, especially if they co-exist with other languages that do.
And it also hurts to have this behavior be opt-in via field tags, instead of it being an encoding flag. There are at least two examples in the standard library which come to mind:
- encoding/base64 has 4 differently behaving encoders,
- log has a standard logger, SetFlags() and New(),
Both these examples demonstate that the json package could be expanded with configuration options as well as configure currently unavailable behavior, like quoting int64/uint64 values by default when encoding json.
Obviously, none of this is likely to occur in encoding/json and possibly neither in Go2 without a well thought out proposal, and judging from what I’ve seen, pacifying the objections on that subject might prove futile. But somebody might fork it to solve a number of stackoverflow answers in regards to the experienced strictness, I’m just strangely confused that it hasn’t happened yet.
So, how can we use an decoder that is less strict in reading numeric values, but at the same time, the encoder can produce valid output which considers languages like Javascript and encodes int64/uint64 as strings?
You could create a type alias for your int64/uint64 types, and satisfy
your own json.Marshaler
and json.Unmarshaler
interfaces. Any errors
would bubble up to the JSON encoder/decoder, and all you’d need to do, is
convert them to a literal int64/uint64 type when required.
Here’s a playground link, how a
custom int64 marshaller type would look like. Basically all that’s done
is that quotes are removed when decoding, and the number itself is
converted to a string on encoding. The actual value validation so that
e.g. "deadbeef"
or ""
are considered invalid int values, is retained.
Do we have a particular JSON decoder which we can use as a drop-in to the standard encoding/json package that would use these rules to encode and decode int64/uint64 values? Not one I found. It seems your best option is to generate the structs you want from proto definitions, and then use protojson on that. It’s not a great trade-off so I hope somebody points me to a more reasonable fork of encoding/json that is less strict.
While I have you here...
It would be great if you buy one of my books:
- Go with Databases
- Advent of Go Microservices
- API Foundations in Go
- 12 Factor Apps with Docker and Go
Feel free to send me an email if you want to book my time for consultancy/freelance services. I'm great at APIs, Go, Docker, VueJS and scaling services, among many other things.
Want to stay up to date with new posts?
Stay up to date with new posts about Docker, Go, JavaScript and my thoughts on Technology. I post about twice per month, and notify you when I post. You can also follow me on my Twitter if you prefer.