jsonlines

jsonlines is a Python library to simplify working with jsonlines and ndjson data.

This data format is straight-forward: it is simply one valid JSON value per line, encoded using UTF-8. While code to consume and create such data is not that complex, it quickly becomes non-trivial enough to warrant a dedicated library when adding data validation, error handling, support for both binary and text streams, and so on. This small library implements all that (and more!) so that applications using this format do not have to reinvent the wheel.

Features

  • Sensible behaviour for most use cases
    • transparently handles str and bytes, both for input and output
    • supports multiple JSON libraries, e.g. json (standard library), orjson, ujson
    • transparently handles UTF-8 BOM (if present)
    • useful error messages
    • prevents gotchas, e.g. uses standard-compliant line breaking, unlike str.splitlines
  • Convenient open() function
    • makes simple cases trivial to write
    • takes a file name and a mode
    • returns either a Reader or Writer instance
    • can be used as a context manager
  • Flexible Reader
    • wraps a file-like object or any other iterable yielding lines
    • can read lines directly via the read() method
    • can be used as an iterator, either directly or via the iter() method
    • can validate data types, including None checks
    • can skip invalid lines during iteration
    • provides decent error messages
    • can be used as a context manager
    • allows complete control over decoding using a custom loads callable
  • Flexible Writer
    • wraps a file-like object
    • can produce compact output
    • can sort keys (deterministic output)
    • can flush the underlying stream after each write
    • can be used as a context manager
    • allows complete control over encoding using a custom dumps callable

Installation

pip install jsonlines

The supported Python versions are 3.8+.

User guide

Import the jsonlines module to get started:

import jsonlines

The convenience function jsonlines.open() takes a file name and returns either a reader or writer, making simple cases extremely simple:

with jsonlines.open('input.jsonl') as reader:
    for obj in reader:
        ...

with jsonlines.open('output.jsonl', mode='w') as writer:
    writer.write(...)

A Reader typically wraps a file-like object:

fp = io.BytesIO(...)  # readable file-like object
reader = jsonlines.Reader(fp)
first = reader.read()
second = reader.read()
reader.close()
fp.close()

Instead of a file-like object, any iterable yielding JSON encoded strings can be provided:

lines = ['1', '2', '3']
reader = jsonlines.Reader(lines)

While the Reader.read() method can be used directly, it is often more convenient to use iteration:

for obj in reader:
    ...

Custom iteration flags, such as type checks, can be specified by calling Reader.iter() instead:

for obj in reader.iter(type=dict, skip_invalid=True):
    ...

A Writer wraps a file-like object, and can write a single object, or multiple objects at once:

fp = io.BytesIO()  # writable file-like object
writer = jsonlines.Writer(fp)
writer.write(...)
writer.write_all([
    ...,
    ...,
    ...,
])
writer.close()
fp.close()

Both readers and writers can be used as a context manager, in which case they will be closed automatically. Note that this will not close a passed-in file-like object since that object’s life span is controlled by the calling code. Example:

fp = io.BytesIO()  # file-like object
with jsonlines.Writer(fp) as writer:
    writer.write(...)
fp.close()

Note that the jsonlines.open() function does close the opened file, since the open file is not explicitly opened by the calling code. That means no .close() is needed there:

with jsonlines.open('input.jsonl') as reader:
    ...

This should be enough to get started. See the API docs below for more details.

API

jsonlines.open(file: Union[str, bytes, int, os.PathLike], mode: str = 'r', *, loads: Optional[Callable[[Union[str, bytes]], Any], None] = None, dumps: Optional[Callable[[Any], Union[str, bytes]], None] = None, compact: Optional[bool, None] = None, sort_keys: Optional[bool, None] = None, flush: Optional[bool, None] = None) → Union[jsonlines.jsonlines.Reader, jsonlines.jsonlines.Writer]

Open a jsonlines file for reading or writing.

This is a convenience function to open a file and wrap it in either a Reader or Writer instance, depending on the specified mode.

Additional keyword arguments will be passed on to the reader and writer; see their documentation for available options.

The resulting reader or writer must be closed after use by the caller, which will also close the opened file. This can be done by calling .close(), but the easiest way to ensure proper resource finalisation is to use a with block (context manager), e.g.

with jsonlines.open('out.jsonl', mode='w') as writer:
    writer.write(...)
Parameters:
  • file – name or ‘path-like object’ of the file to open
  • mode – whether to open the file for reading (r), writing (w), appending (a), or exclusive creation (x).
class jsonlines.Reader(file_or_iterable: Union[IO[str], IO[bytes], Iterable[Union[str, bytes]]], *, loads: Callable[[Union[str, bytes]], Any] = <function loads>)

Reader for the jsonlines format.

The first argument must be an iterable that yields JSON encoded strings. Usually this will be a readable file-like object, such as an open file or an io.TextIO instance, but it can also be something else as long as it yields strings when iterated over.

Instances are iterable and can be used as a context manager.

The loads argument can be used to replace the standard json decoder. If specified, it must be a callable that accepts a (unicode) string and returns the decoded object.

Parameters:
  • file_or_iterable – file-like object or iterable yielding lines as strings
  • loads – custom json decoder callable
close() → None

Close this reader/writer.

This closes the underlying file if that file has been opened by this reader/writer. When an already opened file-like object was provided, the caller is responsible for closing it.

iter(type: Optional[Type[Any], None] = None, allow_none: bool = False, skip_empty: bool = False, skip_invalid: bool = False) → Iterator[Union[Dict[str, Any], List[Any], bool, float, int, str, None]]

Iterate over all lines.

This is the iterator equivalent to repeatedly calling read(). If no arguments are specified, this is the same as directly iterating over this Reader instance.

When skip_invalid is set to True, invalid lines will be silently ignored.

See read() for a description of the other arguments.

read(*, type: Optional[Type[Any], None] = None, allow_none: bool = False, skip_empty: bool = False) → Union[Dict[str, Any], List[Any], bool, float, int, str, None]

Read and decode a line.

The optional type argument specifies the expected data type. Supported types are dict, list, str, int, float, and bool. When specified, non-conforming lines result in InvalidLineError.

By default, input lines containing null (in JSON) are considered invalid, and will cause InvalidLineError. The allow_none argument can be used to change this behaviour, in which case None will be returned instead.

If skip_empty is set to True, empty lines and lines containing only whitespace are silently skipped.

class jsonlines.Writer(fp: Union[IO[str], IO[bytes]] = None, *, compact: bool = False, sort_keys: bool = False, flush: bool = False, dumps: Callable[[Any], Union[str, bytes]] = <function default_dumps>)

Writer for the jsonlines format.

Instances can be used as a context manager.

The fp argument must be a file-like object with a .write() method accepting either text (unicode) or bytes.

The compact argument can be used to to produce smaller output.

The sort_keys argument can be used to sort keys in json objects, and will produce deterministic output.

For more control, provide a a custom encoder callable using the dumps argument. The callable must produce (unicode) string output. If specified, the compact and sort arguments will be ignored.

When the flush argument is set to True, the writer will call fp.flush() after each written line.

Parameters:
  • fp – writable file-like object
  • compact – whether to use a compact output format
  • sort_keys – whether to sort object keys
  • dumps – custom encoder callable
  • flush – whether to flush the file-like object after writing each line
close() → None

Close this reader/writer.

This closes the underlying file if that file has been opened by this reader/writer. When an already opened file-like object was provided, the caller is responsible for closing it.

write(obj: Any) → int

Encode and write a single object.

Parameters:obj – the object to encode and write
Returns:number of characters or bytes written
write_all(iterable: Iterable[Any]) → int

Encode and write multiple objects.

Parameters:iterable – an iterable of objects
Returns:number of characters or bytes written
class jsonlines.Error(message: str)

Base error class.

class jsonlines.InvalidLineError(message: str, line: Union[str, bytes], lineno: int)

Error raised when an invalid line is encountered.

This happens when the line does not contain valid JSON, or if a specific data type has been requested, and the line contained a different data type.

The original line itself is stored on the exception instance as the .line attribute, and the line number as .lineno.

This class subclasses both jsonlines.Error and the built-in ValueError.

line = None

The invalid line

lineno = None

The line number

Contributing

The source code and issue tracker for this package can be found on GitHub:

Version history

  • 4.0.0, released at 2023-09-01
    • use ‘orjson’ or ‘ujson’ for reading if available (#81)
    • drop support for end-of-life Python versions; this package is now Python 3.8+ only. (#80, #80)
  • 3.1.0, released at 2022-07-01
  • 3.0.0, released at 2021-12-04
    • add type annotations; adopt mypy in strict mode (#58, #62)
    • ignore UTF-8 BOM sequences in various scenarios (#69)
    • support dumps() callables returning bytes again (#64)
    • add basic support for rfc7464 text sequences (#61)
    • drop support for numbers.Number in type= arguments (#63)
  • 2.0.0, released at 2021-01-04
    • drop support for end-of-life Python versions; this package is now Python 3.6+ only. (#54, #51)
  • 1.2.0, released at 2017-08-17
    • allow mode='a' in open() to allow appending to an existing file (#31)
  • 1.1.3, released at 2017-07-19
    • fix incomplete iteration when given list containing empty strings (#30)
  • 1.1.2, released at 2017-06-26
    • documentation tweaks
    • enable building universal wheels
  • 1.1.1, released at 2017-06-04
    • include licensing information in sdist (#27)
    • doc tweaks
  • 1.1.0, released at 2016-10-07
    • rename first argument to Reader since it is not required to be a file-like object
    • actually check that the reader/writer is not closed when performing operations
    • improved repr() output
    • doc tweaks
  • 1.0.0, released at 2016-10-05
    • minimum Python versions are Python 3.4+ and Python 2.7+
    • implemented lots of configuration options
    • add proper exceptions handling
    • add proper documentation
    • switch to semver
  • 0.0.1, released at 2015-03-02
    • initial release with basic functionality

License

(This is the OSI approved 3-clause “New BSD License”.)

Copyright © 2016, wouter bolsterlee

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  • Neither the name of the author nor the names of the contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.