jsonlines¶
jsonlines
is a Python library to simplify working with jsonlines
and ndjson data.
This data format is straight-forward: it is simply one valid JSON value per line, encoded using UTF-8. While code to consume and create such data is not that complex, it quickly becomes non-trivial enough to warrant a dedicated library when adding data validation, error handling, support for both binary and text streams, and so on. This small library implements all that (and more!) so that applications using this format do not have to reinvent the wheel.
Features¶
Sensible behaviour for most use cases
transparently handles
str
andbytes
, both for input and outputsupports multiple JSON libraries, e.g.
json
(standard library),orjson
,ujson
transparently handles UTF-8 BOM (if present)
useful error messages
prevents gotchas, e.g. uses standard-compliant line breaking, unlike str.splitlines
Convenient
open()
functionFlexible
Reader
wraps a file-like object or any other iterable yielding lines
can read lines directly via the
read()
methodcan be used as an iterator, either directly or via the
iter()
methodcan validate data types, including None checks
can skip invalid lines during iteration
provides decent error messages
can be used as a context manager
allows complete control over decoding using a custom
loads
callable
Flexible
Writer
wraps a file-like object
can produce compact output
can sort keys (deterministic output)
can flush the underlying stream after each write
can be used as a context manager
allows complete control over encoding using a custom
dumps
callable
Installation¶
pip install jsonlines
The supported Python versions are 3.8+.
User guide¶
Import the jsonlines
module to get started:
import jsonlines
The convenience function jsonlines.open()
takes a file name
and returns either a reader or writer, making simple cases extremely
simple:
with jsonlines.open('input.jsonl') as reader:
for obj in reader:
...
with jsonlines.open('output.jsonl', mode='w') as writer:
writer.write(...)
A Reader
typically wraps a file-like object:
fp = io.BytesIO(...) # readable file-like object
reader = jsonlines.Reader(fp)
first = reader.read()
second = reader.read()
reader.close()
fp.close()
Instead of a file-like object, any iterable yielding JSON encoded strings can be provided:
lines = ['1', '2', '3']
reader = jsonlines.Reader(lines)
While the Reader.read()
method can be used directly, it is
often more convenient to use iteration:
for obj in reader:
...
Custom iteration flags, such as type checks, can be specified by
calling Reader.iter()
instead:
for obj in reader.iter(type=dict, skip_invalid=True):
...
A Writer
wraps a file-like object, and can write a single
object, or multiple objects at once:
fp = io.BytesIO() # writable file-like object
writer = jsonlines.Writer(fp)
writer.write(...)
writer.write_all([
...,
...,
...,
])
writer.close()
fp.close()
Both readers and writers can be used as a context manager, in which case they will be closed automatically. Note that this will not close a passed-in file-like object since that object’s life span is controlled by the calling code. Example:
fp = io.BytesIO() # file-like object
with jsonlines.Writer(fp) as writer:
writer.write(...)
fp.close()
Note that the jsonlines.open()
function does close the
opened file, since the open file is not explicitly opened by the
calling code. That means no .close()
is needed there:
with jsonlines.open('input.jsonl') as reader:
...
This should be enough to get started. See the API docs below for more details.
API¶
- jsonlines.open(file: str | bytes | int | PathLike, mode: Literal['r'] = 'r', *, loads: Callable[[str | bytes], Any] | None = None) Reader ¶
- jsonlines.open(file: str | bytes | int | PathLike, mode: Literal['w', 'a', 'x'], *, dumps: Callable[[Any], str | bytes] | None = None, compact: bool | None = None, sort_keys: bool | None = None, flush: bool | None = None) Writer
- jsonlines.open(file: str | bytes | int | PathLike, mode: str = 'r', *, loads: Callable[[str | bytes], Any] | None = None, dumps: Callable[[Any], str | bytes] | None = None, compact: bool | None = None, sort_keys: bool | None = None, flush: bool | None = None) Reader | Writer
Open a jsonlines file for reading or writing.
This is a convenience function to open a file and wrap it in either a
Reader
orWriter
instance, depending on the specified mode.Additional keyword arguments will be passed on to the reader and writer; see their documentation for available options.
The resulting reader or writer must be closed after use by the caller, which will also close the opened file. This can be done by calling
.close()
, but the easiest way to ensure proper resource finalisation is to use awith
block (context manager), e.g.with jsonlines.open('out.jsonl', mode='w') as writer: writer.write(...)
- Parameters:
file – name or ‘path-like object’ of the file to open
mode – whether to open the file for reading (
r
), writing (w
), appending (a
), or exclusive creation (x
).
- class jsonlines.Reader(file_or_iterable: ~typing.IO[str] | ~typing.IO[bytes] | ~typing.Iterable[str | bytes], *, loads: ~typing.Callable[[str | bytes], ~typing.Any] = <function loads>)¶
Reader for the jsonlines format.
The first argument must be an iterable that yields JSON encoded strings. Usually this will be a readable file-like object, such as an open file or an
io.TextIO
instance, but it can also be something else as long as it yields strings when iterated over.Instances are iterable and can be used as a context manager.
The loads argument can be used to replace the standard json decoder. If specified, it must be a callable that accepts a (unicode) string and returns the decoded object.
- Parameters:
file_or_iterable – file-like object or iterable yielding lines as strings
loads – custom json decoder callable
- close() None ¶
Close this reader/writer.
This closes the underlying file if that file has been opened by this reader/writer. When an already opened file-like object was provided, the caller is responsible for closing it.
- iter(*, type: Literal[None] = None, allow_none: Literal[False] = False, skip_empty: bool = False, skip_invalid: bool = False) Iterator[Dict[str, Any] | List[Any] | bool | float | int | str] ¶
- iter(*, type: Literal[None] = None, allow_none: Literal[True], skip_empty: bool = False, skip_invalid: bool = False) Iterator[Dict[str, Any] | List[Any] | bool | float | int | str]
- iter(*, type: Type[TJSONValue], allow_none: Literal[False] = False, skip_empty: bool = False, skip_invalid: bool = False) Iterator[TJSONValue]
- iter(*, type: Type[TJSONValue], allow_none: Literal[True], skip_empty: bool = False, skip_invalid: bool = False) Iterator[TJSONValue | None]
- iter(*, type: Type[TJSONValue] | None = None, allow_none: bool = False, skip_empty: bool = False, skip_invalid: bool = False) Iterator[TJSONValue | None]
Iterate over all lines.
This is the iterator equivalent to repeatedly calling
read()
. If no arguments are specified, this is the same as directly iterating over thisReader
instance.When skip_invalid is set to
True
, invalid lines will be silently ignored.See
read()
for a description of the other arguments.
- read(*, type: Literal[None] = None, allow_none: Literal[False] = False, skip_empty: bool = False) Dict[str, Any] | List[Any] | bool | float | int | str ¶
- read(*, type: Literal[None] = None, allow_none: Literal[True], skip_empty: bool = False) Dict[str, Any] | List[Any] | bool | float | int | str | None
- read(*, type: Type[TJSONValue], allow_none: Literal[False] = False, skip_empty: bool = False) TJSONValue
- read(*, type: Type[TJSONValue], allow_none: Literal[True], skip_empty: bool = False) TJSONValue | None
- read(*, type: Type[Any] | None = None, allow_none: bool = False, skip_empty: bool = False) Dict[str, Any] | List[Any] | bool | float | int | str | None
Read and decode a line.
The optional type argument specifies the expected data type. Supported types are
dict
,list
,str
,int
,float
, andbool
. When specified, non-conforming lines result inInvalidLineError
.By default, input lines containing
null
(in JSON) are considered invalid, and will causeInvalidLineError
. The allow_none argument can be used to change this behaviour, in which caseNone
will be returned instead.If skip_empty is set to
True
, empty lines and lines containing only whitespace are silently skipped.
- class jsonlines.Writer(fp: ~typing.IO[str] | ~typing.IO[bytes] = None, *, compact: bool = False, sort_keys: bool = False, flush: bool = False, dumps: ~typing.Callable[[~typing.Any], str | bytes] = <function default_dumps>)¶
Writer for the jsonlines format.
Instances can be used as a context manager.
The fp argument must be a file-like object with a
.write()
method accepting either text (unicode) or bytes.The compact argument can be used to to produce smaller output.
The sort_keys argument can be used to sort keys in json objects, and will produce deterministic output.
For more control, provide a a custom encoder callable using the dumps argument. The callable must produce (unicode) string output. If specified, the compact and sort arguments will be ignored.
When the flush argument is set to
True
, the writer will callfp.flush()
after each written line.- Parameters:
fp – writable file-like object
compact – whether to use a compact output format
sort_keys – whether to sort object keys
dumps – custom encoder callable
flush – whether to flush the file-like object after writing each line
- close() None ¶
Close this reader/writer.
This closes the underlying file if that file has been opened by this reader/writer. When an already opened file-like object was provided, the caller is responsible for closing it.
- write(obj: Any) int ¶
Encode and write a single object.
- Parameters:
obj – the object to encode and write
- Returns:
number of characters or bytes written
- write_all(iterable: Iterable[Any]) int ¶
Encode and write multiple objects.
- Parameters:
iterable – an iterable of objects
- Returns:
number of characters or bytes written
- class jsonlines.Error(message: str)¶
Base error class.
- class jsonlines.InvalidLineError(message: str, line: str | bytes, lineno: int)¶
Error raised when an invalid line is encountered.
This happens when the line does not contain valid JSON, or if a specific data type has been requested, and the line contained a different data type.
The original line itself is stored on the exception instance as the
.line
attribute, and the line number as.lineno
.This class subclasses both
jsonlines.Error
and the built-inValueError
.- line: str | bytes¶
The invalid line
- lineno: int¶
The line number
Contributing¶
The source code and issue tracker for this package can be found on GitHub:
Version history¶
4.0.0, released at 2023-09-01
3.1.0, released at 2022-07-01
Return number of chars/bytes written by
Writer.write()
andwrite_all()
(#73)allow
mode='x'
inopen()
to open a file for exclusive creation (#74)
3.0.0, released at 2021-12-04
2.0.0, released at 2021-01-04
1.2.0, released at 2017-08-17
1.1.3, released at 2017-07-19
fix incomplete iteration when given list containing empty strings (#30)
1.1.2, released at 2017-06-26
documentation tweaks
enable building universal wheels
1.1.1, released at 2017-06-04
include licensing information in sdist (#27)
doc tweaks
1.1.0, released at 2016-10-07
rename first argument to
Reader
since it is not required to be a file-like objectactually check that the reader/writer is not closed when performing operations
improved repr() output
doc tweaks
1.0.0, released at 2016-10-05
minimum Python versions are Python 3.4+ and Python 2.7+
implemented lots of configuration options
add proper exceptions handling
add proper documentation
switch to semver
0.0.1, released at 2015-03-02
initial release with basic functionality
License¶
(This is the OSI approved 3-clause “New BSD License”.)
Copyright © 2016, wouter bolsterlee
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the author nor the names of the contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.