Reference

Main functions

pdf2image is a light wrapper for the poppler-utils tools that can convert your PDFs into Pillow images.

pdf2image.pdf2image.convert_from_bytes(pdf_file: bytes, dpi: int = 200, output_folder: ~typing.Optional[~typing.Union[str, ~pathlib.PurePath]] = None, first_page: ~typing.Optional[int] = None, last_page: ~typing.Optional[int] = None, fmt: str = 'ppm', jpegopt: ~typing.Optional[~typing.Dict] = None, thread_count: int = 1, userpw: ~typing.Optional[str] = None, ownerpw: ~typing.Optional[str] = None, use_cropbox: bool = False, strict: bool = False, transparent: bool = False, single_file: bool = False, output_file: ~typing.Union[str, ~pathlib.PurePath] = <pdf2image.generators.ThreadSafeGenerator object>, poppler_path: ~typing.Optional[~typing.Union[str, ~pathlib.PurePath]] = None, grayscale: bool = False, size: ~typing.Optional[~typing.Union[~typing.Tuple, int]] = None, paths_only: bool = False, use_pdftocairo: bool = False, timeout: ~typing.Optional[int] = None, hide_annotations: bool = False) → List[Image][source]

Function wrapping pdftoppm and pdftocairo.

Parameters

pdf_bytes (bytes) – Bytes of the PDF that you want to convert
dpi (int, optional) – Image quality in DPI (default 200), defaults to 200
output_folder (Union[str, PurePath], optional) – Write the resulting images to a folder (instead of directly in memory), defaults to None
first_page (int, optional) – First page to process, defaults to None
last_page (int, optional) – Last page to process before stopping, defaults to None
fmt (str, optional) – Output image format, defaults to “ppm”
jpegopt (Dict, optional) – jpeg options quality, progressive, and optimize (only for jpeg format), defaults to None
thread_count (int, optional) – How many threads we are allowed to spawn for processing, defaults to 1
userpw (str, optional) – PDF’s password, defaults to None
ownerpw (str, optional) – PDF’s owner password, defaults to None
use_cropbox (bool, optional) – Use cropbox instead of mediabox, defaults to False
strict (bool, optional) – When a Syntax Error is thrown, it will be raised as an Exception, defaults to False
transparent (bool, optional) – Output with a transparent background instead of a white one, defaults to False
single_file (bool, optional) – Uses the -singlefile option from pdftoppm/pdftocairo, defaults to False
output_file (Any, optional) – What is the output filename or generator, defaults to uuid_generator()
poppler_path (Union[str, PurePath], optional) – Path to look for poppler binaries, defaults to None
grayscale (bool, optional) – Output grayscale image(s), defaults to False
size (Union[Tuple, int], optional) – Size of the resulting image(s), uses the Pillow (width, height) standard, defaults to None
paths_only (bool, optional) – Don’t load image(s), return paths instead (requires output_folder), defaults to False
use_pdftocairo (bool, optional) – Use pdftocairo instead of pdftoppm, may help performance, defaults to False
timeout (int, optional) – Raise PDFPopplerTimeoutError after the given time, defaults to None
hide_annotations (bool, optional) – Hide PDF annotations in the output, defaults to False

Raises

NotImplementedError – Raised when conflicting parameters are given (hide_annotations for pdftocairo)
PDFPopplerTimeoutError – Raised after the timeout for the image processing is exceeded
PDFSyntaxError – Raised if there is a syntax error in the PDF and strict=True

Returns

A list of Pillow images, one for each page between first_page and last_page

Return type

List[Image.Image]

pdf2image.pdf2image.convert_from_path(pdf_path: ~typing.Union[str, ~pathlib.PurePath], dpi: int = 200, output_folder: ~typing.Optional[~typing.Union[str, ~pathlib.PurePath]] = None, first_page: ~typing.Optional[int] = None, last_page: ~typing.Optional[int] = None, fmt: str = 'ppm', jpegopt: ~typing.Optional[~typing.Dict] = None, thread_count: int = 1, userpw: ~typing.Optional[str] = None, ownerpw: ~typing.Optional[str] = None, use_cropbox: bool = False, strict: bool = False, transparent: bool = False, single_file: bool = False, output_file: ~typing.Any = <pdf2image.generators.ThreadSafeGenerator object>, poppler_path: ~typing.Optional[~typing.Union[str, ~pathlib.PurePath]] = None, grayscale: bool = False, size: ~typing.Optional[~typing.Union[~typing.Tuple, int]] = None, paths_only: bool = False, use_pdftocairo: bool = False, timeout: ~typing.Optional[int] = None, hide_annotations: bool = False) → List[Image][source]

Function wrapping pdftoppm and pdftocairo

Parameters

pdf_path (Union[str, PurePath]) – Path to the PDF that you want to convert
dpi (int, optional) – Image quality in DPI (default 200), defaults to 200
output_folder (Union[str, PurePath], optional) – Write the resulting images to a folder (instead of directly in memory), defaults to None
first_page (int, optional) – First page to process, defaults to None
last_page (int, optional) – Last page to process before stopping, defaults to None
fmt (str, optional) – Output image format, defaults to “ppm”
jpegopt (Dict, optional) – jpeg options quality, progressive, and optimize (only for jpeg format), defaults to None
thread_count (int, optional) – How many threads we are allowed to spawn for processing, defaults to 1
userpw (str, optional) – PDF’s password, defaults to None
ownerpw (str, optional) – PDF’s owner password, defaults to None
use_cropbox (bool, optional) – Use cropbox instead of mediabox, defaults to False
strict (bool, optional) – When a Syntax Error is thrown, it will be raised as an Exception, defaults to False
transparent (bool, optional) – Output with a transparent background instead of a white one, defaults to False
single_file (bool, optional) – Uses the -singlefile option from pdftoppm/pdftocairo, defaults to False
output_file (Any, optional) – What is the output filename or generator, defaults to uuid_generator()
poppler_path (Union[str, PurePath], optional) – Path to look for poppler binaries, defaults to None
grayscale (bool, optional) – Output grayscale image(s), defaults to False
size (Union[Tuple, int], optional) – Size of the resulting image(s), uses the Pillow (width, height) standard, defaults to None
paths_only (bool, optional) – Don’t load image(s), return paths instead (requires output_folder), defaults to False
use_pdftocairo (bool, optional) – Use pdftocairo instead of pdftoppm, may help performance, defaults to False
timeout (int, optional) – Raise PDFPopplerTimeoutError after the given time, defaults to None
hide_annotations (bool, optional) – Hide PDF annotations in the output, defaults to False

Raises

NotImplementedError – Raised when conflicting parameters are given (hide_annotations for pdftocairo)
PDFPopplerTimeoutError – Raised after the timeout for the image processing is exceeded
PDFSyntaxError – Raised if there is a syntax error in the PDF and strict=True

Returns

A list of Pillow images, one for each page between first_page and last_page

Return type

List[Image.Image]

pdf2image.pdf2image.pdfinfo_from_bytes(pdf_bytes: bytes, userpw: Optional[str] = None, ownerpw: Optional[str] = None, poppler_path: Optional[str] = None, rawdates: bool = False, timeout: Optional[int] = None) → Dict[source]

Function wrapping poppler’s pdfinfo utility and returns the result as a dictionary.

Parameters

pdf_bytes (bytes) – Bytes of the PDF that you want to convert
userpw (str, optional) – PDF’s password, defaults to None
ownerpw (str, optional) – PDF’s owner password, defaults to None
poppler_path (Union[str, PurePath], optional) – Path to look for poppler binaries, defaults to None
rawdates (bool, optional) – Return the undecoded data strings, defaults to False
timeout (int, optional) – Raise PDFPopplerTimeoutError after the given time, defaults to None

Returns

Dictionary containing various information on the PDF

Return type

Dict

pdf2image.pdf2image.pdfinfo_from_path(pdf_path: str, userpw: Optional[str] = None, ownerpw: Optional[str] = None, poppler_path: Optional[str] = None, rawdates: bool = False, timeout: Optional[int] = None) → Dict[source]

Function wrapping poppler’s pdfinfo utility and returns the result as a dictionary.

Parameters

pdf_path (str) – Path to the PDF that you want to convert
userpw (str, optional) – PDF’s password, defaults to None
ownerpw (str, optional) – PDF’s owner password, defaults to None
poppler_path (Union[str, PurePath], optional) – Path to look for poppler binaries, defaults to None
rawdates (bool, optional) – Return the undecoded data strings, defaults to False
timeout (int, optional) – Raise PDFPopplerTimeoutError after the given time, defaults to None

Raises

PDFPopplerTimeoutError – Raised after the timeout for the image processing is exceeded
PDFInfoNotInstalledError – Raised if pdfinfo is not installed
PDFPageCountError – Raised if the output could not be parsed

Returns

Dictionary containing various information on the PDF

Return type

Dict

Exceptions

Define exceptions specific to pdf2image

exception pdf2image.exceptions.PDFInfoNotInstalledError[source]: Raised when pdfinfo is not installed

exception pdf2image.exceptions.PDFPageCountError[source]: Raised when the pdfinfo was unable to retrieve the page count

exception pdf2image.exceptions.PDFPopplerTimeoutError[source]: Raised when the timeout is exceeded while converting a PDF

exception pdf2image.exceptions.PDFSyntaxError[source]: Raised when a syntax error was thrown during rendering

exception pdf2image.exceptions.PopplerNotInstalledError[source]: Raised when poppler is not installed

Parsers

pdf2image custom buffer parsers

pdf2image.parsers.parse_buffer_to_jpeg(data: bytes) → List[Image][source]

Parse JPEG file bytes to Pillow Image

Parameters: data (bytes) – pdftoppm/pdftocairo output bytes
Returns: List of JPEG images parsed from the output
Return type: List[Image.Image]

pdf2image.parsers.parse_buffer_to_pgm(data: bytes) → List[Image][source]

Parse PGM file bytes to Pillow Image

Parameters: data (bytes) – pdftoppm/pdftocairo output bytes
Returns: List of PGM images parsed from the output
Return type: List[Image.Image]

pdf2image.parsers.parse_buffer_to_png(data: bytes) → List[Image][source]

Parse PNG file bytes to Pillow Image

Parameters: data (bytes) – pdftoppm/pdftocairo output bytes
Returns: List of PNG images parsed from the output
Return type: List[Image.Image]

pdf2image.parsers.parse_buffer_to_ppm(data: bytes) → List[Image][source]

Parse PPM file bytes to Pillow Image

Parameters: data (bytes) – pdftoppm/pdftocairo output bytes
Returns: List of PPM images parsed from the output
Return type: List[Image.Image]