Reference

Main functions

pdf2image is a light wrapper for the poppler-utils tools that can convert your PDFs into Pillow images.

pdf2image.pdf2image.convert_from_bytes(pdf_file: bytes, dpi: int = 200, output_folder: ~typing.Optional[~typing.Union[str, ~pathlib.PurePath]] = None, first_page: ~typing.Optional[int] = None, last_page: ~typing.Optional[int] = None, fmt: str = 'ppm', jpegopt: ~typing.Optional[~typing.Dict] = None, thread_count: int = 1, userpw: ~typing.Optional[str] = None, ownerpw: ~typing.Optional[str] = None, use_cropbox: bool = False, strict: bool = False, transparent: bool = False, single_file: bool = False, output_file: ~typing.Union[str, ~pathlib.PurePath] = <pdf2image.generators.ThreadSafeGenerator object>, poppler_path: ~typing.Optional[~typing.Union[str, ~pathlib.PurePath]] = None, grayscale: bool = False, size: ~typing.Optional[~typing.Union[~typing.Tuple, int]] = None, paths_only: bool = False, use_pdftocairo: bool = False, timeout: ~typing.Optional[int] = None, hide_annotations: bool = False) List[Image][source]

Function wrapping pdftoppm and pdftocairo.

Parameters
  • pdf_bytes (bytes) – Bytes of the PDF that you want to convert

  • dpi (int, optional) – Image quality in DPI (default 200), defaults to 200

  • output_folder (Union[str, PurePath], optional) – Write the resulting images to a folder (instead of directly in memory), defaults to None

  • first_page (int, optional) – First page to process, defaults to None

  • last_page (int, optional) – Last page to process before stopping, defaults to None

  • fmt (str, optional) – Output image format, defaults to “ppm”

  • jpegopt (Dict, optional) – jpeg options quality, progressive, and optimize (only for jpeg format), defaults to None

  • thread_count (int, optional) – How many threads we are allowed to spawn for processing, defaults to 1

  • userpw (str, optional) – PDF’s password, defaults to None

  • ownerpw (str, optional) – PDF’s owner password, defaults to None

  • use_cropbox (bool, optional) – Use cropbox instead of mediabox, defaults to False

  • strict (bool, optional) – When a Syntax Error is thrown, it will be raised as an Exception, defaults to False

  • transparent (bool, optional) – Output with a transparent background instead of a white one, defaults to False

  • single_file (bool, optional) – Uses the -singlefile option from pdftoppm/pdftocairo, defaults to False

  • output_file (Any, optional) – What is the output filename or generator, defaults to uuid_generator()

  • poppler_path (Union[str, PurePath], optional) – Path to look for poppler binaries, defaults to None

  • grayscale (bool, optional) – Output grayscale image(s), defaults to False

  • size (Union[Tuple, int], optional) – Size of the resulting image(s), uses the Pillow (width, height) standard, defaults to None

  • paths_only (bool, optional) – Don’t load image(s), return paths instead (requires output_folder), defaults to False

  • use_pdftocairo (bool, optional) – Use pdftocairo instead of pdftoppm, may help performance, defaults to False

  • timeout (int, optional) – Raise PDFPopplerTimeoutError after the given time, defaults to None

  • hide_annotations (bool, optional) – Hide PDF annotations in the output, defaults to False

Raises
  • NotImplementedError – Raised when conflicting parameters are given (hide_annotations for pdftocairo)

  • PDFPopplerTimeoutError – Raised after the timeout for the image processing is exceeded

  • PDFSyntaxError – Raised if there is a syntax error in the PDF and strict=True

Returns

A list of Pillow images, one for each page between first_page and last_page

Return type

List[Image.Image]

pdf2image.pdf2image.convert_from_path(pdf_path: ~typing.Union[str, ~pathlib.PurePath], dpi: int = 200, output_folder: ~typing.Optional[~typing.Union[str, ~pathlib.PurePath]] = None, first_page: ~typing.Optional[int] = None, last_page: ~typing.Optional[int] = None, fmt: str = 'ppm', jpegopt: ~typing.Optional[~typing.Dict] = None, thread_count: int = 1, userpw: ~typing.Optional[str] = None, ownerpw: ~typing.Optional[str] = None, use_cropbox: bool = False, strict: bool = False, transparent: bool = False, single_file: bool = False, output_file: ~typing.Any = <pdf2image.generators.ThreadSafeGenerator object>, poppler_path: ~typing.Optional[~typing.Union[str, ~pathlib.PurePath]] = None, grayscale: bool = False, size: ~typing.Optional[~typing.Union[~typing.Tuple, int]] = None, paths_only: bool = False, use_pdftocairo: bool = False, timeout: ~typing.Optional[int] = None, hide_annotations: bool = False) List[Image][source]

Function wrapping pdftoppm and pdftocairo

Parameters
  • pdf_path (Union[str, PurePath]) – Path to the PDF that you want to convert

  • dpi (int, optional) – Image quality in DPI (default 200), defaults to 200

  • output_folder (Union[str, PurePath], optional) – Write the resulting images to a folder (instead of directly in memory), defaults to None

  • first_page (int, optional) – First page to process, defaults to None

  • last_page (int, optional) – Last page to process before stopping, defaults to None

  • fmt (str, optional) – Output image format, defaults to “ppm”

  • jpegopt (Dict, optional) – jpeg options quality, progressive, and optimize (only for jpeg format), defaults to None

  • thread_count (int, optional) – How many threads we are allowed to spawn for processing, defaults to 1

  • userpw (str, optional) – PDF’s password, defaults to None

  • ownerpw (str, optional) – PDF’s owner password, defaults to None

  • use_cropbox (bool, optional) – Use cropbox instead of mediabox, defaults to False

  • strict (bool, optional) – When a Syntax Error is thrown, it will be raised as an Exception, defaults to False

  • transparent (bool, optional) – Output with a transparent background instead of a white one, defaults to False

  • single_file (bool, optional) – Uses the -singlefile option from pdftoppm/pdftocairo, defaults to False

  • output_file (Any, optional) – What is the output filename or generator, defaults to uuid_generator()

  • poppler_path (Union[str, PurePath], optional) – Path to look for poppler binaries, defaults to None

  • grayscale (bool, optional) – Output grayscale image(s), defaults to False

  • size (Union[Tuple, int], optional) – Size of the resulting image(s), uses the Pillow (width, height) standard, defaults to None

  • paths_only (bool, optional) – Don’t load image(s), return paths instead (requires output_folder), defaults to False

  • use_pdftocairo (bool, optional) – Use pdftocairo instead of pdftoppm, may help performance, defaults to False

  • timeout (int, optional) – Raise PDFPopplerTimeoutError after the given time, defaults to None

  • hide_annotations (bool, optional) – Hide PDF annotations in the output, defaults to False

Raises
  • NotImplementedError – Raised when conflicting parameters are given (hide_annotations for pdftocairo)

  • PDFPopplerTimeoutError – Raised after the timeout for the image processing is exceeded

  • PDFSyntaxError – Raised if there is a syntax error in the PDF and strict=True

Returns

A list of Pillow images, one for each page between first_page and last_page

Return type

List[Image.Image]

pdf2image.pdf2image.pdfinfo_from_bytes(pdf_bytes: bytes, userpw: Optional[str] = None, ownerpw: Optional[str] = None, poppler_path: Optional[str] = None, rawdates: bool = False, timeout: Optional[int] = None) Dict[source]

Function wrapping poppler’s pdfinfo utility and returns the result as a dictionary.

Parameters
  • pdf_bytes (bytes) – Bytes of the PDF that you want to convert

  • userpw (str, optional) – PDF’s password, defaults to None

  • ownerpw (str, optional) – PDF’s owner password, defaults to None

  • poppler_path (Union[str, PurePath], optional) – Path to look for poppler binaries, defaults to None

  • rawdates (bool, optional) – Return the undecoded data strings, defaults to False

  • timeout (int, optional) – Raise PDFPopplerTimeoutError after the given time, defaults to None

Returns

Dictionary containing various information on the PDF

Return type

Dict

pdf2image.pdf2image.pdfinfo_from_path(pdf_path: str, userpw: Optional[str] = None, ownerpw: Optional[str] = None, poppler_path: Optional[str] = None, rawdates: bool = False, timeout: Optional[int] = None) Dict[source]

Function wrapping poppler’s pdfinfo utility and returns the result as a dictionary.

Parameters
  • pdf_path (str) – Path to the PDF that you want to convert

  • userpw (str, optional) – PDF’s password, defaults to None

  • ownerpw (str, optional) – PDF’s owner password, defaults to None

  • poppler_path (Union[str, PurePath], optional) – Path to look for poppler binaries, defaults to None

  • rawdates (bool, optional) – Return the undecoded data strings, defaults to False

  • timeout (int, optional) – Raise PDFPopplerTimeoutError after the given time, defaults to None

Raises
Returns

Dictionary containing various information on the PDF

Return type

Dict

Exceptions

Define exceptions specific to pdf2image

exception pdf2image.exceptions.PDFInfoNotInstalledError[source]

Raised when pdfinfo is not installed

exception pdf2image.exceptions.PDFPageCountError[source]

Raised when the pdfinfo was unable to retrieve the page count

exception pdf2image.exceptions.PDFPopplerTimeoutError[source]

Raised when the timeout is exceeded while converting a PDF

exception pdf2image.exceptions.PDFSyntaxError[source]

Raised when a syntax error was thrown during rendering

exception pdf2image.exceptions.PopplerNotInstalledError[source]

Raised when poppler is not installed

Parsers

pdf2image custom buffer parsers

pdf2image.parsers.parse_buffer_to_jpeg(data: bytes) List[Image][source]

Parse JPEG file bytes to Pillow Image

Parameters

data (bytes) – pdftoppm/pdftocairo output bytes

Returns

List of JPEG images parsed from the output

Return type

List[Image.Image]

pdf2image.parsers.parse_buffer_to_pgm(data: bytes) List[Image][source]

Parse PGM file bytes to Pillow Image

Parameters

data (bytes) – pdftoppm/pdftocairo output bytes

Returns

List of PGM images parsed from the output

Return type

List[Image.Image]

pdf2image.parsers.parse_buffer_to_png(data: bytes) List[Image][source]

Parse PNG file bytes to Pillow Image

Parameters

data (bytes) – pdftoppm/pdftocairo output bytes

Returns

List of PNG images parsed from the output

Return type

List[Image.Image]

pdf2image.parsers.parse_buffer_to_ppm(data: bytes) List[Image][source]

Parse PPM file bytes to Pillow Image

Parameters

data (bytes) – pdftoppm/pdftocairo output bytes

Returns

List of PPM images parsed from the output

Return type

List[Image.Image]