Reference
Main functions
pdf2image is a light wrapper for the poppler-utils tools that can convert your PDFs into Pillow images.
- pdf2image.pdf2image.convert_from_bytes(pdf_file: bytes, dpi: int = 200, output_folder: ~typing.Optional[~typing.Union[str, ~pathlib.PurePath]] = None, first_page: ~typing.Optional[int] = None, last_page: ~typing.Optional[int] = None, fmt: str = 'ppm', jpegopt: ~typing.Optional[~typing.Dict] = None, thread_count: int = 1, userpw: ~typing.Optional[str] = None, ownerpw: ~typing.Optional[str] = None, use_cropbox: bool = False, strict: bool = False, transparent: bool = False, single_file: bool = False, output_file: ~typing.Union[str, ~pathlib.PurePath] = <pdf2image.generators.ThreadSafeGenerator object>, poppler_path: ~typing.Optional[~typing.Union[str, ~pathlib.PurePath]] = None, grayscale: bool = False, size: ~typing.Optional[~typing.Union[~typing.Tuple, int]] = None, paths_only: bool = False, use_pdftocairo: bool = False, timeout: ~typing.Optional[int] = None, hide_annotations: bool = False) List[Image] [source]
Function wrapping pdftoppm and pdftocairo.
- Parameters
pdf_bytes (bytes) – Bytes of the PDF that you want to convert
dpi (int, optional) – Image quality in DPI (default 200), defaults to 200
output_folder (Union[str, PurePath], optional) – Write the resulting images to a folder (instead of directly in memory), defaults to None
first_page (int, optional) – First page to process, defaults to None
last_page (int, optional) – Last page to process before stopping, defaults to None
fmt (str, optional) – Output image format, defaults to “ppm”
jpegopt (Dict, optional) – jpeg options quality, progressive, and optimize (only for jpeg format), defaults to None
thread_count (int, optional) – How many threads we are allowed to spawn for processing, defaults to 1
userpw (str, optional) – PDF’s password, defaults to None
ownerpw (str, optional) – PDF’s owner password, defaults to None
use_cropbox (bool, optional) – Use cropbox instead of mediabox, defaults to False
strict (bool, optional) – When a Syntax Error is thrown, it will be raised as an Exception, defaults to False
transparent (bool, optional) – Output with a transparent background instead of a white one, defaults to False
single_file (bool, optional) – Uses the -singlefile option from pdftoppm/pdftocairo, defaults to False
output_file (Any, optional) – What is the output filename or generator, defaults to uuid_generator()
poppler_path (Union[str, PurePath], optional) – Path to look for poppler binaries, defaults to None
grayscale (bool, optional) – Output grayscale image(s), defaults to False
size (Union[Tuple, int], optional) – Size of the resulting image(s), uses the Pillow (width, height) standard, defaults to None
paths_only (bool, optional) – Don’t load image(s), return paths instead (requires output_folder), defaults to False
use_pdftocairo (bool, optional) – Use pdftocairo instead of pdftoppm, may help performance, defaults to False
timeout (int, optional) – Raise PDFPopplerTimeoutError after the given time, defaults to None
hide_annotations (bool, optional) – Hide PDF annotations in the output, defaults to False
- Raises
NotImplementedError – Raised when conflicting parameters are given (hide_annotations for pdftocairo)
PDFPopplerTimeoutError – Raised after the timeout for the image processing is exceeded
PDFSyntaxError – Raised if there is a syntax error in the PDF and strict=True
- Returns
A list of Pillow images, one for each page between first_page and last_page
- Return type
List[Image.Image]
- pdf2image.pdf2image.convert_from_path(pdf_path: ~typing.Union[str, ~pathlib.PurePath], dpi: int = 200, output_folder: ~typing.Optional[~typing.Union[str, ~pathlib.PurePath]] = None, first_page: ~typing.Optional[int] = None, last_page: ~typing.Optional[int] = None, fmt: str = 'ppm', jpegopt: ~typing.Optional[~typing.Dict] = None, thread_count: int = 1, userpw: ~typing.Optional[str] = None, ownerpw: ~typing.Optional[str] = None, use_cropbox: bool = False, strict: bool = False, transparent: bool = False, single_file: bool = False, output_file: ~typing.Any = <pdf2image.generators.ThreadSafeGenerator object>, poppler_path: ~typing.Optional[~typing.Union[str, ~pathlib.PurePath]] = None, grayscale: bool = False, size: ~typing.Optional[~typing.Union[~typing.Tuple, int]] = None, paths_only: bool = False, use_pdftocairo: bool = False, timeout: ~typing.Optional[int] = None, hide_annotations: bool = False) List[Image] [source]
Function wrapping pdftoppm and pdftocairo
- Parameters
pdf_path (Union[str, PurePath]) – Path to the PDF that you want to convert
dpi (int, optional) – Image quality in DPI (default 200), defaults to 200
output_folder (Union[str, PurePath], optional) – Write the resulting images to a folder (instead of directly in memory), defaults to None
first_page (int, optional) – First page to process, defaults to None
last_page (int, optional) – Last page to process before stopping, defaults to None
fmt (str, optional) – Output image format, defaults to “ppm”
jpegopt (Dict, optional) – jpeg options quality, progressive, and optimize (only for jpeg format), defaults to None
thread_count (int, optional) – How many threads we are allowed to spawn for processing, defaults to 1
userpw (str, optional) – PDF’s password, defaults to None
ownerpw (str, optional) – PDF’s owner password, defaults to None
use_cropbox (bool, optional) – Use cropbox instead of mediabox, defaults to False
strict (bool, optional) – When a Syntax Error is thrown, it will be raised as an Exception, defaults to False
transparent (bool, optional) – Output with a transparent background instead of a white one, defaults to False
single_file (bool, optional) – Uses the -singlefile option from pdftoppm/pdftocairo, defaults to False
output_file (Any, optional) – What is the output filename or generator, defaults to uuid_generator()
poppler_path (Union[str, PurePath], optional) – Path to look for poppler binaries, defaults to None
grayscale (bool, optional) – Output grayscale image(s), defaults to False
size (Union[Tuple, int], optional) – Size of the resulting image(s), uses the Pillow (width, height) standard, defaults to None
paths_only (bool, optional) – Don’t load image(s), return paths instead (requires output_folder), defaults to False
use_pdftocairo (bool, optional) – Use pdftocairo instead of pdftoppm, may help performance, defaults to False
timeout (int, optional) – Raise PDFPopplerTimeoutError after the given time, defaults to None
hide_annotations (bool, optional) – Hide PDF annotations in the output, defaults to False
- Raises
NotImplementedError – Raised when conflicting parameters are given (hide_annotations for pdftocairo)
PDFPopplerTimeoutError – Raised after the timeout for the image processing is exceeded
PDFSyntaxError – Raised if there is a syntax error in the PDF and strict=True
- Returns
A list of Pillow images, one for each page between first_page and last_page
- Return type
List[Image.Image]
- pdf2image.pdf2image.pdfinfo_from_bytes(pdf_bytes: bytes, userpw: Optional[str] = None, ownerpw: Optional[str] = None, poppler_path: Optional[str] = None, rawdates: bool = False, timeout: Optional[int] = None) Dict [source]
Function wrapping poppler’s pdfinfo utility and returns the result as a dictionary.
- Parameters
pdf_bytes (bytes) – Bytes of the PDF that you want to convert
userpw (str, optional) – PDF’s password, defaults to None
ownerpw (str, optional) – PDF’s owner password, defaults to None
poppler_path (Union[str, PurePath], optional) – Path to look for poppler binaries, defaults to None
rawdates (bool, optional) – Return the undecoded data strings, defaults to False
timeout (int, optional) – Raise PDFPopplerTimeoutError after the given time, defaults to None
- Returns
Dictionary containing various information on the PDF
- Return type
Dict
- pdf2image.pdf2image.pdfinfo_from_path(pdf_path: str, userpw: Optional[str] = None, ownerpw: Optional[str] = None, poppler_path: Optional[str] = None, rawdates: bool = False, timeout: Optional[int] = None) Dict [source]
Function wrapping poppler’s pdfinfo utility and returns the result as a dictionary.
- Parameters
pdf_path (str) – Path to the PDF that you want to convert
userpw (str, optional) – PDF’s password, defaults to None
ownerpw (str, optional) – PDF’s owner password, defaults to None
poppler_path (Union[str, PurePath], optional) – Path to look for poppler binaries, defaults to None
rawdates (bool, optional) – Return the undecoded data strings, defaults to False
timeout (int, optional) – Raise PDFPopplerTimeoutError after the given time, defaults to None
- Raises
PDFPopplerTimeoutError – Raised after the timeout for the image processing is exceeded
PDFInfoNotInstalledError – Raised if pdfinfo is not installed
PDFPageCountError – Raised if the output could not be parsed
- Returns
Dictionary containing various information on the PDF
- Return type
Dict
Exceptions
Define exceptions specific to pdf2image
- exception pdf2image.exceptions.PDFInfoNotInstalledError[source]
Raised when pdfinfo is not installed
- exception pdf2image.exceptions.PDFPageCountError[source]
Raised when the pdfinfo was unable to retrieve the page count
- exception pdf2image.exceptions.PDFPopplerTimeoutError[source]
Raised when the timeout is exceeded while converting a PDF
Parsers
pdf2image custom buffer parsers
- pdf2image.parsers.parse_buffer_to_jpeg(data: bytes) List[Image] [source]
Parse JPEG file bytes to Pillow Image
- Parameters
data (bytes) – pdftoppm/pdftocairo output bytes
- Returns
List of JPEG images parsed from the output
- Return type
List[Image.Image]
- pdf2image.parsers.parse_buffer_to_pgm(data: bytes) List[Image] [source]
Parse PGM file bytes to Pillow Image
- Parameters
data (bytes) – pdftoppm/pdftocairo output bytes
- Returns
List of PGM images parsed from the output
- Return type
List[Image.Image]