Do you have PDF files and want to extract images from them? One of the easiest and simplest ways to extract images from PDF files is to use the
pdfimages command-line tool in Linux. Moreover, there are other tools that can be used for this purpose. In this article, we’ll discuss them in detail and learn how to use them.
What is PDF?
Portable Document Format (PDF) is a format that makes sharing information and ideas very easy. The format is used when you have to save files that can’t be edited but still need to be printed and shared easily. It’s an open standard for document exchange. Each pdf file encapsulates an entire description of a fixed-layout document including, text, graphics, and images. A PDF file displays the exact same layout and content regardless of the device, software application, or operating system.
pdfimages is a tool used to extract images from a pdf file, and it has many options, such as writing the images as jpeg and png, specifying the username and password for encrypted files, and specifying the first and last page for image extraction. The
pdfimages tool is a part of
poppler-utils. Open the Linux terminal and type the following command to install it:
$ sudo apt-get install poppler-utils
pdfimages tool provides help on its usage by four options. The first one is the -help that lets you use different options without remembering them. These options are helpful as they help you learn how to use the tool. After the tool is installed and ready to use, let’s get started.
Here are some essential things need to know to use the tool:
- The name of the PDF file
- The starting page (specify the number, optional)
- The end page (specify the number, optional)
The last option is very crucial. If you do not specify the
-j option, the tool will extract jpeg images and save them in .ppm (Portable Pixmap) format. It can be a memory and time-consuming process for your system as each image will be over a megabyte size.
Also, If you want to extract all images from a document, there is no need to specify the start and end page:
$ pdfimages 1710.05006.pdf images
This will generate the images in the ppm format. Let’s use the
-j option now:
$ pdfimages -j 1710.05006.pdf image
You can also convert the output image format into PNG:
$ pdfimages 1710.05006.pdf image -png
It takes a few seconds before generating the PNG files:
-f option specifies the first pages to scan. Scanning the first 5 pages:
$ pdfimages 1710.05006.pdf -f 5 image -png
-l option specifies the last pages to scan. Scanning the last 10 pages:
$ pdfimages -l 10 1710.05006.pdf image -png
You can pass
-p as well to include the page number in the output image file name.
You can also pass
-list to list the images in the PDF instead of saving them to the disk:
$ pdfimages 1710.05006.pdf -list
There are several other methods that can be used for the same task, but they require a lot more steps. The method discussed here is the simplest, easiest, and effective way to extract images from PDF files. I hope this guide is helpful for your task.
Note that if you want to convert your PDF into an image or several images, check this tutorial.
You can find the PDF file I’ve used in this tutorial here.