PDF Files

Our Definition

Portable Document Format is a file format made by Adobe Systems in 1993 for document exchange.

PDF is used for representing two-dimensional documents in a way independent of the application software, hardware, and operating system. Each PDF file encapsulates a total outline of a fixed-layout 2-D document ( and, with Acrobat 3-D, embedded 3-D documents ) that incorporates the text, fonts, images, and 2-D vector graphics which compose the documents.

Any one may create applications that may read and write PDF files without needing to pay royalties to Adobe Systems ; Adobe holds patents to PDF, but licenses them for royalty-free use in developing software obeying its PDF design.

A subset of the Sequel page outline programming language, for generating the layout and graphics. A font-embedding / replacement system to permit fonts to go with the documents. A structured storage system to bundle these elements and any associated content into a single file, with info compression where suitable. Sequel is a page outline language run in an interpreter to generate an image, a technique requiring plenty of resources. PDF is a file format, not a programming language, i.e.

Flow control commands such as if and loop are removed, while graphics commands such as lineto remain. Sometimes the PostScript-like PDF code is generated from a source PostScript file. The graphics commands that are output by the Sequel code are picked up and tokenized ; any files, graphics, or fonts to that the document refers also are picked up ; then, everything is compressed to a single file. As a document format, PDF has many benefits over Sequel . PDF contains tokenized and translated results of the Sequel source code, for direct correspondence between changes to items in the PDF page outline and changes to the ensuing page appearance.

Sequel is an urgent programming language with an implicit worldwide state, so instructions accompanying the outline of one page could affect the appearance of any following page. So , all preceding pages in a Sequel document must be processed to figure out the proper appearance of a given page, while each page in a PDF document is untouched by the others.

As a consequence, PDF spectators permit the user to quickly jump to the final pages of a long document, while a Postscript spectator needs to process all pages sequentially before having the ability to display the destination page. PDF graphics employ a device independent Cartesian coordinate system to explain the outside of a page. A key concept in PDF is that of the graphics state, which is a collection of graphical parameters that can be modified, saved, and revived by a page outline. PDF has ( as of version 1.6 ) twenty-four graphics state properties, of which some of the most significant are:. The present transformation matrix ( CTM ), which decides the coordinate system. The alpha consistent, which is a key part of transparency. Vector graphics in PDF, as in Sequel , are created with trails. Trails are sometimes composed from lines and cubic Bezier curves, but may also be assembled from the outlines of text. Unlike Sequel , PDF does not permit a single trail to mix text outlines with lines and curves. Trails can be stroked, filled, or used for clipping.

Strokes and fills can use any color set in the graphics state, including patterns. The simplest is the tiling pattern in which a piece of design is mentioned to be drawn regularly.

This should be a colored tiling pattern, with the colours cited in the pattern object, or a bleached tiling pattern, which defers color blueprint to the time the pattern is drawn. There are 7 sorts of shading pattern of that the simplest are the radial shade ( Type two.

The PDF file format has changed several times and continues to evolve, as new versions of Adobe Acrobat were released. There have been nine versions of PDF with corresponding Acrobat releases[4]:

(1993) - PDF 1.0 / Acrobat 1.0
(1994) - PDF 1.1 / Acrobat 2.0
(1996) - PDF 1.2 / Acrobat 3.0
(1999) - PDF 1.3 / Acrobat 4.0
(2001) - PDF 1.4 / Acrobat 5.0
(2003) - PDF 1.5 / Acrobat 6.0
(2005) - PDF 1.6 / Acrobat 7.0
(2006) - PDF 1.7 / Acrobat 8.0
(2008) - PDF 1.7, Adobe Extension Level 3 / Acrobat 9.0

The ISO 32000-1:2008 PDF open standard was published by the ISO on July 1, 2008. PDF is now a published ISO standard, titled Document management -- Portable document format -- Part 1: PDF 1.7

According to the ISO PDF standard abstract:

ISO 32000-1:2008 specifies a digital form for representing electronic documents to enable users to exchange and view electronic documents independent of the environment in which they were created or the environment in which they are viewed or printed. It is intended for the developer of software that creates PDF files (conforming writers), software that reads existing PDF files and interprets their contents for display and interaction (conforming readers) and PDF products that read and/or write PDF files for a variety of other purposes (conforming products).


File structure

A PDF file consists primarily of objects, of which there are eight types:[6]

Boolean values, representing true or false
Numbers
Strings
Names
Arrays, ordered collections of objects
Dictionaries, collections of objects indexed by Names
Streams, usually containing large amounts of data
The Null object


Objects could be either direct or indirect.

Indirect objects are numbered with an object number and a generation number. An index table called the xref table gives the byte offset of each indirect object from the beginning of the file. This design allows for efficient random access to the objects in the file, and also allows for tiny changes to be made without rewriting the complete file ( incremental update ). Starting with PDF version 1.5, indirect objects might also be found in special streams known as object streams. This system reduces the dimensions of files that have enormous numbers of tiny indirect objects and is particularly helpful for Tagged PDF. There are 2 layouts to the PDF filesnon-linear ( not "optimized" ) and linear ( "optimized" ). Non-linear PDF files consume less disk space than their linear opposite numbers, though they are slower to access because portions of the information needed to assemble pages of the document are scattered across the PDF file. Linear PDF files ( also called "optimized" or "web optimized" PDF files ) are constructed in a way that enables them to be read in a Net browser plugin, since they are written to disk in a linear ( as in page order ) fashion. PDF files could be optimized using Adobe Acrobat software or pdfopt, which is a part of GPL Ghostscript. Raster photographs in PDF ( called Image XObjects ) are represented by compendiums with an associated stream. The compendium describes properties of the image, and the stream contains the image information. ( Less ordinarily, a raster image might be embedded at once in a page outline as an inline image.

) Pictures are typically filtered for compression purposes. Image filters supported in PDF include the general purpose filters. Usually all image content in a PDF is embedded in the file. But PDF permits image info to be stored in external files by the application of external streams or Alternate Pictures .

Text in PDF is represented by text elements in page content streams. A text part specifies that characters should be drawn at certain positions. The characters are mentioned using the encoding of a selected font resource. A font object in PDF is a description of a digital typeface. It may either describe the features of a typeface, or it may include an embedded font file. The latter case is named an embedded font whilst the previous is known as an unembedded font. The font files that could be embedded are based totally on commonly used standard digital font formats : Type one ( and its compressed variant CFF ), TrueType, and ( starting with PDF 1.6 ) OpenType.

In addition PDF supports the Type three variant in which the elements of the font are described by PDF graphic operators.

Inside text strings characters are shown using personality codes ( integers ) that map to glyphs in the present font using an encoding. There are some built-in encodings, including WinAnsi, MacRoman, and a large number of encodings for East Asian languages.

( though the WinAnsi and MacRoman encodings derive from the historical properties of the Windows and Macintosh operating systems, fonts using these encodings work similarly well on any platform. ) The encoding mechanisms in PDF were designed for Type one fonts, and the guidelines for applying them to TrueType fonts are complicated. For giant fonts or fonts with non-standard glyphs, the special encodings Identity-H ( for horizontal writing ) and Identity-V ( for vertical ) are used. With such fonts it is important to offer a ToUnicode table if semantic info about the characters is to be saved. The first imaging model of PDF was, like Sequel's, opaque : each object drawn on the page absolutely replaced anything formerly marked in the same location. When transparency is employed, new objects interact with formerly marked objects to supply mixing effects.

As a consequence, files that use a touch of transparency might view acceptably in older viewers, but files making intensive use of transparency could view totally incorrectly in an older viewer without warning. The transparency extensions are primarily based on the key ideas of transparency groups, mixing modes, shape, and alpha. The model is closely aligned with the features of Adobe Illustrator version 9.

The mix modes were based totally on those employed by Adobe Photoshop at the point.