The bits and bytes of PKI

In two earlier articles—An introduction to cryptography and public key infrastructure and How do personal keys work in PKI and cryptography?—I mentioned cryptography and public key infrastructure (PKI) in a basic method. I talked about how digital bundles referred to as certificates retailer public keys and figuring out data. These bundles include a variety of complexity, and it is helpful to have a primary understanding of the format for when you might want to look beneath the hood.

Summary artwork

Keys, certificates signing requests, certificates, and different PKI artifacts outline themselves in an information description language referred to as Summary Syntax Notation One (ASN.1). ASN.1 defines a sequence of straightforward knowledge sorts (integers, strings, dates, and so forth.) together with some structured sorts (sequences, units). By utilizing these sorts as constructing blocks, we will create surprisingly complicated knowledge codecs.

ASN.1 incorporates loads of pitfalls for the unwary, nevertheless. For instance, it has two alternative ways of representing dates: GeneralizedTime (ISO 8601 format) and UTCTime (which makes use of a two-digit 12 months). Strings introduce much more confusion. We’ve IA5String for ASCII strings and UTF8String for Unicode strings. ASN.1 additionally defines a number of different string sorts, from the unique T61String and TeletexString to the extra innocuous sounding—however in all probability not what you needed—PrintableString (solely a small subset of ASCII) and UniversalString (encoded in UTF-32). If you happen to’re writing or studying ASN.1 knowledge, I like to recommend referencing the specification.

ASN.1 has one other knowledge sort value particular point out: the article identifier (OID). OIDs are a sequence of integers. Generally they’re proven with durations delimiting them. Every integer represents a node in what’s mainly a “tree of issues.” For instance, is the OID for my employer, Crimson Hat, the place “1” is the node for the Worldwide Group for Standardization (ISO), “3” is for ISO-identified organizations, “6” is for the US Division of Protection (which, for historic causes, is the mother or father to the following node), “1” is for the web, “4” is for personal organizations, “1” is for enterprises, and eventually “2312,” which is Crimson Hat’s personal.

Extra generally, OIDs are frequently used to determine particular algorithms in PKI objects. When you have a digital signature, it is not a lot use if you do not know what sort of signature it’s. The signature algorithm “sha256WithRSAEncryption” has the OID “1.2.840.113549.1.1.11,” for instance.

ASN.1 at work

Suppose we personal a manufacturing unit that produces flying brooms, and we have to retailer some knowledge about each broom. Our brooms have a mannequin title, a serial quantity, and a sequence of inspections which were made to make sure flight-worthiness. We may retailer this data utilizing ASN.1 like so:

BroomInfo ::= SEQUENCE {
mannequin UTF8String,
serialNumber INTEGER,
inspections SEQUENCE OF InspectionInfo

InspectionInfo ::= SEQUENCE {
inspectorName UTF8String,
inspectionDate GeneralizedTime

The instance above defines the mannequin title as a UTF8-encoded string, the serial quantity as an integer, and our inspections as a sequence of InspectionInfo gadgets. Then we see that every InspectionInfo merchandise contains two items of information: the inspector’s title and the time of the inspection.

An precise occasion of BroomInfo knowledge would look one thing like this in ASN.1’s worth project syntax:

broom BroomInfo ::= {
mannequin “Nimbus 2000”,
serialNumber 1066,
inspections {
inspectorName “Harry”,
inspectionDate “201901011200Z”
inspectorName “Hagrid”,
inspectionDate “201902011200Z”

Don’t be concerned an excessive amount of in regards to the particulars of the syntax; for the common developer, having a primary grasp of how the items match collectively is ample.

Now let’s take a look at an actual instance from RFC 8017 that I’ve abbreviated considerably for readability:

RSAPrivateKey ::= SEQUENCE {
model Model,
modulus INTEGER, — n
publicExponent INTEGER, — e
privateExponent INTEGER, — d
prime1 INTEGER, — p
prime2 INTEGER, — q
exponent1 INTEGER, — d mod (p-1)
exponent2 INTEGER, — d mod (q-1)
coefficient INTEGER, — (inverse of q) mod p
otherPrimeInfos OtherPrimeInfos OPTIONAL

Model ::= INTEGER { two-prime(0), multi(1) }
{– model have to be multi if otherPrimeInfos current –})

OtherPrimeInfos ::= SEQUENCE SIZE(1..MAX) OF OtherPrimeInfo

OtherPrimeInfo ::= SEQUENCE {
prime INTEGER, — ri
exponent INTEGER, — di
coefficient INTEGER — ti

The ASN.1 above defines the PKCS #1 format used to retailer RSA keys. this, we will see the RSAPrivateKey sequence begins with a model sort (both Zero or 1) adopted by a bunch of integers after which an non-obligatory sort referred to as OtherPrimeInfos. The OtherPrimeInfos sequence incorporates a number of items of OtherPrimeInfo. And every OtherPrimeInfo is only a sequence of integers.

Let us take a look at an precise occasion by asking OpenSSL to generate an RSA key after which pipe it into asn1parse, which can print it out in a extra human-friendly format. (By the best way, the genrsa command I am utilizing right here has been outmoded by genpkey; we’ll see why a bit of later.)

% openssl genrsa 4096 2> /dev/null | openssl asn1parse
0:d=Zero hl=Four l=2344 cons: SEQUENCE
4:d=1 hl=2 l= 1 prim: INTEGER :00
7:d=1 hl=Four l= 513 prim: INTEGER :B80B0C2443…
524:d=1 hl=2 l= Three prim: INTEGER :010001
529:d=1 hl=Four l= 512 prim: INTEGER :59C609C626…
1045:d=1 hl=Four l= 257 prim: INTEGER :E8FC43002D…
1306:d=1 hl=Four l= 257 prim: INTEGER :CA39222DD2…
1567:d=1 hl=Four l= 256 prim: INTEGER :25F6CD181F…
1827:d=1 hl=Four l= 256 prim: INTEGER :38CCE374CB…
2087:d=1 hl=Four l= 257 prim: INTEGER :C80430E810…

Recall that RSA makes use of a modulus, n; a public exponent, e; and a personal exponent, d. Now let’s take a look at the sequence. First, we see the model set to Zero for a two-prime RSA key (what genrsa generates), an integer for the modulus, n, after which 0x010001 for the general public exponent, e. If we convert to decimal, we’ll see our public exponent is 65537, a quantity generally used as an RSA public exponent. Following the general public exponent, we see the integer for the personal exponent, e, after which another integers which might be used to hurry up decryption and signing. Explaining how this optimization works is past the scope of this text, however if you happen to like math, there is a good video on the topic.

What about that different stuff on the left facet of the output? What does “h=4” and “l=513” imply? We’ll cowl that shortly.


We have seen the “summary” a part of Summary Syntax Notation One, however how does this knowledge get encoded and saved? For that, we flip to a binary format referred to as Distinguished Encoding Guidelines (DER) outlined within the X.690 specification. DER is a stricter model of its mother or father, Fundamental Encoding Guidelines (BER), in that for any given knowledge, there is just one option to encode it. If we’ll be digitally signing knowledge, it makes issues rather a lot simpler if there is just one potential encoding that must be signed as an alternative of dozens of functionally equal representations.

DER makes use of a tag-length-value (TLV) construction. The encoding of a bit of information begins with an identifier octet defining the info’s sort. (“Octet” is used moderately than “byte” since the usual could be very outdated and a few early architectures did not use Eight bits for a byte.) Subsequent are the octets that encode the size of the info, and eventually, there may be the info. The information could be one other TLV sequence. The left facet of the asn1parse output makes a bit of extra sense now. The primary quantity signifies absolutely the offset from the start. The “d=” tells us the depth of that merchandise within the construction. The primary line is a sequence, which we descend into on the following line (the depth d goes from Zero to 1) whereupon asn1parse begins enumerating all the weather in that sequence. The “hl=” is the header size (the sum of the identifier and size octets), and the “l=” tells us the size of that individual piece of information.

How is header size decided? It is the sum of the identifier byte and the bytes encoding the size. In our instance, the highest sequence is 2344 octets lengthy. If it had been lower than 128 octets, the size could be encoded in a single octet within the “brief type”: bit Eight could be a zero and bits 7 to 1 would maintain the size worth (27-1=127). A worth of 2344 wants extra space, so the “lengthy” type is used. The primary octet has bit Eight set to at least one, and bits 7 to 1 include the size of the size. In our case, a price of 2344 could be encoded in two octets (0x0928). Mixed with the primary “size of the size” octet, now we have three octets whole. Add the one identifier octet, and that offers us our whole header size of 4.

As a facet train, let’s contemplate the biggest worth we may presumably encode. We have seen that now we have as much as 127 octets to encode a size. At Eight bits per octet, now we have a complete of 1008 bits to make use of, so we will maintain a quantity equal to 21008-1. That may equate to a content material size of two.743062*10279 yottabytes, staggeringly greater than the estimated 1080 atoms within the observable universe. If you happen to’re focused on all the small print, I like to recommend studying “A Layman’s Information to a Subset of ASN.1, BER, and DER.”

What about “cons” and “prim”? These point out whether or not the worth is encoded with “constructed” or “primitive” encoding. Primitive encoding is used for easy sorts like “INTEGER” or “BOOLEAN,” whereas constructed encoding is used for structured sorts like “SEQUENCE” or “SET.” The precise distinction between the 2 encoding strategies is whether or not bit 6 within the identifier octet is a zero or one. If it is a one, the parser is aware of that the content material octets are additionally DER-encoded and it may well descend.

PEM buddies

Whereas helpful in a variety of circumstances, a binary format will not move muster if we have to show the info as textual content. Earlier than the MIME customary existed, attachment help was spotty. Generally, if you happen to needed to connect knowledge, you place it within the physique of the e-mail, and since SMTP solely supported ASCII, that meant changing your binary knowledge (just like the DER of your public key, for instance) into ASCII characters.

Thus, the PEM format emerged. PEM stands for “Privateness-Enhanced E mail” and was an early customary for transmitting and storing PKI knowledge. The usual by no means caught on, however the format it outlined for storage did. PEM-encoded objects are simply DER objects which might be base64-encoded and wrapped at 64 characters per line. To explain the kind of object, a header and footer encompass the bottom64 string. You may see —–BEGIN CERTIFICATE—– or —–BEGIN PRIVATE KEY—–, for instance.

Typically you may see information with the “.pem” extension. I do not discover this suffix helpful. The file may include a certificates, a key, a certificates signing request, or a number of different potentialities. Think about going to a sushi restaurant and seeing a menu that described each merchandise as “fish and rice”! As a substitute, I choose extra informative extensions like “.crt”, “.key”, and “.csr”.

The PKCS zoo

Earlier, I confirmed an instance of a PKCS #1-formatted RSA key. As you would possibly anticipate, codecs for storing certificates and signing requests additionally exist in numerous IETF RFCs. For instance, PKCS #Eight can be utilized to retailer personal keys for a lot of completely different algorithms (together with RSA!). This is a number of the ASN.1 from RFC 5208 for PKCS #8. (RFC 5208 has been obsoleted by RFC 5958, however I really feel that the ASN.1 in RFC 5208 is less complicated to grasp.)

PrivateKeyInfo ::= SEQUENCE {
model Model,
privateKeyAlgorithm PrivateKeyAlgorithmIdentifier,
privateKey PrivateKey,
attributes [0] IMPLICIT Attributes OPTIONAL }

Model ::= INTEGER

PrivateKeyAlgorithmIdentifier ::= AlgorithmIdentifier

PrivateKey ::= OCTET STRING

Attributes ::= SET OF Attribute

If you happen to retailer your RSA personal key in a PKCS #8, the PrivateKey component will truly be a DER-encoded PKCS #1! Let’s show it. Keep in mind earlier once I used genrsa to generate a PKCS #1? OpenSSL can generate a PKCS #Eight with the genpkey command, and you’ll specify RSA because the algorithm to make use of.

% openssl genpkey -algorithm RSA | openssl asn1parse
0:d=Zero hl=Four l= 629 cons: SEQUENCE
4:d=1 hl=2 l= 1 prim: INTEGER :00
7:d=1 hl=2 l= 13 cons: SEQUENCE
9:d=2 hl=2 l= 9 prim: OBJECT :rsaEncryption
20:d=2 hl=2 l= Zero prim: NULL
22:d=1 hl=Four l= 607 prim: OCTET STRING [HEX DUMP]:3082025B…

You might have noticed the “OBJECT” within the output and guessed that was associated to OIDs. You would be appropriate. The OID “1.2.840.113549.1.1.1” is assigned to RSA encryption. OpenSSL has a built-in listing of widespread OIDs and interprets them right into a human-readable type for you.

% openssl genpkey -algorithm RSA | openssl asn1parse -strparse 22
0:d=Zero hl=Four l= 604 cons: SEQUENCE
4:d=1 hl=2 l= 1 prim: INTEGER :00
7:d=1 hl=Three l= 129 prim: INTEGER :CA6720E706…
139:d=1 hl=2 l= Three prim: INTEGER :010001
144:d=1 hl=Three l= 128 prim: INTEGER :05D0BEBE44…
275:d=1 hl=2 l= 65 prim: INTEGER :F215DC6B77…
342:d=1 hl=2 l= 65 prim: INTEGER :D6095CED7E…
409:d=1 hl=2 l= 64 prim: INTEGER :402C7562F3…
475:d=1 hl=2 l= 64 prim: INTEGER :06D0097B2D…
541:d=1 hl=2 l= 65 prim: INTEGER :AB266E8E51…

Within the second command, I’ve instructed asn1parse by way of the -strparse argument to maneuver to octet 22 and start parsing the content material’s octets there as an ASN.1 object. We will clearly see that the PKCS #8’s PrivateKey appears similar to the PKCS #1 that we examined earlier.

It’s best to favor utilizing the genpkey command. PKCS #Eight has some options that PKCS #1 doesn’t: PKCS #Eight can retailer personal keys for a number of completely different algorithms (PKCS #1 is RSA-specific), and it offers a mechanism to encrypt the personal key utilizing a passphrase and a symmetric cipher.

Encrypted PKCS #Eight objects use a special ASN.1 syntax that I am not going to dive into, however let’s check out an precise instance and see if something stands out. Encrypting a personal key with genpkey requires that you just specify the symmetric encryption algorithm to make use of. I am going to use AES-256-CBC for this instance and a password of “good day” (the “move:” prefix is the best way of telling OpenSSL that the password is coming in from the command line).

% openssl genpkey -algorithm RSA -aes-256-cbc -pass move:good day | openssl asn1parse
0:d=Zero hl=Four l= 733 cons: SEQUENCE
4:d=1 hl=2 l= 87 cons: SEQUENCE
6:d=2 hl=2 l= 9 prim: OBJECT :PBES2
17:d=2 hl=2 l= 74 cons: SEQUENCE
19:d=Three hl=2 l= 41 cons: SEQUENCE
21:d=Four hl=2 l= 9 prim: OBJECT :PBKDF2
32:d=Four hl=2 l= 28 cons: SEQUENCE
34:d=5 hl=2 l= Eight prim: OCTET STRING [HEX DUMP]:17E6FE554E85810A
44:d=5 hl=2 l= 2 prim: INTEGER :0800
48:d=5 hl=2 l= 12 cons: SEQUENCE
50:d=6 hl=2 l= Eight prim: OBJECT :hmacWithSHA256
60:d=6 hl=2 l= Zero prim: NULL
62:d=Three hl=2 l= 29 cons: SEQUENCE
64:d=Four hl=2 l= 9 prim: OBJECT :aes-256-cbc
75:d=Four hl=2 l= 16 prim: OCTET STRING [HEX DUMP]:91E9536C39…
93:d=1 hl=Four l= 640 prim: OCTET STRING [HEX DUMP]:98007B264F…

% openssl genpkey -algorithm RSA -aes-256-cbc -pass move:good day | head -n 1

There are a few attention-grabbing gadgets right here. We see our encryption algorithm is recorded with an OID beginning at octet 64. There’s an OID for “PBES2” (Password-Primarily based Encryption Scheme 2), which defines a normal course of for encryption and decryption, and an OID for “PBKDF2” (Password-Primarily based Key Derivation Perform 2), which defines a normal course of for creating encryption keys from passwords. Helpfully, OpenSSL makes use of the header “ENCRYPTED PRIVATE KEY” within the PEM output.

OpenSSL will allow you to encrypt a PKCS #1, but it surely’s carried out in a non-standard method by way of a sequence of headers inserted into the PEM:

% openssl genrsa -aes256 -passout move:good day 4096
Proc-Sort: 4,ENCRYPTED
DEK-Information: AES-256-CBC,5B2C64DC05B7C0471A278C76562FD776

In conclusion

There is a remaining PKCS format you might want to find out about: PKCS #12. The PKCS #12 format permits for storing a number of objects multi function file. When you have a certificates and its corresponding key or a sequence of certificates, you may retailer them collectively in a single PKCS #12 file. Particular person entries within the file could be protected with password-based encryption.

Past the PKCS codecs, there are different storage strategies such because the Java-specific JKS format and the NSS library from Mozilla, which makes use of file-based databases (SQLite or Berkeley DB, relying on the model). Fortunately, the PKCS codecs are a lingua franca that may function a begin or reference if you might want to cope with different codecs.

If this all appears complicated, that is as a result of it’s. Sadly, the PKI ecosystem has a variety of sharp edges between instruments that generate enigmatic error messages ( you, OpenSSL) and requirements which have grown and advanced over the previous 35 years. Having a primary understanding of how PKI objects are saved is vital if you happen to’re doing any software growth that can be accessed over SSL/TLS.

I hope this text has shed a bit of gentle on the topic and would possibly prevent from spending fruitless hours within the PKI wilderness.

The creator wish to thank Hubert Kario for offering a technical evaluate.


Germany Devoted Server

Leave a Reply