0%

手动分析PEM文件

为PEM溯源一波~

首先我们要知道PEM文件是什么,是一个以——-BEGIN ——-开头,——-END ——-结束,中间是一大串Base64编码的文件。

而将Base64编码转换成二进制字符串后,这样的二进制表示其实是一串遵循ASN.1的DER编码。

下面进行分析:

先用Pycrypto生成一个PEM文件(以RSA公私钥为例)。

1
2
3
4
5
6
7
8
9
10
11
from Crypto.PublicKey import RSA

rsa = RSA.generate(1024)
sk = rsa.exportKey()
pk = rsa.publickey().exportKey()

with open ('./pub.pem', 'wb') as f:
f.write(pk)

with open ('./priv.pem', 'wb') as f:
f.write(sk)

得到私钥的PEM文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-----BEGIN RSA PRIVATE KEY-----
MIICXQIBAAKBgQCg0VTVv5fED3eXtEgZ0Jxgj6S1w45w2DvBMmcTjG7/TBqs7+Pd
tXHhtB2RHHq2E2z5BJMYlWNFDh9CcMq7xCB8VMTae4SiAxHPu6voK5/mC99IoI1X
g50M35Rk2EJivMBrwwgJWmmH9grQfWaaMStafkEzITeI7s8lhjJIuRNJ7wIDAQAB
AoGAD4JwxJaQO/Pj7EkSRQ8V7cgcsfz0sVRhWu4R+9Qo5k1AK1qNZtX3cDWPPm35
NbMk6NU0nIPXyZKlmCJJoxc0rLHbGcTI2CkmdRS8Hve7++JC1DUPZ6ACpW0z5W0a
lK3HHGjwINw5q30AZMERsWTia6BpjclKA839UW/9lm6HeUkCQQDKl+ScBYI3+W6Z
EYzjg/kZEsuhFj3pI2GB/3VO8+8aJg+sjS2a7oZtUai2g2mDsFz4UOeGKJtoWZJb
yGlfxnxHAkEAyzYwqv/8spYH8IM9x/BcFD7pL63+l12kz2cZ5xImvuclYuhjEyii
XXNRUHqNQ8EpWrbqJCtgoosQkjOpg/QhGQJAG0oypUGotNmIqF3Q2KTiXRpHC7/v
PwRhEh3TM3twbdlKqzepOQGAYiFp1IwHHpIXM+vSBCRcKsZGDM8GQrx96QJBAI2f
RKfII+qqWPor3SC8yM9rUMRj9Ky1HKlW51x87/fXy9x0rKeriAys05zM7CquMg4A
sIlomb5uQKxDyP4nY/ECQQDGfKbZiPU6vqghWUMaFGUSqNlCl41Kj4Py1CbxCV47
8bW5uLHMu60qMcZAGIBEekX14HkCaQYawTtfaPF3fX8H
-----END RSA PRIVATE KEY-----

现在来探索一下它是怎么形成的。

因为使用Pycrypto生成此PEM,我们寻找这个相关文件,使用pip show pycryptodome查找Pycrpto相关路径,进入…/site-packages/Crypto/PublicKey/RSA.py。

代码溯源:

  1. 找到关于私钥生成的主要功能在RsaKey类的export_key函数

    有用的部分大概长这样

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    class RsaKey(object):
    ... ...
    def export_key(self, format='PEM', passphrase=None, pkcs=1,
    protection=None, randfunc=None):
    ... ...
    # DER format is always used, even in case of PEM, which simply
    # encodes it into BASE64.
    if self.has_private():
    binary_key = DerSequence([0,
    self.n,
    self.e,
    self.d,
    self.p,
    self.q,
    self.d % (self.p-1),
    self.d % (self.q-1),
    Integer(self.q).inverse(self.p)
    ]).encode()
    if pkcs == 1:
    key_type = 'RSA PRIVATE KEY'
    ... ...
    ... ...
    if format == 'PEM':
    from Crypto.IO import PEM

    pem_str = PEM.encode(binary_key, key_type, passphrase, randfunc)
    return tobytes(pem_str)
    ... ...

    直接看到最后,使用PEM.encode()生成pem_str。

  2. 寻找PEM.encode(),位置是.../site-packages/Crypto/IO/PEM.pyencode函数

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    def encode(data, marker, passphrase=None, randfunc=None):
    ... ...
    out = "-----BEGIN %s-----\n" % marker
    ... ...

    # Each BASE64 line can take up to 64 characters (=48 bytes of data)
    # b2a_base64 adds a new line character!
    chunks = [tostr(b2a_base64(data[i:i + 48]))
    for i in range(0, len(data), 48)]
    out += "".join(chunks)
    out += "-----END %s-----" % marker
    return out

    这个函数的作用就是将每48个bytes编码成一行Base64,加BEGIN和END,但这里出现的data不知道来自哪

  3. 寻找data来自哪

    由于上面PEM.encode()函数在data位置传递了binary_key参数,而binary_key是由DerSequence()以[0, n, e, d, ...]的顺序生成的,这样的顺序也恰好是openssl读取RSA私钥时的输出顺序。

    我们显然不知道DerSequence()是干嘛的。

  4. 寻找DerSequence,在.../site-packages/Crypto/Util/asn1.py

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    class DerSequence(DerObject):
    def encode(self):
    """
    Return this DER SEQUENCE, fully encoded as abinary string.

    Raises:
    ValueError: if some elements in the sequence are neither integznor byte strings.
    """
    self.payload = b''
    for item in self._seq:
    if byte_string(item):
    self.payload += item
    elif _is_number(item):
    self.payload += DerInteger(item).encode()
    else:
    self.payload += item.encode()
    return DerObject.encode(self)

    这里不改变字节串,但通过DerInteger()改变了数字,且最后返回使用未知函数DerObject.encode()

    发现未知函数DerInterger,DerObject

  5. 寻找DerInterger

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    class DerInteger(DerObject):
    ... ...
    def encode(self):
    """Return the DER INTEGER, fully encoded as a
    binary string."""

    number = self.value
    self.payload = b''
    while True:
    self.payload = bchr(int(number & 255)) + self.payload
    if 128 <= number <= 255:
    self.payload = bchr(0x00) + self.payload
    if -128 <= number <= 255:
    break
    number >>= 8
    return DerObject.encode(self)

    就是最数字 进行了一个转换,最后使用了DerObject.encode()

    这下我们不得不看一下DerObject.encode()是什么了

  6. 寻找DerObject

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    class DerObject(object):
    ... ...
    def encode(self):
    """Return this DER element, fully encoded as a binary byte string."""

    # Concatenate identifier octets, length octets,
    # and contents octets

    output_payload = self.payload

    ... ...

    return (bchr(self._tag_octet) +
    self._definite_form(len(output_payload)) +
    output_payload)

    return了 + + ,其中tag是ASN.1的类型标签,lengthpayload的长度,还发现_definite_form对长度做格式化

  7. 寻找_definite_form

    1
    2
    3
    4
    5
    6
    7
    8
    def _definite_form(length):
    """Build length octets according to BER/DER
    definite form.
    """
    if length > 127:
    encoding = long_to_bytes(length)
    return bchr(len(encoding) + 128) + encoding
    return bchr(length)

    这就是一个DER的长度编码方式:

    • 短格式:当长度 < 128 时,直接返回
    • 长格式:当长度 ≥ 128 时,最高比特位设为 1,剩下的 7 位表示用于存储长度的字节数(即长度值本身占几个字节)

    比如0x0100(即长度为256) 需要用 2 个字节存储(因为 1 个字节最大存 255)。

    使用长格式将最高位置为 1,剩余 7 位为 0b00000010(即 2,表示后面 2 个字节存长度) → 得到 0x82。最终编码为0x820100

完成溯源,开始手撕。

首先需要了解到在ASN.1的类型标签中,0x30是指序列(Sequence),0x02指整数(Integer)等

也就是tag是02,30这些值。

而长度就是按第7步中的编码方式,比如0x8180就被显示成8180

我们再看一眼这个序列的生成规律

1
2
3
4
5
6
7
8
9
10
11
12
RSAPrivateKey ::= SEQUENCE {
version Version,
modulus INTEGER, -- n
publicExponent INTEGER, -- e
privateExponent INTEGER, -- d
prime1 INTEGER, -- p
prime2 INTEGER, -- q
exponent1 INTEGER, -- d mod (p-1)
exponent2 INTEGER, -- d mod (q-1)
coefficient INTEGER, -- (inverse of q) mod p
otherPrimeInfos OtherPrimeInfos OPTIONAL
}

第一步:将Base64转化为二进制字符串

1
2
3
4
5
6
7
8
9
10
11
with open('./priv.pem', 'r') as f:
data = f.read()

key_64 = ''.join(data.split('\n')[1:-1])
key_num = libnum.s2n(base64.b64decode(key_64))
key_hex = hex(key_num)[2:]
print(key_hex)

'''
3082025d02010002818100a0d154d5bf97c40f7797b44819d09c608fa4b5c38e70d83bc13267138c6eff4c1aacefe3ddb571e1b41d911c7ab6136cf90493189563450e1f4270cabbc4207c54c4da7b84a20311cfbbabe82b9fe60bdf48a08d57839d0cdf9464d84262bcc06bc308095a6987f60ad07d669a312b5a7e4133213788eecf25863248b91349ef02030100010281800f8270c496903bf3e3ec4912450f15edc81cb1fcf4b154615aee11fbd428e64d402b5a8d66d5f770358f3e6df935b324e8d5349c83d7c992a5982249a31734acb1db19c4c8d829267514bc1ef7bbfbe242d4350f67a002a56d33e56d1a94adc71c68f020dc39ab7d0064c111b164e26ba0698dc94a03cdfd516ffd966e877949024100ca97e49c058237f96e99118ce383f91912cba1163de9236181ff754ef3ef1a260fac8d2d9aee866d51a8b6836983b05cf850e786289b6859925bc8695fc67c47024100cb3630aafffcb29607f0833dc7f05c143ee92fadfe975da4cf6719e71226bee72562e8631328a25d7351507a8d43c1295ab6ea242b60a28b109233a983f4211902401b4a32a541a8b4d988a85dd0d8a4e25d1a470bbfef3f0461121dd3337b706dd94aab37a9390180622169d48c071e921733ebd204245c2ac6460ccf0642bc7de90241008d9f44a7c823eaaa58fa2bdd20bcc8cf6b50c463f4acb51ca956e75c7ceff7d7cbdc74aca7ab880cacd39cccec2aae320e00b0896899be6e40ac43c8fe2763f1024100c67ca6d988f53abea82159431a146512a8d942978d4a8f83f2d426f1095e3bf1b5b9b8b1ccbbad2a31c6401880447a45f5e0790269061ac13b5f68f1777d7f07
'''

第二步:分析

首先30就是Sequence的tag,82最高位置1后转换为十六进制为8,2就是说接下来后两个bytes是这个Sequence的长度,即0x025d个bytes。

接着02是整数的tag,01是这个整数占1byte,00是value。后同。

得到

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
3082025d  	# Begin Sequence: len=0x025d

0201 # Version: (len=0x01)
00

028181 # n: (len=0x81)
00a0d154d5bf97c40f7797b44819d09c608fa4b5c38e70d83bc13267138c6eff4c1aacefe3ddb571e1b41d911c7ab6136cf90493189563450e1f4270cabbc4207c54c4da7b84a20311cfbbabe82b9fe60bdf48a08d57839d0cdf9464d84262bcc06bc308095a6987f60ad07d669a312b5a7e4133213788eecf25863248b91349ef

0203 # e: (len=0x03)
010001

028180 # d: (len=0x80)
0f8270c496903bf3e3ec4912450f15edc81cb1fcf4b154615aee11fbd428e64d402b5a8d66d5f770358f3e6df935b324e8d5349c83d7c992a5982249a31734acb1db19c4c8d829267514bc1ef7bbfbe242d4350f67a002a56d33e56d1a94adc71c68f020dc39ab7d0064c111b164e26ba0698dc94a03cdfd516ffd966e877949

0241 # p: (len=0x41)
00ca97e49c058237f96e99118ce383f91912cba1163de9236181ff754ef3ef1a260fac8d2d9aee866d51a8b6836983b05cf850e786289b6859925bc8695fc67c47

0241 # q: (len=0x41)
00cb3630aafffcb29607f0833dc7f05c143ee92fadfe975da4cf6719e71226bee72562e8631328a25d7351507a8d43c1295ab6ea242b60a28b109233a983f42119

0240 # d mod (p-1): (len=0x40)
1b4a32a541a8b4d988a85dd0d8a4e25d1a470bbfef3f0461121dd3337b706dd94aab37a9390180622169d48c071e921733ebd204245c2ac6460ccf0642bc7de9

0241 # d mod (q-1): (len=0x41)
008d9f44a7c823eaaa58fa2bdd20bcc8cf6b50c463f4acb51ca956e75c7ceff7d7cbdc74aca7ab880cacd39cccec2aae320e00b0896899be6e40ac43c8fe2763f1

0241 # (inverse of q) mod p: (len=0x41)
00c67ca6d988f53abea82159431a146512a8d942978d4a8f83f2d426f1095e3bf1b5b9b8b1ccbbad2a31c6401880447a45f5e0790269061ac13b5f68f1777d7f07

# End Sequence

本文参考https://tover.xyz/p/pem-by-hand/,受益良多。

-------------本文结束感谢您的阅读-------------