Post

AWS - boto3 - S3


S3

Action


Create random file

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# If another user has already claimed the bucket name: -> botocore.errorfactory.BucketAlreadyExist
# increase your chance of success when creating your bucket by picking a random name.
import uuid

def create_bucket_name(bucket_prefix):
    # The generated bucket name must be between 3 and 63 chars long
    return ''.join([bucket_prefix, str(uuid.uuid4())])


# Create random file
def create_temp_file(size, file_name, file_content):
    random_file_name = ''.join([str(uuid.uuid4().hex[:6]), file_name])
    with open(random_file_name, 'w') as f:
        f.write(str(file_content) * size)
    return random_file_name


def create_temp_file(file_name, file_content):
    random_file_name = ''.join([str(uuid.uuid4().hex[:6]), file_name])
    with open(random_file_name, 'w') as f:
        f.write(str(file_content))
    return random_file_name


# Create file
first_file = create_temp_file(300, 'firstfile.txt', 'helloworld')

Understanding Sub-resources

Bucket and Object are sub-resources of one another.

  • Sub-resources are methods that create a new instance of a child resource.
  • The parent’s identifiers get passed to the child resource.

Creating a Bucket s3_resource.create_bucket()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import boto3

# -------------- hardcode --------------
# unless your region is in the United States, you’ll need to define the region explicitly when you are creating the bucket.
# Otherwise -> IllegalLocationConstraintException
# To exemplify what this means when you’re creating your S3 bucket in a non-US region, take a look at the code below:
s3_resource = boto3.resource('s3')
s3_resource.create_bucket(
    Bucket = YOUR_BUCKET_NAME,
    CreateBucketConfiguration = {
        'LocationConstraint': 'eu-west-1'
    }
)


# taking advantage of a `session` object.
# - Boto3 will create the `session` from your credentials.
# - You just need to take the region and pass it to `create_bucket()` as its `LocationConstraint` configuration. Here’s how to do that:


s3_conn = boto3.resource('s3')
s3_conn = boto3.client('s3')


def create_bucket(bucket_prefix, s3_conn):
    session = boto3.session.Session()
    current_region = session.region_name
    bucket_name = create_bucket_name(bucket_prefix)

    bucket_response = s3_conn.create_bucket(
        Bucket = bucket_name,
        CreateBucketConfiguration= {
        'LocationConstraint': current_region
        }
    )

    print(bucket_name, current_region)
    return bucket_name, bucket_response

no matter where you want to deploy it: locally/EC2/Lambda. no need to hardcode your region.

both the client and the resource create buckets in the same way

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# create one using the client
# gives back bucket_response as a dictionary:


first_bucket_name, first_response = create_bucket(
    bucket_prefix = 'fbucket',
    s3_connection=s3_resource.meta.client
)

first_response
# {'ResponseMetadata': {
#     'RequestId': 'E1DCFE71EDE7C1EC',
#     'HostId': 'r3AP32NQk9dvbHSEPIbyYADT769VQEN/+xT2BPM6HCnuCb3Z/GhR2SBP+GM7IjcxbBN7SQ+k+9B=',
#     'HTTPStatusCode': 200,
#     'HTTPHeaders': {
#         'x-amz-id-2': 'r3AP32NQk9dvbHSEPIbyYADT769VQEN/+xT2BPM6HCnuCb3Z/GhR2SBP+GM7IjcxbBN7SQ+k+9B=',
#         'x-amz-request-id': 'E1DCFE71EDE7C1EC',
#         'date': 'Fri, 05 Oct 2018 15:00:00 GMT',
#         'location': 'https://firstpythonbucket7250e773-c4b1-422a-b51f-c45a52af9304.s3.amazonaws.com/',
#         'content-length': '0',
#         'server': 'AmazonS3'
#     },
#     'RetryAttempts': 0},
#     'Location': 'https://firstpythonbucket7250e773-c4b1-422a-b51f-c45a52af9304.s3.amazonaws.com/'
# }


# create bucket using the resource
# gives you back a Bucket instance as the `bucket_response`:

second_bucket_name, second_response = create_bucket(
    bucket_prefix='secondpythonbucket',
    s3_connection=s3_resource
)

second_response
# s3.Bucket(name='secondpythonbucket2d5d99c5-ab96-4c30-b7f7-443a95f72644')

Deleting Buckets and Objects

Deleting a Non-empty Bucket

  • To be able to delete a bucket, you must first delete every single object within the bucket,
  • or else the BucketNotEmpty exception will be raised. When you have a versioned bucket,
  • you need to delete every object and all its versions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def delete_all_objects(bucket_name):
    res = []
    s3bucket=s3_resource.Bucket(bucket_name)
    for obj_version in s3bucket.object_versions.all():
        res.append({'Key': obj_version.object_key, 'VersionId': obj_version.id})
    print(res)
    s3bucket.delete_objects(Delete={'Objects': res})


delete_all_objects(first_bucket_name)
# [
# {'Key': '127367firstfile.txt', 'VersionId': 'eQgH6IC1VGcn7eXZ_.ayqm6NdjjhOADv'},
# {'Key': '127367firstfile.txt', 'VersionId': 'UnQTaps14o3c1xdzh09Cyqg_hq4SjB53'},
# {'Key': '127367firstfile.txt', 'VersionId': 'null'},
# {'Key': '616abesecondfile.txt', 'VersionId': 'WIaExRLmoksJzLhN7jU5YzoJxYSu6Ey6'},
# {'Key': '616abesecondfile.txt', 'VersionId': 'null'},
# {'Key': 'fb937cthirdfile.txt', 'VersionId': 'null'}
# ]


s3_resource.Object(second_bucket_name, first_file).upload_file(first_file)
delete_all_objects(second_bucket_name)
# [{'Key': '9c8b44firstfile.txt', 'VersionId': 'null'}]

Deleting Buckets

1
2
3
s3_resource.Bucket(first_bucket_name).delete()

s3_resource.meta.client.delete_bucket(Bucket=second_bucket_name)

Creating Bucket and Object Instances

  • using the resource, have access to the high-level classes (Bucket and Object).
1
2
3
4
5
6
7
8
9
10
11
# create Bucket
first_bucket = s3_resource.Bucket(name = first_bucket_name)


# create Object
first_object = s3_resource.Object(
    bucket_name = first_bucket_name,
    key = first_file
)

first_object_again = first_bucket.Object(first_file)

The reason you have not seen any errors with creating the first_object variable is that Boto3 doesn’t make calls to AWS to create the reference.

  • The bucket_name and the key are called identifiers, necessary parameters to create an Object.
  • Any other attribute of an Object, such as its size, is lazily loaded.
  • This means that for Boto3 to get the requested attributes, it has to make calls to AWS.


S3 file object

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# create Bucket & Object
first_bucket = s3_resource.Bucket(name = first_bucket_name)
first_object = s3_resource.Object(
    bucket_name = first_bucket_name,
    key = first_file
)


# ------------------- Upload a File Object -------------------
# **Object Instance Version**
first_object.upload_file(Filename=first_file)

# **Bucket Instance Version**
first_bucket.upload_file(Filename=first_file, Key=first_file)

# **Client Version**
s3_resource.meta.client.upload_file(
    Filename=first_file,
    Bucket=first_bucket_name,
    Key=first_file
)

# when you upload an object to S3, that object is private



# get the S3 file object
s3_object = s3_resource.Object(first_bucket_name, first_file)

# ------------------- Download a File Object -------------------
s3_object.download_file(f'/tmp/{first_file}')

# ------------------- Delete a File Object -------------------
s3_object.delete()

Copying an Object Between Buckets

  • copy files from one bucket to another
1
2
3
4
5
6
7
8
def copy_to_bucket(bucket_from_name, bucket_to_name, file_name):
    copy_source = {
        'Bucket': bucket_from_name,
        'Key': file_name
    }
    s3_resource.Object(bucket_to_name, file_name).copy(copy_source)

copy_to_bucket(first_bucket_name, second_bucket_name, first_file)

ACL (Access Control Lists)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# when you upload an object to S3, that object is private.
#
# to make this object available to someone else, set the object’s ACL to be public at creation time.

second_file = create_temp_file(400, 'secondfile.txt', 's')
second_object = s3_resource.Object(first_bucket.name, second_file)

second_object.upload_file(
    second_file,
    ExtraArgs={'ACL': 'public-read'}
)

second_object_acl = second_object.Acl()


# make your object private again, without needing to re-upload it:

response = second_object_acl.put(ACL='private')

Encryption

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Create a new file and upload it using `ServerSideEncryption`:

third_file = create_temp_file(300, 'thirdfile.txt', 't')
third_object = s3_resource.Object(first_bucket_name, third_file)


third_object.upload_file(
    third_file,
    ExtraArgs={'ServerSideEncryption': 'AES256'}
)

# check the algorithm that was used to encrypt the file, in this case `AES256`:

third_object.server_side_encryption
# 'AES256'

Storage

storage classes with S3:

  • STANDARD: default for frequently accessed data
  • STANDARD_IA: for infrequently used data that needs to be retrieved rapidly when requested
  • ONEZONE_IA: for the same use case as STANDARD_IA, but stores the data in one Availability Zone instead of three
  • REDUCED_REDUNDANCY: for frequently used noncritical data that is easily reproducible

If you want to change the storage class of an existing object, you need to recreate the object.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# reupload and set its storage class to Standard_IA:
third_object.upload_file(
    third_file,
    ExtraArgs={
        'ServerSideEncryption': 'AES256',
        'StorageClass': 'STANDARD_IA',
    }
)

# If you make changes to your object, you might find that your local instance doesn’t show them.
# What you need to do at that point is call `.reload()` to fetch the newest version of your object.
# Reload the object, and you can see its new storage class:

third_object.reload()
third_object.storage_class
'STANDARD_IA'

Enable versioning for bucket

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def enable_bucket_versioning(bucket_name):
    s3bucket_versioning = s3_resource.BucketVersioning(bucket_name)
    s3bucket_versioning.enable()
    print(s3bucket_versioning.status)


enable_bucket_versioning(first_bucket_name)
# Enabled




# create two new versions for the first file `Object`, one with the contents of the original file and one with the contents of the third file:


s3_resource.Object(first_bucket_name, first_file).upload_file(first_file)
s3_resource.Object(first_bucket_name, first_file).upload_file(third_file)

# Now reupload the second file, which will create a new version:
s3_resource.Object(first_bucket_name, second_file).upload_file(second_file)


# retrieve the latest available version of your objects like so:
s3_resource.Object(first_bucket_name, first_file).version_id
# 'eQgH6IC1VGcn7eXZ_.ayqm6NdjjhOADv'

Traversals

Bucket Traversal

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# resource’s buckets attribute alongside .all(), gives a complete list of Bucket instances:
for bucket in s3_resource.buckets.all():
    print(bucket.name)
# firstpythonbucket7250e773-c4b1-422a-b51f-c45a52af9304
# secondpythonbucket2d5d99c5-ab96-4c30-b7f7-443a95f72644

You can use the `client` to retrieve the bucket information as well, but the code is more complex, as you need to extract it from the dictionary that the `client` returns:



for bucket_dict in s3_resource.meta.client.list_buckets().get('Buckets'):
    print(bucket_dict['Name'])
# firstpythonbucket7250e773-c4b1-422a-b51f-c45a52af9304
# secondpythonbucket2d5d99c5-ab96-4c30-b7f7-443a95f72644`

Object Traversal

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
for obj in first_bucket.objects.all():
    print(obj.key)
# 127367firstfile.txt
# 616abesecondfile.txt
# fb937cthirdfile.txt

# The `obj` variable is an `ObjectSummary`. a lightweight representation of an `Object`.
# The summary version doesn’t support all of the attributes that the `Object` has. I
# to access them, use the `Object()` sub-resource to create a new reference to the underlying stored key.
# Then you’ll be able to extract the missing attributes:


for obj in first_bucket.objects.all():
    subsrc = obj.Object()
    print(obj.key, obj.storage_class, obj.last_modified, subsrc.version_id, subsrc.metadata)
# 127367firstfile.txt STANDARD 2018-10-05 15:09:46+00:00 eQgH6IC1VGcn7eXZ_.ayqm6NdjjhOADv {}
# 616abesecondfile.txt STANDARD 2018-10-05 15:09:47+00:00 WIaExRLmoksJzLhN7jU5YzoJxYSu6Ey6 {}
# fb937cthirdfile.txt STANDARD_IA 2018-10-05 15:09:05+00:00 null {}

.

This post is licensed under CC BY 4.0 by the author.

Comments powered by Disqus.