AWS - boto3 - S3

Posted Jul 18, 2020

By Grace L

8 min read

S3
Action
Traversals
- Bucket Traversal
- Object Traversal

S3

Action

Create random file

  
# If another user has already claimed the bucket name: -> botocore.errorfactory.BucketAlreadyExist
# increase your chance of success when creating your bucket by picking a random name.
import uuid

def create_bucket_name(bucket_prefix):
    # The generated bucket name must be between 3 and 63 chars long
    return ''.join([bucket_prefix, str(uuid.uuid4())])


# Create random file
def create_temp_file(size, file_name, file_content):
    random_file_name = ''.join([str(uuid.uuid4().hex[:6]), file_name])
    with open(random_file_name, 'w') as f:
        f.write(str(file_content) * size)
    return random_file_name


def create_temp_file(file_name, file_content):
    random_file_name = ''.join([str(uuid.uuid4().hex[:6]), file_name])
    with open(random_file_name, 'w') as f:
        f.write(str(file_content))
    return random_file_name


# Create file
first_file = create_temp_file(300, 'firstfile.txt', 'helloworld')

Understanding Sub-resources

Bucket and Object are sub-resources of one another.

Sub-resources are methods that create a new instance of a child resource.
The parent’s identifiers get passed to the child resource.

Creating a Bucket `s3_resource.create_bucket()`

  
import boto3

# -------------- hardcode --------------
# unless your region is in the United States, you’ll need to define the region explicitly when you are creating the bucket.
# Otherwise -> IllegalLocationConstraintException
# To exemplify what this means when you’re creating your S3 bucket in a non-US region, take a look at the code below:
s3_resource = boto3.resource('s3')
s3_resource.create_bucket(
    Bucket = YOUR_BUCKET_NAME,
    CreateBucketConfiguration = {
        'LocationConstraint': 'eu-west-1'
    }
)


# taking advantage of a `session` object.
# - Boto3 will create the `session` from your credentials.
# - You just need to take the region and pass it to `create_bucket()` as its `LocationConstraint` configuration. Here’s how to do that:


s3_conn = boto3.resource('s3')
s3_conn = boto3.client('s3')


def create_bucket(bucket_prefix, s3_conn):
    session = boto3.session.Session()
    current_region = session.region_name
    bucket_name = create_bucket_name(bucket_prefix)

    bucket_response = s3_conn.create_bucket(
        Bucket = bucket_name,
        CreateBucketConfiguration= {
        'LocationConstraint': current_region
        }
    )

    print(bucket_name, current_region)
    return bucket_name, bucket_response

no matter where you want to deploy it: locally/EC2/Lambda. no need to hardcode your region.

both the client and the resource create buckets in the same way

  
# create one using the client
# gives back bucket_response as a dictionary:


first_bucket_name, first_response = create_bucket(
    bucket_prefix = 'fbucket',
    s3_connection=s3_resource.meta.client
)

first_response
# {'ResponseMetadata': {
#     'RequestId': 'E1DCFE71EDE7C1EC',
#     'HostId': 'r3AP32NQk9dvbHSEPIbyYADT769VQEN/+xT2BPM6HCnuCb3Z/GhR2SBP+GM7IjcxbBN7SQ+k+9B=',
#     'HTTPStatusCode': 200,
#     'HTTPHeaders': {
#         'x-amz-id-2': 'r3AP32NQk9dvbHSEPIbyYADT769VQEN/+xT2BPM6HCnuCb3Z/GhR2SBP+GM7IjcxbBN7SQ+k+9B=',
#         'x-amz-request-id': 'E1DCFE71EDE7C1EC',
#         'date': 'Fri, 05 Oct 2018 15:00:00 GMT',
#         'location': 'https://firstpythonbucket7250e773-c4b1-422a-b51f-c45a52af9304.s3.amazonaws.com/',
#         'content-length': '0',
#         'server': 'AmazonS3'
#     },
#     'RetryAttempts': 0},
#     'Location': 'https://firstpythonbucket7250e773-c4b1-422a-b51f-c45a52af9304.s3.amazonaws.com/'
# }


# create bucket using the resource
# gives you back a Bucket instance as the `bucket_response`:

second_bucket_name, second_response = create_bucket(
    bucket_prefix='secondpythonbucket',
    s3_connection=s3_resource
)

second_response
# s3.Bucket(name='secondpythonbucket2d5d99c5-ab96-4c30-b7f7-443a95f72644')

Deleting Buckets and Objects

Deleting a Non-empty Bucket

To be able to delete a bucket, you must first delete every single object within the bucket,
or else the BucketNotEmpty exception will be raised. When you have a versioned bucket,
you need to delete every object and all its versions.

  
def delete_all_objects(bucket_name):
    res = []
    s3bucket=s3_resource.Bucket(bucket_name)
    for obj_version in s3bucket.object_versions.all():
        res.append({'Key': obj_version.object_key, 'VersionId': obj_version.id})
    print(res)
    s3bucket.delete_objects(Delete={'Objects': res})


delete_all_objects(first_bucket_name)
# [
# {'Key': '127367firstfile.txt', 'VersionId': 'eQgH6IC1VGcn7eXZ_.ayqm6NdjjhOADv'},
# {'Key': '127367firstfile.txt', 'VersionId': 'UnQTaps14o3c1xdzh09Cyqg_hq4SjB53'},
# {'Key': '127367firstfile.txt', 'VersionId': 'null'},
# {'Key': '616abesecondfile.txt', 'VersionId': 'WIaExRLmoksJzLhN7jU5YzoJxYSu6Ey6'},
# {'Key': '616abesecondfile.txt', 'VersionId': 'null'},
# {'Key': 'fb937cthirdfile.txt', 'VersionId': 'null'}
# ]


s3_resource.Object(second_bucket_name, first_file).upload_file(first_file)
delete_all_objects(second_bucket_name)
# [{'Key': '9c8b44firstfile.txt', 'VersionId': 'null'}]

Deleting Buckets

  
s3_resource.Bucket(first_bucket_name).delete()

s3_resource.meta.client.delete_bucket(Bucket=second_bucket_name)

Creating `Bucket` and `Object` Instances

using the resource, have access to the high-level classes (Bucket and Object).

  
# create Bucket
first_bucket = s3_resource.Bucket(name = first_bucket_name)


# create Object
first_object = s3_resource.Object(
    bucket_name = first_bucket_name,
    key = first_file
)

first_object_again = first_bucket.Object(first_file)

The reason you have not seen any errors with creating the first_object variable is that Boto3 doesn’t make calls to AWS to create the reference.

The bucket_name and the key are called identifiers, necessary parameters to create an Object.
Any other attribute of an Object, such as its size, is lazily loaded.
This means that for Boto3 to get the requested attributes, it has to make calls to AWS.

S3 file object

  
# create Bucket & Object
first_bucket = s3_resource.Bucket(name = first_bucket_name)
first_object = s3_resource.Object(
    bucket_name = first_bucket_name,
    key = first_file
)


# ------------------- Upload a File Object -------------------
# **Object Instance Version**
first_object.upload_file(Filename=first_file)

# **Bucket Instance Version**
first_bucket.upload_file(Filename=first_file, Key=first_file)

# **Client Version**
s3_resource.meta.client.upload_file(
    Filename=first_file,
    Bucket=first_bucket_name,
    Key=first_file
)

# when you upload an object to S3, that object is private



# get the S3 file object
s3_object = s3_resource.Object(first_bucket_name, first_file)

# ------------------- Download a File Object -------------------
s3_object.download_file(f'/tmp/{first_file}')

# ------------------- Delete a File Object -------------------
s3_object.delete()

Copying an Object Between Buckets

copy files from one bucket to another

  
def copy_to_bucket(bucket_from_name, bucket_to_name, file_name):
    copy_source = {
        'Bucket': bucket_from_name,
        'Key': file_name
    }
    s3_resource.Object(bucket_to_name, file_name).copy(copy_source)

copy_to_bucket(first_bucket_name, second_bucket_name, first_file)

ACL (Access Control Lists)

  
# when you upload an object to S3, that object is private.
#
# to make this object available to someone else, set the object’s ACL to be public at creation time.

second_file = create_temp_file(400, 'secondfile.txt', 's')
second_object = s3_resource.Object(first_bucket.name, second_file)

second_object.upload_file(
    second_file,
    ExtraArgs={'ACL': 'public-read'}
)

second_object_acl = second_object.Acl()


# make your object private again, without needing to re-upload it:

response = second_object_acl.put(ACL='private')

Encryption

  
# Create a new file and upload it using `ServerSideEncryption`:

third_file = create_temp_file(300, 'thirdfile.txt', 't')
third_object = s3_resource.Object(first_bucket_name, third_file)


third_object.upload_file(
    third_file,
    ExtraArgs={'ServerSideEncryption': 'AES256'}
)

# check the algorithm that was used to encrypt the file, in this case `AES256`:

third_object.server_side_encryption
# 'AES256'

Storage

storage classes with S3:

STANDARD: default for frequently accessed data
STANDARD_IA: for infrequently used data that needs to be retrieved rapidly when requested
ONEZONE_IA: for the same use case as STANDARD_IA, but stores the data in one Availability Zone instead of three
REDUCED_REDUNDANCY: for frequently used noncritical data that is easily reproducible

If you want to change the storage class of an existing object, you need to recreate the object.

  
# reupload and set its storage class to Standard_IA:
third_object.upload_file(
    third_file,
    ExtraArgs={
        'ServerSideEncryption': 'AES256',
        'StorageClass': 'STANDARD_IA',
    }
)

# If you make changes to your object, you might find that your local instance doesn’t show them.
# What you need to do at that point is call `.reload()` to fetch the newest version of your object.
# Reload the object, and you can see its new storage class:

third_object.reload()
third_object.storage_class
'STANDARD_IA'

Enable versioning for bucket

  
def enable_bucket_versioning(bucket_name):
    s3bucket_versioning = s3_resource.BucketVersioning(bucket_name)
    s3bucket_versioning.enable()
    print(s3bucket_versioning.status)

enable_bucket_versioning(first_bucket_name)
# Enabled

# create two new versions for the first file `Object`, one with the contents of the original file and one with the contents of the third file:

s3_resource.Object(first_bucket_name, first_file).upload_file(first_file)
s3_resource.Object(first_bucket_name, first_file).upload_file(third_file)

# Now reupload the second file, which will create a new version:
s3_resource.Object(first_bucket_name, second_file).upload_file(second_file)

# retrieve the latest available version of your objects like so:
s3_resource.Object(first_bucket_name, first_file).version_id
# 'eQgH6IC1VGcn7eXZ_.ayqm6NdjjhOADv'

Traversals

Bucket Traversal

  
# resource’s buckets attribute alongside .all(), gives a complete list of Bucket instances:
for bucket in s3_resource.buckets.all():
    print(bucket.name)
# firstpythonbucket7250e773-c4b1-422a-b51f-c45a52af9304
# secondpythonbucket2d5d99c5-ab96-4c30-b7f7-443a95f72644

You can use the `client` to retrieve the bucket information as well, but the code is more complex, as you need to extract it from the dictionary that the `client` returns:



for bucket_dict in s3_resource.meta.client.list_buckets().get('Buckets'):
    print(bucket_dict['Name'])
# firstpythonbucket7250e773-c4b1-422a-b51f-c45a52af9304
# secondpythonbucket2d5d99c5-ab96-4c30-b7f7-443a95f72644`

Object Traversal

  
for obj in first_bucket.objects.all():
    print(obj.key)
# 127367firstfile.txt
# 616abesecondfile.txt
# fb937cthirdfile.txt

# The `obj` variable is an `ObjectSummary`. a lightweight representation of an `Object`.
# The summary version doesn’t support all of the attributes that the `Object` has. I
# to access them, use the `Object()` sub-resource to create a new reference to the underlying stored key.
# Then you’ll be able to extract the missing attributes:


for obj in first_bucket.objects.all():
    subsrc = obj.Object()
    print(obj.key, obj.storage_class, obj.last_modified, subsrc.version_id, subsrc.metadata)
# 127367firstfile.txt STANDARD 2018-10-05 15:09:46+00:00 eQgH6IC1VGcn7eXZ_.ayqm6NdjjhOADv {}
# 616abesecondfile.txt STANDARD 2018-10-05 15:09:47+00:00 WIaExRLmoksJzLhN7jU5YzoJxYSu6Ey6 {}
# fb937cthirdfile.txt STANDARD_IA 2018-10-05 15:09:05+00:00 null {}

01AWS, boto3

AWS

This post is licensed under CC BY 4.0 by the author.