S3
Action
Create random file
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
| # If another user has already claimed the bucket name: -> botocore.errorfactory.BucketAlreadyExist
# increase your chance of success when creating your bucket by picking a random name.
import uuid
def create_bucket_name(bucket_prefix):
# The generated bucket name must be between 3 and 63 chars long
return ''.join([bucket_prefix, str(uuid.uuid4())])
# Create random file
def create_temp_file(size, file_name, file_content):
random_file_name = ''.join([str(uuid.uuid4().hex[:6]), file_name])
with open(random_file_name, 'w') as f:
f.write(str(file_content) * size)
return random_file_name
def create_temp_file(file_name, file_content):
random_file_name = ''.join([str(uuid.uuid4().hex[:6]), file_name])
with open(random_file_name, 'w') as f:
f.write(str(file_content))
return random_file_name
# Create file
first_file = create_temp_file(300, 'firstfile.txt', 'helloworld')
|
Understanding Sub-resources
Bucket
and Object
are sub-resources of one another.
- Sub-resources are methods that create a new instance of a child resource.
- The parent’s identifiers get passed to the child resource.
Creating a Bucket s3_resource.create_bucket()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
|
import boto3
# -------------- hardcode --------------
# unless your region is in the United States, you’ll need to define the region explicitly when you are creating the bucket.
# Otherwise -> IllegalLocationConstraintException
# To exemplify what this means when you’re creating your S3 bucket in a non-US region, take a look at the code below:
s3_resource = boto3.resource('s3')
s3_resource.create_bucket(
Bucket = YOUR_BUCKET_NAME,
CreateBucketConfiguration = {
'LocationConstraint': 'eu-west-1'
}
)
# taking advantage of a `session` object.
# - Boto3 will create the `session` from your credentials.
# - You just need to take the region and pass it to `create_bucket()` as its `LocationConstraint` configuration. Here’s how to do that:
s3_conn = boto3.resource('s3')
s3_conn = boto3.client('s3')
def create_bucket(bucket_prefix, s3_conn):
session = boto3.session.Session()
current_region = session.region_name
bucket_name = create_bucket_name(bucket_prefix)
bucket_response = s3_conn.create_bucket(
Bucket = bucket_name,
CreateBucketConfiguration= {
'LocationConstraint': current_region
}
)
print(bucket_name, current_region)
return bucket_name, bucket_response
|
no matter where you want to deploy it: locally/EC2/Lambda. no need to hardcode your region.
both the client and the resource create buckets in the same way
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
| # create one using the client
# gives back bucket_response as a dictionary:
first_bucket_name, first_response = create_bucket(
bucket_prefix = 'fbucket',
s3_connection=s3_resource.meta.client
)
first_response
# {'ResponseMetadata': {
# 'RequestId': 'E1DCFE71EDE7C1EC',
# 'HostId': 'r3AP32NQk9dvbHSEPIbyYADT769VQEN/+xT2BPM6HCnuCb3Z/GhR2SBP+GM7IjcxbBN7SQ+k+9B=',
# 'HTTPStatusCode': 200,
# 'HTTPHeaders': {
# 'x-amz-id-2': 'r3AP32NQk9dvbHSEPIbyYADT769VQEN/+xT2BPM6HCnuCb3Z/GhR2SBP+GM7IjcxbBN7SQ+k+9B=',
# 'x-amz-request-id': 'E1DCFE71EDE7C1EC',
# 'date': 'Fri, 05 Oct 2018 15:00:00 GMT',
# 'location': 'https://firstpythonbucket7250e773-c4b1-422a-b51f-c45a52af9304.s3.amazonaws.com/',
# 'content-length': '0',
# 'server': 'AmazonS3'
# },
# 'RetryAttempts': 0},
# 'Location': 'https://firstpythonbucket7250e773-c4b1-422a-b51f-c45a52af9304.s3.amazonaws.com/'
# }
# create bucket using the resource
# gives you back a Bucket instance as the `bucket_response`:
second_bucket_name, second_response = create_bucket(
bucket_prefix='secondpythonbucket',
s3_connection=s3_resource
)
second_response
# s3.Bucket(name='secondpythonbucket2d5d99c5-ab96-4c30-b7f7-443a95f72644')
|
Deleting Buckets and Objects
Deleting a Non-empty Bucket
- To be able to delete a bucket, you must first delete every single object within the bucket,
- or else the
BucketNotEmpty
exception will be raised. When you have a versioned bucket, you need to delete every object and all its versions.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
| def delete_all_objects(bucket_name):
res = []
s3bucket=s3_resource.Bucket(bucket_name)
for obj_version in s3bucket.object_versions.all():
res.append({'Key': obj_version.object_key, 'VersionId': obj_version.id})
print(res)
s3bucket.delete_objects(Delete={'Objects': res})
delete_all_objects(first_bucket_name)
# [
# {'Key': '127367firstfile.txt', 'VersionId': 'eQgH6IC1VGcn7eXZ_.ayqm6NdjjhOADv'},
# {'Key': '127367firstfile.txt', 'VersionId': 'UnQTaps14o3c1xdzh09Cyqg_hq4SjB53'},
# {'Key': '127367firstfile.txt', 'VersionId': 'null'},
# {'Key': '616abesecondfile.txt', 'VersionId': 'WIaExRLmoksJzLhN7jU5YzoJxYSu6Ey6'},
# {'Key': '616abesecondfile.txt', 'VersionId': 'null'},
# {'Key': 'fb937cthirdfile.txt', 'VersionId': 'null'}
# ]
s3_resource.Object(second_bucket_name, first_file).upload_file(first_file)
delete_all_objects(second_bucket_name)
# [{'Key': '9c8b44firstfile.txt', 'VersionId': 'null'}]
|
Deleting Buckets
1
2
3
| s3_resource.Bucket(first_bucket_name).delete()
s3_resource.meta.client.delete_bucket(Bucket=second_bucket_name)
|
Creating Bucket
and Object
Instances
- using the resource, have access to the high-level classes (
Bucket
and Object
).
1
2
3
4
5
6
7
8
9
10
11
| # create Bucket
first_bucket = s3_resource.Bucket(name = first_bucket_name)
# create Object
first_object = s3_resource.Object(
bucket_name = first_bucket_name,
key = first_file
)
first_object_again = first_bucket.Object(first_file)
|
The reason you have not seen any errors with creating the first_object
variable is that Boto3 doesn’t make calls to AWS to create the reference.
- The
bucket_name
and the key
are called identifiers, necessary parameters to create an Object
. - Any other attribute of an
Object
, such as its size, is lazily loaded. - This means that for Boto3 to get the requested attributes, it has to make calls to AWS.
S3 file object
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
| # create Bucket & Object
first_bucket = s3_resource.Bucket(name = first_bucket_name)
first_object = s3_resource.Object(
bucket_name = first_bucket_name,
key = first_file
)
# ------------------- Upload a File Object -------------------
# **Object Instance Version**
first_object.upload_file(Filename=first_file)
# **Bucket Instance Version**
first_bucket.upload_file(Filename=first_file, Key=first_file)
# **Client Version**
s3_resource.meta.client.upload_file(
Filename=first_file,
Bucket=first_bucket_name,
Key=first_file
)
# when you upload an object to S3, that object is private
# get the S3 file object
s3_object = s3_resource.Object(first_bucket_name, first_file)
# ------------------- Download a File Object -------------------
s3_object.download_file(f'/tmp/{first_file}')
# ------------------- Delete a File Object -------------------
s3_object.delete()
|
Copying an Object Between Buckets
- copy files from one bucket to another
1
2
3
4
5
6
7
8
| def copy_to_bucket(bucket_from_name, bucket_to_name, file_name):
copy_source = {
'Bucket': bucket_from_name,
'Key': file_name
}
s3_resource.Object(bucket_to_name, file_name).copy(copy_source)
copy_to_bucket(first_bucket_name, second_bucket_name, first_file)
|
ACL (Access Control Lists)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| # when you upload an object to S3, that object is private.
#
# to make this object available to someone else, set the object’s ACL to be public at creation time.
second_file = create_temp_file(400, 'secondfile.txt', 's')
second_object = s3_resource.Object(first_bucket.name, second_file)
second_object.upload_file(
second_file,
ExtraArgs={'ACL': 'public-read'}
)
second_object_acl = second_object.Acl()
# make your object private again, without needing to re-upload it:
response = second_object_acl.put(ACL='private')
|
Encryption
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| # Create a new file and upload it using `ServerSideEncryption`:
third_file = create_temp_file(300, 'thirdfile.txt', 't')
third_object = s3_resource.Object(first_bucket_name, third_file)
third_object.upload_file(
third_file,
ExtraArgs={'ServerSideEncryption': 'AES256'}
)
# check the algorithm that was used to encrypt the file, in this case `AES256`:
third_object.server_side_encryption
# 'AES256'
|
Storage
storage classes with S3:
- STANDARD: default for frequently accessed data
- STANDARD_IA: for infrequently used data that needs to be retrieved rapidly when requested
- ONEZONE_IA: for the same use case as STANDARD_IA, but stores the data in one Availability Zone instead of three
- REDUCED_REDUNDANCY: for frequently used noncritical data that is easily reproducible
If you want to change the storage class of an existing object, you need to recreate the object.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| # reupload and set its storage class to Standard_IA:
third_object.upload_file(
third_file,
ExtraArgs={
'ServerSideEncryption': 'AES256',
'StorageClass': 'STANDARD_IA',
}
)
# If you make changes to your object, you might find that your local instance doesn’t show them.
# What you need to do at that point is call `.reload()` to fetch the newest version of your object.
# Reload the object, and you can see its new storage class:
third_object.reload()
third_object.storage_class
'STANDARD_IA'
|
Enable versioning for bucket
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
| def enable_bucket_versioning(bucket_name):
s3bucket_versioning = s3_resource.BucketVersioning(bucket_name)
s3bucket_versioning.enable()
print(s3bucket_versioning.status)
enable_bucket_versioning(first_bucket_name)
# Enabled
# create two new versions for the first file `Object`, one with the contents of the original file and one with the contents of the third file:
s3_resource.Object(first_bucket_name, first_file).upload_file(first_file)
s3_resource.Object(first_bucket_name, first_file).upload_file(third_file)
# Now reupload the second file, which will create a new version:
s3_resource.Object(first_bucket_name, second_file).upload_file(second_file)
# retrieve the latest available version of your objects like so:
s3_resource.Object(first_bucket_name, first_file).version_id
# 'eQgH6IC1VGcn7eXZ_.ayqm6NdjjhOADv'
|
Traversals
Bucket Traversal
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| # resource’s buckets attribute alongside .all(), gives a complete list of Bucket instances:
for bucket in s3_resource.buckets.all():
print(bucket.name)
# firstpythonbucket7250e773-c4b1-422a-b51f-c45a52af9304
# secondpythonbucket2d5d99c5-ab96-4c30-b7f7-443a95f72644
You can use the `client` to retrieve the bucket information as well, but the code is more complex, as you need to extract it from the dictionary that the `client` returns:
for bucket_dict in s3_resource.meta.client.list_buckets().get('Buckets'):
print(bucket_dict['Name'])
# firstpythonbucket7250e773-c4b1-422a-b51f-c45a52af9304
# secondpythonbucket2d5d99c5-ab96-4c30-b7f7-443a95f72644`
|
Object Traversal
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| for obj in first_bucket.objects.all():
print(obj.key)
# 127367firstfile.txt
# 616abesecondfile.txt
# fb937cthirdfile.txt
# The `obj` variable is an `ObjectSummary`. a lightweight representation of an `Object`.
# The summary version doesn’t support all of the attributes that the `Object` has. I
# to access them, use the `Object()` sub-resource to create a new reference to the underlying stored key.
# Then you’ll be able to extract the missing attributes:
for obj in first_bucket.objects.all():
subsrc = obj.Object()
print(obj.key, obj.storage_class, obj.last_modified, subsrc.version_id, subsrc.metadata)
# 127367firstfile.txt STANDARD 2018-10-05 15:09:46+00:00 eQgH6IC1VGcn7eXZ_.ayqm6NdjjhOADv {}
# 616abesecondfile.txt STANDARD 2018-10-05 15:09:47+00:00 WIaExRLmoksJzLhN7jU5YzoJxYSu6Ey6 {}
# fb937cthirdfile.txt STANDARD_IA 2018-10-05 15:09:05+00:00 null {}
|
.
Comments powered by Disqus.