S3 Uses: User generated content (Photos, attachments), Log (Cloudtrail, CDN, S3 access logs), Search Indexes, Data Warehouse (storage for Spark), Backup. S3 works on PUT instead of POST
S3 Storage Classes, (Standard, IA, Intelligent, Archive and Deep Archive). Storage classes other than Standard have cost savings but there is a retrieval fee per GB. Intelligent tiering do not have retrieval fee and removes operation overhead (automatically optimizing storage cost with changing access pattern), but there is a monitoring cost to move data between tiers (Standard/Frequent -> infrequent -> Archive -> Deep Archive), so better to keep large files (1MB or more).
For manual movement, use Lifecycle Policies and Analytics tools (S3 lens is new), or in combination of both.
S3 Data Protection:
a) Versioning. how to do cleanup of older versions? (lifecycle rules to expire old versions)
b) Delete markers: A delete marker in Amazon S3 is a placeholder (or marker) for a versioned object that was specified in a simple DELETE
request. A simple DELETE
request is a request that doesn’t specify a version ID. Because the object is in a versioning-enabled bucket, the object is not deleted. But the delete marker makes Amazon S3 behave as if the object is deleted and it becomes the current version. To undelete the object, you must delete this delete marker to retrieve earlier version.
c) S3 Object Lock: With Specified Retention period or legal hold which has no expiration date and need to explicitly remove. For example, suppose that you have an object that is 15 days into a 30-day retention period, and you PUT
an object into Amazon S3 with the same name and a 60-day retention period. In this case, your PUT
request succeeds, and Amazon S3 creates a new version of the object with a 60-day retention period. The older version maintains its original retention period and becomes deletable in 15 days.
S3 Object Lock provides two retention modes that apply different levels of protection to your objects:
- Compliance mode
- Governance mode
In compliance mode, a protected object version can’t be overwritten or deleted by any user, including the root user in your AWS account. When an object is locked in compliance mode, its retention mode can’t be changed, and its retention period can’t be shortened. Compliance mode helps ensure that an object version can’t be overwritten or deleted for the duration of the retention period.
d) Replication: Across region, accounts, multiple buckets (new), and replicate based on lock and SLA backed replication for time based. Replication time control can allow auto replication (ex: 15 mins after upload)
Cost Savings: Use smaller retention period for standard bucket and longer for backup buckets. Storage classes and lifecycle policy.
S3 Inventory reports can be used to see the access patterns based on prefix. If replication fails, the inventory report can be read and use S3 Batch Operations to perform replication.
S3 Security: Block public acces, encrypted by default, Access points (unique hostnames for different teams to same bucket and can apply policies, this helps to split complex S3 policies and teams can manage their own) for shared dataset or restrict access via a VPC only.
S3 Bucket policy vs ACL vs IAM Policy: Bucket policy is enforced at the bucket level, ACL is deprecated and its for objects inside the bucket. Use IAM policy for multiple buckets
S3 Access Analyzer continously monitors buckets to check for public access and shared access across accounts. Remove all public access with a single click.
S3 Bucket Ownership: Get ownership of objects uploaded by another account to your bucket. By default, the upload guy is the owner.