Azure Storage Blob - How to List Blob, Download Blob from Azure Storage container in Python (pypy libs)

February 03, 2021

Introduction

In this tutorial we will see:

  • How to instantiate different classes required for talking to Azure storage container
  • How to authenticate
    • if we have account key
    • if we have sas_token
    • No Auth (Just container name)
  • How to use with Proxy
  • List Blobs for a storage container
    • Attributes of each blob object
  • Download blob

And, everything will be in Python

Pre-requisite

This tutorial is based upon Python-3.7

Pypy Dependency

We would require azure-storage-blob. Code is tested with version 12.7.1

How to Authenticate and Instantiate

from azure.storage.blob import BlobServiceClient

# consider a dictionary container
container = {
  'account_name': 'your_account_name',
  'container_name': 'your_container_name',
  'sas_token': 'xxxxxxx'
}

if "account_key" in container:
    blob_service = BlobServiceClient(
        account_url=account_url, credential=container["account_key"])
elif "sas_token" in container:
    blob_service = BlobServiceClient(
        account_url=account_url, credential=container["sas_token"])
else:
    blob_service = BlobServiceClient(account_url=account_url)

# Now to get instance of class which has list_blob methods
container_client = blob_service.get_container_client(container['container_name'])

In above code, we are just instantiating client classes required for the operation and authenticate. In my example, I have a sas_token.

Complete Example for list and download blobs (with proxy configuration as well)

import os
from azure.storage.blob import BlobServiceClient

def _create_dirs(dest_path):
    if not os.path.exists(dest_path):
        os.makedirs(dest_path)
    elif not os.path.isdir(dest_path):
        shutil.rmtree(dest_path)
        os.makedirs(dest_path)

def _get_container_service(container):
    account_url = f'https://{container["account_name"]}.blob.core.windows.net'
    
    proxies = None
    if 'proxy' in container:
        proxies = {'http': container['proxy']}
    # If 'proxy' isn't specified in container block, check if 'https_proxy' is set.
    elif 'https_proxy' in container:
        proxies = {'https': container['https_proxy']}

    # instantiate based upon credential
    if "account_key" in container:
        blob_service = BlobServiceClient(
            account_url=account_url, credential=container["account_key"], proxies=proxies)
    elif "sas_token" in container:
        blob_service = BlobServiceClient(
            account_url=account_url, credential=container["sas_token"], proxies=proxies)
    else:
        blob_service = BlobServiceClient(account_url=account_url, proxies=proxies)

    return blob_service.get_container_client(container['container_name'])

def download_blobs(container, dest_path):
    ## You might want to handle some exceptions here
    _create_dirs(dest_path)

    # Get the container instance
    blob_service = _get_container_service(container)

    # Note: list_blobs returns an iterator
    blob_list = blob_service.list_blobs()
    
    for blob in blob_list:
        fname = os.path.join(dest_path, blob.name)
        print(f'Downloading {blob.name} to {fname}')

        # get blob client which has download_blob method
        blob_client = blob_service.get_blob_client(blob)

        # create base dirs if not exists
        _create_dirs(os.path.dirname(fname))
        
        with open(fname, "wb") as download_file:
            download_file.write(blob_client.download_blob().readall())


## main starts here
local_dest_path = './container_blob'

container = {
    'account_name': 'your_account_name',
    'container_name': 'your_container_name',
    'sas_token': 'xxxxxxx'
}

download_blobs(container, local_dest_path)

Above script is very simple to understand. My container has nested directories and files. The code iterate over all files and downloads one by one.

Attributes of a Blob object

{
  'name': 'fdg/cert_discovery.fdg', 
  'snapshot': None, 
  'content': None, 
  'properties': {
    'blob_type': 'BlockBlob', 
    'last_modified': datetime.datetime(2019, 12, 2, 9, 42, 50, tzinfo=tzutc()), 
    'etag': '0x8D7770BFF1CC8A1', 
    'content_length': 423, 
    'content_range': None, 
    'append_blob_committed_block_count': None, 
    'page_blob_sequence_number': None, 
    'server_encrypted': True, 
    'copy': {
      'id': None, 
      'source': None, 
      'status': None, 
      'progress': None, 
      'completion_time': None, 
      'status_description': None
    }, 
    'content_settings': {
      'content_type': 'application/octet-stream', 
      'content_encoding': None, 
      'content_language': None, 
      'content_disposition': None, 
      'cache_control': None, 
      'content_md5': '3ycLC3CutKkybJtlgvEdsQ=='
    }, 
    'lease': {
      'status': 'unlocked', 
      'state': 'available', 
      'duration': None
    }, 
    'blob_tier': None, 
    'blob_tier_change_time': None, 
    'blob_tier_inferred': False, 
    'deleted_time': None, 
    'remaining_retention_days': None, 
    'creation_time': datetime.datetime(2019, 11, 28, 11, 52, 5, tzinfo=tzutc())
  },
  'metadata': None, 
  'deleted': False
}

Usage with only Python library, not Azure libraries

For usage without Azure libraries, see: List and Download Azure blobs by Python Libraries

Let me know if you face any difficulties, and I will try to resolve them.


Similar Posts

Latest Posts