|
Add this skill
npx mdskills install sickn33/azure-storage-file-datalake-pyComprehensive reference with clear examples for all major operations and async patterns
1---2name: azure-storage-file-datalake-py3description: |4 Azure Data Lake Storage Gen2 SDK for Python. Use for hierarchical file systems, big data analytics, and file/directory operations.5 Triggers: "data lake", "DataLakeServiceClient", "FileSystemClient", "ADLS Gen2", "hierarchical namespace".6package: azure-storage-file-datalake7---89# Azure Data Lake Storage Gen2 SDK for Python1011Hierarchical file system for big data analytics workloads.1213## Installation1415```bash16pip install azure-storage-file-datalake azure-identity17```1819## Environment Variables2021```bash22AZURE_STORAGE_ACCOUNT_URL=https://<account>.dfs.core.windows.net23```2425## Authentication2627```python28from azure.identity import DefaultAzureCredential29from azure.storage.filedatalake import DataLakeServiceClient3031credential = DefaultAzureCredential()32account_url = "https://<account>.dfs.core.windows.net"3334service_client = DataLakeServiceClient(account_url=account_url, credential=credential)35```3637## Client Hierarchy3839| Client | Purpose |40|--------|---------|41| `DataLakeServiceClient` | Account-level operations |42| `FileSystemClient` | Container (file system) operations |43| `DataLakeDirectoryClient` | Directory operations |44| `DataLakeFileClient` | File operations |4546## File System Operations4748```python49# Create file system (container)50file_system_client = service_client.create_file_system("myfilesystem")5152# Get existing53file_system_client = service_client.get_file_system_client("myfilesystem")5455# Delete56service_client.delete_file_system("myfilesystem")5758# List file systems59for fs in service_client.list_file_systems():60 print(fs.name)61```6263## Directory Operations6465```python66file_system_client = service_client.get_file_system_client("myfilesystem")6768# Create directory69directory_client = file_system_client.create_directory("mydir")7071# Create nested directories72directory_client = file_system_client.create_directory("path/to/nested/dir")7374# Get directory client75directory_client = file_system_client.get_directory_client("mydir")7677# Delete directory78directory_client.delete_directory()7980# Rename/move directory81directory_client.rename_directory(new_name="myfilesystem/newname")82```8384## File Operations8586### Upload File8788```python89# Get file client90file_client = file_system_client.get_file_client("path/to/file.txt")9192# Upload from local file93with open("local-file.txt", "rb") as data:94 file_client.upload_data(data, overwrite=True)9596# Upload bytes97file_client.upload_data(b"Hello, Data Lake!", overwrite=True)9899# Append data (for large files)100file_client.append_data(data=b"chunk1", offset=0, length=6)101file_client.append_data(data=b"chunk2", offset=6, length=6)102file_client.flush_data(12) # Commit the data103```104105### Download File106107```python108file_client = file_system_client.get_file_client("path/to/file.txt")109110# Download all content111download = file_client.download_file()112content = download.readall()113114# Download to file115with open("downloaded.txt", "wb") as f:116 download = file_client.download_file()117 download.readinto(f)118119# Download range120download = file_client.download_file(offset=0, length=100)121```122123### Delete File124125```python126file_client.delete_file()127```128129## List Contents130131```python132# List paths (files and directories)133for path in file_system_client.get_paths():134 print(f"{'DIR' if path.is_directory else 'FILE'}: {path.name}")135136# List paths in directory137for path in file_system_client.get_paths(path="mydir"):138 print(path.name)139140# Recursive listing141for path in file_system_client.get_paths(path="mydir", recursive=True):142 print(path.name)143```144145## File/Directory Properties146147```python148# Get properties149properties = file_client.get_file_properties()150print(f"Size: {properties.size}")151print(f"Last modified: {properties.last_modified}")152153# Set metadata154file_client.set_metadata(metadata={"processed": "true"})155```156157## Access Control (ACL)158159```python160# Get ACL161acl = directory_client.get_access_control()162print(f"Owner: {acl['owner']}")163print(f"Permissions: {acl['permissions']}")164165# Set ACL166directory_client.set_access_control(167 owner="user-id",168 permissions="rwxr-x---"169)170171# Update ACL entries172from azure.storage.filedatalake import AccessControlChangeResult173directory_client.update_access_control_recursive(174 acl="user:user-id:rwx"175)176```177178## Async Client179180```python181from azure.storage.filedatalake.aio import DataLakeServiceClient182from azure.identity.aio import DefaultAzureCredential183184async def datalake_operations():185 credential = DefaultAzureCredential()186187 async with DataLakeServiceClient(188 account_url="https://<account>.dfs.core.windows.net",189 credential=credential190 ) as service_client:191 file_system_client = service_client.get_file_system_client("myfilesystem")192 file_client = file_system_client.get_file_client("test.txt")193194 await file_client.upload_data(b"async content", overwrite=True)195196 download = await file_client.download_file()197 content = await download.readall()198199import asyncio200asyncio.run(datalake_operations())201```202203## Best Practices2042051. **Use hierarchical namespace** for file system semantics2062. **Use `append_data` + `flush_data`** for large file uploads2073. **Set ACLs at directory level** and inherit to children2084. **Use async client** for high-throughput scenarios2095. **Use `get_paths` with `recursive=True`** for full directory listing2106. **Set metadata** for custom file attributes2117. **Consider Blob API** for simple object storage use cases212
Full transparency — inspect the skill content before installing.