Run data processing scripts on Lyceum Cloud with access to your stored files and automatic dependency management.
CSV Analysis
Upload data to storage, then process it:
# Upload your data
lyceum python run analyze.py -m cpu
# analyze.py
import pandas as pd
import json
# Read from Lyceum storage
df = pd.read_csv('/lyceum/storage/sales_data.csv')
print(f"Loaded {len(df)} rows")
# Analysis
summary = df.groupby('region').agg({
'revenue': ['sum', 'mean', 'count']
}).round(2)
print(summary)
# Save results back to storage
report = {
'total_revenue': float(df['revenue'].sum()),
'total_rows': len(df),
'top_region': df.groupby('region')['revenue'].sum().idxmax()
}
with open('/lyceum/storage/report.json', 'w') as f:
json.dump(report, f, indent=2)
print(f"Report saved. Total revenue: ${report['total_revenue']:,.2f}")
Batch File Processing
Process multiple files from storage:
import os
import pandas as pd
storage = '/lyceum/storage/'
csv_files = [f for f in os.listdir(storage) if f.endswith('.csv')]
print(f"Found {len(csv_files)} CSV files")
all_data = []
for filename in csv_files:
df = pd.read_csv(os.path.join(storage, filename))
df['source'] = filename
all_data.append(df)
print(f" {filename}: {len(df)} rows")
combined = pd.concat(all_data, ignore_index=True)
combined.to_csv(f'{storage}combined.csv', index=False)
print(f"Combined: {len(combined)} total rows")
Data Visualization
Generate plots and save to storage for download:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('/lyceum/storage/data.csv')
fig, ax = plt.subplots(figsize=(10, 6))
df.groupby('month')['revenue'].sum().plot(kind='bar', ax=ax)
ax.set_title('Monthly Revenue')
ax.set_ylabel('Revenue ($)')
plt.tight_layout()
plt.savefig('/lyceum/storage/chart.png', dpi=300)
print("Chart saved to storage")
Docker-based Processing
For complex dependencies, use a Docker container:
lyceum docker run jupyter/scipy-notebook:latest \
-c "python /lyceum/storage/pipeline.py" \
-m cpu \
-t 600
Files saved to /lyceum/storage/ persist across executions. Upload input data first via the dashboard, VS Code extension, or API, then reference it in your scripts.
For large datasets, consider using a higher-spec machine type. CPU instances provide 4 vCPU and 16 GB RAM by default.