GPT Prompts.
Prompt 1: Incident Response Playbook Development
As an SRE at [Company], I am tasked with enhancing our incident response protocols. Could you assist in developing a comprehensive incident response playbook that includes:
- A detailed incident classification system (e.g., P1, P2, P3) with corresponding response procedures.
- Escalation paths and communication templates for internal and external stakeholders.
- Post-incident review processes to identify root causes and implement corrective actions.
Please provide the playbook in a structured format, incorporating industry best practices and real-world examples.
Prompt 2: Service Level Objectives (SLOs) and Error Budget Policy
In our efforts to balance feature development with system reliability, I need to establish clear Service Level Objectives (SLOs) and an error budget policy. Could you help by:
- Defining measurable SLOs for key services, including availability and latency targets.
- Creating an error budget framework that outlines acceptable risk levels and triggers for corrective actions.
- Suggesting monitoring tools and dashboards to track SLO compliance in real-time.
Provide the output as a policy document with practical examples and implementation guidelines.
Prompt 3: Automation of Repetitive Operational Tasks
To improve efficiency and reduce manual intervention, I aim to automate routine operational tasks. Could you assist in:
- Identifying high-impact tasks suitable for automation (e.g., log rotation, system health checks).
- Recommending automation tools (e.g., Ansible, Terraform) and scripting languages (e.g., Python, Bash) for implementation.
- Providing sample scripts or playbooks for automating these tasks.
Please present the information in a tabular format, detailing tasks, tools, and example scripts.
Prompt 4: Capacity Planning and Scalability Assessment
As part of our growth strategy, I need to conduct a capacity planning and scalability assessment. Could you help by:
- Developing a methodology to forecast resource requirements based on historical data and projected growth.
- Identifying potential bottlenecks in the current infrastructure and suggesting mitigation strategies.
- Recommending tools for capacity monitoring and predictive analysis (e.g., Prometheus, Grafana).
Provide the assessment in a report format, including charts and data-driven insights.
Prompt 5: Implementation of Chaos Engineering Practices
To enhance system resilience, I plan to introduce chaos engineering experiments. Could you assist in:
- Designing experiments to simulate failures (e.g., server crashes, network latency) in a controlled environment.
- Establishing safety measures to prevent unintended consequences during testing.
- Creating a framework for analyzing experiment outcomes and implementing improvements.
Please provide a step-by-step guide with examples of successful chaos engineering practices.
Prompt 6: Development of a Comprehensive Monitoring Strategy
I need to develop a monitoring strategy that provides end-to-end visibility into our systems. Could you help by:
- Identifying key performance indicators (KPIs) for infrastructure, applications, and user experience.
- Recommending monitoring tools (e.g., Nagios, Datadog) and log management solutions (e.g., ELK Stack).
- Designing alerting mechanisms with appropriate thresholds to minimize false positives.
Provide the strategy in a document format, including diagrams of the monitoring architecture.
Prompt 7: Disaster Recovery and Business Continuity Planning
To ensure preparedness for unforeseen events, I need to develop a disaster recovery and business continuity plan. Could you assist in:
- Conducting a risk assessment to identify critical systems and potential vulnerabilities.
- Defining recovery time objectives (RTOs) and recovery point objectives (RPOs) for essential services.
- Creating a step-by-step recovery procedure, including data backup strategies and failover mechanisms.
Please present the plan in a structured format, with checklists and contact information for key personnel.
Prompt 8: Security Compliance and Vulnerability Management
To maintain compliance and secure our systems, I need to establish a vulnerability management program. Could you help by:
- Developing a process for regular vulnerability scanning and assessment.
- Prioritizing vulnerabilities based on risk impact and likelihood.
- Recommending remediation strategies and tools (e.g., Nessus, OpenVAS) for patch management.
Provide the program outline in a policy document, including schedules and responsible parties.
Prompt 9: Continuous Integration and Continuous Deployment (CI/CD) Pipeline Optimization
To enhance our deployment efficiency, I aim to optimize our CI/CD pipeline. Could you assist in:
- Identifying bottlenecks in the current pipeline and suggesting improvements.
- Recommending tools for automated testing, code quality analysis, and deployment (e.g., Jenkins, GitLab CI/CD).
- Establishing best practices for version control, rollback procedures, and environment consistency.
Please provide the optimization plan in a detailed document with flowcharts illustrating the proposed pipeline.
Prompt 10: Implementation of Infrastructure as Code (IaC) Practices
To improve infrastructure management, I plan to implement Infrastructure as Code (IaC) practices. Could you help by:
- Selecting appropriate IaC tools (e.g., Terraform, CloudFormation) based on our technology stack.
- Developing templates for provisioning resources consistently across environments.
- Establishing version control and collaboration workflows for infrastructure code.
Provide the implementation plan in a guide format, including sample code snippets and best practices.