GPT Prompts.

Prompt 1: Incident Response Playbook Development

As an SRE at [Company], I am tasked with enhancing our incident response protocols. Could you assist in developing a comprehensive incident response playbook that includes:

  1. A detailed incident classification system (e.g., P1, P2, P3) with corresponding response procedures.
  2. Escalation paths and communication templates for internal and external stakeholders.
  3. Post-incident review processes to identify root causes and implement corrective actions.

Please provide the playbook in a structured format, incorporating industry best practices and real-world examples.


Prompt 2: Service Level Objectives (SLOs) and Error Budget Policy

In our efforts to balance feature development with system reliability, I need to establish clear Service Level Objectives (SLOs) and an error budget policy. Could you help by:

  1. Defining measurable SLOs for key services, including availability and latency targets.
  2. Creating an error budget framework that outlines acceptable risk levels and triggers for corrective actions.
  3. Suggesting monitoring tools and dashboards to track SLO compliance in real-time.

Provide the output as a policy document with practical examples and implementation guidelines.


Prompt 3: Automation of Repetitive Operational Tasks

To improve efficiency and reduce manual intervention, I aim to automate routine operational tasks. Could you assist in:

  1. Identifying high-impact tasks suitable for automation (e.g., log rotation, system health checks).
  2. Recommending automation tools (e.g., Ansible, Terraform) and scripting languages (e.g., Python, Bash) for implementation.
  3. Providing sample scripts or playbooks for automating these tasks.

Please present the information in a tabular format, detailing tasks, tools, and example scripts.


Prompt 4: Capacity Planning and Scalability Assessment

As part of our growth strategy, I need to conduct a capacity planning and scalability assessment. Could you help by:

  1. Developing a methodology to forecast resource requirements based on historical data and projected growth.
  2. Identifying potential bottlenecks in the current infrastructure and suggesting mitigation strategies.
  3. Recommending tools for capacity monitoring and predictive analysis (e.g., Prometheus, Grafana).

Provide the assessment in a report format, including charts and data-driven insights.


Prompt 5: Implementation of Chaos Engineering Practices

To enhance system resilience, I plan to introduce chaos engineering experiments. Could you assist in:

  1. Designing experiments to simulate failures (e.g., server crashes, network latency) in a controlled environment.
  2. Establishing safety measures to prevent unintended consequences during testing.
  3. Creating a framework for analyzing experiment outcomes and implementing improvements.

Please provide a step-by-step guide with examples of successful chaos engineering practices.


Prompt 6: Development of a Comprehensive Monitoring Strategy

I need to develop a monitoring strategy that provides end-to-end visibility into our systems. Could you help by:

  1. Identifying key performance indicators (KPIs) for infrastructure, applications, and user experience.
  2. Recommending monitoring tools (e.g., Nagios, Datadog) and log management solutions (e.g., ELK Stack).
  3. Designing alerting mechanisms with appropriate thresholds to minimize false positives.

Provide the strategy in a document format, including diagrams of the monitoring architecture.


Prompt 7: Disaster Recovery and Business Continuity Planning

To ensure preparedness for unforeseen events, I need to develop a disaster recovery and business continuity plan. Could you assist in:

  1. Conducting a risk assessment to identify critical systems and potential vulnerabilities.
  2. Defining recovery time objectives (RTOs) and recovery point objectives (RPOs) for essential services.
  3. Creating a step-by-step recovery procedure, including data backup strategies and failover mechanisms.

Please present the plan in a structured format, with checklists and contact information for key personnel.


Prompt 8: Security Compliance and Vulnerability Management

To maintain compliance and secure our systems, I need to establish a vulnerability management program. Could you help by:

  1. Developing a process for regular vulnerability scanning and assessment.
  2. Prioritizing vulnerabilities based on risk impact and likelihood.
  3. Recommending remediation strategies and tools (e.g., Nessus, OpenVAS) for patch management.

Provide the program outline in a policy document, including schedules and responsible parties.


Prompt 9: Continuous Integration and Continuous Deployment (CI/CD) Pipeline Optimization

To enhance our deployment efficiency, I aim to optimize our CI/CD pipeline. Could you assist in:

  1. Identifying bottlenecks in the current pipeline and suggesting improvements.
  2. Recommending tools for automated testing, code quality analysis, and deployment (e.g., Jenkins, GitLab CI/CD).
  3. Establishing best practices for version control, rollback procedures, and environment consistency.

Please provide the optimization plan in a detailed document with flowcharts illustrating the proposed pipeline.


Prompt 10: Implementation of Infrastructure as Code (IaC) Practices

To improve infrastructure management, I plan to implement Infrastructure as Code (IaC) practices. Could you help by:

  1. Selecting appropriate IaC tools (e.g., Terraform, CloudFormation) based on our technology stack.
  2. Developing templates for provisioning resources consistently across environments.
  3. Establishing version control and collaboration workflows for infrastructure code.

Provide the implementation plan in a guide format, including sample code snippets and best practices.