The Internet is a critical infrastructure supporting our information-based society. Our dependence on this infrastructure is similar to our dependence on the world's power grids and global transportation systems. Failures of the network infrastructure or major applications running on it can have an enormous financial and social cost, with serious consequences to the organizations and consumers that depend on these services. Supporting reliable networks and networked application services involves some of the most complex engineering and operational challenges that are dealt with in any industry. This unique and valuable guide addresses the challenges faced by service providers and the approaches they use to deliver reliable services to their users. The book offers a systematic, inter-disciplinary approach and coverage of practical problems arising in real, operational deployments. Leading practitioners and researchers present their unique perspectives on the subject matter, and provide best practices to help translate ideas into practice. Topics and features:Introduces the challenges of building reliable networks and services, presenting an overview of the structure of a large ISP backbone networkExamines network reliability modeling and network planning, covering router elements and their failure modes, and providing a theoretical grounding in performance and reliability modelingInvestigates inter-domain reliability and overlay networks, presenting existing solutions and possible future research directions, and reviewing how overlay applications can increase network resilienceExplores the critical function of network configuration management, and considers auditing from the perspective of bottom-up, network-wide configuration validationDiscusses network measurement, with a focus on reliability and performance monitoring, covering both data-plane and control-plane measurementsCovers network management systems and the tasks involved in supporting day-to-day operations of an IP network, including fault diagnosis, trouble-shooting, security management, and disaster recoveryPresents an approach to the design and development of reliable network application software, and provides a comprehensive overview of server capacity and performance engineeringThis comprehensive text/reference on reliable Internet services and applications is suitable for an advanced undergraduate or graduate course, and will be of value to researchers seeking to understand the challenges faced by service providers and to identify areas that are ripe for research. The guidebook will be particularly useful to practitioners who want to broaden their understanding of the field, and/or to deepen their knowledge of the fundamentals.Charles R. Kalmanek is Vice President of Networking and Services Research in AT&T Labs. Dr. Sudip Misra is an Assistant Professor at the School of IT, Indian Institute of Technology Kharagpur. Dr. Y. Richard Yang is an Associate Professor of Computer Science and Electrical Engineering at Yale University.