Stanford, California, United States
10K followers 500+ connections

Join to view profile

About

Christos Kozyrakis is a Professor of Electrical Engineering and Computer Science at…

Activity

Join now to see all activity

Experience & Education

  • Stanford University

View Christos’s full experience

See their title, tenure and more.

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Publications

  • Convolution engine: balancing efficiency & flexibility in specialized computing

    Proceedings of the 40th Annual International Symposium on Computer Architecture / ACM

  • Towards Energy-Proportional Datacenter Memory with Mobile DRAM

    Proceedings of the 39th Intl. Symposium on Computer Architecture

  • Decoupling Datacenter Studies from Access to Large-Scale Applications: A Modeling Approach for Storage Workloads

    IEEE International Symposium on Workload Characterization (IISWC)

    We propose a modeling and characterization framework for large-scale storage applications. As part of this framework we use a state diagram-based storage model, extend it to a hierarchical representation and implement a tool that consistently recreates I/O loads of DC applications. We present the principal features of the framework that allow accurate modeling and generation of storage workloads and the validation process performed against ten original DC applications traces. Furthermore, using…

    We propose a modeling and characterization framework for large-scale storage applications. As part of this framework we use a state diagram-based storage model, extend it to a hierarchical representation and implement a tool that consistently recreates I/O loads of DC applications. We present the principal features of the framework that allow accurate modeling and generation of storage workloads and the validation process performed against ten original DC applications traces. Furthermore, using our framework, we perform an indepth, per-thread characterization of these applications and provide insights on their behavior. Finally, we explore two practical applications of this methodology: SSD caching and defragmentation benefits on enterprise storage. In both cases we observe significant speedup for most of the examined applications. Since knowledge of the workload’s spatial and temporal locality is necessary to model these use cases, our framework was instrumental in quantifying their performance benefits. The proposed methodology provides a detailed understanding on the storage activity of large-scale applications and enables a wide spectrum of storage studies without the requirement for access to real applications and full application deployment.

    Other authors
    See publication
  • Accurate Modeling and Generation of Storage I/O for Datacenter workloads

    Exascale Evaluation and Research Techniques Workshop (EXERT 2011)

    Tools that confidently recreate I/O workloads have become a critical requirement in designing e?cient storage systems for datacenters (DCs), since potential ine?ciencies get aggregated over several thousand servers. Designing performance, power and cost optimized systems requires a deep understanding of target workloads, and mechanisms to effectively model different design choices. Traditional benchmarking is invalid in cloud datastores, representative storage profiles are hard to obtain…

    Tools that confidently recreate I/O workloads have become a critical requirement in designing e?cient storage systems for datacenters (DCs), since potential ine?ciencies get aggregated over several thousand servers. Designing performance, power and cost optimized systems requires a deep understanding of target workloads, and mechanisms to effectively model different design choices. Traditional benchmarking is invalid in cloud datastores, representative storage profiles are hard to obtain, while replaying the entire application in all storage configurations is impractical. Despite these issues, current workload generators are not comprehensive enough to accurately reproduce key aspects of real application patterns. Some of these features include spatial and temporal locality, as well as tuning the intensity of the workload to emulate different storage system behaviors.To address these limitations, we use a state diagram-based storage model, extend it to a hierarchical representation and implement a tool that consistently recreates I/O loads of
    DC applications. We present the design of the tool and the
    validation process performed against six original DC applications traces. We explore the practical applications of this methodology in two important storage challenges - 1) SSD caching and 2) defragmentation bene?ts on enterprise storage. In both cases we observe significant storage speedup for most of the DC applications. Since knowledge of the workload's spatial locality is necessary to model these use cases, our tool was instrumental in quantifying their performance benefits.

    Other authors
    See publication
  • Accurate Modeling and Generation of Storage I/O for Datacenter workloads

    Exascale Evaluation and Research Techniques Workshop (EXERT 2011)

    Tools that confidently recreate I/O workloads have become a critical requirement in designing e?cient storage systems for datacenters (DCs), since potential ine?ciencies get aggregated over several thousand servers. Designing performance, power and cost optimized systems requires a deep understanding of target workloads, and mechanisms to effectively model different design choices. Traditional benchmarking is invalid in cloud datastores, representative storage profiles are hard to obtain…

    Tools that confidently recreate I/O workloads have become a critical requirement in designing e?cient storage systems for datacenters (DCs), since potential ine?ciencies get aggregated over several thousand servers. Designing performance, power and cost optimized systems requires a deep understanding of target workloads, and mechanisms to effectively model different design choices. Traditional benchmarking is invalid in cloud datastores, representative storage profiles are hard to obtain, while replaying the entire application in all storage configurations is impractical. Despite these issues, current workload generators are not comprehensive enough to accurately reproduce key aspects of real application patterns. Some of these features include spatial and temporal locality, as well as tuning the intensity of the workload to emulate different storage system behaviors.To address these limitations, we use a state diagram-based storage model, extend it to a hierarchical representation and implement a tool that consistently recreates I/O loads of
    DC applications. We present the design of the tool and the
    validation process performed against six original DC applications traces. We explore the practical applications of this methodology in two important storage challenges - 1) SSD caching and 2) defragmentation bene?ts on enterprise storage. In both cases we observe significant storage speedup for most of the DC applications. Since knowledge of the workload's spatial locality is necessary to model these use cases, our tool was instrumental in quantifying their performance benefits.

    Other authors
    See publication
  • Server Engineering Insights for Large Scale Online Services

    IEEE Micro Issue on Datacenter Computing, Vol. 30, No. 4

    The rapid growth of online services in the last decade has led to the development of large datacenters to host these workloads. These large scale online, user-facing services have unique enginerering and capacity provisioning design requirements. The authors explore these requirements focusing on systems balancing, the impact of technology trends and the challenges of online services workloads.

    Other authors
  • Eigenbench: A Simple Exploration Tool for Orthogonal TM Characteristics

    IISWC'10 (best paper award)

    Other authors
  • Comparing memory systems for chip multiprocessors

    Proceedings of the 34th annual international symposium on Computer architecture (ISCA '07)

    There are two basic models for the on-chip memory in CMP systems:hardware-managed coherent caches and software-managed streaming memory. This paper performs a direct comparison of the two modelsunder the same set of assumptions about technology, area, and computational capabilities. The goal is to quantify how and when they differ in terms of performance, energy consumption, bandwidth requirements, and latency tolerance for general-purpose CMPs. We demonstrate that for data-parallel…

    There are two basic models for the on-chip memory in CMP systems:hardware-managed coherent caches and software-managed streaming memory. This paper performs a direct comparison of the two modelsunder the same set of assumptions about technology, area, and computational capabilities. The goal is to quantify how and when they differ in terms of performance, energy consumption, bandwidth requirements, and latency tolerance for general-purpose CMPs. We demonstrate that for data-parallel applications, the cache-based and streaming models perform and scale equally well. For certain applications with little data reuse, streaming scales better due to better bandwidth use and macroscopic software prefetching. However, the introduction of techniques such as hardware prefetching and non-allocating stores to the cache-based model eliminates the streaming advantage. Overall, our results indicate that there is not sufficient advantage in building streaming memory systems where all on-chip memory structures are explicitly managed. On the other hand, we show that streaming at the programming model level is particularly beneficial, even with the cache-based model, as it enhances locality and creates opportunities for bandwidth optimizations. Moreover, we observe that stream programming is actually easier with the cache-based model because the hardware guarantees correct, best-effort execution even when the programmer cannot fully regularize an application's code.

    Other authors
    See publication
  • Evaluating MapReduce for Multi-core and Multiprocessor Systems

    Won best paper award at the IEEE High Performance Computer Architecture conference

    Other authors
    See publication
Join now to see all publications

Patents

  • Low power programmable image processor

    Issued US 14/492,535

    A convolution image processor includes a load and store unit, a shift register unit, and a mapping unit. The load and store unit is configured to load and store image pixel data and allow for unaligned access of the image pixel data. The shift register is configured to load and store at least a portion of the image pixel data from the load and store unit and concurrently provide access to each image pixel value in the portion of the image pixel data. The mapping unit is configured to generate a…

    A convolution image processor includes a load and store unit, a shift register unit, and a mapping unit. The load and store unit is configured to load and store image pixel data and allow for unaligned access of the image pixel data. The shift register is configured to load and store at least a portion of the image pixel data from the load and store unit and concurrently provide access to each image pixel value in the portion of the image pixel data. The mapping unit is configured to generate a number of shifted versions of image pixel data and corresponding stencil data from the portion of the image pixel data, and concurrently perform one or more operations on each image pixel value in the shifted versions of the portion of the image pixel data and a corresponding stencil value in the corresponding stencil data.

    See patent

More activity by Christos

View Christos’ full profile

  • See who you know in common
  • Get introduced
  • Contact Christos directly
Join to view full profile

Other similar profiles

Explore top content on LinkedIn

Find curated posts and insights for relevant topics all in one place.

View top content

Add new skills with these courses