San Francisco Bay Area
3K followers 500+ connections

Join to view profile

About

Management and technical leadership experiences in AI powered enterprise software, cloud…

Activity

Join now to see all activity

Experience & Education

  • Salesforce

View Jason’s full experience

See their title, tenure and more.

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Volunteer Experience

  • Salesforce Corporate Volunteer in Tanzania, Africa

    Make A Difference Now ( http://www.gomadnow.org/ )

    - 3 years 8 months

    Education

    Improve education in Tanzania by helping build a school library, updating tech center with laptops, tablets, and wifi, holding technology workshops with students and teachers. Spent time in Tanzania with MAD supported orphanage children with reading, Q&A, and sports activities. Later on, sponsored MAD delegation to Salesforce Dreamforce conference.

  • Cisco Corporate Sponsor to Texas A&M University

    Cisco University Research Program

    - 1 year 4 months

    Science and Technology

    Sponsored 2 Cisco research grants to Department of Computer Science @ Texas A&M University. Hosted intern students from Texas A&M.

  • Board member, VP, co-President, President

    Fudan University Alumni Association in Northern California (FDAANC)

    - 6 years

    1000+ members

Publications

  • Performance Characterization of Multi-thread and Multi-core Processors based XML Application Oriented Networking Systems.

    Journal of Parallel and Distributed Computing, Volume 70, Issue 1

    There is a growing trend to insert application intelligence into network devices. Processors in this type of Application Oriented Networking (AON) devices are required to handle both packet-level network I/O intensive operations as well as XML message-level CPU intensive operations. In this paper, we investigate the performance effect of symmetric multi-processing (SMP) via (1) hardware multi-threading, (2) uni-processor to dual-processor architectures, and (3) single to dual and quad core…

    There is a growing trend to insert application intelligence into network devices. Processors in this type of Application Oriented Networking (AON) devices are required to handle both packet-level network I/O intensive operations as well as XML message-level CPU intensive operations. In this paper, we investigate the performance effect of symmetric multi-processing (SMP) via (1) hardware multi-threading, (2) uni-processor to dual-processor architectures, and (3) single to dual and quad core processing, on both packet-level and XML message-level traffic. We use AON systems based on Intel Xeon processors with hyperthreading, Pentium M based dual-core processors, and Intel's dual quad-core Xeon E5335 processors. We analyze and cross-examine the SMP effect from both high level performance as well as processor microarchitectural perspectives. The evaluation results will not only provide insight to microprocessor designers, but also help system architects of AON types of device to select the right processors.

    Other authors
    See publication
  • Virtual Machine Scalability on Multi-Core Processor Based Servers for Cloud Computing Workloads

    IEEE International Conference on Networking, Architecture, and Storage (IEEE NAS-2009)

    In this paper, we analyze virtual machine (VM) scalability on multi-core systems for compute-, memory-, and network I/O-intensive workloads. The VM scalability evaluation under these three workloads will help cloud users to understand the performance impact of underlying system and network architectures. We demonstrate that VMs on the state-of-the-art multi-core processor based systems scale as well as multiple threads on native SMP kernel for CPU and memory intensive workloads. Intra-VM…

    In this paper, we analyze virtual machine (VM) scalability on multi-core systems for compute-, memory-, and network I/O-intensive workloads. The VM scalability evaluation under these three workloads will help cloud users to understand the performance impact of underlying system and network architectures. We demonstrate that VMs on the state-of-the-art multi-core processor based systems scale as well as multiple threads on native SMP kernel for CPU and memory intensive workloads. Intra-VM communication of network I/O intensive TCP message workload has a lower overhead compared to multiple threads when VMs are pinned to specific cores. However, VM scalability is severely limited for such workloads for across-VM communication on a single host due to virtual bridges. For across local and wide area network communication, the network bandwidth is the limiting factor. Unlike previous studies that use workload mixes, we apply a single workload type at a time to clearly attribute VM scalability bottlenecks to system and network architectures or virtualization itself.

    Other authors
    See publication
  • A Scalable Multithreaded L7-filter Design for Multi-Core Servers

    ACM/IEEE Symposium on Architectures for Networking & Communications Systems (ANCS)

    L7-filter is a significant component in Linux's QoS framework that classifies network traffic based on application layer data. It enables subsequent distribution of network resources in respect to the priority of applications. Considerable research has been reported to deploy multi-core architectures for computationally intensive applications. Unfortunately, the proliferation of multi-core architectures has not helped fast packet processing due to: 1) the lack of efficient parallelism in legacy…

    L7-filter is a significant component in Linux's QoS framework that classifies network traffic based on application layer data. It enables subsequent distribution of network resources in respect to the priority of applications. Considerable research has been reported to deploy multi-core architectures for computationally intensive applications. Unfortunately, the proliferation of multi-core architectures has not helped fast packet processing due to: 1) the lack of efficient parallelism in legacy network programs, and 2) the non-trivial configuration for scalable utilization on multi-core servers. In this paper, we propose a highly scalable parallelized L7-filter system architecture with affinity-based scheduling on a multi-core server. We start with an analytical study of the system architecture based on an offline design. Similar to Receive Side Scaling (RSS) in the NIC, we develop a model to explore the connection level parallelism in L7-filter and propose an affinity-based scheduler to optimize system scalability. Performance results show that our optimized L7-filter has superior scalability over the naive multithreaded version. It improves system performance by about 50% when all the cores are deployed.

    Other authors
    See publication
  • Benchmarking Stream-Based XPath Engines Supporting Simultaneous Queries for Service Oriented Networking

    IEEE International Conference on Global Telecommunications (IEEE GLOBECOM'2008)

    Stream-based simultaneous XPath processing plays a critical role in service oriented networking, where the processing must scale well in terms of concurrent input streams and number of XPath queries. However, there are no benchmarks or evaluation methodology in existing literatures that benchmark stream-based XPath engines supporting simultaneous queries. In this paper, we describe a novel benchmarking methodology for evaluating XPath engines which handle simultaneous queries on streaming…

    Stream-based simultaneous XPath processing plays a critical role in service oriented networking, where the processing must scale well in terms of concurrent input streams and number of XPath queries. However, there are no benchmarks or evaluation methodology in existing literatures that benchmark stream-based XPath engines supporting simultaneous queries. In this paper, we describe a novel benchmarking methodology for evaluating XPath engines which handle simultaneous queries on streaming traffics. With structured data model, query model, and control model in our benchmark, we conduct well controlled experiments to assess and isolate various performance factors. We also demonstrate that our structured, quantified approach with wide data set coverage, enables accurate performance measurements, and easy bottleneck isolation of a real-world XPath engine implementation.

    Other authors
    See publication
  • XML Document Parsing: Operational and Performance Characteristics

    IEEE COMPUTER Magazine

    Parsing is an expensive operation that can degrade XML processing performance. A survey of four representative XML parsing models, DOM, SAX, StAX, and VTD, reveals their suitability for different types of applications.

    Other authors
    See publication
  • A Novel Service-Aware Message Scheduler for Cisco Application Oriented Networking Systems

    International Conference on Computer Communications and Networks (IEEE ICCCN-08)

    Cisco Systems application oriented networking (AON) product is an important network element towards building next generation service oriented intelligent information network (IIN). AON processes application-level content and moves far beyond a conventional content-aware Web switch. It creates a novel content delivery platform and allows more sophisticated load balancing schemes to be deployed in a switch among back-end servers to reduce user-perceived response time. In this paper, we…

    Cisco Systems application oriented networking (AON) product is an important network element towards building next generation service oriented intelligent information network (IIN). AON processes application-level content and moves far beyond a conventional content-aware Web switch. It creates a novel content delivery platform and allows more sophisticated load balancing schemes to be deployed in a switch among back-end servers to reduce user-perceived response time. In this paper, we investigate different scheduling techniques for an AON system to maximize overall throughput and minimize latency per message for a heterogeneous server cluster consisting of different application servers. Based on a thorough evaluation of the three existing load balancing algorithms of AON, we propose a novel message type based service adaptive scheduling algorithm that makes AON more efficient and more intelligent. Systematic performance measurements, analyses, and comparisons are conducted to demonstrate the superiority of our intelligent message scheduling technique.

    Other authors
    See publication
  • Dual Processor Performance Characterization for XML Application-Oriented Networking

    International Conference on Parallel Processing (ICPP-2007)

    There is a growing trend to insert application intelligence into network devices. Processors in this type of application-oriented networking (AON) devices are required to handle both packet-level network I/O intensive operations as well as XML message-level CPU intensive operations. In this paper, we investigate performance effect of dual processing via (1) hyperthreading, (2) uni-processor to dual- processor, and (3) single-core to dual-core, on both packet-level and XML message-level traffic.…

    There is a growing trend to insert application intelligence into network devices. Processors in this type of application-oriented networking (AON) devices are required to handle both packet-level network I/O intensive operations as well as XML message-level CPU intensive operations. In this paper, we investigate performance effect of dual processing via (1) hyperthreading, (2) uni-processor to dual- processor, and (3) single-core to dual-core, on both packet-level and XML message-level traffic. We analyze and cross-examine the dual processing effect from both high-level performance as well as processor microarchitectural perspectives. We employ on-chip performance counters to measure cycles per instruction, cache misses, bus utilization, and branch miss predictions for this work. Our results show a significant improvement in dual-core Pentium M processor over Hyperthreaded Xeon processor for AON workload. These results will not only provide insight to processor designers, but also help architects of AON devices to select from alternative processors with restrictions to use one or two physical CPUs due to space and power consumption limitations.

    Other authors
    See publication
  • Performance Characterization of the Pentium Pro Processor

    IEEE International Symposium on High Performance Computer Architecture (IEEE HPCA-3)

    In this paper, we characterize the performance of several business and technical benchmarks on a Pentium Pro (1st-gen Xeon) processor based system. Key architectural data are collected using a performance monitoring counter tool and then are analyzed with the understanding of Pentium Pro microarchitecture and cache/memory organization. Results show that the Pentium Pro processor achieves significantly lower cycles per instruction than the Pentium processor due to its out-of-order and…

    In this paper, we characterize the performance of several business and technical benchmarks on a Pentium Pro (1st-gen Xeon) processor based system. Key architectural data are collected using a performance monitoring counter tool and then are analyzed with the understanding of Pentium Pro microarchitecture and cache/memory organization. Results show that the Pentium Pro processor achieves significantly lower cycles per instruction than the Pentium processor due to its out-of-order and speculative execution, and non-blocking cache and memory system. Its higher clock frequency also contributes to even higher performance

    Other authors
    See publication
  • Finite Buffer Analysis of Multistage Interconnection Networks

    IEEE Transactions on Computers, Vol 43, No 2

    Proposes an analysis technique for a class of Multistage Interconnection Networks (MIN's) that have finite buffers at their switch inputs and operate in a synchronous packet-switched mode. The authors examine the issue of clock period in design and analysis of synchronous MIN's and propose a model based on small clock periods. Then they analyze their "small cycle" design and compare the results with those obtained from the standard "big cycle" model that is currently used. The significant…

    Proposes an analysis technique for a class of Multistage Interconnection Networks (MIN's) that have finite buffers at their switch inputs and operate in a synchronous packet-switched mode. The authors examine the issue of clock period in design and analysis of synchronous MIN's and propose a model based on small clock periods. Then they analyze their "small cycle" design and compare the results with those obtained from the standard "big cycle" model that is currently used. The significant performance improvement of their model is shown based on various clock width, data width, and buffer length.

    Other authors
    • Laxmi Bhuyan
    See publication
  • An Adaptive Submesh Allocation Strategy for Two-Dimensional Mesh Connected Systems

    International Conference on Parallel Processing (ICPP-1993)

    In this paper, we propose an adaptive scan (AS) strategy for submesh allocation. The earlier frame sliding (FS) strategy allocates submeshes based on fixed orientations of incoming tasks. It also slides frames on mesh planes by fixed strides. Our AS allocation strategy differs from the FS strategy in the following two ways: (1) it does not fix the orientations of incoming tasks; (2) it scans on mesh planes adaptively. Experimental studies show that our AS strategy outperforms the FS strategy in…

    In this paper, we propose an adaptive scan (AS) strategy for submesh allocation. The earlier frame sliding (FS) strategy allocates submeshes based on fixed orientations of incoming tasks. It also slides frames on mesh planes by fixed strides. Our AS allocation strategy differs from the FS strategy in the following two ways: (1) it does not fix the orientations of incoming tasks; (2) it scans on mesh planes adaptively. Experimental studies show that our AS strategy outperforms the FS strategy in terms of external fragmentation, completion time, and processor utilization.

    Other authors
    • Laxmi Bhuyan
    See publication

Patents

  • Systems and Methods for Determining Optimal Cost-to-Serve for Cloud Applications in the Public Cloud

    Issued US20210141708

    Systems and methods for an elastic cost-to-serve system including a first module to orchestrate an elastic server set; a second module to orchestrate a load test and to apply one or more use-case scenarios for each orchestrated server set; a third module to generate a cost metrics model of the orchestrated server set for predictive cost modeling; a fourth module coupled to the third module to collect a plurality of performance metrics across the server resources and associated client devices; a…

    Systems and methods for an elastic cost-to-serve system including a first module to orchestrate an elastic server set; a second module to orchestrate a load test and to apply one or more use-case scenarios for each orchestrated server set; a third module to generate a cost metrics model of the orchestrated server set for predictive cost modeling; a fourth module coupled to the third module to collect a plurality of performance metrics across the server resources and associated client devices; a fifth module to post-process the collected performance metrics across a load testing duration and to provide analytics of the server set performance; and a sixth module coupled to analyze the performance metrics adapting available resources and to apply a heuristic of the cost metrics model to predict a model of cost optimization of the server set.

    See patent
  • Priority-driven boxcarring of action requests from component-driven cloud applications

    Issued US20180007166A1

    Improved perceived load time for browser and mobile application pages is achieved by adjusting boxcarring of action requests from coupled data consuming applications on the user device, using the priority level of regions and components in component-driven cloud applications. Priority labels differentiate among display regions rendered by the data consuming application and the priority labels further differentiate among components within respective display regions. The middleware application…

    Improved perceived load time for browser and mobile application pages is achieved by adjusting boxcarring of action requests from coupled data consuming applications on the user device, using the priority level of regions and components in component-driven cloud applications. Priority labels differentiate among display regions rendered by the data consuming application and the priority labels further differentiate among components within respective display regions. The middleware application batches the action requests into batches based at least in part on the priority labels, into boxcars segregated by priority label according to a predetermined segregation schedule, and dispatches the boxcars of batched action requests to the server. Performance is also dynamically speeded up, by adjusting inter-boxcar intervals used to dispatch batches of action requests from the user device to a production server, based on the dynamically measured network communication latency between the user device and the server.

    See patent
  • Dynamic adjustment of boxcarring of action requests from component-driven cloud applications

    Issued US20180007165A1

    Performance of web pages and mobile device applications with multiple components rendered on a user device is dynamically speeded up, including dynamically measuring network communication latency, adjusting inter-boxcar intervals used to dispatch batches of action requests from the user device to a production server, and dispatching boxcarred requests to the server. Adjustments to the boxcar intervals are based on the dynamically measured network communication latency and a number of…

    Performance of web pages and mobile device applications with multiple components rendered on a user device is dynamically speeded up, including dynamically measuring network communication latency, adjusting inter-boxcar intervals used to dispatch batches of action requests from the user device to a production server, and dispatching boxcarred requests to the server. Adjustments to the boxcar intervals are based on the dynamically measured network communication latency and a number of connections supported between the user device and the server. The measured network communication latency is calculated as dispatch-to-completed response time minus server processing time and the server processing time is received from the server for a boxcar of completed responses. The system adjusts according to feedback received, as a browser or mobile device changes network connections or the network conditions change, and adapting over time for a particular user. Inter-boxcar intervals are tunable and programmatically changeable, with values learned from experience.

    See patent
  • CHARACTERIZATION OF NETWORK LATENCY USING BOXCARRING OF ACTION REQUESTS FROM COMPONENT-DRIVEN CLOUD APPLICATIONS

    Filed US20190230192

    Method embodiments are disclosed for characterizing network latency for a component of a webpage provided by an application server device, using boxcarring of action requests. The method comprises measuring the network latency for a component provided by an application server device. A latency category is established based on the network latency. An action request of a user occurring within a queue wait time is associated with the latency category. The action request of the user associated with…

    Method embodiments are disclosed for characterizing network latency for a component of a webpage provided by an application server device, using boxcarring of action requests. The method comprises measuring the network latency for a component provided by an application server device. A latency category is established based on the network latency. An action request of a user occurring within a queue wait time is associated with the latency category. The action request of the user associated with the latency category is enqueued into an enqueued action request, which is batched in a boxcar to create a batched action request. The batched action request is dispatched in the boxcar to the application server device. The queue wait time is adapted based on an updated network latency and a transmission status of the action request of the user and the batched action request. System and computer program product embodiments are also disclosed.

    See patent
  • Facilitating dynamic creation of multi-column index tables and management of customer queries in an on-demand services environment

    Filed US20140317093A1

    In accordance with embodiments, there are provided mechanisms and methods for facilitating dynamic creation of multi-column index tables and management of customer queries in an on-demand services environment in a multi-tenant environment according to one embodiment. In one embodiment and by way of example, a method includes receiving, at a computing device, a query having one or more filters relating to one or more data type columns of database at a primary table. The primary table may include…

    In accordance with embodiments, there are provided mechanisms and methods for facilitating dynamic creation of multi-column index tables and management of customer queries in an on-demand services environment in a multi-tenant environment according to one embodiment. In one embodiment and by way of example, a method includes receiving, at a computing device, a query having one or more filters relating to one or more data type columns of database at a primary table. The primary table may include an object table. The method may further include calculating a hash number based on an index identifier corresponding to the one or more filters, and determining a first key at a secondary table based on the calculated hash number. The secondary table may include an index table, and the first key may be mapped with a second key corresponding to one or more rows at the primary table. The method may further include obtaining data from the one or more rows of the primary table, where the data includes filtered data corresponding to the one or more data type columns.

    See patent

Projects

Recommendations received

14 people have recommended Jason

Join now to view

More activity by Jason

View Jason’s full profile

  • See who you know in common
  • Get introduced
  • Contact Jason directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Jason Ding in United States

Add new skills with these courses