OpenCL Actors – Adding Data Parallelism to Actor-Based Programming with CAF

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10789))

572 Accesses
5 Citations
1 Altmetric

Abstract

The actor model of computation has been designed for a seamless support of concurrency and distribution. However, it remains unspecific about data parallel program flows, while available processing power of modern many core hardware such as graphics processing units (GPUs) or coprocessors increases the relevance of data parallelism for general-purpose computation.

In this work, we introduce OpenCL-enabled actors to the C++ Actor Framework (CAF). This offers a high level interface for accessing any OpenCL device without leaving the actor paradigm. The new type of actor is integrated into the runtime environment of CAF and gives rise to transparent message passing in distributed systems on heterogeneous hardware. Following the actor logic in CAF, OpenCL kernels can be composed while encapsulated in C++ actors, hence operate in a multi-stage fashion on data resident at the GPU. Developers are thus enabled to build complex data parallel programs from primitives without leaving the actor paradigm, nor sacrificing performance. Our evaluations on commodity GPUs, an Nvidia TESLA, and an Intel PHI reveal the expected linear scaling behavior when offloading larger workloads. For sub-second duties, the efficiency of offloading was found to largely differ between devices. Moreover, our findings indicate a negligible overhead over programming with the native OpenCL API.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices

Compiling and Optimizing OpenMP 4.X Programs to OpenCL and SPIR

Research on Parallel Architecture of OpenCL-Based FPGA

Notes

1.
https://github.com/boostorg/compute (Feb. 2017).
2.
https://github.com/ddemidov/vexcl (Feb. 2017).
3.
http://msdn.microsoft.com/en-us/library/hh265136.aspx (Feb. 2017).

References

ACM: Result and Artifact Review and Badging, January 2017. http://acm.org/publications/policies/artifact-review-badging
Agha, G.: Actors: A Model of Concurrent Computation In Distributed Systems. MIT Press, Cambridge (1986)
Google Scholar
Agha, G., Mason, I.A., Smith, S., Talcott, C.: Towards a theory of actor computation. In: Cleaveland, W.R. (ed.) CONCUR 1992. LNCS, vol. 630, pp. 565–579. Springer, Heidelberg (1992). https://doi.org/10.1007/BFb0084816
Chapter Google Scholar
AMD: Aparapi, February 2017. http://aparapi.github.io
Armstrong, J.: Making Reliable Distributed Systems in the Presence of Software Errors. Ph.D. thesis, Department of Microelectronics and Information Technology, KTH, Sweden (2003)
Google Scholar
Armstrong, J.: A history of erlang. In: Proceedings of the Third ACM SIGPLAN Conference on History of Programming Languages (HOPL III), pp. 6-1–6-26. ACM, New York (2007)
Google Scholar
Billeter, M., Olsson, O., Assarsson, U.: Efficient stream compaction on wide SIMD many-core architectures. In: Proceedings of the Conference on High Performance Graphics 2009, HPG 2009, pp. 159–166. ACM, New York, August 2009
Google Scholar
Blythe, D.: The Direct3D 10 system. In: ACM SIGGRAPH 2006 Papers, SIGGRAPH 2006, pp. 724–734. ACM, New York (2006)
Google Scholar
Breitbart, J.: CuPP - a framework for easy CUDA integration. In: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS 2009), pp. 1–8. IEEE Computer Society, Washington (2009)
Google Scholar
CAPS: Cray Inc., NVIDIA and the Portland Group. The OpenACC Application Programming Interface, v1.0, November 2011
Google Scholar
Charousset, D., Hiesgen, R., Schmidt, T.C.: CAF - the C++ actor framework for scalable and resource-efficient applications. In: Proceedings of the 5th ACM SIGPLAN Conference on Systems, Programming, and Applications (SPLASH 2014), Workshop AGERE! pp. 15–28. ACM, New York, October 2014
Google Scholar
Charousset, D., Hiesgen, R., Schmidt, T.C.: Revisiting actor programming in C++. Comput. Lang. Syst. Struct. 45, 105–131 (2016). https://doi.org/10.1016/j.cl.2016.01.002
Article Google Scholar
Charousset, D., Schmidt, T.C., Hiesgen, R., Wählisch, M.: Native actors - a scalable software platform for distributed, heterogeneous environments. In: Proceedings of the 4th ACM SIGPLAN Conference on Systems, Programming, and Applications (SPLASH 2013), Workshop AGERE! pp. 87–96. ACM, New York, October 2013
Google Scholar
Clucas, R., Levitt, S.: CAPP: a C++ aspect-oriented based framework for parallel programming with OpenCL. In: Proceedings of the 2015 Annual Conference on South African Institute of Computer Scientists and Information Technologists (SAICSIT 2015), pp. 10:1–10:10. ACM, New York (2015)
Google Scholar
Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. Comput. Sci. Eng. IEEE 5(1), 46–55 (1998)
Article Google Scholar
Deliège, F., Pedersen, T.B.: Position list word aligned hybrid: optimizing space and performance for compressed bitmaps. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 228–239. EDBT 2010. ACM, New York, March 2010
Google Scholar
Desell, T., Varela, C.A.: SALSA lite: a hash-based actor runtime for efficient local concurrency. In: Agha, G., et al. (eds.) Concurrent Objects and Beyond. LNCS, vol. 8665, pp. 144–166. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44471-9_7
Chapter Google Scholar
Fang, J., Varbanescu, A.L., Sips, H.: A comprehensive performance comparison of CUDA and OpenCL. In: Parallel Processing (ICPP), pp. 216–225 (2011)
Google Scholar
Fusco, F., Vlachos, M., Dimitropoulos, X., Deri, L.: Indexing million of packets per second using GPUs. In: Proceedings of the 2013 Conference on Internet Measurement Conference (IMC 2013), pp. 327–332. ACM, New York, October 2013
Google Scholar
Harvey, P., Hentschel, K., Sventek, J.: Parallel programming in actor-based applications via OpenCL. In: The 16th International Conference on Middleware. ACM, New York, December 2015
Google Scholar
Hewitt, C., Bishop, P., Steiger, R.: A universal modular ACTOR formalism for artificial intelligence. In: Proceedings of the 3rd IJCAI, pp. 235–245. Morgan Kaufmann Publishers Inc., San Francisco (1973)
Google Scholar
Hiesgen, R., Charousset, D., Schmidt, T.C.: Manyfold actors: extending the C++ actor framework to heterogeneous many-core machines using OpenCL. In: Proceedings of the 6th ACM SIGPLAN Conference on Systems, Programming, and Applications (SPLASH 2015), Workshop AGERE! pp. 45–56. ACM, New York, October 2015
Google Scholar
Intel: Intel Xeon PhiTM Coprocessor x100 Product Family Datasheet, February 2017. http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-coprocessor-datasheet.html
Kale, L.V., Krishnan, S.: Charm++: parallel programming with message-driven objects. In: Parallel Programming Using C++, pp. 175–213 (1996)
Google Scholar
Kirk, D.B., Hwu, W.m.W.: Programming Massively Parallel Processors, A Hands-on Approach, 2nd edn. Morgan Kaufmann, San Francisco (2013)
Google Scholar
Klöckner, A., Pinto, N., Lee, Y., Catanzaro, B., Ivanov, P., Fasih, A.: PyCUDA and PyOpenCL: a scripting-based approach to GPU run-time code generation. Parallel Comput. 38(3), 157–174 (2012)
Article Google Scholar
Krieder, S.J., et al.: Design and evaluation of the GeMTC framework for GPU-enabled many-task computing. In: Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, HPDC 2014, pp. 153–164. ACM, New York (2014)
Google Scholar
Howes, L., Rovatsou, M.: SYCL integrates OpenCL devices with modern C++. Khronos Group, February 2017
Google Scholar
Lindholm, E., Kilgard, M.J., Moreton, H.: A user-programmable vertex engine. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 2001, pp. 149–158. ACM, New York (2001)
Google Scholar
Medina, D.S., St-Cyr, A., Warburton, T.: OCCA: A unified approach to multi-threading languages. ArXiv e-prints, March 2014
Google Scholar
Munshi, A.: The OpenCL Specification. Khronos OpenCL Working Group, Khronos Group (2012). http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf, Version 1.2, Revision 19
Munshi, A., Howes, L.: The OpenCL Specification. Khronos OpenCL Working Group, Khronos Group (2015). https://www.khronos.org/registry/OpenCL/specs/opencl-2.0.pdf, Version 2.0, Revision 29
Nickolls, J., Dally, W.J.: The GPU computing era. IEEE Micro 30(2), 56–69 (2010)
Article Google Scholar
NVIDIA: Tesla C2075 Computing Processor Board (Board Specification), February 2017
Google Scholar
OpenACC-standard.org: The OpenACC Application Programming Interface, February 2017
Google Scholar
Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU computing. Proc. IEEE 96(5), 879–899 (2008)
Article Google Scholar
Scarpino, M.: OpenCL in Action: How to Accelerate Graphics and Computation. Manning Publications Company, Manning Publication Co., 20 Baldwin Road, Shelter Island, NY 11964 (2011)
Google Scholar
Scheitle, Q., Wählisch, M., Gasser, O., Schmidt, T.C., Carle, G.: Towards an ecosystem for reproducible research in computer networking. In: Proceedings of ACM SIGCOMM Reproducibility Workshop. ACM, New York, August 2017
Google Scholar
Sorensen, T., Donaldson, A.F., Batty, M., Gopalakrishnan, G., Rakamarić, Z.: Portable inter-workgroup barrier synchronisation for GPUs. In: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016, pp. 39–58. ACM, New York (2016)
Google Scholar
Srinivasan, S., Mycroft, A.: Kilim: isolation-typed actors for java. In: Vitek, J. (ed.) ECOOP 2008. LNCS, vol. 5142, pp. 104–128. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-70592-5_6
Chapter Google Scholar
The Khronos Group: The Khronos Group, February 2017. http://www.khronos.org/
Typesafe Inc.: Akka Framework, August 2017. http://akka.io
Vallentin, M., Paxson, V., Sommer, R.: VAST: a unified platform for interactive network forensics. In: Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI), March 2016
Google Scholar
Wienke, S., Springer, P., Terboven, C., an Mey, D.: OpenACC — first experiences with real-world applications. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 859–870. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32820-6_85
Chapter Google Scholar
Wu, K., Otoo, E.J., Shoshani, A.: Optimizing bitmap indices with efficient compression. ACM Trans. Database Syst. 31(1), 1–38 (2006)
Article Google Scholar

Download references

Acknowledgments

The authors would like to thank Marian Triebe and Sebastian Bartels for implementing benchmarks, testing, and bugfixing. We further want to thank Matthias Vallentin for raising the indexing use case, and the iNET working group for vivid discussions and inspiring suggestions. Funding by the German Federal Ministry of Education and Research within the projects ScaleCast and X–CHECK is gratefully acknowledged.

Author information

Authors and Affiliations

Department Computer Science, Hamburg University of Applied Sciences, Hamburg, Germany
Raphael Hiesgen, Dominik Charousset & Thomas C. Schmidt

Authors

Raphael Hiesgen
View author publications
Search author on:PubMed Google Scholar
Dominik Charousset
View author publications
Search author on:PubMed Google Scholar
Thomas C. Schmidt
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Thomas C. Schmidt .

Editor information

Editors and Affiliations

University of Bologna, Cesena, Italy
Alessandro Ricci
KTH Royal Institute of Technology, Stockholm, Sweden
Philipp Haller

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hiesgen, R., Charousset, D., Schmidt, T.C. (2018). OpenCL Actors – Adding Data Parallelism to Actor-Based Programming with CAF. In: Ricci, A., Haller, P. (eds) Programming with Actors. Lecture Notes in Computer Science(), vol 10789. Springer, Cham. https://doi.org/10.1007/978-3-030-00302-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-00302-9_3
Published: 07 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00301-2
Online ISBN: 978-3-030-00302-9
eBook Packages: Computer ScienceComputer Science (R0)

Keywords

Publish with us

Policies and ethics