BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20220812T074334Z
LOCATION:Osaka Room
DTSTART;TZID=Europe/Stockholm:20220627T111500
DTEND;TZID=Europe/Stockholm:20220627T114500
UID:submissions.pasc-conference.org_PASC22_sess174_pap113@linklings.com
SUMMARY:Parallel Memory-Efficient Computation of Symmetric Higher-Order Jo
 int Moment Tensors
DESCRIPTION:Paper\n\nParallel Memory-Efficient Computation of Symmetric Hi
 gher-Order Joint Moment Tensors\n\nLi, Kolla, Phipps\n\nThe decomposition 
 of higher-order joint cumulant tensors of spatio-temporal data sets is use
 ful in analyzing multi-variate non-Gaussian statistics with a wide variety
  of applications (e.g. anomaly detection, independent component analysis, 
 dimensionality reduction). Computing the cumulant tensor often requires co
 mputing the joint moment tensor of the input data first, which is very exp
 ensive using a naïve algorithm. The current state-of-the-art algorithm tak
 es advantage of the symmetric nature of a moment tensor by dividing it int
 o smaller cubic tensor blocks and only computing the blocks with unique va
 lues and thus reducing computation. We propose a refactoring of this algor
 ithm by posing its computation as matrix operations, specifically Khatri-R
 ao products and standard matrix multiplications. An analysis of the comput
 ational and cache complexity indicates significant performance savings due
  to the refactoring. Implementations of our refactored algorithm in Julia 
 show speedups up to 10x over the reference algorithm in single processor e
 xperiments. We describe multiple levels of hierarchical parallelism inhere
 nt in the refactored algorithm, and present an implementation using an adv
 anced programming model that shows similar speedups in experiments run on 
 an NVIDIA GPU.\n\nDomain: Computer Science and Applied Mathematics, Physic
 s
END:VEVENT
END:VCALENDAR
