### Filter by type:

Sort by year:

PreprintPreprint

#### Abstract

In federated learning, communication cost is often a critical bottleneck to scale up distributed optimization algorithms to collaboratively learn a model from millions of devices with potentially unreliable or limited communication and heterogeneous data distributions. Two notable trends to deal with the communication overhead of federated algorithms are gradient compression and local computation with periodic communication. Despite many attempts, characterizing the relationship between these two approaches has proven elusive. We address this by proposing a set of algorithms with periodical compressed (quantized or sparsified) communication and analyze their convergence properties in both homogeneous and heterogeneous local data distributions settings. For the homogeneous setting, our analysis improves existing bounds by providing tighter convergence rates for both strongly convex and non-convex objective functions. To mitigate data heterogeneity, we introduce a local gradient tracking scheme and obtain sharp convergence rates that match the best-known communication complexities without compression for convex, strongly convex, and nonconvex settings. We complement our theoretical results and demonstrate the effectiveness of our proposed methods by several experiments on real-world datasets.

#### Targeted Data-driven Regularization for Out-of-Distribution Generalization

Conference PapersInternational conference on Knowledge Discovery and Data Mining (ACM SIGKDD 2020)

#### Abstract

Due to biases introduced by large real-world datasets, deviations of deep learning models from their expected behavior on out-ofdistribution test data are worrisome. Especially when data come from imbalanced or heavy-tailed label distributions or minority groups of a sensitive feature. Classical approaches to address these biases are mostly data- or application-dependent, hence, burdensome to tune. Some meta-learning approaches, on the other hand, aim to learn hyperparameters in the learning process using different objective functions on training and validation data. However, these methods suffer from high computational complexity and are not scalable to large datasets. In this paper, we propose a unified datadriven regularization approach to learn a generalizable model from biased data. The proposed framework, dubbed as targeted datadriven regularization (TDR), is model- and dataset-agnostic, and employs a target dataset that resembles the desired nature of test data in order to guide the learning process in a coupled manner. We cast the problem as a bilevel optimization and propose an efficient stochastic gradient descent based method to solve it. The framework can be utilized to alleviate various types of biases in real-world applications.We empirically show, on both synthetic and real-world datasets, the superior performance of TDR for resolving issues stem from these biases.

#### Learning to Quantize Deep Neural Networks: A Competitive-Collaborative Approach

with Md Fahim Faysal Khan, Vijaykrishnan Narayanan and Mehrdad Mahdavi
Conference Papers57th ACM/IEEE Design Automation Conference (DAC 2020)

#### Abstract

Reducing the model size and computation costs for dedicated AI accelerator designs, neural network quantization methods have attracted momentous attention recently. Unfortunately, merely minimizing quantization loss using constant discretization causes accuracy deterioration. In this paper, we propose an iterative accuracy-driven learning framework of competitive-collaborative quantization (CCQ) to gradually adapt the bit-precision of each individual layer. Orthogonal to prior quantization policies working with full precision for the first and last layers of the network, CCQ offers layer-wise competition for any target quantization policy with holistic layer fine-tuning to recover accuracy, where the state-of-the-art networks can be entirely quantized without any significant accuracy degradation.

#### Pareto-Efficient Fairness in Supervised Learning

with Rana Forsati and Mehrdad Mahdavi
PreprintPreprint

#### Abstract

TBA.

with Yuyang Deng and Mehrdad Mahdavi
PreprintPreprint

#### Abstract

Investigation of the degree of personalization in federated learning algorithms has shown that only maximizing the performance of the global model will confine the capacity of the local models to personalize. In this paper, we advocate an adaptive personalized federated learning (APFL) algorithm, where each client will train their local models while contributing to the global model. Theoretically, we show that the mixture of local and global models can reduce the generalization error, using the multi-domain learning theory. We also propose a communication-reduced bilevel optimization method, which reduces the communication rounds to $$O(\sqrt{T}$$ and show that under strong convexity and smoothness assumptions, the proposed algorithm can achieve a convergence rate of $$O(1/T)$$ with some residual error. The residual error is related to the gradient diversity among local models, and the gap between optimal local and global models.

PreprintPreprint

TBA.

PreprintPreprint

#### Abstract

The flourishing assessments of fairness measure in machine learning algorithms have shown that dimension reduction methods such as PCA treat data from different sensitive groups unfairly. In particular, by aggregating data of different groups, the reconstruction error of the learned subspace becomes biased towards some populations that might hurt or benefit those groups inherently, leading to an unfair representation. On the other hand, alleviating the bias to protect sensitive groups in learning the optimal projection, would lead to a higher reconstruction error overall. This introduces a trade-off between sensitive groups' sacrifices and benefits, and the overall reconstruction error. In this paper, in pursuit of achieving fairness criteria in PCA, we introduce a more efficient notion of Pareto fairness, cast the Pareto fair dimensionality reduction as a multi-objective optimization problem, and propose an adaptive gradient-based algorithm to solve it. Using the notion of Pareto optimality, we can guarantee that the solution of our proposed algorithm belongs to the Pareto frontier for all groups, which achieves the optimal trade-off between those aforementioned conflicting objectives. This framework can be efficiently generalized to multiple group sensitive features, as well. We provide convergence analysis of our algorithm for both convex and non-convex objectives and show its efficacy through empirical studies on different datasets, in comparison with the state-of-the-art algorithm.

#### Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization

Conference PapersNeural Information Processing Systems (NeurIPS 2019)

#### Abstract

Communication overhead is one of the key challenges that hinders the scalability of distributed optimization algorithms. In this paper, we study local distributed SGD, where data is partitioned among computation nodes, and the computation nodes perform local updates with periodically exchanging the model among the workers to perform averaging. While local SGD is empirically shown to provide promising results, a theoretical understanding of its performance remains open. In this paper, we strengthen convergence analysis for local SGD, and show that local SGD can be far less expensive and applied far more generally than current theory suggests. Specifically, we show that for loss functions that satisfy the Polyak-Kojasiewicz condition, $$O((pT)^{1/3})$$ rounds of communication suffice to achieve a linear speed up, that is, an error of $$O(1/pT)$$, where $$T$$ is the total number of model updates at each worker. This is in contrast with previous work which required higher number of communication rounds, as well as was limited to strongly convex loss functions, for a similar asymptotic performance. We also develop an adaptive synchronization scheme that provides a general condition for linear speed up. Finally, we validate the theory with experimental results, running over AWS EC2 clouds and an internal GPUs cluster.

#### Targeted Meta-Learning for Critical Incident Detection in Weather Data

Conference Papers Accepted for presentation in 36th International Conference on Machine Learning (ICML 2019), Workshop on "Climate Change: How Can AI Help?"

#### Abstract

Due to imbalanced or heavy-tailed nature of weather- and climate-related datasets, the performance of standard deep learning models significantly deviates from their expected behavior on test data. Classical methods to address these issues are mostly data or application dependent, hence burdensome to tune. Meta-learning approaches, on the other hand, aim to learn hyperparameters in the learning process using different objective functions on training and validation data. However, these methods suffer from high computational complexity and are not scalable to large datasets. In this paper, we aim to apply a novel framework named as targeted meta-learning to rectify this issue, and show its efficacy in dealing with the aforementioned biases in datasets. This framework employs a small, well-crafted target dataset that resembles the desired nature of test data in order to guide the learning process in a coupled manner. We empirically show that this framework can overcome the bias issue, common to weather-related datasets, in a bow echo detection case study.

#### Trading Redundancy for Communication: Speeding up Distributed SGD for Non-convex Optimization

Conference Papers Accepted for presentation in 36th International Conference on Machine Learning (ICML 2019)

#### Abstract

Communication overhead is one of the key challenges that hinders the scalability of dis-tributed optimization algorithms to train large neural networks. In recent years, there hasbeen a great deal of research to alleviate communication cost by compressing the gradientvector or using local updates and periodic model averaging. In this paper, we advocatethe use of redundancy towards communication-efficient distributed stochastic algorithmsfor non-convex optimization. In particular, we, both theoretically and practically, showthat by properly infusing redundancy to the training data with model averaging, it is pos-sible to significantly reduce the number of communications rounds. To be more precise,we show that redundancy reduces residual error in local averaging, thereby reaching thesame level of accuracy with fewer rounds of communication as compared with previousalgorithms. Our empirical studies on CIFAR10, CIFAR100 and ImageNet datasets ina distributed environment complement our theoretical results; they show that our algo-rithms have additional beneficial aspects including tolerance to failures, as well as greatergradient diversity compared with other algorithms.

#### CAPTAIN: Comprehensive Composition Assistance for Photo Taking

with Farshid Farhat and James Z. Wang
Journal Paper Work in progress.

#### Abstract

Many people are interested in taking astonishing photos and sharing with others. Emerging hightech hardware and software facilitate ubiquitousness and functionality of digital photography. Because composition matters in photography, researchers have leveraged some common composition techniques to assess the aesthetic quality of photos computationally. However, composition techniques developed by professionals are far more diverse than well-documented techniques can cover. We leverage the vast underexplored innovations in photography for computational composition assistance. We propose a comprehensive framework, named CAPTAIN (Composition Assistance for Photo Taking), containing integrated deep-learned semantic detectors, sub-genre categorization, artistic pose clustering, personalized aesthetics-based image retrieval, and style set matching. The framework is backed by a large dataset crawled from a photo-sharing Website with mostly photography enthusiasts and professionals. The work proposes a sequence of steps that have not been explored in the past by researchers. The work addresses personal preferences for composition through presenting a ranked-list of photographs to the user based on user-specified weights in the similarity measure. The matching algorithm recognizes the best shot among a sequence of shots with respect to the user's preferred style set. We have conducted a number of experiments on the newly proposed components and reported findings. A user study demonstrates that the work is useful to those taking photos.

#### Take It or Leave It: A Survey Study on Operating System Upgrade Practices

with Sadegh Farhang, Jake Weidman, Jens Grossklags, and Peng Liu
Conference Papers Proceedings of the Annual Computer Security Applications Conference 2018.

#### Intelligent Portrait Composition Assistance Integrating Deep-learned Models and Photography Idea Retrieval

with Farshid Farhat, Sahil Mishra, and James Z. Wang
Conference Papers Proceedings of the ACM Multimedia 2017.

#### Abstract

Retrieving photography ideas corresponding to a given location facilitates the usage of smart cameras, where there is a high interest among amateurs and enthusiasts to take astonishing photos at anytime and in any location. Existing research captures some aesthetic techniques and retrieves useful feedbacks based on one technique. However, they are restricted to a particular technique and the retrieved results have room to improve as they can be limited to the quality of the query. There is a lack of a holistic framework to capture important aspects of a given scene and give a novice photographer informative feedback to take a better shot in his/her photography adventure. This work proposes an intelligent framework of portrait composition using our deep-learned models and image retrieval methods. A highly-rated web-crawled portrait dataset is exploited for retrieval purposes. Our framework detects and extracts ingredients of a given scene representing as a correlated hierarchical model. It then matches extracted semantics with the dataset of aesthetically composed photos to investigate a ranked list of photography ideas, and gradually optimizes the human pose and other artistic aspects of the composed scene supposed to be captured. The conducted user study demonstrates that our approach is more helpful than the other constructed feedback retrieval systems.

#### Skeleton Matching with Applications in Severe Weather Detection

with Farshid Farhat, Stephen Wistar, and James Z. Wang
Journal Paper Journal of Applied Soft Computing, Elsevier, 2017.

#### Abstract

Severe weather conditions cause enormous amount of damages around the globe. Bow echo patterns in radar images are associated with a number of these destructive conditions such as damaging winds, hail, thunderstorms, and tornadoes. They are detected manually by meteorologists. In this paper, we propose an automatic framework to detect these atterns with high accuracy by introducing novel skeletonization and shape matching approaches. In this framework, first we extract regions with high probability of occurring bow echo from radar images, and apply our skeletonization method to extract the skeleton of those regions. Next, we prune these skeletons using our innovative pruning scheme with fuzzy logic. Then, using our proposed shape descriptor, Skeleton Context, we can extract bow echo features from these skeletons in order to use them in shape matching algorithm and classification step. The output of classification indicates whether these regions are bow echo with over 97% accuracy.

#### Shape Matching using Skeleton Context for automated Bow Echo Detection

with Farshid Farhat, Stephen Wistar, and James Z. Wang
Conference Papers IEEE International Conference on Big Data. December 2016.

#### Abstract

Severe weather conditions cause enormous amount of damages around the globe. Bow echo patterns in radar images are associated with a number of these destructive conditions such as damaging winds, hail, thunderstorms, and tornadoes. They are detected manually by meteorologists. In this paper, we propose an automatic framework to detect these atterns with high accuracy by introducing novel skeletonization and shape matching approaches. In this framework, first we extract regions with high probability of occurring bow echo from radar images, and apply our skeletonization method to extract the skeleton of those regions. Next, we prune these skeletons using our innovative pruning scheme with fuzzy logic. Then, using our proposed shape descriptor, Skeleton Context, we can extract bow echo features from these skeletons in order to use them in shape matching algorithm and classification step. The output of classification indicates whether these regions are bow echo with over 97% accuracy.