Barna Saha

barna-imageI am an Assistant Professor in the College of Information & Computer Science at the University of Massachusetts Amherst. I am also a Permanent Member of Center for Discrete Mathematics and Theoretical Computer Science (DIMACS) at Rutgers. Before joining UMass in 2014, I was a Research Scientist at AT&T Shannon Laboratories, New Jersey. I spent four wonderful years (2007-2011) at the University of Maryland College Park from where I received my Ph.D. in Computer Science. In Fall 2015, I was at the University of California Berkeley as a Visiting Scholar and as a fellow of the Simons Institute.

I offered a 4-day summer course in India on Algorithmic aspects of handling large data. 

Research Interests:

  • Algorithm Design and Analysis,
  • Large Scale Data Analytics,
  • Randomization in Computation

I particularly enjoy working on problems that are tied to core applications but have potentials to lead to beautiful theory. I primarily focus on designing fast algorithms for a variety of applications, or understanding fundamental limitations of why that is not possible. This includes problems in pattern matching, network and graph problems, data mining, crowd sourcing, resource allocation etc. You will find the recurring theme of randomization and approximation in all my works–though the reasons they are used could be different.


[By TCS tradition author names are in alphabetical order of surnames, with few exceptions.]


A1. Fast & Space-Efficient Approximations of Language Edit Distance and RNA-Folding: An Amnesic Dynamic Programming Approach, Barna Saha, FOCS 2017.

A2. Approximating Language Edit Distance Beyond Fast Matrix Multiplication: Ultra-linear grammars are where Parsing Becomes Hard!, Rajesh Jayaram and Barna Saha, ICALP 2017. Extended Abstract

A3Clustering via Crowdsourcing, Arya Mazumdar and Barna Saha, arxiv preprint, April 2016. (The paper introduces a rigorous theoretical study of using pair-wise queries interactively for clustering. This concept has been picked up by flurry of papers recently by a different name “same-cluster” queries.)

A4. Truly Sub-cubic Algorithms for Language Edit Distance and RNA Folding via Fast Bounded-Difference Min-Plus Product, Karl Bringmann, Fabrizio Grandoni, Barna Saha and Virginia Williams, FOCS 2016. Invited to special issue for selected best papers.

A5. Language Edit Distance & Maximum Likelihood Parsing of Stochastic Grammars: Faster Algorithms & Connection to Fundamental Graph Problems, Barna Saha, FOCS 2015.

J1. New Approximation Results for Resource Replication Problems, Samir Khuller, Barna Saha and Kanthi Sarpatwar. Algorithmica, 2015. [Preliminary Version in APPROX 2012]

A6. The Dyck Language Edit Distance Problem in Near-linear Time, Barna Saha, FOCS 2014.

A7.  Hierarchical Graph Partitioning,  Mohammadtaghi Hajiaghayi, Theodore Johnson, Reza Khani and Barna SahaSPAA 2014.

J2. Facility location with matroid or knapsack constraints , Ravishankar Krishnaswamy, Amit Kumar, Vishwanath Nagarajan, Yogish Sabharwal and Barna Saha. Mathematics of Operations Research, 2014.

A8. A Constant Factor Approximation Algorithm for Fault Tolerant k-Median Problem, Mohammadtaghi Hajiaghayi, Wei Hu, Jian Li, Shi Li and Barna Saha.  SODA, 2014.

A9. Renting a Cloud, Barna Saha, FSTTCS, 2013.

A10. On Chebyshev Radius of a Set in Hamming Space and the Closest String Problem Yury Polyanskiy, Arya Mazumdar and Barna SahaISIT 2013.

A11. New Approximation Results for Resource Replication Problems, Samir Khuller, Barna Saha and Kanthi Sarpatwar. APPROX 2012.

A12. Set Cover revisited: Hypergraph Cover with Hard Capacities,Barna Saha and Samir Khuller.ICALP-Track A 2012.

This paper settles an open question raised by Chuzhoy and Naor in [FOCS02].

A13. The Matroid Median Problem, Ravishankar Krishnaswamy,  Vishwanath Nagarajan and Barna Saha. SODA 2011.
Merged with a paper by Amit Kumar and Yogish Sabharwal who obtained similar results for the case of partition matroid.

J3. New Constructive Aspects of the Lovasz Local Lemma Bernhard Haeupler,  Barna Saha and Aravind Srinivasan. Journal of the ACM (JACM), 2011. [Preliminary version in FOCS 2010]

A14. AdCell-Ad Allocation in Cellular Networks, Saeed Alaei, Mohammad Taghi Hajiaghayi, Vahid Liaghat, Dan Pei and Barna Saha. ESA 2011.
Here is a much polished version for the online algorithm. Link.

A15. On Capacitated Set Cover Problems, Nikhil Bansal, Ravishankar Krishnaswamy and Barna Saha. APPROX 2011.

A16. New Constructive Aspects of the Lovasz Local Lemma, Bernhard Haeupler, Barna Saha and Aravind Srinivasan. FOCS 2010.
This paper settles an outstanding open question related to resource allocation, known as the Santa Claus problem.

A17. Energy Efficient Scheduling via Partial Shutdown, Samir Khuller, Barna Saha and Jian Li. SODA 2010.

A18. A New Approximation Technique for Resource-Allocation ProblemsBarna Saha and Aravind Srinivasan. ITCS 2010, known as ICS at that time.

A19. On Finding Dense Subgraphs, Samir Khuller and Barna Saha. ICALP 2009. Longer version.

A20On Estimating Path-Aggregates over Streaming Graphs, Sumit Ganguly and Barna SahaISAAC 2006.

Data Analytics

D1. A Theoretical Analysis of First Heuristics of Crowdsourced Entity Resolution, Arya Mazumdar, Barna Saha. AAAI 2017.

D2.  Online Entity Resolution with an Oracle, Donatella Firmani, Barna Saha, Divesh Srivastava, VLDB 2016.

D3. TreeScope: Finding Structural Anomalies In Semi-Structured Data, Shanshan Ying, Flip Korn, Barna Saha and Divesh Srivastava, VLDB 2015.

D4. Size-constrained Weighted Set Cover, Lukasz Golab, Flip Korn, Feng Li, Barna Saha and Divesh Srivastava. ICDE 2015.

J4. Discovering Conservation Rules, Lukasz Golab, Howard Karloff, Flip Korn, Barna Saha and Divesh Srivastava.  IEEE Transactions on Knowledge and Data Engineering (TKDE) 2014. Special Issue on the Best Papers of ICDE 2012.

D5. Data Quality: The other Face of Big Data, Barna Saha and Divesh Srivastava. Tutorial, ICDE 2014.

D6.  Distributed Data Placement to Minimize Communication Costs via Graph Partitioning,
Lukasz Golab, Marios Hadjieleftheriou, Howard Karloff and Barna Saha. SSDBM 2014.

D7. Firewall Placement on Cloud Data Centers, Manish Purohit Seungjoon Lee and Barna Saha. Poster, SOCC 2013.

D8.  On Repairing Structural Problems in Semi-structured Data, Flip Korn, Barna Saha, Divesh Srivastava and Shanshan Ying. VLDB 2013.

D9Less is More, Selecting Sources Wisely for IntegrationXin Luna Dong, Barna Saha and Divesh Srivastava. VLDB 2013.

D10. Discovering Conservation Rules, Lukasz Golab, Howard Karloff, Flip Korn, Barna Saha and Divesh Srivastava. ICDE 2012.

Among the selected best papers of ICDE 2012.

D11. Link Prediction for Annotation Graphs, Philip Anderson, Samir Khuller, Saket Navlakha, Louiqa Raschid, Barna Saha, Andreas Thor and Xiao-Ning Zhang. ISWC 2011.

J5. A Unified Approach to Ranking in Probabilistic Databaseswith Jian Li, Barna Saha and Amol Deshpande. The VLDB Journal, 2011.
Best Paper Issue of VLDB 2009.

D12.  Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs, with Allison Hoch, Samir Khuller, Louiqa Raschid, Barna Saha and Xiao-Ning Zhang. RECOMB 2010.

D13. Schema Covering: A Step Towards Enabling Reusability in Information Integration, Barna Saha, Ioana Stanoi and Ken Clarkson. ICDE 2010.

D14. A Unified Approach to Ranking in Probabilistic Databases, Jian Li, Barna Saha and Amol Deshpande. VLDB 2009. [BEST PAPER AWARD]                     Invited to the BEST PAPER ISSUE OF VLDBJ.

D15. On Maximum Coverage in the Streaming Model & Application to Multi-topic Blog-Watch, Barna Saha and Lise GetoorSDM 2009.

D16.  Simplifying Information Integration: Object-Based Flow-of-Mappings Framework for Integration, Bogdan Alexe, Michael Gubanov, Mauricio A. Hernandez, Howard Ho, Jen-Wei Huang, Yannis Katsis, Lucian Popa, Barna Saha and Ioana Stanoi. Lecture Notes in Business Information Processing, Revised Selected Papers from
Second International VLDB Workshop, BIRTE 2008.

D17.  Group Proximity Measure for Recommending Groups in Online Social Networks.Barna Saha and Lise Getoor. ACM SIGKDD Workshop on Social Network Mining and Analysis (SNA-KDD), 2008.

D18. Bidirectional Fuzzy-Regression Model for Road-lines Detection, Barna Saha, Arya Mazumdar and N R Pal. ICEIS 2006.