<efrbr:recordSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:efrbr="http://vfrbr.info/efrbr/1.1" xmlns:efrbr-work="http://vfrbr.info/efrbr/1.1/work" xmlns:efrbr-expression="http://vfrbr.info/efrbr/1.1/expression" xmlns:efrbr-manifestation="http://vfrbr.info/efrbr/1.1/manifestation" xmlns:efrbr-person="http://vfrbr.info/efrbr/1.1/person" xmlns:efrbr-corporateBody="http://vfrbr.info/efrbr/1.1/corporateBody" xmlns:efrbr-concept="http://vfrbr.info/efrbr/1.1/concept" xmlns:efrbr-structure="http://vfrbr.info/efrbr/1.1/structure" xmlns:efrbr-responsible="http://vfrbr.info/efrbr/1.1/responsible" xmlns:efrbr-subject="http://vfrbr.info/efrbr/1.1/subject" xmlns:efrbr-other="http://vfrbr.info/efrbr/1.1/other" xsi:schemaLocation="http://vfrbr.info/efrbr/1.1 http://vfrbr.info/schemas/1.1/efrbr.xsd"><efrbr:entities><efrbr-work:work identifier="http://purl.tuc.gr/dl/dias/65923D16-7EF8-4081-8F5F-D386BC668CB5"><efrbr-work:titleOfTheWork>SPARTAN: using constrained models for guaranteed-error semantic compression</efrbr-work:titleOfTheWork></efrbr-work:work><efrbr-expression:expression identifier="http://purl.tuc.gr/dl/dias/65923D16-7EF8-4081-8F5F-D386BC668CB5"><efrbr-expression:titleOfTheExpression>SPARTAN: using constrained models for guaranteed-error semantic compression</efrbr-expression:titleOfTheExpression><efrbr-expression:formOfExpression vocabulary="DIAS:TYPES">
            Peer-Reviewed Journal Publication
            Δημοσίευση σε Περιοδικό με Κριτές
         </efrbr-expression:formOfExpression><efrbr-expression:dateOfExpression type="issued">2015-10-29</efrbr-expression:dateOfExpression><efrbr-expression:dateOfExpression type="published">2002</efrbr-expression:dateOfExpression><efrbr-expression:languageOfExpression vocabulary="iso639-1">en</efrbr-expression:languageOfExpression><efrbr-expression:summarizationOfContent>While a variety of lossy compression schemes have been developed for certain forms of digital data (e.g., images, audio, video), the area of lossy compression techniques for arbitrary data tables has been left relatively unexplored. Nevertheless, such techniques are clearly motivated by the ever-increasing data collection rates of modern enterprises and the need for effective, guaranteed-quality approximate answers to queries over massive relational data sets.In this paper, we propose SPARTAN, a system that takes advantage of attribute semantics and data-mining models to perform lossy compression of massive data tables. SPARTAN is based on the novel idea of exploiting predictive data correlations and prescribed error-tolerance constraints for individual attributes to construct concise and accurate Classification and Regression Tree (CaRT) models for entire columns of a table. More precisely, SPARTAN selects a certain subset of attributes (referred to as predicted attributes) for which no values are explicitly stored in the compressed table; instead, concise error-constrained CaRTs that predict these values (within the prescribed error tolerances) are maintained. To restrict the huge search space of possible CaRT predictors, SPARTAN uses a Bayesian network structure to guide the selection of CaRT models that minimize the overall storage requirement, based on the prediction and materialization costs for each attribute. SPARTAN's CaRT-building algorithms employ novel integrated pruning strategies that take advantage of the given error constraints on individual attributes to minimize the computational effort involved. Our experimentation with several real-life data sets offers convincing evidence of the effectiveness of SPARTAN's model-based approach --- SPARTAN is able to consistently yield substantially better compression ratios than existing semantic or syntactic compression tools (e.g., gzip) while utilizing only small samples of the data for model inference.</efrbr-expression:summarizationOfContent><efrbr-expression:useRestrictionsOnTheExpression type="creative-commons">http://creativecommons.org/licenses/by/4.0/</efrbr-expression:useRestrictionsOnTheExpression><efrbr-expression:note type="journal name">SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery &amp; Data Mining</efrbr-expression:note><efrbr-expression:note type="journal volume">4</efrbr-expression:note><efrbr-expression:note type="journal number">1</efrbr-expression:note><efrbr-expression:note type="page range">11-20</efrbr-expression:note></efrbr-expression:expression><efrbr-person:person identifier="067E0285-B610-4113-B727-93264AA0AD19"><efrbr-person:nameOfPerson vocabulary="">
            Babu Shivnath
         </efrbr-person:nameOfPerson></efrbr-person:person><efrbr-person:person identifier="http://users.isc.tuc.gr/~mgarofalakis"><efrbr-person:nameOfPerson vocabulary="TUC:LDAP">
            Garofalakis Minos
            Γαροφαλακης Μινως
         </efrbr-person:nameOfPerson></efrbr-person:person><efrbr-person:person identifier="818ADC29-99D8-4326-8B99-C2563CF81047"><efrbr-person:nameOfPerson vocabulary="">
            Rastogi Rajeev
         </efrbr-person:nameOfPerson></efrbr-person:person><efrbr-corporateBody:corporateBody identifier="http://www.acm.org/"><efrbr-corporateBody:nameOfTheCorporateBody vocabulary="S/R:PUBLISHERS">
            Association for Computing Machinery
         </efrbr-corporateBody:nameOfTheCorporateBody></efrbr-corporateBody:corporateBody><efrbr-concept:concept identifier="A434BFD7-4330-449A-BAD7-0AECB0FC727D"><efrbr-concept:termForTheConcept>
            Semantics
         </efrbr-concept:termForTheConcept></efrbr-concept:concept><efrbr-concept:concept identifier="9EB555EF-0027-46BA-96D6-8A32AE78507E"><efrbr-concept:termForTheConcept>
            Data mining
         </efrbr-concept:termForTheConcept></efrbr-concept:concept></efrbr:entities><efrbr:relationships><efrbr-structure:structureRelations><efrbr-structure:realizedThrough sourceEntity="work" targetEntity="expression" sourceURI="http://purl.tuc.gr/dl/dias/65923D16-7EF8-4081-8F5F-D386BC668CB5" targetURI="http://purl.tuc.gr/dl/dias/65923D16-7EF8-4081-8F5F-D386BC668CB5"/></efrbr-structure:structureRelations><efrbr-responsible:responsibleRelations><efrbr-responsible:createdBy sourceEntity="work" targetEntity="person" sourceURI="http://purl.tuc.gr/dl/dias/65923D16-7EF8-4081-8F5F-D386BC668CB5" targetURI="067E0285-B610-4113-B727-93264AA0AD19"/><efrbr-responsible:realizedBy sourceEntity="expression" role="author" targetEntity="person" sourceURI="http://purl.tuc.gr/dl/dias/65923D16-7EF8-4081-8F5F-D386BC668CB5" targetURI="067E0285-B610-4113-B727-93264AA0AD19"/><efrbr-responsible:realizedBy sourceEntity="expression" role="author" targetEntity="person" sourceURI="http://purl.tuc.gr/dl/dias/65923D16-7EF8-4081-8F5F-D386BC668CB5" targetURI="http://users.isc.tuc.gr/~mgarofalakis"/><efrbr-responsible:realizedBy sourceEntity="expression" role="author" targetEntity="person" sourceURI="http://purl.tuc.gr/dl/dias/65923D16-7EF8-4081-8F5F-D386BC668CB5" targetURI="818ADC29-99D8-4326-8B99-C2563CF81047"/><efrbr-responsible:realizedBy sourceEntity="expression" role="publisher" targetEntity="person" sourceURI="http://purl.tuc.gr/dl/dias/65923D16-7EF8-4081-8F5F-D386BC668CB5" targetURI="http://www.acm.org/"/></efrbr-responsible:responsibleRelations><efrbr-subject:subjectRelations><efrbr-subject:hasSubject sourceEntity="work" targetEntity="concept" sourceURI="http://purl.tuc.gr/dl/dias/65923D16-7EF8-4081-8F5F-D386BC668CB5" targetURI="A434BFD7-4330-449A-BAD7-0AECB0FC727D"/><efrbr-subject:hasSubject sourceEntity="work" targetEntity="concept" sourceURI="http://purl.tuc.gr/dl/dias/65923D16-7EF8-4081-8F5F-D386BC668CB5" targetURI="9EB555EF-0027-46BA-96D6-8A32AE78507E"/></efrbr-subject:subjectRelations><efrbr-other:otherRelations/></efrbr:relationships></efrbr:recordSet>