Saturday, November 30, 2013

Extended EMC for CVS Files

An extended Epsilon Model Connectivity (EMC) layer for working with Comma-separated Values (CSV) files/models.

Note to the reader: Some of my instructions assume you have worked with the Epsilon languages and tools before.

Epsilon family of languages ad set of tools provide a lot of flexibility when doing al sort of stuff with models. One of the things I like the most is the ability to work with models from domains other than EMF. Epsilon provide support for working with BibTex, CSV, and other type of files/models. Find all the info here. However, some of the drivers only provide basic support. This is the case of the CSV (comma-separated values), which currently only supports loading the CSV model (file) and accessing each of its rows as an element in the model. So basically you where limited to something in the lines of  the following EOL script:

for (r in csv!Row.all()) {
    if (r.at(2)) {
        r.at(0).print();

        r.at(1).print();
        ", is invited to the party!";

    }
}

Which assumes you have a CSV file of the form "name,lastName,isInvited", and you want to print the list of invited guests. Although your EOL script could be more complex you are still limited to read-only operations and index access of fields.

I am currently working on an extended version of the Epsilon's CSV EMC to provide additional capabilities. Essentially I want to provide support to access fields in each row by using the header information as navigation and that also supports modifying the model. You can find the source code in the Epsilon Labs (at Google code). You will need to check out two projects (if you have never worked with SVN repositories in eclipse this link provides all the info you need. Look under installation instructions and how to check out a repository.) : org.eclipse.epsilon.emc.csv and org.eclipse.epsilon.emc.csv.dt, under svnRepo. I will recommend importing this projects to your Eclipse Workspace and then running a nested eclipse to try them out. Replacing your default plugins by this ones might work, but I haven't tested that option. You can also checkout the org.eclipse.epsilon.emc.csv.test project, which has the examples I will be using onwards.

Loading a CSV model

Once the plugins are in place you will be able to use the extended CSV model loader with new options. If you don't see the csv model when adding a model to the lunch configuration be sure to click the "Show all model types" check box.


If you have worked with CSV models before, you will notice that now have a Load/Save section and a new CSV section to set some of the attributes of your CSV model.

  • The Field Separator allows you to select a different separator than comma.... yes, they are called comma-separated files, but sometimes a colon, or a semi-colon, or other char is used as a field separator. Now you can tell the model loader which one too use. By default it is a comma.
  • The Known Headers tells the loader that the first row of your file contains headers. Headers can late be used to access fields of a row.
  • The Varargs Header tells the loader that although the first row has headers, some of the rows in the file may not have values for all of them. I know this is not the "standard" (did you know that RFC 4180 describes CSV file standards?), but my particular CSV files did so I took the liberty to add it.

Lets see how the header information works now inside EOL scripts.

Lets assume I have a CSV model with the following information (some baseball statistics, but I am not a baseball fan):
Rk,Year,Age,Tm,Lg,,W,L,W-L%,G,Finish
1,1978,37,Atlanta Braves,NL,,69,93,.426,162,6
2,1979,38,Atlanta Braves,NL,,66,94,.413,160,6
3,1980,39,Atlanta Braves,NL,,81,80,.503,161,4
4,1981,40,Atlanta Braves,NL,,25,29,.463,55,4
5,1981,40,Atlanta Braves,NL,,25,27,.481,52,5
6,1982,41,Toronto Maple Leafs,AL,,78,84,.481,162,6
7,1983,42,Toronto Maple Leafs,AL,,89,73,.549,162,4
8,1984,43,Toronto Maple Leafs,AL,,89,73,.549,163,2
9,1985,44,Toronto Maple Leafs,AL,,99,62,.615,161,1

If you specify the Known Headers option, then each record (row) of your model will have the attributes Rk, Year, Age, Tm, Lg, etc.. Hence, you can do stuff like this (assuming you named your model baseball):

for (row in baseball!Row.all()) {
    row.Tm.println();
}

I know, its is simple, but enough to show you that now u can access fields in a row by the header name. Pretty neat ah? (You can find this in TestFieldRead.eol)

Saving model modifications:

If you set the "Store on Disposal" value to true, a script like this will change the team name every time u run it. (TestFieldWrite.eol)

for (row in baseball!Row.all()) {
    row.Tm.println();
    if (row.Tm == "Toronto Blue Jays") {
        row.Tm = "Toronto Maple Leafs";
    } else if (row.Tm == "Toronto Maple Leafs") {
        row.Tm = "Toronto Blue Jays";
    }
}

Varargs Headers

The concept of varargs is used in Java to define methods that accept a variable number of parameter (variable arguments). This can happen in a CSV file, where maybe there is a known set of headers that all records have, but also each record may have additional fields beyond this headers. Lets me show this with an example. I do my reference management with Qiqqa, which among other things, lets me export a CSV file of my reference matrix. It looks like this (after some editing to remove the comments and leave only two headers):

source,target
Corradini.etal1996,
Agrawal.etal2006,
DiRuscio.etal2012,Stevens2007,Czarnecki.etal2009,Czarnecki.Helsen2006,Stevens2008,Stevens2010,Selic2003,
Syriani.Vangheluwe2010,Agrawal.etal2006,Jouault.Kurtev2006a,
Aranda.etal2012,France.Rumpe2007,Schmidt2006,Kent2002,
France.Rumpe2007,Czarnecki.Helsen2006,MDA1.0.1,
Schmidt2006,
Kent2002,
Atkinson.Kuhne2003,
Baudry.etal2006,Fleurey.etal2004,
Vallecillo.etal2012,DiRuscio.etal2012,France.Rumpe2007,Czarnecki.Helsen2006,Stevens2008,Baudry.etal2006,

Patzina.Patzina2012,Kolovos.etal2008,Wimmer.etal2011,Jouault.Kurtev2006a,Taentzer.etal2005,Czarnecki.Helsen2006,
Baudry.etal2010,Schmidt2006,France.Rumpe2007,
Guerra2012,Kolovos.etal2008,Baudry.etal2010,
Bezivin2004,OCL2.3.1,Bezivin.Gerbe2001,
Favre2004,MDA1.0.1,Bezivin.Gerbe2001,Bezivin2004,Kent2002,


In such a file, there is always a source filed, but there may be zero or more targets. If you set the Varargs Headers option to true (check it), the CSV model loader will create a record that has a source and a target attribute (or as many fields in your header), and the last header will be a collection. This collection will hold all the fields in each row after and including the last header. IN the previous example, target is the last header, so the header attribute is a collection.

So now, if u want to print a list of all targets of each source, your script may look like this(TestFieldRead.eol) :


for (row in qiqqa!Row.all()) {
    row.source.print();
    " is connected to ".println();
    for (t in row.target) {
        t.println();
    }
}

Header-less CSV files

Finally, for support of header-less CSV files and to make it easy to use your existing scripts, if you don´t check the Known Headers option, the CSV model loader ill create a record for each row and a default attribute field. This attribute is collection which holds all the fields in each row.

Final Notes

Please use the Epsilon forum for any technical questions or discussions :D.

No comments:

Post a Comment