BadRevit Archives » What Revit Wants + Black Grid AI

Parsing a CSV with Line Breaks in the Data Fields

December 26, 2020December 26, 2020 Luke Johnson

I was recently working on a multi-vector dataset comparison in Deep Space. We had received Revit, Navisworks and tabular data, and I was comparing 3 different data drops of that information, particularly for changes in the quantities of specific types of elements. A ‘data drop’ is a set of data or files that you receive at a given point in time.

In the course of exporting the Navisworks data through to CSV, I came across a specific problem – the number of rows in imported CSV did not match the number of elements processed. Why?

As you may know, you can store line breaks inside parameters in Revit. There are very few good reasons to do this, but it still does happen. Once this happens, those line breaks need to be processed by tools down stream. I spent a lot of time in the early days of Deep Space figuring out how to ‘clean’ bad Revit data so we could still bring it into the platform for analysis. Usually, if the CSV writer or reader are smart enough, they should be able to deal with this problem. However, I discovered a bit of a gap in the available tools this time. What was the problem?

It turns out that the CSV was malformed, it was actually dirty or bad data. While it did have the line breaks, it did not consistently use double quotes to contain fields. So we had the situations where there would be line breaks that were inside the CSV fields or columns, but not inside double quotes. I tried a lot of different CSV readers, including Excel, LibreOffice, Google Sheets and PowerBI / PowerQuery, but they all tripped up at this data. Because they were using the rule ‘new line = new row of data’, the imported CSV information was coming in mangled.

How can we clean this data? We generally know there should be 17 fields or 16 commas per row of data. But we also know there can be line breaks inside fields… so it is a challenge to map a data row to CSV lines. In some cases 2 or 3 lines of CSV data might still just be one row of actual data.

After trying to use various out of the box solutions, I decided to build some Python code to try and solve this. I used Dynamo Sandbox 1.3 to do this, primarily out of habit, not because it is the best Python IDE out there 🙂 I ended up with a kind of line-merging iterator, here is some of the Python code below:

biglist = IN[0]

commact=[]
for ctr in range(len(biglist)):
    astr=biglist[ctr]
    strct=astr.count(',')
    commact.append(strct)

counted=range(len(biglist))
fixedstr=[]
bad=[]
skips=[]
incr=0

for ctrx in range(len(biglist)):
    if ctrx==skips:
        pass
    elif commact[ctrx]==16:
        fixedstr.append(biglist[ctrx])
    elif commact[ctrx]>16:
        fixedstr.append(biglist[ctrx])
    elif commact[ctrx]+commact[ctrx+1]==16:
        jnr=[]
        jns=biglist[ctrx]+biglist[ctrx+1]
        fixedstr.append(jns)
        skips=ctrx+1
    else:
        bad.append(ctrx) 
        
OUT = bad, fixedstr, commact, counted

What does it do? Essentially, it counts the number of delimeters (commas) on one line, then it processes or merges lines based on that information.

This actually got me around 90% of the way there. Then I still had to do some manual fixes of things like ‘double double quotes’ that were also tripping up the CSV readers.

Above is a snapshot of the Dynamo script. And here is the script for download:

CSV Line Merge for Malformed CSV

What the the lessons here?

Try and fix the source or native data if you can. Dealing with messy data downstream can be a real pain.
If you need to solve this problem, you can pick up my code or work above and advance it a bit further to build a more robust ‘malformed CSV reader’
Don’t let problems like this distract you during the holidays 🙂

Let’s All Agree that Using Detail Lines for Grids is Really #BadRevit

June 20, 2017 Luke Johnson

Hey, I know we sometimes can’t agree on things like OpenBIM and Revit Vs ArchiCAD and Shared Parameter Standardisation…

But let’s all agree never to use Detail Lines as Grids, ever again, ok? (You know who you are…)

How To Allow Duplicate Detail Numbers on a Revit Sheet (and yes it is #BadRevit)

December 9, 2016December 13, 2017 Luke Johnson

Revit Wants you to have a unique Detail Number for each Viewport placed on a Sheet. This makes total sense, and allows you to trace from a Referencing View, through a View Tag (telling you which Sheet and Detail Number that View is on), and back again… The Viewport knows which Detail Number it is, and which Sheet is referenced from.

Ok, that is What Revit Wants. And in fact, it doesn’t allow you to have duplicate Detail Numbers,

… because that would be madness, right? Well, in some Architectural firms, they may have drafting standards which indicate something like this:

We will reference Interior Elevations from our General Arrangement Floor Plan
The View Tags will have letters a, b, c, d and so on for each Room
These Interior Elevation views will be placed together on Sheets
We will append the Room number to the Detail Number so that we can figure out which Elevation a is which…
Meaning that there will be multiple Detail Numbers that could be the same on each Interior Elevation Sheet

I’m not going to speak to the validity (?) of this logic, but lets just say sometimes drafting standards from past ways of doing things don’t mix so well with Revit. How do we work around this problem?

I’m not so proud of this, but it is one of those hacks that just seems to work…

Create the Viewport Title Tag with a Shared Parameter for the View Room Number, and the Detail number in one label together
Use a special invisible character to make the Revit program think the Detail Number is different, when it appears exactly the same to human eyes… as I said this is bad Revit

How do you do step 2? Just copy one of these characters from your Character Map and paste it after the Detail Number that Revit doesn’t want you to have. So you type ‘a’, then press Ctrl+V to paste the special character in to the Detail Number parameter:

This is the Unicode character U+200B: Zero Width Space.

Oh, and have you ever wanted to get rid of that annoying question mark in a Revit Tag because there is no value? Just put in one of these Zero Width Space characters and it will go away 😉

Ok, bring on the Comments 🙂

Category: BadRevit

Parsing a CSV with Line Breaks in the Data Fields

Let’s All Agree that Using Detail Lines for Grids is Really #BadRevit

How To Allow Duplicate Detail Numbers on a Revit Sheet (and yes it is #BadRevit)

Enjoy this blog? Please spread the word :)