Nicolas Neu

dies & ditt.

Pycon 2015: Serialization Formats Are Not Toys

I am on my way back from this years Pycon in Montreal. You should definately have a look at the PyCon YouTube channel where you can find all the talks that where given over the weekend. Too many to choose from? Check out this session by one of my co-workers on how to build a recommendation engine with Python, NumPy and Pandas.

One presentation I want to quickly recap is Serialization formats are not toys by @tveastman. You can check out the slides here.

Eastman looks at possible security risks when using data serialization formats like XML, YAML or JSON. Pretty much all these problems occur, when the parser tries to be too smart for its own good or the serialization format itself includes questionable features that might be useful for a handful of problems but will also gladly assist you shooting yourself in the foot.

YAML

YAML allows you to embed scripting code in your markup. While this might be useful to insert timestamps or something similar into your config files it can also be used to call system commands on the server. An attacker can now execute arbitrary code. When dealing with YAML coming from untrusted sources make sure to use the safer yaml.safe_load function which disables these features instead of the default yaml.load.

XML

XML, being an inherently complex markup format, also has some built-in features that can be used for attacks. The Billion laugs attack for example is a very simple and straight forward way to DOS a server that is accepting and parsing XML. Another attack vector are XML ENTITIES using file or URL descriptors.

<!ENTITY password SYSTEM "file:///etc/passwd">

This entity includes the systems password file into the XML document. Accessing the file is possible since oftentimes the parsing process is running with root permissions. If the document is malformed and rejected, a common way to handle the situation is to send back the original document to the client. The document sent back will now include the content of the systems password file. A similar attack is possible when used not with files but with URLs to access files on other servers inside the internal network.

When handling XML coming from untrusted sources you should dumb down your parser as much as possible. Instead of using lxml, consider using defusedxml which allows you to disable all unnecessary features that increase your attack surface.

JSON

JSON is pretty safe when used with a decent parser. You will be fine as long as you do not use eval(). At this point it is worth mentioning, that the same principle applies for all the instances in which you marshall or pickle which should never be used with untrusted input.

LaTex Sytax Highlighting Using Minted, Pygments and Latexmk

This post mostly serves as a reminder for myself. There are already dozen of tutorials on the web describing how to display code inside a latex document. Yet, every time i needed to typeset code I found myself on google and stackexchange looking for the best way to do it. All solutions come with some kind of caveat, so maybe the best way to look at this post is as some kind of best practices document.

First of all: Why not listings? That’s what I have been using so far and it works. Never was quite satisfied with it though. Support for the languages I use is built in, at least rudimentary. Occasionally keywords are missing which need to be patched in, which is not a big problem itself but it always felt kind of hacky and I would prefer a definite solution. Secondly, although colours are supported it does not do so out of the box. Colormappings have to be added by hand and the ones I found online always felt off. So what else is there?

The second solution you will find if you google this longer than one minute is the minted package which relies on the python pygments package for syntax highlighting. It does a superb job of highlighting without any manual fiddling, colours feel right and the output overall looks just as beautiful as the rest of your tex’d document.

Installation of pygments is business as usual:

pip install pygments

Then in the document header just include the minted package and fancyvrb:

\usepackage{fancyvrb}
\usepackage{minted}

The only downside is the additional requirement of pygments. I try to avoid solutions like this. I occasionally switch workstations which means dependencies might not be set up and especially on windows dealing with python, packages and the command line in general is always a little bit more painful than on other systems. Unfortunately I had no success getting around this. Pygments supports tex output through a formatter which I used to generate tex files with syntax highlighting macros already in place. The macros this is relying on are undefined and compiling fails. There probably is a way to set them up, but apart from mentioning a get_style_defs() method somewhere in the pygments package, I found no documentation of actually getting these definitions. This does work however when the minted package is included. If anyone knows how to do this, I’d be happy to hear about it.

Did I say only downside? Correction! Two downsides. As an external library is invoked, latex needs to be called with the –shell-escape argument. I am usually building my documents with latexmk which I did not expect to be very cooperative. I was wrong. If you don’t already have a .latexmkrc in your home directory, just create one with the following content and you are done:

$pdf_mode = 1;
$pdflatex = 'pdflatex --shell-escape interaction=nonstopmode %O %S -file-line-error -synctex=1';

To use it in your document you can define a code area like this:

\begin{minted}[linenos=true]{python}
    print 'waddap?'
\end{minted}

The end result should look something like this. Neat eh?

The Sad State of (Secure) Mobile Messaging

People don’t trust facebook. After the recent announcement that facebook bought Whatsapp I have heard a lot of people around me voicing their discontent with this situation, wanting to change for a different messaging platform. For me this came as a small surprise as there was no shift whatsoever to move to more secure platforms after the Snowden leaks, but I take what I can get. If this acquisition can motivate people to move somewhere else, away from an application which showcased severe security problems I will gladly take this opportunity. The fact that Threema, an alternative messaging app which promises end to end encryption, doubled its user base overnight shows that not only people around me are feeling uneasy about the situation. Granted, 400.000 Threema users compared to 430.000.000 whatsapp users doesn’t sound like much, but it’s a start. So why don’t I switch?

I make extensive use of whatsapp and its groupchat function to stay in touch with my friends in germany. As the resident nerd, I was told to figure out what to do. Where to now? So I started considering different applications that would be worth switching to. Spoileralert: They all suck. You don’t get people to switch away from their favorite apps that often, so better take this chance and make it right. For me, this primarily means a secure end-to-end encryption when using the app. Also, it needs to be easy to use. My friends are in business, banking, one of them is a nurse. They never experienced the joy of recompiling a kernel just to get a graphics card driver to work. Manual handling of keys or overly complicated account management would be a tough sell. So what’s out there at the moment?

Threema

Lets start with Threema as it seems to be one of the two most popular contenders in the field right now. It has been on the market for quite some time now, so the app is quite polished. It uses encryption, the servers are located in switzerland and not the states (although I doubt that this makes much a difference at this point), supports groupchat and a couple other nice features. It costs about two bucks which might slightly hinder adoption among my peers but I am not too concerned about that. More severe is the fact that it requires Android 4.0 or higher. Whatsapp, how horrible it might be, still has the big advantage to run on every crappy android phone out there. The real problem however, is that it is not open source. Safe encryption without scrutiny by 3rd parties is not possible and I am not willing to blindly trust some guys who promise that their crypto is secure. Even if there is no evil intent, the danger of a weakness introduced by a programming error is still there.

I’d rather have no security than a false sense of it.

Telegram

Currently in the toplist on the appstore, telegram also promises secure messaging. Now, I don’t know how I should feel to abandon whatsapp because it was bought by facebook only to use something developed by the VK founders. If the crypto works this is fine I guess. Sadly it isn’t. I am no expert so I will just link to this nice blogpost by cryptofails, explaining the issues in details.

Surespot

Surespot although a bit ugly looks quite promising and I might consider using it some time from now. Right now some quintessential features like groupchat are lacking which immediately disqualifies it. The surespot developer promised group chat in two months from now. If they can also get rid of the message log limit of 1000 and implement voice memos this might actually become usable.

Whistle.im

I will just link to this analysis by the CCC hamburg, completely dismantling the whistle.im crypto. The developers have since patched their software to address the mentioned issues but as some of the flaws have shown a lack of basic cryptographical knowledge I am not convinced that the software is working as intended now. If no further reports pop up in the future they might gain some trust back, but it will certainly take a lot of time (and maybe some independent verification of the code) before you should even consider using this.

Chatsecure

A mobile jabber client using otr for encryption. The app certainly isn’t the prettiest but I could use my desktop machine to continue chatting when I got my laptop with me. The setup process which would involve creating a jabber account somewhere is certainly more complicated than for the rest of the apps but nothing even not so computer savy people couldn’t manage with some help. The problem is that jabber on mobile devices sucks. The protocol was never intended to work with flaky mobile connections. There are extensions for mobile use, but the iPhone architecture doesn’t allow apps to continuously keep running in the background which means the app disconnects every ten minutes. Besides, otr needs both parties to be online for the key exchange. Messaging offline contacts in a secure way is not possible.

myENIGMA

No public sourcecode. NEXT!

Conclusion

Right now, nothing out there is really satisfying. I have some hope for Textsecure who are still working on an iPhone app (and I think switching to data channels for transport) and heml.is which is also still in development. Until then I don’t think I’ll be going anywhere. At least whatsapp has that awesome voice memo feature….

Starting With Clojure

One item which has been on my todo list for quite time now is to fiddle around and learn a functional language of some sorts. Most programming I do nowadays is either in Python or Java with a little bit of Ruby or Javascript here and there. The parallel code I wrote was in C using OpenMP or MPI and while I admit that OpenMP can be pleasant to work with, I still wished for a nicer, more explicit way to accomplish my tasks. Functional programming languages are based on the idea of side-effect free programming (with some languages complying with this goal more than others). This allows the runtime to execute code in parallel without requiring the programmer to explicitly use concurrency constructs (Threads, Runnables, #pragma omp’s or whatever) in his code.

Another nice thing is that understanding the functional programming style helps you to write better code in general. At least that’s the word on the street and if I remember Blochs Effective Java correctly pretty much every second thing he said boiled down to: avoid side-effects, so it seems to be pretty legit.

I won’t give an introduction into Clojure per se, but I will showcase some resources that helped me to get started and made the process of learning the language a lot easier.

  • Programming Clojure by Emerick, Carper and Grand. A very extensive book which provides an easy to understand introduction as well as advanced topics.

  • Leiningen makes it easy to set up and maintain a Clojure environment on your machine. It also simplifies project management although I didn’t really come into contact with this functionality.

  • Light Table is an open source IDE with excellent Clojure support. Resolving the values of S-Expressions immediately after input and displaying the results makes it very easy to follow the program flow, even for more complex applications. As a Lisp novice who had a hard time parsing all those parentheses, this was a great help.

  • 4Clojure provides programming puzzles similar to Project Euler. The big difference is that the puzzles are designed to help you learn Clojure along the way. New concepts are introduced by presenting successively harder becoming problems you can solve in Clojure.

  • Multiple Videolectures addressing different Clojure related topics. They ranged from being very general, outlining the motivation behind Clojure to more technically focussed ones dealing with concurrency in Clojure: Hammock driven development, The Value of Values, Concurrency in Clojure.

  • ClojureDocs provides a nice API including examples for all the complete Clojure standard library. Unfortunately it is slightly outdated albeit this didn’t affect me too much.

Planethatch Hackathon

A couple weeks back I took part in the Planet Hatch Hackathon, hosted by @ianbishop. It was a rather small event with only 4 teams competing which I didn’t mind, seeing that it was the first event of this kind I attended. For those who don’t know how a Hackathon works: One person pitches an idea he wants to work on during the weekend. If you like the idea or you think you can contribute to this project with your skillset you join in. The projects at Planet Hack ranged from yup, that could actually be pretty useful to are you kidding me(looking at you, QR-Code based bathrom social network). I worked on an intelligent alarm clock for runners. It allows you to set two distinct alarms: The first one will go off if the weather is nice enough for a run outside. If its raining you may want to sleep a bit longer, so the second alarm kicks in.

photo by @Planethatch

Working in a new team where everyone has a different level of experience under a hard time constraints is quite different to what I am used to in my daily work as a grad student. Some things I took away from the event:

Try to get shit done: One of the guys in my team brought a RaspberryPi which was pretty awesome as I didn’t really have any opportunity to play around with one of those things so far. We had a couple cool ideas on how to incorporate the Pi into our project which, for numerous reasons, didn’t really work out. Instead of acknowledging that early on and concentrating on the actual product we continued to fiddle around with the Pi. We didn’t finish our project and some core functionality is still missing because we didn’t focus on the actual problem.

It always takes a lot more time than you think: This is probably true for every project you have ever worked on but it becomes even more apparent if you only have one or two days to finish your project. Just pull in the weather data, decide if you want to sound the alarm or not and then play an mp3 file. We can probably get that done during first evening (I think that was an actual quote). Well, we didn’t get it done the first evening, we weren’t even finished by the end of the event. There is always some component whose complexity you underestimate.

Be prepared: One of the problems we had with the Pi was a very unstable network connection (due to a 2-step auth wlan). That wouldn’t have been a problem if we had brought a switch and a couple of ethernet cables. This might seem archaic (heck, my laptop doesn’t even have an ethernet port) but it might just save your ass. Another thing I didn’t think about beforehand were the different development environments everyone was using. A couple of times I got something to work on my macbook only to find out that it doesn’t work on my neighbours Ubuntu machine. I am pretty sure it would have been smarter to set up a virtual machine and work from there. I’ve been wanting to try out Vagrant for quite some time now, that would’ve been the perfect occasion. Oh well, what can you do.

Think about how to wire stuff together: If you are all working on the same codebase things are fine I guess but if there are multiple components which have to work together somehow, you better have a rough idea how to wire everything together. Surprisingly that part went over much more smoothly than I expected, mostly due to the awesomeness that is REST. My code was mostly written in Python, setting up a REST endpoint required about 3 lines of code using Flask. To communicate with other parties the Requests library came in handy. To glue different applications together, this worked just wonderfully.

It’s a HACKathon: Half an hour before the demos we realized that the code to turn off the alarm via motion gesturing which was working fine on the dev machine couldn’t be run on the machine we were using for the demo. After some back and forth we decided to install the motion software which takes a picture everytime it detects movement in front of the webcam and just monitor the folder it saves the images to. If any new pictures have been added, the webcam picked up some movement. Ugly, I know, but it worked surprisingly well for the demo. Which is what counts in the end I guess.

By the way, the bathroom graffiti social sharing site won. So if you feel that your restroom quips are underappreciated, you should definitely check out plungr. Boy, do I hope that thing catches on.

Volumecontrol via Web

You know what the cloud is good for? Filesyncing, mail, listening to music and a bit of recreational netflix’ing now and then. You know what it isn’t good for? Adjusting the volume of your living room linux server which is hooked up to the television and every time you sit down to watch some Doctor Who you just slightly misjudge the mid-episode volume level and now you either have to endure this situation for the next 35 minutes or you have to get up from the couch. Both of which sounds subpar.

Well, fret no more. I threw a small webservice together over the weekend which provides a simple webinterface to increase and decrease the volume on any linux pc. As long as they are using ALSA audio which I admit might pose a problem. My server is running lubuntu which still uses ALSA but one could probably make it work for Pulseaudio. A Running Pulseaudio server can be controlled with pactl, so setting the volume can be done with a system call, e.g.:

pactl set-sink-volume 0 -- -5%    

Although I don’t really know how to get the current volume level, but one could probably go without that.

You can find the code on GitHub. If you don’t already have flask and the Python ALSA bindings you can install them and run the server with the following commands:

pip install flask
sudo apt-get install python-alsaaudio
./vserver.py

#ECSW: Speaker Series Recap

Eastcoast Startup Week was packed with great events. I’ve been to 20 Mentor minutes last week where I met some amazing and bright people and had a lot of interesting discussions. Startup Weekend, although we didn’t manage to place top 3, was a very good experience and I would recommend participating in such an event to everyone who has even the slightest interest in the whole startup/entrepreneurial thing. The week started with some really cool talks by Tim Burke of 26ones, Dan Martell - Serial Entrepeneur, Ben Yoskovitz of GoInstant and githubber Zach Holman.

Tim Burke

@t1mburkes 26ones is all about really rapid idea execution. The problem isn’t to get a new idea for a product. Products solve problem and as an engineer that’s basically what you do all day long anyway. Analyze your environment and look for solutions on how to make life easier. The problem is more about validating those idea. Is it worth investing more resources in an idea or should you just scrap it and start working on something different. You want to come to a decision pretty fast, as every minute you work on a dead end project is better spent working on something which has a future. The way they do it is to start out with some basic market research. Are there to many well funded competitors? Are there already 15 companies working with a very small revenue margin? Back off, on to the next product. And don’t be afraid to kill your idea. A better one will come along.

Does it seem like you actually stumbled upon something here which is worth executing? Are there competitors in the space with obvious shortcomings you can address? Proceed to the next stage where you fake the whole product, front to back. Register a domain and get your designer to build a full blown website for your product, which at this point, is vaporware. Your task is now to gain traction with your fake. If you manage to get 10 Sign-Ups, Preorders, Calls, whatever, you might be on to something. You now have unique opportunity now to talk with potential customers who want your product without having written a single line of code. Talk to them, explain that you are faking it, ask them for their input. Usually they are excited to help; Excited they can contribute to form the product to fit to their needs. What do they want? What is it they expect from the future product?. Only now you would start to implement and invest more man hours into it.

Life is to short to build stuff people won’t buy.

I found this approach to product building to be the most interesting part of the talk. He also mentioned a few rules such as Don’t be afraid of competition and Scratch your own itch but you probably heard this a thousand times already. If not, have a look at the 37signals Rework book. If you have one free afternoon ahead - read it. It shouldn’t take much longer than that to get through the book and there are quite a few interesting points being made.

Dan Martell

@danmartells presentation wasn’t so much about technicalities but more about the wisdom you get when you are in the startup business for quite some time. Some of the more notable rules would be:

Make no small plans.

Yeah…just don’t! Dream Big. And then fail big! Because…

If you are not failing on a daily basis you are not playing a big enough game.

Probably the most used metaphor during the whole week to describe entrepreneurship was that of a roller coaster. As much cliché as this may sound, there has to be some truth to it if a bunch of very experienced people throw it at you again and again.

Really understand who your customers are.

This ties in nicely with what @t1mburke said. Do your customer validation, and do it properly. There is absolutley no excuse if you fail in this department.

Hustle to help

This one i found really nice. Whenever in the future you have a conversation goind along the lines of Hey. How’s it going. What are you working on? …. your very next question should be

How can I help?

He met his fiancĂ© this way (after some detour) which is undoubtedly one of the cooler outcomes. And if this doesn’t happen, the worst thing that will happen to you is that you meet new people and learn new stuff.

The world would probably be a much nicer place if this would be a universal thing.

You are the average of the five people around you

This could perhaps be filed under the bitter truths section. Just look around you, with what kind of people are you surrounding yourself. If you are the most aspiring one, just watch out that you aren’t held back.

Ben Yoskovitz

@byosko is the author of the Lean Analytics Book (you can get it here at O’Reily) so it isn’t surprising that most of his talk centered around the topic of analytics and finding useful metrics for your startup. You want to know when it’s useful to just hack something out and when it’s time to scale, i.e. spend money. When you are using a lean approach you are zig-zagging towards your goal. Besides just providing the core value, your MVP also helps you to learn and understand the problem you are trying to solve a lot better. Once you have the feeling you learned enough you might consider to pivot. Which is usually the right thing to do. But the one thing you want to stay away from is, what he calls, the lazy pivot. Don’t go the easy way or pivot to what you think might be the best solution. Focus! You have a business goal and analytics help you to move towards this goal. You just need to learn to set them up and read them properly.

Commonly tracked metrics, “Likes”, “Followers”, “Page Hits” you are probably using? Yeah sorry, they are bad. They are Vanity Metrics. Do you actually have a goal with them or is more just better? Draw a line in the sand so you can actually make a statement if you were successful or not. Change your actions according to those metrics.

A metric which doesn’t change the way you behave is a bad metric.

Track everything but focus on one thing. And then A/B test the shit out of things.

You can distinguish between leading and lagging indicators. The number of complaints for example is a leading indicator. If they go up, churn will inevitably follow if you can’t remedy the root cause for your customers discontent.

Zach Holman

The last talk and also the one I was looking the most forward to was by @holman

He talked about the working environment you can find at Github. As since they were founded, not one person has left Github (though a couple were left) they have to do something right. The key is to work asynchronously. Work on what, you want, where you want, when you want. This is pretty easy to do when starting out but as you grow gets harder and harder. If you build your company around this principle, not only is this a very good tool to keep employee satisfaction high but it also makes hiring a lot easier. Easy to see as you aren’t bound by geography when you make your hiring decisions. There are a couple of tools to make this work. Perhaps one of the most important is to make all information accessible. Put it in a wiki, build your own stuff, just find a way to keep everyone updated on whats going on. Github for example uses a lot of chatrooms for the communication between individual employees. A nice side effect of this is that you don’t get pulled out of the zone constantly as it may happen when working in an office. To code is a creative endeavor and you can’t enforce creativity. Oh, and it helps you to avoid meetings. No one likes those, right?

If tapping someone on the shoulder instantly makes you a jerk, having a meeting is probably a crime.

To avoid lock in of knowledge into teams you want the repositories to be as permissive as possible. Of course, teams will center around certain features but if someone wants to contribute to something different is is certainly allowed to do so by sending a pull request. Internally, almost everything works by sending pull requests.

Another thing is the side-project culture, which is valued very highly. While google, as far as I remember, mostly abandoned it’s 20% project, at Github you are very welcome to work on stuff you are interested in. A lot of the tools they are using right now started out as someones side project. Smartphone-lockable doors? Check. Distributed Music Playlist for the office? Check.

Of course this whole approach doesn’t work for everyone. There is a huge amount of responsibility for every developer to structure his own work and his day. For that reason, if a team feels it is understaffed, it goes about hiring reinforcement by itself. Obviously they know the requirements for the open position the best and chances are, they have someone fitting this profile in their circle of friends and acquaintances.

A big tech company, run by the engineers and techies without all the administrative overhead to many managers would bring with them - probably every engineers wet dream - seems to scale remarkably well. Even now, with over 150 Employees Github does a pretty good job at keeping being as awesome as always.

By the way: Holman blogged about this exact topic, it’s a pretty interesting post and you should definitely hear it from the source. You can find his post here.