Entries tagged parsing

Minimal Mensa Plan

Posted on 9. Dezember 2014 Comments

We’ve been doing a lot of website scraping for a university project latel so I decided a little app to scrape the universitys‘ cantine website for the lunch menu.

minimalmensaplan-screenshot

A network connection in the main Activity is not allowed so I’m using a private class for that. Once the doInBackground method is over onPostExecute is automatically called.

Luckily the jSoup library which is used for parsing also brings a way download a website for parsing.

Document doc = Jsoup.connect("http://speiseplan.studierendenwerk-hamburg.de/de/520/2014/0/").get();

There is only one category class and it contains the date. The dish-description class contains what you will later see in the app. The size is needed for a for-loop later on.

date = doc.select(".category").text();
int maxDishes = doc.getElementsByClass("dish-description").size();

With the next piece of code all the dish descriptions are extracted. The original website contains details about the food (made with alcohol, pork or if it’s vegeterian etc). For blending this out I use a regular expression which filters out:

1 non-word character(\W = an opening bracket), then possibly multiple digits (\d) and non-word characters (\W = commas) and then another non-word character (\W = a closing bracket)

String dish = doc.getElementsByClass("dish-description").get(i).text().replaceAll("(\\W[\\d\\W]*\\W)", " ");

These dishes are then saved together with every second price (the one for students, the other one’s for employees) in a HashMap, which is then added to a list. For recognizing later, the dishes get the key „dish“ and the prices the key „price“.

Once the data extraction is done, onPostExecute is automatically called. The date is set to a TextView above the ListView and a SimpleAdapter is populating the list of HashMaps into the layout

dateTextView.setText(date);
simpleAdapter = new SimpleAdapter(MainActivity.this,
dishList, R.layout.list, new String[] { "dish",
"price" }, new int[] { R.id.text1, R.id.text2 });
setListAdapter(simpleAdapter);

Since i’s called Minimal Mensa Plan, no other features (such as caching or selecting a cantine) are available. The app in the Play Store is used for scraping the cantine at the campus Berliner Tor but it might as well be used for others, just by changing the URL. It’s released under the MIT License and available at GitHub.