Actionable Collaborative Common Operational Picture in Crisis Situation: A Comprehensive Architecture Powered with Social Media Data

. Previous works in social media processing during crisis management highlight a paradox: citizens are extensively sharing data from the field of the crisis, while decision-makers are looking for information about the emerging risks they need to address. Several tools already exist to help taking advantage of this new important source of data. However, few made their way to decision-makers, mainly because they remain resource-consuming. That is why the question of a tool, able to process social media in near-real time, to deliver actionable information from the field is still pending. Based on a state of the art of the Natural Language Processing tools and systems dedicated to the use of social media data to improve the situational awareness of the decision-makers, this paper aims to describe a way to provide them with a first comprehensive system which asset is to completely address the challenge, from the collection of the data to their interpretation and understanding and finally offer situational models. In this sense, the paper focuses on the thorough detail of the business and consequent technical challenges that are raised, and a work in progress proposal to address them in a comprehensive manner.


Introduction
Crisis situations are recurrent situations in our societies.Whatever their nature, severity, extent, duration or complexity, these breaks are confusing situations.The need for information then becomes crucial to provide an adequate response to ongoing events.On the one hand, emergency management cells aim at building a common operational picture (COP) from the information sent by the responders on the field.On the other hand, citizens are using more and more social media to share what is happening around them during exceptional events.They tend to react more during crisis situations and share information about it [1].Social media, such as Twitter, Facebook or Instagram therefore participate in the exchange of information related to the crisis in an unprecedented timely manner.For this reason, it is of utmost importance to be able to integrate citizens' social media data into the COP.
Up to this day, although many tools exist in the literature, such information systems incorporating social media data into the COP are rarely used in the facts.This leads to a paradoxical situation where decision-makers are looking for information to organize their resources during crisis situations and victims and witnesses' willing to share information in real-time is not considered.These social media processing systems and crisis-related resources have already been developed and the most noticeable of them are explored by [2].Moreover, [3] points out some possible necessary improvements.Among them, "how the added-value information extracted thanks to such social mediaoriented system should be integrated in the decision-making process?" echoes to this context.
In a collaborative context such as crisis management, where heterogeneous actors are acting conjunctly or simultaneously in a coordinated manner, the ability to design a relevant, trustable and sharable COP is of the highest priority.Dispersed visions and compartmentalized information among responders are totally inappropriate in a context where interdisciplinarity and complementarity of coordinated interoperable responders is, not only the doctrine, but also the best way to have a chance to perform an efficient collaborative response.
Consequently, the point of this article is: How to fully integrate the information provided by social media data, in times of crisis, into the COP to improve the common understanding and thus the collaborative response of the responders?
The following of this article is structured in three parts.Section 2 presents the most significant social media processing systems, their structure and what needs they aim to address.Based on this, Section 3 describes the chosen approach to tackle the broad and comprehensive problematic raised here and illustrates the proposed approach with an example.Finally, Section 4 details what are the next steps of this work in progress research.

Existing Social Media Processing Systems
Following the fact that people are posting text messages, pictures or videos about their surrounding environment during a crisis, the idea of automatically processing this data emerged.[2] lists several existing systems alongside their literature.Also, these systems address various business concerns.
First, data collection.It is achieved by using requests directly executed thanks to the Application Program Interface (API) offered by the social media platforms.As an example, Twitter provides an API that allows to retrieve past messages according to their ID or a username.It also allows to monitor a fraction of the activity through keywords or geoboxes (for location-based collect).This last method is tedious as approximately 1% of the total user has the geolocation turn on their device.Finally, systems such as EMERSE [4] are also using Short Message Service (SMS).But as it is difficult to have such access, this kind of system represent a small fraction of the existing systems.Because of its ease of access, thanks to its freely-available API, Twitter was chosen as the social media platform for this study.
The processing of the data is considered as the most difficult task.Some of the main challenges addressed by existing systems are listed below: • Event detection is a task considered by all the existing systems.However, systems such as ESA (Emergency Situation Awareness) [5] aims at detecting bursts on social media, based on trigger put on data volume or specific keywords.
• The most addressed is filtering.Most of the systems such as AIDR [6], Tweedr [7], ESA [5], Twitris [8], Twitcident [9] and EMERSE [4] are filtering the tweets according to fact they are related or not to the crisis.• In order to improve the filtering of the tweets, additional context may be needed.
So, semantic enrichment aims to add more information to the existing tweets.For example, Twitris [8] proceeds to a sentiment analysis on the tweets in order to add a sentiment feature to the tweets.Also, entities extraction (such as Part of Speech tagging to classify words according to their grammatical role or Named Entities Recognition in NLP) allow to retrieve specific information from the message, such as names (brand, companies, people…), location (place, street name…) or numbers (money, quantities, etc.).Twitcident [9] uses this technique to improve the semantic of the tweets.All these previous elements can then be associated with the metadata of the tweet to infer the context of the monitored event.Systems such as SensePlace2 [6] can help inferring the location of the user thanks to the locations mentioned in the tweet.• Filtering is a first good step towards a better use of the social media data.
However, in order to improve the usefulness of the tweets, several systems aim to classify into different categories or identify clusters of tweets.For instance Tweedr [7] classifies the tweets according to their relation to casualties, damage, missing persons, projectile damage and health services.Twitris [8], Twitcident [9], AIDR [6] and EMERSE [4] are also performing a similar classification with different categories.Systems such as CrisisTracker [10] or ESA [5] works at identifying tweets related to the same topic during an event thanks to their metadata, entities, keywords, time period, entities or other relevant features.These features help to filter the data collected and aggregate the associated messages.
• Veracity of the data is also a huge concern in crisis management.Social media may convey useful information during an emergency but may also contain rumors or fake news.Identifying rumors is a challenging task, even for humans.
To do so, identify and describe what makes a rumor is a key.[11] studies how rumors propagate after March 2012 tsunami on social media.Then, they identify several common features between rumors.Few years later, [12] identify rumor signature during Boston bombing in order to characterize them.Retweets (RT), previous activity, or identifying if the user is present or not at the event site, are all potential features that may be used to assess the veracity of the data on Twitter [13].This work has made possible to train machine learning models to automatically detect rumors.[14] presents a rumor detector and a classifier.
Their model is based on Twitter NLP tools, a WEKA's1 framework dedicated to tweets processing.Also, [15] introduces a real-time rumor debunking system, using Support Vector Machine described as effective even with only five tweets.
The systems described previously are summed up in Tab.1.
Tab. 1. Classification of the different systems mentioned in the literature review, according to the business issue they address.

Filtering Semantic Enrichment
Classification/ Clustering Geotag Veracity ESA [5] x x x AIDR [6] x x Tweedr [7] x x Twitris [8] x x x x Twitcident [9]] x x x EMERSE [4] x x Crisistracker [10] x x SensePlace2 [16] x x [15] x x [12] x x Collaboration requires adapted medium.So, the way the results of the processing part is displayed is crucial.It varies according to the purpose of the system.However, most of them are sharing common elements.A geographical map allows to regroup the information on a single and shared representation.Systems such as Twitris [8] SensePlace2 [16], Twitcident [9], CrisisTracker [10], ESA [5] are plotting the tweets on a map, in order to provide a quick and visual information of the location of the tweets.However, this system requires that the Twitter user enables the geolocation on his/her tweet, which represent only 1% of the total volume.So, systems such as SensePlace2 are going one step further with geolocation inference according to places mentioned in the tweet.Some other common representations such as word clouds or pie charts are used to visually summarize the most frequent words captured by the system.Finally, a timeline of the messages marked as relevant by the system is provided in order to help the user to keep an access to the data.Also, in order to improve performances, some systems, such as [6], [10], also involve digital citizens during the processing to label some of the data.These new data are then used to train the algorithms online with human annotated data.
Other industrial tools exist in this field.In crisis-related domain, the most famous is Ushahidi, but many commercial/advertising solutions also exist.
To sum up, previous systems aim to filter the flow of information delivered by the social media, classify them according to some, and then display the messages that may be interesting for the decision-makers.But these filtered data ignore the existing environment, the current context of the situation or the organization and the mandatory collaboration.In addition, they do not feed the COP used by the organizations.The processing of these information coming from the social media and the cross-checking with all the other sources of information remain to the user or to the decision-maker.
The question of the integration of these data into the organization remains, in particular the automation of the data collection, processing and display to provide actionable information to the responders.

Research Motivations
Let's consider an example.An emergency agent is using a social media processing system in a crisis cell during a flooding.He/she has a similar system to one of those described in the previous section, i.e., a map, corresponding to the COP used by all the actors of the crisis responder's organization and a timeline of the tweets sent in a specific area.Then, a tweet appears, and it says that "The dike at Atherton St is about to fail! Help #911".Current systems make it possible to display such crisis-related data.But then, the user has to check all the emerging tweets, one-by-one, if there are other data related to this event, maybe send resources to get more information if they are available.
An improved version of the current system would have notified the user that there is a school nearby that can be used as a shelter according to the contingency plan.It would then have triggered an alarm to the crisis cell, offered to send resources to evacuate the place and maybe people to assess whether or not the dike can still hold.In addition, all these elements would have been displayed on the COP.This second version would be a version that integrates the social media data into the actual organization.The following of the paper aims to propose an approach that leads to the second version of the existing systems and which provides (i) an understanding of the situation and (ii) a corresponding decision support.

Integrating Information into the Collaborative Decision-making Process
The proposed approach relies on an understanding of the actual rescue organization by the algorithms, to better match it.To do so, such system should be able to match the current logic of the user.In [17] authors highlight that in the American 911 call centers, staff are asking questions in order to answer the "6Ws".During a phone call, questions asked by the call-taker are supposed to answer question Where (Where the emergency is occurring), What (What is happening?What kind of emergency is it?),Weapon (Are they weapons imply in the emergency?),When (if the event is not currently eyewitnesses when did the event occur?),Who (How many people are concern by the emergency?),and Why (Why is the emergency occurring?).
Any answer related to one of these questions is then forwarded to the emergency teams.Moreover, the "6Ws"framework is shared with several call centers in the United States.This behavior highlights that the call-takers follow an underlying mental framework.They are specifically looking for answers to one of these "6Ws".This observation produces two main results.First, it highlights that in a collaborative environment, it is crucial that the stakeholders share a common vocabulary in order to perform well.Due to the organization of the American 911, where the call centers are in charge of the rescue, and police and firefighters have to work side by side, these implementing a these common concepts and use them in the response phase of a crisis seems an obvious requirement which is yet not always addressed, in particular when it comes to set up an effective collaborative decision in terms of the actions to take.
Secondly, while the "6Ws" were highlighted particularly in a Charleston call center [17], it can be assumed that such framework is or should be used in a generic way, for any crisis related call center or crisis cell, since the intrinsic purpose is precisely to get a good understanding of the situation that would be understood by all crisis stakeholders and responders.In particular, rescue organizations are taking decisions according to a specific set of concepts, such as the environment, the resources that they have, the people involved, the possible additional threats etc.Such concepts have already been defined in ontologies or metamodels.As a metamodel provides a representation or a framework of an observed situation it enables conceptualizing the elements that make the situation in the form of interdependent concepts.This representation can then be interpreted as a common vocabulary shared between the organizations.Therefore, it becomes possible to describe behavior, processes, and interactions between the different actors according to occurring events.
The current approach of the existing systems is that the decision makers of the organizations are receiving data from the social media, and then have to process them according to their own vocabulary.However, the approach proposed in this paper is to couple this approach with a metamodel that will both (i) provide the common vocabulary to use among all responders and decision makers and (ii) help generating actionable information (i.e. that can be used by a decision support system to provide a collaborative response behavior to better coordinate all stakeholders).These information are organized trough an information model generated according to the metamodel.In this sense, the metamodel is used to provide a situation model which instanciate its concepts and associations between concepts (Fig. 1).Doing so, it would be easier to monitor the operations and collaborate around the COP.So, the system is going to embed a module that is dedicated to the instantiation of the crisis situation models based on the metamodel's classes.
Many people propose metamodels or ontologies that cover concepts involved during an emergency.[18] establishes a metamodel based on the interdependence of networks to study the impact of a crisis.[19] introduces a metamodel focused on crisis entities.It purposes is to describe the activities they carry out with each other in order to respond in the better way to the [20] defines a metamodel for each of the 4 phases of the crisis (preparedness, mitigation, response and recovery).The application depicted in this paper is only considering the crisis response phase, so the three other phases would not be taken into account during the evaluation.[21] published an ontology to describe data flow, from the event to the decision-maker.Finally, [22] defines an ontology around terrorist risk to help decision-makers prevent terrorist attacks on a territory.All this previous work shows the interest of metamodels and ontology in crisis management.
Previous work linked to these points has been done in order to correlate sensors data to several metamodels' entities.[23] develops how they correlate sensor data with different entities from a metamodel in the use case of the Loire floods.In this case, the data coming from the sensors are automatically processed using [19].This way, the system maintains a COP where water level data are considered in order to indicate the consequences of a water rise.

Current Implementation
Following the previous assumption, the evolution proposed in this paper is to enhanced existing systems thanks to a metamodel, alongside the filter.The proposed architecture is represented in Fig. 2. It is composed of different modules, each achieving a different operation of the global processing.The first module is the data collection module (Tweet collection in the figure).Then, the Information extraction module is composed of a first classifier able to identify if the data is related to a crisis or not and proceed to semantic enrichment or data normalization in order to contextualize the collected and relevant data.Next comes the Reconciliation modules.Paired with the Metamodel, it classifies the information contained in the tweets with the concepts that fit best.This outputs a situation model, which then feeds the COP.However, this solution raises several challenges.First, the classifier contained in the Reconciliation module needs to "reconcile" the data it receives, with the metamodel classes, in order to instantiate them.Secondly, the modules need to handle the relations between the different classes instantiated previously in order to extract the underlying links between the different data (i.e.correlate data together and relevantly generate consistent parts of situational models).Fig. 2. Architecture of the proposed solution with the different elements that may composed the final system.

Metamodel Chosen in this Approach
Crisis situation modelling brings constraints in the choice of a metamodel.First, it must be instantiable and provide a model of the crisis situation, either manually or automatically.Crisis situations also require an evolution of these models to keep them up to date based on information acquired throughout the rescue operations.Finally, the models generated must be usable by its users and, here, specifically by most of the users.So, these criterions can be summarized in the following requirements: (i) instantiate it manually and/or automatically, (ii) continuously update the built model (iii) exploit it.The system developed in this paper proposes to use the metamodel described in [19], which allows to describe the management of the concepts underlying in crisis resolution.In addition, it also fits to the previous criterions mentioned.First, it is instantiable as shown by [24], but also because it fits with concepts already handled by first responders, the "6Ws", which can be easily link to the metamodel classes (Who-Actors, Where-Environment, What-Threat etc.).Secondly, the situation model can be changed according to occurring events and the different data it receives.Finally, it responds to the goal set previously: enable a better collaboration through common concepts shared between the stakeholders, and it is usable to display a COP.Consequently, this metamodel is the framework used to design the information in the proposed approach.

Filtering the Data Flow
The filtering task is crucial, as it reduces the processing load on humans and allow to process only crisis-related data.Data filtering is already achieving acceptable performances according to the literature.Consequently, the system presented in this paper do not intend to improve this component.Filtering the data will reimplement existing solutions used and tested through systems such as those presented in the literature review (see section 2).This will be implemented in the Tweet collection module.Once the data is filtered, it is then possible to organize them and extract meaningful information which will then ultimately lead to the instantiation of the different classes of the metamodel.

Information Representation and Word Embeddings
However, being able to extract information from tweets requires that the algorithms can catch the meaning behind the words used in the tweets.To get such representation of words, the proposed solution uses a word embedding.A word embedding contains all the words found in the corpus and represent them through a vector according to the context in which they are found.The context is represented by the words surrounding the targeted word.The result is a n dimensional space (where n is set during the training time) containing m vectors (where m is the number of different words in the dataset).Fig 3 a) represents a 2D representation of a word embedding, after a reduction of the n dimensions using the t-SNE algorithm [26].The interesting feature of word embeddings is that while each word keeps its semantic sense, it is expressed according to all other words' semantic.
Here, the proposed solution relies on a crisis-related dataset, in order to better capture the vocabulary related to crisis.However, tweets sometime contain words that are missing in the word embedding.This issue is common for domain-specific word embeddings.The obvious way to overcome this issue is to get more data, which allows to get the missing words.But [27] proposes another solution which consists in starting from general word embeddings (with an important vocabulary therefore) and make them domain-specific.To do this, they modify it by emphasizing the areas of the vocabulary that are related to the domain targeted.Also, this proposition suits well the following of the proposed approach.
Thus, the resulting Information Extraction module ensures a filtering step to only keep crisis-related tweets and then an information representation step were all words used in the remaining dataset are confronted to one another, in order to provide a crisisspecific semantic context.

Information/Classes Reconciliation through Clustering
Using this representation of words and their semantic, the system is going to fit the information contained in the tweets with the classes of the metamodel.To do this, the approach adopted is to identify the semantic clusters present in the word embedding.This will be accomplished using clustering algorithms.Figure 3 b) represents a possible cluster in a word embedding.Word clusters that are semantically close to a metamodel class are then highlighted, as performed in [27].
All this work will be incorporated into the Information/Classes Reconciliation module.The output of this module is mapped on the COP with the different instances identified on social networks, linked to the other instances that may interconnect each other's.Developing this part should be iterative, as it will require to tune the word representation and the clustering in order to get sufficient results.

Conclusion
This paper presented a novel social media analysis system which aimed to improve the COPs used in crisis cells with a better integration of social media data and a better sharing of this information.
The contributions are: (i) a system which takes away social media processing from humans, to let them focus on decision making (ii) instantiate an information model thanks to semantic clusters found in a word embedding.This information model is generated according to the classes provided by a metamodel.
Future work will consist in further experimentation of the modules in order to provide consistent outputs for crisis responders.Reconciliation between the entities extracted from the classifier and the model is going to require time to find the appropriate parameters for the word embedding and the clustering algorithm.
Moreover, in order to improve accuracy and/or veracity of the data, it may be interesting to merge the data coming from social media data and data coming from sensors placed on the ground (water level, CCTV…).Also, this work is not dedicated

Fig. 1 .
Fig. 1.Differences between the approach proposed in this paper and the one used on existing systems.

Fig. 3 .
Fig. 3. 2D representation of a word embedding created using the Singapore_Haze_2013 dataset from the CrisisLex archive [28].Figure a) gives an overview of the word embedding.Figure b) gives an overview of a cluster of terms related to "haze".